Everything you need to know about data quality

Learn to identify, manage, and nurture high-quality data with these industry best practices.
26 April 2023
everything you need to know about data quality
Alain Briancon photo
Alain Briancon

VP Data Science

Get in touch

In the world of market research, data is king. Data allows businesses to connect with consumers, provide valuable products and services, and rise above the competition. But, alas, not all data is helpful. Inaccurate, incomplete, inconsistent, and otherwise skewed data can muddy the waters, making the path to intelligent, informed business decisions unclear and risky. That’s why market researchers must have a solid grasp of data quality and demand it from their partners.

Data quality measures how well a data (or dataset) fulfills an intended purpose; different purposes will require different levels of quality. Objectives can be as varied as gauging brand awareness, mapping seasonality to trigger sales campaigns, and understanding consumer purchasing behavior across demographics. Does this mean that data quality is subjective? No, it means quality is subjective and objective. On the objective side, there are critical measures of good and bad data.

Let us delve deeper into the concept of data quality and discuss data quality metrics and means of improving data quality.

What is data quality?

Data quality measures how well qualitative or quantitative information serves an intended purpose. Put differently; data is deemed high-quality if it accurately represents real-world constructs.

For instance, imagine a company attempting to assess brand awareness. Data quality is high if the data yielded from a survey precisely evaluates consumer sentiments, opinions, and behavior. On the contrary, data quality is compromised if the questionnaire delivers data that paints a grossly skewed picture.

Ergo, data quality is closely aligned with trustworthiness and reliability. When data quality is high, market researchers feel confident using the information to make critical business moves. Comparatively, when data quality is low, market researchers may feel trepidatious about using the information as a springboard for company decisions like boosting production or increasing sales prices.

Why is data quality important?

For decades, companies have often relied on intuition to make critical decisions. Years of experience have to build a consensus view of what matters, the ins and outs of markets and technology. But alas, gut feelings aren’t always warranted, especially when market or technology disruptions are present. So hoping to eliminate the fickleness of human emotion and biases, many contemporary corporations have adopted a data-driven decision-making model.

The value proposition is clear: Rather than making choices on a whim or being driven by the loudest person in the room, businesses extract insights from quantitative and qualitative information and ensure the best decision is taken as well as a side benefit to enhance the

Consider this example: C-suite executives of a software development company hope to unveil enhancement to their accounting tool for small businesses. Some existing clients are on version X; others are on version Y.

Before investing time and resources in product development and determining possible upgrade prices, the company’s market research team surveys to assess existing demand for the software and elasticity. Ensuring the current version is captured is more important than the tenure of the account. Since information from this questionnaire will determine the company’s next move (that is, whether or not executives give the ok to engineering to proceed), data quality is of the utmost importance.

Trusted information can help companies:

  • Extract greater value from market research efforts
  • Reduce risks and costs associated with production
  • Improve tradeoffs between options
  • Target consumers better
  • Develop more effective marketing campaigns
  • Improve customer relations

There is no doubt that proper data management gives businesses a competitive edge. Companies can make efficient and effective decisions to outperform rivals by better-understanding consumer opinions and behaviors.

Consequences of poor data quality

Accurate data allows a company to flourish. The opposite is true: Compromised data quality can quickly tank a business. Low-quality data can result in the following:

  • Reduced efficiency: When market researchers base decisions on flawed data, they risk wasting two essential resources: time and money. They may, for example, release a product for which there is no demand. Or, they may launch a marketing campaign that doesn’t resonate with the target consumer.
  • Missed opportunities: When data quality is compromised, companies miss revenue-generating opportunities. For example, executives may need to realize there is, in fact, a need for a particular product or service. Or, they may attribute brand awareness to social media outreach when, in reality, out-of-home advertising is the contributing source of conversions. In return, they may invest marketing dollars in the less effective media vehicle.
  • Strained customer relations: Market research's primary objective is to understand your target consumer better. Sadly, the understanding must be more precise when data is biased or skewed by outliers. In return, companies might be perceived as having a blind eye to the market, appear disconnected, or even be dismissed as arrogant.

Aspects of data quality

High-quality data can move businesses forward. But how, exactly, is data quality assessed? How can you determine if data should be used to make critical business decisions or abandoned altogether?

As a general rule, there are seven aspects of data quality. These dimensions can allow you to determine the trustworthiness of a particular dataset.

Fidelity or accuracy

Data fidelity refers to the degree to which data represents reality. In other words, fidelity measures whether or not the information collected is correct

As with most things about data, fidelity can be compromised by human error. For example, a survey respondent may accidentally mistype their zip code or select the wrong entry from a pull-down menu when completing a questionnaire. Though an honest mistake, this foible can compromise data quality if you assess purchasing behavior by location or brand preference. This has to be distinguished from dishonest survey takers who may purposefully lie about demographic information to qualify for monetary rewards—the latter an example of survey fraud, which must be tackled proactively.

Other factors that influence data fidelity include:

  • Data Decay: Fidelity may be high initially but degrade over time. For example, a survey respondent’s income or number of dependents living in the same home may change.
  • Manual Entry: As previously noted, a survey participant may mistype a value. Similarly, a market researcher may transpose numbers or letters during the data analysis. That is incredibly impactful.
  • Data Movement and Integration: The data Information can also be altered inadvertently when it has been moved from one system to another where the formatting might differ. Is 4/6/23 the 6th of April or the 4th of June? You better be sure.

Completeness

Completeness measures if each data entry is “full.” The data refers to NANs. In other words, this metric seeks to determine if there are missing records, fields, rows, or columns.

Generally speaking, there are two types of missing records:

  • Unit Nonresponse: This occurs when a member of the survey sample fails to complete the questionnaire.
  • Item Nonresponse: Item nonresponse occurs when a survey participant fails to answer one or more survey questions.

Both phenomena can affect the quality of your survey results, potentially leaving insufficient data to make meaningful insights (such as cross-tab analysis).

It is important to note that the completeness needed for a project is subjective. It depends on the purpose of the study. It is up to the market researchers (working with their survey partner) to determine the acceptable response level. Data science methods assess if the missing data follows specific patterns. It is also up to market researchers to distinguish between critical data—information integral to the study—and non-critical data.

Consistency

Consistency is the degree to which a survey would yield similar results if conducted again under the same conditions. Furthermore, this relates to the statistical concepts of confidence levels and results. In other words, it’s an assessment of whether the questionnaire measures what you aim to measure as a market researcher.

Consistency may also refer to whether specific data points gathered through your questionnaire are congruent with those gathered elsewhere. For example, a respondent may note earning a particular income during a pre-screening survey. However, they may designate a dramatically lower income during the actual study.

Sometimes market researchers intentionally ask the same (or slightly different) question twice or similar questions to check for these conflicting responses. This survey quality check should be used sparingly, however. Market researchers risk triggering survey dropout with redundancy.

Timeliness

Timeliness refers to the relevance of the data. In other words, how recently was the data collected?

As a general rule, companies should make decisions using the most up-to-date information possible. Otherwise, stale data could result in erroneous decision-making. Case in point: Suppose a company conducted a survey to assess consumer buying behavior before the pandemic. Since COVID-19 shifted purchasing habits, this data is no longer accurate. Hence, a further study should be conducted to evaluate the target audience better.

Validity

Valid data refers to data correctly formatted per predetermined standards set by market researchers.

For example, a survey may ask that respondents provide their birthdays in British English (i.e., day, month, and year). Responses provided in American English (i.e., month, day, and year) would be considered invalid. Other examples include telephone numbers. A survey may ask for the respondent’s phone number using only numbers—no symbols. Any responses submitted with symbols would not be valid.

Uniqueness

Unique data only appears once in a dataset. In other words, there are no duplicates.

Unfortunately, data duplication is a common occurrence. In addition, dishonest and fraudulent survey takers may intentionally guise their identity to collect rewards. The risk, of course, is that these respondents need more insight into your target audience. Worse yet, disingenuous survey takers are often incredibly sophisticated and challenging to detect. That’s why anti-fraud software is a must. Integrity

Data integrity refers to the fidelity and completeness of data as it’s maintained over time and across formats. Unfortunately, there are various threats to data integrity, from human error (i.e., a market researcher accidentally deleting a row in Excel) to data decay. Hence, maintaining data integrity is a continual, ongoing process that requires a meticulous approach.

How to improve your data quality

High-quality data offers a window into the consumer psyche, allowing market researchers to understand better what motivates the target market. But alas, compromised data can sour results, wasting countless company dollars.

Luckily, there are several ways you can improve data quality.

1. Know your niche audience

When market researchers conduct panel surveys, they hope to gain insight into how potential customers think, feel, and act. However, market researchers must first determine their niche audience for these insights to be valuable.

A niche audience, or target audience, is a group of people who are most likely to purchase a product or service. These individuals often share demographic traits like age, gender, location, education, and socioeconomic status.

It’s essential to have a clear idea of your target audience before conducting a survey. Why? Because surveying these specific types of individuals increases the fidelity of your data. If, for example, your company mainly sells products to middle-aged women, the dataset will be more accurate if you survey middle-aged women.

To help with this, Kantar offers an extensive research panel of more than 170 million people. As one of the biggest and best sources of global survey takers, we can easily connect you with your target niche, allowing your business to collect more accurate and representative data.

2. Engage your survey respondents

As a market researcher, boredom is your arch-nemesis. It triggers panellists to speed through questions, straight-line, fill open-ended fields with gibberish, and abandon questionnaires altogether. Unfortunately, these actions can spoil the quality of your data, leaving you with a dataset that needs to be more accurate and complete.

Kantar has developed an entire library of online survey training modules to support the collection of trustworthy data. Created with award-winning online survey design knowledge and best practices to improve survey effectiveness, these online classes will teach you how to craft surveys that keep respondents happy and engaged rather than listless and weary. In return, you can expect higher-quality responses.

3. Reduce fraud

Kantar found in Q4 2022 that companies discard up to 38% of the data they collect because of quality concerns and panel fraud. Fortunately, market researchers can combat lazy and dishonest panellists through effective survey design. You can, for example, remove superfluous questions to keep the survey length under 10 minutes. Or, you can use iconography to keep survey takers engaged.

Despite these efforts, fraudulent panellists will continue to be an issue as long as there is monetary reward. Often located overseas, these scammers are highly sophisticated and understand how to guide their IP address, device type, and other red flags that give away their identity. Their goal is to extract as much money as possible quickly as possible. Hence, they can be pretty aggressive in their methods.

4. Implement data validation processes

Data validation involves checking the accuracy, completeness, and integrity of data during entry or migration. Automated validation rules and manual verification can help identify and rectify errors promptly.

5. Conduct data cleansing and standardization

Data cleansing involves identifying and rectifying inaccuracies, inconsistencies, and duplications in datasets. Standardization ensures uniformity and consistency by defining guidelines and formats for data entry.

6. Leverage technology and AI designed for improving data quality

AI driven tools can automate data validation, cleansing, and standardization processes. Leveraging advanced analytics and machine learning algorithms can further enhance data quality efforts.

To thwart spammers, Kantar developed Qubed, a proprietary anti-fraud technology based on deep neural networks, the benchmark technique for AI-based classification. Qubed employs the latest artificial intelligence technology to detect fraud where humans or other standard measures cannot.

How Kantar’s Qubed improves data quality

Qubed works using a four-pronged approach:

  • Assessing Domain Knowledge: Qubed's Core AI is trained via 5 years of concise data collection and labelling, detailed knowledge of breached ISP/IP and points of access for fraudsters and bots. Knowing all the attack vectors allow Qubed to block most fraudsters before they even attempt the studies.
  • Assessing Key Factors: Qubed analyses the full history of each user, looking at every data point of every event collected, it evaluates their reconcile/acceptance rate, activity patterns, rate of starting/finishing studies versus indicated and other like-to-like users, open-ended response quality rate, demographics sensibility/consistency, device/browser fingerprint and much more..
  • Machine Learning: Qubed is continuously improving and learning through real-time machine learning. That means it evolves and learn new patterns automatically as scammers develop new ploys, placing you under constant vigilant protection.
  • Identifying Types of Fraudulence: Not all red flags are triggered by actual scammers. Straight-lining, for instance, could result from respondent fatigue—not fraudulence. With this in mind, Kantar designed Qubed to distinguish and categorize different sources of survey fraud - which are in turn dealt by different measures.

Feel confident in your data-informed decisions with Kantar

Improving data quality should be a top priority if your company aims to boost revenue and foster brand awareness. Fortunately, Kantar's Profiles division has developed a science-backed quality data formula that affords market research partners highly accurate, valid, and trustworthy information.

Our formula encompasses three key elements:

  • An expansive research panel of 170M+ people. Quality data begins with a representative sample of survey respondents. To provide you with just that, Kantar offers the biggest and best source of human respondents.
  • Productive panellists. Even better, our survey respondents are satisfied and engaged. This results in a 23 percent higher survey completion rate than the industry average.
  • State-of-the-art fraud protection. To combat pesky bots and fraudulent survey takers, our R&D team has developed proprietary anti-fraud software that prevents four times more fraud than any other tool on the market.

If your business wants to make smart, data-informed decisions, the first step is to partner with Kantar. As an industry leader, we understand how to conduct market research that yields informative, helpful, and high-quality data.

antifraud, panel fraud, survey fraud, qubed, quality data

Boost Data Quality with Anti-Fraud Technology

Gain insight on the influx of panel fraud today and its impact on data integrity. Learn what Kantar's doing to protect your data.
Protect your data
Get in touch