Free eBook
6 Steps to Monetizing Your Data Ready to Sell Your Data? Download Now

What Do We Mean When We Say Quality of Data?

June 20, 2016

Do marketers want quality data? Without hesitation, marketers will say yes and then quickly change the subject to the kind of audiences they want to reach and what they want to pay. But what they may not realize is that their ability to use data to reach those audiences hinges on what we mean when we say quality. So what do we really mean when we refer to the quality of the data? As it turns out, quality is an incredibly subjective word, because we all use it differently.


This is the most common way marketers use the word quality. Simply put, marketers are asking if the information contained inside the data files is correct? If the cookies say males 18-34, are you actually reaching men between those ages? Not surprisingly, this speaks to one of the longest running issues in data-driven marketing. Quality inputs return quality results, so it’s imperative that the data marketer’s use is accurate, but accuracy is only the minimum threshold when it comes to quality.

Fraud & Viewability

Right now, the two biggest issues facing marketers and publishers alike are fraud and viewability. These are serious challenges. Unfortunately, fraud and viewability are sometimes mislabeled as quality issues. The problem with defining fraud and viewability as quality issues is that it makes it sound as if the problem can be solved by better data. In fact, quality data (cookie files that are accurate because they contain what they say they do) can still run into fraud and viewability issues because those are problems that revolve around better business practices and enhanced industry policing.

100% Matching Expectation

There’s a common misperception among marketers that digital media buys should yield a 100 percent matching rate, and that anything that falls short of that goal is likely the result of poor quality data. The reality is more complicated.

When an advertiser buys time on a television show like The Big Bang Theory, they accept that the demo they want will represent only a fraction of the total audience. Online, there’s a lot less waste, but advertisers still face a different kind of matching challenge. Shared computers, bots, and deleted cookies combine to make a 100 percent match rate impossible. What that means is that when an advertiser buys 10 million impressions in a specific demo, they should expect to match at a rate that is slightly less than 100 percent, but significantly more than what they’re likely to find on television.

So why do advertisers wrongly assume that a match rate below 100 percent is a quality problem? The misunderstanding is often predicated on the fact that the sheer number of vendors at play insulates the advertiser from the mechanics of the online media buy.

In fact, many advertisers fail to dig much deeper than an agency’s promise to deliver X impressions in a demo at Y price. In practice, digital media is an incredibly complex and granular ecosystem that rewards marketers who drill down into the details of each campaign. The quality of the data can improve match rates, but to expect a 100 percent match rate is to misunderstand the fundamentals of online media.

Probabilistic vs. Deterministic

Broadly speaking, there are two kinds of inventory, and understanding the difference goes a long way to understanding what we mean by quality data.

Typically, marketers associate what’s known as deterministic data as being of the highest quality because it targets a known person. By way of example, I did an online search for a hot tub last year. When I saw ads on Facebook for the same model I had searched for, I knew that those ads were based on deterministic data matching. Both Facebook and Google, in different ways, are deeply personal, and therefore the data they have is tied to a specific person. For the hot tub advertiser, those ads relied on deterministic data—I had expressed a clear interest, and so the advertiser was able to reach me.

Unfortunately, that hot tub advertiser can’t rely exclusively on deterministic data because there isn’t enough of it to sustain a campaign. To compensate, the advertiser must go out into the wild and use probabilistic data to look for people who are probably interested in a hot tub. Instead of targeting a specific person who wants a hot tub, the advertiser uses as many data points as possible to triangulate on a prospect. Probabilistic and deterministic data matching represent different methodologies. Both are necessary to achieve scale, but weighing the relative quality of deterministic matching against probabilistic matching is akin to comparing apples to oranges.

Ok, so what does quality really mean?

Marketers will likely never settle on a single definition of quality. That’s ok. What’s important is that marketers achieve a sophisticated understanding of the term in all its uses. The better grasp marketers have on this topic, the more they’ll be able to leverage the strengths of their data partners. Just as important, marketers who speak the language of data fluently will run campaigns that not only reach the most precise audience possible, but also move that audience to action.

Written by Jason Downie, GM Data Solutions