I believe I’m not the only one who thought that data quality has the same meaning as data value. In data management, we take many actions to augment the quality of data.

  • We exploit different data cleaning strategies;
  • We predict missing values;
  • We find a structure for data;
  • etc.

At the end of the day, we obtain a higher quality for data. But does that necessarily guarantee a higher value as well? Recently, a SIGMOD blog post is published by Julia Stoyanovich (Drexel University) and Fabian M. Suchanek (Télécom ParisTech University) where they interview some famous database researchers. In one part of the post, Divesh Srivastava (AT&T Labs) and Xin Luna Dong (Google) describe the difference between data value and data quality:

It is worth noting that while value and quality of big data may be correlated, they are conceptually different. For example, one can have high quality data about the names of all the countries in North America, but this list of names may not have much perceived value. In contrast, even relatively incomplete data about the shopping habits of people can be quite valuable to online advertisers.

Big Data is usually characterised with 4 V’s: Volume, Variety, Veracity and Velocity. Recently researchers are talking about a 5th V i.e., Value, which evidently makes sense.

Photo from “Knowledgenet” website, a data analytics firm.