6

I'm looking for some kind of lightweight framework for assessing the quality of datasets, as a publisher. The most widely known one is Tim Berners-Lee's "5 star open data", but it's actually not very helpful. Virtually all of our datasets would get exactly 3 stars, as linked data is not a priority for us.

What else is there?

Stanislav Kralin
  • 2,975
  • 1
  • 12
  • 33
Steve Bennett
  • 850
  • 5
  • 12

2 Answers2

2

You can benchmark data vs. Open Data Census, you can use linting tools (CSV Lint, JSON Lint, etc.), and you can make them all into Data Packages.

I'm curious to why linked data is not a priority for you. Or rather, what your end goal(s) are.

albert
  • 11,885
  • 4
  • 30
  • 57
  • There are very few immediate benefits in linked data. So, compared to other goals like publishing more data, higher quality, increasing uptake etc - linked data is a low priority. – Steve Bennett Nov 10 '16 at 02:25
  • Also, the Open Data Census doesn't address quality (aside from its one "is the data up to date" question). What "linting tools" would address data quality (as opposed to, say, well-formedness). And I'm not sure I understand how creating data packages helps describe quality? – Steve Bennett Nov 10 '16 at 02:29
  • update some linting links; data quality includes valid documents, so linting applies. malformed documents aren't machine-readable. i guess i'm confused by the entire question then. can you define quality in this context? – albert Nov 10 '16 at 03:45
  • also, certainly not trying to argue, but there are many benefits to linked data now. most specifically around search and discoverability/findability. really, i'm just confused. what are your goals if not to be linked? – albert Nov 10 '16 at 03:58
  • can you give any specific examples? You're saying if we put semantic web URLs of other entities in our datasets, more people will be able to find our datasets? I don't really understand your last question - our goals are to allow people to access our data, download it, do stuff with it. That certainly doesn't depend on being "linked". – Steve Bennett Nov 11 '16 at 02:39
  • linked open data makes your content more findable in search, not just in terms of SERPs, but in the content users see on SERP pages. lod isn't just semantic web urls; microformats are part of lod. they display on SERP pages. i'm not sure how to word this another way...its like saying you don't care about SEO. you can always just publish data and do nothing/see what happens. none of what you what depends on linked, but you'll get more if it is linked. like if you build with accessibility in mind, your content will go further, etc. – albert Nov 11 '16 at 13:12
  • 1
    all that said...its entirely your prerogative to publish what you want/how you want. back to the quality question...then i don't know how to answer/respond/help without more clarity on what you mean by data quality. – albert Nov 11 '16 at 13:13
2

In December 2016, the Data on the Web Best Practices Working Group (Group Page, Group Charter) have published the Data Quality Vocabulary. This vocabulary is a meta-framework for frameworks that similar to one you need.

In the document, these points are relevant to your question:

  • Feel free to create your own framework; moreover, you rather should create your own framework;
  • W3C, as it seems, does not insist on 5-star data quality scheme;
  • W3C even does not insist on ubiquitous RDF usage when publishing open data;
  • However, W3C insists on RDF usage when measuring data quality
    (by the very fact of publishing this document in a form of RDF vocabulary).

See also this answer for more details.


From a practical point of view, it is easier to measure quality of 5-star data, than of 3- or 4-star.

There are common requirements for Linked Open Data, whereas non-Linked Open Data quality depends on particular technical and non-technical requirements that should be fulfilled. These requirements may vary from format to format or from country to country.

For example, this is the rating page of random Russian federal body in special system that measures open data quality, this page also provides links to:


Finally, have a look at:

Stanislav Kralin
  • 2,975
  • 1
  • 12
  • 33