Enhancing DCAT support in CKAN (DCAT-AP v3, scheming integration, and more)
A review of the recent developments in CKAN's DCAT support, and how you can get involved
Processability - Is it in a convenient format - one that can be machine processed into structured form? Is it in closed proprietary format? (This is 5 stars of openness item 2)
Data quality is a complex measure of data properties from various dimensions. It gives us a picture of the extent to which the data are appropriate for their purpose. What are the main dimensions of data quality?
Accuracy - data reflect real world state. For example: company name is real company name, company identifier exists in the official register of companies. Can be measured in an automated way using various lists and mappings. (NB: data can be complete but not accurate)
Credibility - extent to which the data is regarded as true and credible. It can vary from source to source, or even one sourced can contain automated and manually entered data. This is not quite measurable in an automated way.
Timeliness (age of data) - extent to which the data is sufficiently up-to-date for the task at hand. For example not timely data would be scraped from unstructured PDF that was published today, however, contains contracts from three months ago. This can be measured by comparing publishing date (or scraping date) and dates within the data source
Some other dimensions can also be measured, but require that one has multiple datasets describing the same things:
Integrity - can be multiple datasets correctly joined together? Are all references valid? (measurable in automated way)
Next time we will talk about "What is acceptable data quality?"
A review of the recent developments in CKAN's DCAT support, and how you can get involved
CKAN 2.11 introduces Table Designer: form builder and enforced validation for your data