CKAN extensions Archiver and QA upgraded
Popular CKAN extensions ‘Archiver’ and ‘QA’ have recently been significantly upgraded. Now it is relatively simple to add automatic broken link checking and 5 stars of openness grading to any CKAN site. At a time when many open data portals suffer from quality problems, adding these reports make it easy to identify the problems and get credit when they are resolved.
Whilst these extensions have been around for a few years, most of the development has been on forks, whilst the core has been languishing. In the past couple of months there has been a big push to merge all the efforts from US (data.gov), Finland, Greece, Slovakia and Netherlands, and particularly those from UK (data.gov.uk), into core. It’s been a big leap forward in functionality. Now installers no longer need to customize templates – you get details of broken links and 5 stars shown on every dataset simply by installing and configuring the extensions. And now we’re all on the same page, it means we can work together better from now on.
The Archiver Extension regularly tries out all datasets’ data links to see if they are still working. File URLs that do work are downloaded and the user is offered the ‘cached’ copy. Otherwise, URLs that are broken are marked in red and listed in a report. See more: ckanext-archiver repo, docs and demo images
The QA Extension analyses the data files that Archiver has downloaded to reliably determine their format – CSV, XLS, PDF, etc, rather than trusting the format that the publisher has said they are. This information is combined with the data license and whether the data is currently accessible to give a rating out of 5 according to Tim Berners-Lee’s 5 Stars of Openness. A file that has no open licence, or is not available gets 0 stars. If it passes those tests but is only a PDF then it gets 1 star. A machine-readable but proprietry format like XLS gets it 2 stars, and an open format like CSV gets it 3 stars. 4 and 5 star data is that which uses standard schemas and references other datasets, which tends to mean RDF. See ckanext-qa repo, docs and demo images