Blog

Enhancing DCAT support in CKAN (DCAT-AP v3, scheming integration, and more)

A review of the recent developments in CKAN's DCAT support, and how you can get involved

28-DCAT-01

Summary

Long standing support for the DCAT standard in CKAN has been recently expanded with the release of ckanext-dcat v2.0.0, which includes support for DCAT-AP v3 and seamless integration with ckanext-scheming. We want to keep expanding the DCAT support in CKAN and are keen to hear from publishers and consumers to learn how you are using DCAT and what is missing.

Sharing metadata with DCAT

Data and metadata sharing is one of the key features of data portals. To be able to fulfill their role in data disseminators, portals need ways to share information about the datasets they host with other systems. Most data portals like CKAN will have some form of API that will enable programmatic access to discover and import metadata records but of course different systems will use different representation formats, API endpoints etc.

To help provide a common vocabulary for sharing metadata across data portals, DCAT was formalized as a W3C recommendation back in 2014, building on previous work performed in different institutions. The CKAN community has always been close to these discussions from an early point, with some initial efforts like spec.dataportals.org dating back to 2012.

DCAT provides a number of classes for different entities like Catalogs, Datasets, and so on, and describes what properties should be included for each of them, as well as the relationships between them. On top of the DCAT recommendation, other profiles can be built. These can focus on specific regions or administrations like DCAT-AP and DCAT-US for data portals in the European Union or the US respectively, or be domain specific like GeoDCAT-AP and StatDCAT-AP.

CKAN + DCAT = ckanext-dcat

As DCAT grew in popularity and more organizations adopted it, work consolidated around ckanext-dcat, a CKAN extension that helps data publishers expose and consume metadata serialized using DCAT. Ckanext-dcat is one of the most popular CKAN extensions, and it is used across major sites like the German, Irish or Swiss national data portals. It allows to expose datasets created in CKAN in a semantic RDF format that is compliant with DCAT specifications.

So a CKAN dataset that looks like this in the CKAN API:

{
    "id": "425e361b-bad9-4a8f-8cc4-2e147c4e8c18",
    "name": "my-ckan-dataset",
    "title": "An example CKAN dataset",
    "description": "Some notes about the data",
    "temporal_coverage": [
        {
            "start": "2024-01-01",
            "end": "2024-12-31"
        }
    ],
    "resources": [
        {
            "id": "df0fc449-fddf-41af-910a-f972b458956c",
            "name": "Some data in CSV format",
            "url": "http://my-ckan-site.org/dataset/425e361b-bad9-4a8f-8cc4-2e147c4e8c18/resource/df0fc449-fddf-41af-910a-f972b458956c/download/data.csv",
            "format": "CSV"
        }
    ]
}

Will look like this as a DCAT serialization:

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://my-ckan-site.org/dataset/425e361b-bad9-4a8f-8cc4-2e147c4e8c18> a dcat:Dataset ;
    dct:identifier "425e361b-bad9-4a8f-8cc4-2e147c4e8c18" ;
    dct:temporal [ a dct:PeriodOfTime ;
            dcat:endDate "2024-12-31"^^xsd:date ;
            dcat:startDate "2024-01-01"^^xsd:date ] ;
    dct:title "An example CKAN dataset" ;
    dcat:distribution <http://my-ckan-site.org/dataset/425e361b-bad9-4a8f-8cc4-2e147c4e8c18/resource/df0fc449-fddf-41af-910a-f972b458956c> .

<http://my-ckan-site.org/dataset/425e361b-bad9-4a8f-8cc4-2e147c4e8c18/resource/df0fc449-fddf-41af-910a-f972b458956c> a dcat:Distribution ;
    dct:format "CSV" ;
    dct:title "Some data in CSV format" ;
    dcat:accessURL <http://my-ckan-site.org/dataset/425e361b-bad9-4a8f-8cc4-2e147c4e8c18/resource/df0fc449-fddf-41af-910a-f972b458956c/download/data.csv> .

The extension works by allowing different profiles, providing builtin base profiles compatible with different versions of DCAT-AP, that can be adapted or expanded by site maintainers to support more targeted profiles.

What’s new on ckanext-dcat 2.0.0

Having supported DCAT-AP version 2 for a while, the new 2.0.0 version of ckanext-dcat added support for the recently announced DCAT-AP version 3, allowing sites to update their DCAT support. More significantly, it also added support for integrating with the most common way to customize the CKAN metadata schema, ckanext-scheming. Ckanext-scheming allows to define different metadata schemas for a CKAN site using just configuration files, and will take care of handling the fields internally, validation, displaying the right template snippets in the UI, etc. Ckanext-dcat now integrates with ckanext-scheming, providing pre-built schemas that can be adapted by sites to provide out-of-the-box DCAT support.

What next?

We want to continue expanding the capabilities of ckanext-dcat to make it easier for publishers to provide their sites metadata

  • Provide support for DCAT-US as a builtin base profile
  • Improve the base profiles with support for codelists, multilingual properties, etc.
  • Explore support for other DCAT entities like Dataset Series. Some initial discussion can be found in this issue if you want to provide feedback or your potential use case.
  • Expand support to other application profiles like DCAT-AP-HVD

To do this, and ensure that CKAN becomes the best metadata and data catalog for extensive DCAT support, we need your help! We need to hear from data publishers and consumers, from various jurisdictions on how you use DCAT, how your adoption of DCAT v3 is evolving, and how we can enable you with solid support in CKAN.

As part of the ongoing initiative funded by POSE to strengthen the CKAN community, we are planning on fostering a community of practice around CKAN and DCAT. In the meantime, if you are interested in this work, have questions or want to provide feedback feel free to reach out by creating a discussion on GitHub.