CKAN based source code of the European Data Portal now available
The European Data Portal version 1 was released on 18 February 2016. Today, the source code of the CKAN based European Data Portal and its extensions are released on GitLab. So, what extensions were made to CKAN?
What can I find on the European Data Portal?
Let’s start with some background information to understand the function of the portal and its components. The primary purpose of the European Data Portal is to harvest the metadata of national, regional and local Open Data portals and act as a single point of access to all Open Data available across Europe. The number of harvested datasets increased from 240,000 in November 2015 to over 415,000 datasets today. All metadata is made available in 6 languages: English, French, German, Italian, Polish and Spanish.
Next to the harvesting process, the portal offers more information around providing data to support public bodies in releasing more data. The Goldbook provides an overview of everything you need to know as a data holder who wants to start publishing data. You can also find an explanation on how a portal can be harvested by the European Data Portal. The section using data underlines the benefits of re-using Open Data as well as a checklist illustrating the key steps you need to go through before using data. Share your story and take part in our survey to raise further awareness at public sector level for the release of more data. Finally, the Library contains a huge amount of additional training material, use cases, reports and so on.
In a nutshell, the Portal contains metadata, training material, reports, use cases and all, but how does the CKAN platform integrated into the Portal really work?
The CKAN extensions of the European Data Portal
In order to integrate the required functionalities into CKAN, an extension was specifically developed. This extension provides several aspects of the overall concept. Most notable are the support for multilingual metadata, the implementation of the DCAT application profile for data portals in Europe (DCAT-AP) standard and the synchronisation with a Linked Data triplestore.
The multilingual feature which enables the current search functionality in six different languages was realised by adding an additional metadata field, which holds the translations for arbitrary languages for each respective metadata field. Fields of datasets and resources are taken into account. When rendering the views, the current language setting is considered for serving the appropriate translation. This feature is used to integrate CKAN into an external service, where all relevant metadata is automatically translated into multiple languages by a machine translation service.
The CKAN core schema was considerably extended to fully support DCAT-AP, which provides a RDF vocabulary for specifying public datasets. Therefore, a mapping from the vocabulary to the JSON-style schema of CKAN was designed and implemented. In addition, a mechanism was added to automatically replicate the metadata into a Virtuoso triplestore. By doing so, all datasets are directly available via a SPARQL endpoint. Besides the availability as JSON and HTML, every dataset can be served as RDF.
The activities for extending CKAN in the context of the European Data Portal are in line with similar efforts of the CKAN community to enhance the software further towards Linked Data and multilingual support (see extensions ckanext-fluent and ckanext-dcat). The examples given are just some initiatives that will be further developed in the near future.