Support Us

CKAN meets The DataTank

Pieter Colpaert - December 18, 2013 in Extensions

Today, we at OKF Belgium are thrilled to announce a CKAN extension, integrating DataTank functionality into CKAN.

The DataTank is a data adapter for machine-readable data. It can take data in any of numerous formats, from a CSV file to a SPARQL endpoint, and create an HTTP interface on top of it. This interface is a REST API which lets you read the data in a different format, page it, and query it. The latest version of the DataTank was released two weeks ago.

We asked ourselves: what if we could combine its power with CKAN, a great data registry which stores metadata for all kinds of data (machine-readable or not), with great search functionality and integrated data storage? This new CKAN extension is the answer.

To use the extension, you need to be running both The DataTank and CKAN. When you add a JSON or XML resource (file) to CKAN, it will automatically be added to your DataTank instance, making it instantly usable by app developers. The DataTank also can also export metadata using DCAT.

We’re currently working on more, smarter integration of The DataTank into CKAN. We want to extend CKAN’s “add dataset” interface to allow the user to add extra information about a file (for example, whether there is a header in a CSV file), which will be added to the DataTank’s discovery document. All help is welcomed in developing this further! If you can code in Python, know how to create extra fields in CKAN and know how to call an HTTP API, you’ll love contributing.

In the longer term, the Datatank has some more features in the pipeline: reintroducing SPECTQL, a query language allowing API sources to be filtered and queried that was developed for an earlier version of TDT, having automatic mappings from machine-readable data of which the model is known to RDF using tdt/streamingrdfmapper, analytics on top of the usage, and so on.

We’d love for more people to get involved in the project. Here are some suggestions:

We look forward to hearing from you!

Please help to translate CKAN 2.2

Mark Wainwright - December 13, 2013 in Releases

Wondering how to spend the holiday season? How would you like to help to make the next version of CKAN available in more languages?

CKAN, the world’s leading open-source data portal software, is available in over 30 languages – mainly because of the work of volunteers helping to translate it. The next release will be CKAN 2.2. The strings to be translated for this version have now been uploaded to Transifex:

https://www.transifex.com/projects/p/ckan/resource/2-2/

If you can help with any of the translations, please head over there. If you’re not already on the translation team for your favourite language, you can sign up / log in to Transifex and visit the team page for details of how to join it.

The release won’t be finalised until the new year, so you have until 6 January to get it finished. Special kudos to the Danish team, who finished their translation before the announcement was made, only an hour and a quarter after the new strings were uploaded earlier today!

CKAN for research data management: a round-up from St Andrews

Mark Wainwright - November 28, 2013 in Deployments, News

A new blog post from Birgit Plietzsch at St Andrews University provides an interesting survey of projects using CKAN for research data management projects. St Andrews themselves have a pilot project in this area, and Dr Plietzsch had solicited input from other projects on the ‘ckan4rdm’ mailing list. The post summarises the responses she received.

It’s noticeable that there are now quite a few RDM projects using CKAN in production environments or in pilots, most of them newcomers since the CKAN4RDM workshop earlier this year. Another project in the area that didn’t make it into the St Andrews round-up is EDaWaX, subject of a recent post on this blog.

If you’re interested in using CKAN in a research data management setting it is worth joining the ckan4rdm list. (It is a low-traffic list), and maybe sending it a note introducing yourself and your plans in the area.

Business developer wanted for world leading open data publication software team

Gavin Chait - November 25, 2013 in Jobs

If you’re a business developer with some IT experience in the public sector who’d like to help build exciting open data projects then we’d love to hear from you.

The Open Knowledge Foundation is recruiting a full-time, UK-based, business developer with experience of IT project management to join our main Services infrastructure team to develop new market and sales opportunities in the public sector. Some project management will also be required to implement contracts.

Our team provides high quality professional and technical services to clients around the world to help achieve our vision of a world empowered by open knowledge. We implement this through support of infrastructure development and capacity-building for open knowledge data sites for governments, institutions and organisations.

CKAN, our open data publication platform, is an open source web-based software project written in Python. It allows users to submit, search for and find open data. CKAN is the catalogue behind the UK government’s data.gov.uk and the US data.gov. It also powers over 60 other data catalogues around the world.

If you have a background in IT, the public sector, and business development, with a keen interest in open data, and enjoy working on open-source products, we’d love to hear from you.

You’d be part of a team:

  • developing and concluding business opportunities for deploying and developing open data software infrastructure, training and consulting projects for governments and organisations;
  • developing and managing our network of professional services partners;
  • supporting project management and delivery of software deployments, training and consulting engagements.

Requirements

  • At least 2-3 years’ business development experience;
  • Self-management in a business development IT environment;
  • Business-to-business and / or public-sector business development experience;
  • Ability to write winning business proposals and RFPs;
  • Brilliant communication skills both verbal and written;
  • Some experience in IT project management;
  • Enthusiasm about open data and open knowledge.

How to apply

Email us at jobs@okfn.org with the subject line ‘Business Developer – CKAN/Services’, and a copy of your CV by 15 December 2013.

Being based in and around London would be a plus but we are happy to consider applications from elsewhere in the UK.

Remuneration is commensurate with ability and will be based on a target and commission.

About the Open Knowledge Foundation

The Open Knowledge Foundation is a multi-award winning community-based not-for-profit organisation. We build tools, provides advice and develops communities in the area of open knowledge: data, content and information which can be freely shared and used. We believe that by creating an open knowledge commons we can make a significant contribution to improving governance, research and the economy.

Our world-leading software platform, CKAN (http://ckan.org), is a powerful open source data management system that makes data accessible and usable – by streamlining publishing, sharing, finding and using data. As well as harvesting, cataloguing, and advanced searching, it can store data and provides rich data APIs, visualization and exploration tools. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open, available and usable.

We’re changing the world by promoting a global shift towards more open ways of working in government, arts, sciences and much more.

Senior Python developer wanted for world leading open data publication software team

Gavin Chait - October 28, 2013 in Jobs

If you are an outstanding Python web developer with sound JavaScript skills, a keen interest in Open Data, and enjoy working on open-source products, we’d love to hear from you.

The Open Knowledge Foundation is recruiting a senior full time Python web developer, primarily to work on CKAN, our open-source Open Data web portal. The successful applicant will also be required to develop JavaScript-based data visualisations and other data-driven services.

Our team provides high quality professional and technical services to clients around the world to help achieve our vision of a world empowered by open knowledge. We implement this through support of infrastructure development and capacity-building for open knowledge data sites for governments, institutions and organisations.

CKAN, our open data publication platform, is an open source web-based software project written in Python. It allows users to submit, search for and find open data. As well as powering the US government’s data.gov website, CKAN is also behind over 60 other data catalogues around the world.

You’d be involved in

  • Customising CKAN for different governments and organisations;
  • Helping develop new features for CKAN;
  • Working on other Python and JavaScript programming projects in the OKF.

Requirements

  • Web app development experience in Python and Javascript;
  • PostgreSQL, Linux, Git (essential);
  • Experience in Solr, Pylons, Bootstrap, jQuery or CSS a plus;
  • Enthusiasm about Open Source, Open Data and Open Knowledge.

The Open Knowledge Foundation is a virtual, distributed organisation with team members working remotely on four continents. The team meets occasionally through the year face-to-face, especially for strategic planning and review. This role is a remote / home-based working position, full-time and UK-based.

How to apply

Email us your CV and a cover letter describing your interest in this role to jobs@okfn.org, with the subject line “Senior python developer – CKAN/Services”, by 22nd November 2013.

About the Open Knowledge Foundation

The Open Knowledge Foundation (OKF) is an internationally recognized non-profit working to open knowledge and see it used to empower and improve the lives of citizens around the world. We build tools, provide advice and develop communities in the area of open knowledge: data, content and information which can be freely shared and used. We believe that by creating an open knowledge commons we can make a significant contribution to improving governance, research and the economy.

The last two years have seen rapid growth in our activities, increasing our annual revenue to £2m and our team to over 35 across four continents. We are a virtual organisation with the whole team working remotely, although we have informal clusters in London, Cambridge and Berlin.

The OKF is an international leader in its field and has extensive experience in building open source tools and communities around open material. The Foundation’s software development work includes some of the most innovative and widely acclaimed projects in the area. For example, its CKAN project is the world’s leading open source data portal platform – used by data.gov, data.gov.uk, the European Commission’s open data portal, and numerous national, regional and local portals from Austria to Brazil. The award winning OpenSpending project enables users to explore over 13 million government spending transactions from around the world. It has an active global network which includes Working Groups and Local Groups in dozens of countries – including groups, ambassadors and partners in 21 of Europe’s 27 Member States.

We’re changing the world by promoting a global shift towards more open ways of working in government, arts, sciences and much more.

Project CKAN

Irina Bolychevsky - October 18, 2013 in News, Roadmap

Over the last few years CKAN has seen impressive growth in technology, uptake, number of deployments and in the vendor and developer communities. It is now the basis of dozens of major sites around the world, including national data portals in the UK, US, Canada, Brazil, Australia, Germany, Austria and Norway. Once, almost all core CKAN development was done by the Open Knowledge Foundation; now, there are an increasing number of developers and providers, deploying, customising and working with CKAN.

We believe that, as with many open-source projects when they achieve a certain size, the time has come to bring some more structure to the community of CKAN developers and users. By doing so we aim to provide a solid foundation for the future growth of the project, and to more explicitly empower its growing array of stakeholders.

We are therefore proposing to create an independent, self-governed CKAN project at the Open Knowledge Foundation, separate from our own CKAN developments and offerings, to guide the future development and direction of the software. The main proposed actions are:

  • To establish a steering group and advisory board to oversee the project and represent the growing number of stakeholders.
  • To establish specific groups or teams to look after specific areas; in particular, a “technical group” to oversee technical development and a “content and outreach group” to oversee materials (including project website) and to drive community and user engagement.
  • To establish a membership model for stakeholders to support the long-term sustainability of the project.

The project will still have its formal institutional home at the Open Knowledge Foundation, and enjoy support and participation from our CKAN team. But it will be autonomous and will have its own independent governance, from a board drawn from major CKAN stakeholders. The Open Knowledge Foundation will continue to contribute at all levels, but this approach will allow others – from government users to suppliers of CKAN services – to have a formal role in the future development and direction of CKAN.

Over the next couple of weeks we will be introducing a new structure for development (how to become a core contributor etc) and governance (steering committee and supporting ckan.org as a member) and we would love to hear your ideas and feedback. Please either get in touch or place ideas in this open project ckan document and watch this space for more posts soon!

Geospatial update: MapBox, pycsw, and CKAN at FOSS4G

Adrià Mercader - October 12, 2013 in Extensions, Feature, News, Presentations

There has been a lot of recent work on CKAN’s popular and widely used spatial extension. The extension adds a spatial field to datasets and allows spatial metadata to be harvested from a variety of sources, queried (including with a map search), and exposed using the CSW standard for geospatial discovery. It also provides map previews for spatial data formats such as GeoJSON.

In this post I’ll describe two major new features, an overhaul and expansion of the documentation, and a recent presentation at FOSS4G, the open-source geospatial conference.

Support for MapBox and other tiles

The spatial extension displays maps for various purposes. By default, these uses MapQuest-OSM tiles, based on OpenStreetMap data and provided by MapQuest.

However, users wanting to customize the default maps for their own instances can now use map tiles hosted on MapBox. MapBox makes it really easy to create beautiful custom maps, using their online editor or the more advanced TileMill desktop tool. To use your own tiles, you just need to set a couple of configuration options to enjoy their handcrafted maps. You can see the MapBox tiles in action in the screenshot of a file preview below, or on our demo site, and all details to set it up yourself can be found in the documentation here.

[Image: map preview]

Support for custom tiles is not limited to MapBox: any tileset that follows the XYZ convention for web maps can be used on the widgets of your CKAN instance, even Stamen’s famous watercolor maps, as in the example below of map-based search filtering:

[Image: watercolour tiles]

Integration with pycsw

CKAN offers support for the CSW standard, a specification from the Open Geospatial Consortium for exposing geospatial catalogues over the web. CKAN can both harvest remote CSW servers, and expose its own records via a CSW interface. Until now, the latter has been done via a custom plugin, which provided a very limited subset of the specification and was quite flaky.

To improve that we have added features to integrate with pycsw, an excellent open source Python library that provides a full CSW implementation. At present only datasets harvested from other spatial sources can be exposed via pycsw, but hopefully more general support will be added in the future.

To find out more about CKAN and CSW, look at the documentation.

New and revamped documentation

The documentation had outgrown the README file and was hard to follow, so we moved it into proper Sphinx docs hosted online with the core CKAN docs. The full geospatial extension docs are now at docs.ckan.org/projects/ckanext-spatial.

This wasn’t just a cosmetic change: all the documentation has been restructured, cleaned up and updated, so hopefully topics like installation, geo-indexing datasets and the spatial harvesters are now clearer and easier to set up.

We aim to do a similar job soon updating the documentation for the spatial extension’s favourite companion, the harvester extension. As usual, any suggestions or pull requests to improve the documentation are very welcome.

CKAN at FOSS4G

FOSS4G is the main international conference for open source geospatial software. It is organized by different local chapters of OSGeo, an organization that promotes, incubates and fosters communities around free geo-related projects.

This year we dropped in to Nottigham in the UK to present CKAN at FOSS4G for the first time ever. Although not as popular as other veteran catalogues like GeoNetwork or GeoNode, CKAN is starting to be well known in the geospatial community, and there was a lot of interest in potential features and integration with other tools.

Sadly the presentation doesn’t seem to have been recorded, but you can check out the slides here, or download them as PDF (5.5 Mb).

Partner profile: Liip, Switzerland

Mark Wainwright - October 9, 2013 in Deployments, Partners

The Open Knowledge Foundation’s CKAN Professional Partnership Programme means that governments and other users all over the world can get paid support from a certified local provider, and with access to the core development team if necessary. This post is the first of a series on current CKAN partners.


Liip AG is a web development company based in Switzerland, which does large-scale, high-quality projects in a range of areas, including e-commerce, online learning, mobile – and, of course, Open Data. Their first big project as a CKAN Partner is opendata.admin.ch, the federal Open Data portal for Switzerland. The site, which Liip developed together with five government agencies and the Open Government Data consultancy itopia, was officially launched on 16 September at OKCon in Geneva.

[Image: opendata.admin.ch]

Switzerland’s new Open Data portal, opendata.admin.ch

The current site is a pilot, produced for the Federal Archive, and experience from using it will guide the future development of Open Data in Switzerland. It is hoped that it will foster economic growth as well as government transparency and efficiency. A study commissioned by the Federal Archive concluded that open government data in Switzerland had the potential to be worth over a billion Euros a year in economic growth.

At present the site has over 1600 datasets, including regional boundaries, demographics, election data, weather data, and more. Much of the data is harvested from a range of government bodies, such as the Federal Statistical Office and the Meteorological Office. To this end Liip wrote a number of custom harvesters to extract the datasets from different existing information systems, using CKAN’s harvesting infrastructure. To make the system easily and robustly scalable when other data providers – such as cities and cantons – join in future, they designed an architecture with a central CKAN installation harvesting from two satellite installations, which themselves harvest from the other systems.

As well as the custom harvesters, they also wrote a number of other custom extensions to adjust the look and feel of the site. Like CKAN itself, all their extensions are openly licensed under the Affero Gnu Public License (AGPL), and they have been involved in contributions to the core code, particularly in the area of CKAN’s multilingual capability – essential in a country like Switzerland with four national languages.

At the moment Liip is integrating datasets and webservices of two offices of the canton of Zurich into the federal pilot portal, and helping the city of Zurich to migrate their current open government data portal to a state-of-the-art solution using CKAN.

opendata.admin.ch marks a significant milestone in Open Data in Switzerland. Only a week before its launch, the National Council (the lower house of Switzerland’s parliament) voted by a large majority in favour of an ‘Open Government Data masterplan’. Hopefully we will be hearing much more of Swiss open data in the future.

EDaWaX: Choosing CKAN for managing research data

Hendrik Bunke - September 25, 2013 in Deployments, Extensions, Feature

This is a guest post by Hendrik Bunke of the EDaWaX project, cross posted from the project blog. EDaWaX is a German project which aims to greatly increase the amount of research data in Economics that is made open.


One aim of EDaWaX is to develop and implement a web-platform prototype for a publication-related research data archive. We’ve chosen CKAN – an open source data portal platform – as basis for this prototype.

This post describes the reasons for this decision and tries to give some insights into CKAN, its features and technology. We’ll also discuss these features both in regard to our special use case and to suitability for research data management in general.

Before you proceed, it might be useful to have at least a short look at an article that covers a similar topic and does it far more extensively than this blog post. It’s written by Joss Winn and is titled Open Data and the Academy: An Evaluation of CKAN for Research Data Management. The paper was made available on GoogleDrive so others could comment or even add to the article. I’m also mentioning Joss’ paper to show that there is already an ongoing discussion – for example in this mailing list thread – about how to adapt CKAN, which is at the moment mainly used for government data, for research data management.

This post focuses on our special EDaWaX perspective and does also provide some more technical introduction (installation, writing extensions, using the API etc.). In addition we describe our own CKAN extensions that add basic theme customisations and custom metadata fields.

We hope this will be useful for those who are looking for a decent solution for a research data repository and have heard only a little or (most probably) nothing at all about CKAN yet.

EDaWaX criteria for research data archive software

We won’t go into detail about the EDaWaX project here. In short, EDaWaX is looking for ways to publish and curate research data in economics. Our focus is on publication-related data, meaning especially the data that authors of journal papers have used for their articles. One objective of the project is the development of a data archive for journals using an integral approach.

Our projected web application should demonstrate some features that the EDaWaX studies revealed to be important for replication purposes. We evaluated several software packages dealing with data publishing and had only a few, very general but fundamental, criteria for the software:

  • Open Source: This is a fundamental principle for us, but there are also practical reasons for this. We want to be able to modify and extend the software, and we would like to share our extensions.

  • API (reading and writing): This is quite important for a modular and flexible infrastructure. We also want to provide integration packages for other systems (CMS or special e-journal software). We think that research data must not just be stored in, perhaps closed, ‘data-silos’, but should be accessible and reusable as much as possible. An API opens up a lot more possibilities for this purpose.

  • Simple User Interface: We are mainly targeting authors and editorial offices who don’t have the time, resources and know-how to learn and use complicated UIs and workflows. This is also important for lowering the general barriers for publishing research data.

  • RDF metadata representation: We are aware that this might be a somewhat ‘avant-garde’ criterion. But we predict that it will be more and more important in the near future to have a general, linkable and machine-readable metadata interface, so our research data can be used and adopted as widely as possible.

The main ‘opponents’ of CKAN in this small ‘contest’ were Dataverse and Nesstar. But while both are well established platforms dedicated especially to research data (which CKAN is not), neither met most of our criteria. Nesstar is proprietary, not open software, you have to pay for the server component, and the only way to upload data is the use of the so called ‘Publisher’, a Windows-only client. That’s a no-go for us. Dataverse’s main problem compared to CKAN (besides the fact that it is an unfriendly Java-beast of software ;-) is the lack of a decent API. There is at least now a reading API (since March 2012), but you cannot use it to upload data. So, in the end there was no question which software we would choose: CKAN. Let’s see in detail why.

What is CKAN?

CKAN — an abbreviation for “Comprehensive Knowledge Archive Network”, which does not exactly describe its actual use-cases today — is an open-source (check) web platform for publishing and sharing data. Written in Python, it offers a simple, nice looking, and very friendly user interface (check) and provides by default a RDF metadata representation for each dataset (check). The feature that, in our view, makes CKAN really outstanding is its API, which allows access to nearly every function of the system including writing and deleting of datasets (check; more on that later).

CKAN has many, many more features that could be listed here, including harvesting, data visualisation and preview, full-text and fuzzy search, and faceting. It is widely used, mainly in the field of open government data, where it has become a de facto standard software package. CKAN powers the national open data portals of the UK, the USA, Australia or the EU, to name only a few well-known examples. The CKAN website has an impressive list with all known production instances.

The very active development of CKAN is led and organised by the Open Knowledge Foundation (OKF). There are around ten people at OKF who are mainly working on CKAN, most of them developers, and in addition at least 30-40 developers are contributing actively.

If you want to contribute to CKAN or develop an extension you should subscribe to the developer mailing list. There’s also a general discussion list, and if you want to use CKAN for research data management (and in the end that’s what this article is all about) please immediately subscribe to the quite newly established and already mentioned list ckan4rdm.

CKAN’s source code can be found at github. It is written very cleanly (forced by clear coding standards). As a reasonably experienced Python developer you won’t have major difficulties in understanding the code.

If you just want to test the front end, i.e. the user interface of CKAN, you don’t have to install it yourself. OKF is running the public open data portal datahub.io, where you can register and upload data. The portal is not only for testing purposes. For example, all the RDF data of the Linked Data cloud is registered there. There’s also a ‘pure’ demo site that gives you a first impression.

However, if you’re really considering CKAN to power your open data portal you should of course install it yourself.

Installation

Before you start to install and use CKAN please have a look at its extensive and excellent documentation. We will only give some hints here, to give you a starter.

If you have decided to give CKAN a try, you have two install options. If your machine runs a 64-bit Ubuntu 12.04, you can try to install all needed packages via apt-get. The package installation will do all necessary basic configurations for you, so it might be more convenient. However, this method also involves some lack of flexibility, so we would not recommend it. Moreover, if you may want to develop your own CKAN extension (more on that later) or use another OS platform, you must use the second method and install CKAN from source. I didn’t find that to be too difficult and, again, if you have some experience as developer it will be no real problem. In this scenario CKAN will be installed via git and pip in a virtualenv, which you will most probably already be familiar with. The application then is ran with paster. Under the hood, CKAN uses the Python framework Pylons (which has now merged with another framework and is called Pyramid; but that’s another story).

In addition to the core package of CKAN, you will have to install and configure some packages that CKAN requires. Nothing fancy here, though. CKAN uses PostgreSQL as its database, and for searching and indexing it relies on Solr, which involves the installation of a Java JDK and a Java application server like Jetty or Tomcat. It’s worth mentioning that you can run CKAN without Solr, but you’ll lose a lot of advanced search functionality like faceting for instance. The same goes for the database. Besides the almighty PostgreSQL you can also use the lightweight SQLite. This is quite handy for testing purposes or for development, but not recommended or supported for production installations.

API

As mentioned before, we think it’s the API that makes CKAN really outstanding. “All of a CKAN website’s core functionality (everything you can do with the web interface and more) can be used by external code that calls the CKAN API”, as the documentation states. And that’s true. You can

  • get all sorts of lists for packages, groups or tags;

  • get a full metadata representation of any dataset or resource (which is the actual data or file);

  • do all the kinds of searches you can do with the web interface;

  • create, update and delete datasets, resources and other objects. I’m emphasizing this because it’s really a killer feature, which enables you to develop your very own application based on API calls to an external CKAN installation. It makes, for example, mobile apps possible. Or you can write plugins for your local CMS, journal system or whatever.

From a programmer’s perspective this is just great, great, great. And even with our focus, open research data management, it enables a lot more usage scenarios than a simple web portal with a closed, proprietary database would do.

And in fact, it is quite easy to use the API. There are client libraries for any common web programming language (Python, Java, Perl, PHP, Javascript, Ruby), so you don’t need to write the basic functions on your own. A very simple Python script like the one below is sufficient to upload a file to a CKAN instance:

import ckanclient

CKAN = ckanclient.CkanClient(api_key=<your_CKAN_api_key>,
    base_location=<url_of_CKAN_instance/api>)
upmsg = CKAN.upload_file(<your_local_filename>)
print upmsg #this is not necessary ;-)

For demonstration and testing purposes we’ve developed a small sample application. It was built with the Pyramid framework and can completely manage the datasets of a certain group at an external CKAN instance. The demo pics show the list of the packages and a form to create a new dataset. Since this instance is for developing and evaluation purposes only it’s not public, but hopefully the pics will give you a first impression of what’s possible.

Custom application using the API: list of packages

Custom application using the API: list of packages

add_dataset

Pyramid app: add_form

It’s worth mentioning that, of course, writing your own application around the CKAN API also allows you to simply add features CKAN might not have. So, for instance, the little red X mark at the right side of all packages (screenshot #1) enables a direct deletion of the package. That’s something CKAN’s UI does not offer by default.

OK, you’re saying, I got it, the API is great. But I don’t want to program an external application. I just want to stick with the original platform, but I need a different look, and even more special functionalities. So, is CKAN extensible?

Short answer: Yes.

Long Answer: Writing Extensions

Adding a custom theme or functionality is done with so called extensions. CKAN extensions are ordinary Python packages containing one or more plugin classes. You can create them with paster in your virtual environment.

paster create -t ckanext ckanext-mycustomextension

Note that you must use the prefix ckanext-, for otherwise CKAN won’t detect and load your package. You then have to install it as a develop package in the Python path of your virtual environment. That’s done the usual way with

cd <path_to_your_extension>

python setup.py develop

or even

pip install -e <path_to_your_extension>

Please refer to the docs for a detailed description on writing extensions. Basically you use the so called PluginToolkit and a whole bunch of interface classes with which you can hook into CKAN core functionality with your own code. You will most probably also need to overwrite some Jinja templates, especially if you want to create a new look for your portal.

CKAN provides some basic example extensions that will quickly give you a rough understanding of how the plugin mechanism works. In addition there are many (many!) CKAN extensions already available. You can browse them at github.

So, what are the extensions we are developing for EDaWaX?

EDaWaX extensions and implementations

Basically we are working on two extensions for EDaWaX at the moment. The first one, called ckanext-edawax, is mainly for the UI. It tweaks some templates and UI elements (logo, fonts, colors etc.). In addition it removes elements we do not need at the moment, like ‘groups’ or facet fields, and it renames the default ‘Organizations’ to ‘Journals’, since this is our only type of organization and we’d like to reflect this focus. We will also add new elements, like proposed citation in a dataset view. You can get the idea of the prototype with these screenshots.

edawax_frontpage

EDaWaX custom frontpage

edawax_datasets

EDaWaX datasets view

edawax_single

EDaWaX single dataset view

Our second package, ckanext-dara, relates to metadata. CKAN offers only a kind of general and limited set of metadata for datasets (like title, description, author), that does not reflect any common schema. You can, nevertheless, add arbitrary fields via the webinterface for each dataset. But that’s not schema based. The approach of CKAN here is to avoid extensive metadata forms, that might restrict the usability of the portal, and also not to specialise on certain types or categories of data, like, you name it, research data. Dedicated research data applications like Dataverse do have an advantage here. Dataverse’s metadata forms are based on the well-known and very extensive DDI schema. CKAN is not originally a research data management tool, and the lack of decent metadata schema support is one point where this hurts. However, this more general approach as well as the plugin infrastructure (a feature that Dataverse does not offer, AFAIK) enables us to customise the dataset forms, add specific (meta‑)data, and to guarantee compatibility with a given schema. For EDaWaX this will be the da|ra schema, which itself is partially based on the well-known, but less complex data-cite schema. The German based da|ra is basically a DOI registration service for social science and economic data. Since we will automatically register DOIs for our datasets in the CKAN portal with da|ra it makes perfectly sense that we use their schema (which we must do anyway when submitting our metadata).

ckanext-dara is the CKAN-extension where all the metadata functionality as well as the DOI registration will be added. It is also planned to publish this package as Open Source on github. So far the development has concentrated on extending the standard CKAN dataset forms with da|ra specific metadata. The problem here is the conflict between usability and the aspiration to get as much metadata as possible. You know, we are working in a library. For librarians metadata is important. Very important. You could say that librarians think in metadata. We want every single detail of an object to be described in metadata, if possible in very cryptic metadata. Since metadata schemas are often (if not always) created by librarians, they tend to be kind of exhaustive. The current da|ra schema knows more than 100 elements, which is only a small set compared to DDI which knows up to 846 elements. Now please imagine you’re a scientist or the editor of a scientific journal who is asked to upload research data to our CKAN based platform. Would you like to see a form that asks for ~100 metadata elements? You certainly wouldn’t. Chances are good that you would immediately (and perhaps cursing) leave the site and forget about this open data thing for the next two decades or so.

To deal with this conflict we are following a twofold approach. First, we are dividing the da|ra metadata schema into three levels reflecting the necessity. Level 1 contains very basic metadata elements that are mandatory, like title, main author, or the publication year. These fields of level 1 correspond to the mandatory fields of the DataCite metadata schema. For this level we only need two or three new fields in addition to the ones that are already implemented in CKAN. Level 2 contains elements that are necessary (or perhaps better: desirable) for the searchability of the dataset or special properties of it. And finally Level 3 reflects the special metadata we need to ensure future reuse by integrating authority files. By integrating these authority files it should be possible to link persons to their affiliations, to articles, research data, to keywords or  to special fields of research.

Second, we will try to integrate these different levels of metadata as seamlessly as possible into CKAN’s UI. The general idea behind this is to give users the choice of which metadata functionalities they would like to equip their data with. To achieve this we collapse the subforms for level 2 and level 3 in the dataset form with a little help from jquery. The following screenshots give you an idea. Please note that this is an early stage of development and nothing’s finalised yet. We have not implemented level 3 yet. However, you can see that the form for level 2 (as well as later for level 3) is collapsed by default, so the “quick’n'dirty” user won’t have to deal with it if she does not want to. We are still thinking, however, about the motivation/information text and its presentation.

edawax_addform

ckanext-dara addform

edawax add 2

ckanext-dara addform extended

Journal CMS add-on

In addition to building the opendata portal itself we are planning to develop an add-on for an existing E-Journal, using the CKAN API. This will be done for the CMS Plone, which is the base of the E-Journal ‘Economics’. It should mainly be a testcase for usability of the CKAN API for editorial offices. Editors of ‘Economics’ have some experience with Dataverse (and are not always happy with it) so we have a very good setting here. Generally we consider integration in third-party systems to be very important for the acceptance of CKAN as a repository for publication-related research data. Users should not be bothered with having to use two (or even more) different systems for data and text. This approach also gives the maximum of integration for data and articles. Dataverse, for example, will develop such functionalities for OJS (Open Journal System), hopefully within the next two years. CKAN has a kind of head start here due to its great API, but I think we need to popularise CKAN in this respect, so this package will be developed as a ‘proof-of-concept’.

Conclusion

At least for our use case in the EDaWaX context, CKAN has proven so far to be the best available solution for an open research data portal. Due to its current focus on open government data, it might show some desiderata regarding the use for research data. CKAN is focused on data publishing, not data curation. This shows up clearly in its very basic support for metadata. But as we’ve shown, CKAN has two fantastic features – the API and the plugin mechanism – that facilitate the development of extensions and third-party apps for use-cases in the field of research data. Development in this direction has started already (not only in our project) and it’s foreseeable that those efforts will be ramped up soon.

So, if publishing of open research data is on your schedule please consider using CKAN and give it a serious try. It’s worth it. If you have any questions, comments or criticism please leave a note in the comments section. Please feel also free to write an email to h.bunke <at> zbw.eu in case you have a more specific question.

Open Data in Bermuda: local developers create Bermuda.io

Mark Wainwright - September 25, 2013 in Deployments, News

Two developers based in Bermuda have launched a new online CKAN-based repository of Bermuda public open data, Bermuda.io.

Louis Galipeau and Andrew Simons (below) took key public documents which were not previously available online, and put them on the site, which they set up for the purpose. Previously, members of the public could consult the documents and data only by looking at hard copies at the Bermuda National Library and Bermuda Archives. With the launch of Bermuda.io, users can now freely view or download them anywhere. Where necessary, documents have been scanned to get an electronic copy.

[Photo: Galipeaau and Simons]

Galipeau and Simons plan to publish a wide range of public data in both human and machine readable formats. They are currently compiling two decades’ worth of financial statements from all government controlled organizations and public funds. The following have already been published, with records going back 20 years:

  • The annual report of the Auditor General
  • The “Budget Book” (Estimates of Revenue and Expenditure for the Year)
  • The audited financial statements of the Consolidated Fund of The Government of Bermuda
  • The Bermuda Digest of Statistics
  • The Census Report

The developers have started with these documents because they go to the heart of the operations of Government. The Budget Book details planned revenue and spending, while the audited financials of the Consolidated Fund show the actual figures. The Auditor General’s report provides an independent opinion on the government’s financial management. Finally, the Digest of Statistics and the Census report contain economic and demographic data which provide important context.

Andrew Simons says the documents will enable people to have more informed discussions and debates. “They answer questions like ‘What’s the revenue from lobster licences each season?’, ‘What school renovations are planned for the following year?’ and ‘How much does a firefighter earn?’” He hopes the site will foster wider civic engagement on the island – and adds that it would not have been possible without CKAN.

The developers welcome feedback and suggestions on the site. Anyone interested can follow the project blog or Twitter account.