Blog

CKAN Vision Update

  • James Gardner
  • 20 Apr 2021
For the last 6 months the CKAN team has been expanding and growing, focusing on catalog interoperability and metadata revisioning/moderation technology amongst other features. At the excellent OKCon in Berlin we really crystallised our mission. Below I'm going to set out our plans and roadmap for the next 6 months. Key things we took from the conference:
  • CKAN is now firmly part of people's thinking about open data
  • This space is rapidly evolving and demand is growing for tools like CKAN
  • That we can't take for granted that everyone will see that its natural -- and necessary -- to build an open data platform using open tools and software
At the Open Knowledge Foundation we firmly believe that it is only possible to fully realise the goals of open data if the tools and infrastructure to collect, clean up, publish, revise, improve and understand data are free to share and improve on too. CKAN, our Data Hub software, is already a key tool but we can do much more to make it attractive and useful to both institutions and ordinary data wranglers. We also know that we want to develop our collaboration with other organizations working in the software and open knowledge space, in particular potential partners who could deliver CKAN as part of their offering (pleas get in touch at info@ckan.org if that is you!). To help support this goal we also want to see a shift of focus in the CKAN team over the next 6 months to deliver value to two different sets of people in particular:
  • Governments, councils, universities and any other organsisations who will want to run their own data catalog
  • End users who either want to get hold of some key data or who otherwise want to engage in a data community interpreting data
At the moment having a single CKAN has meant we haven't been focusing enough on either group. We recognise that the site that existed at ckan.net didn't have the necessary polish on either the user interface or the social activity features it needed and the CKAN software we deploy to countries and cities doesn't integrate as well as we'd like with the existing infrastructure those countries and cities have. We're going to address these issues by creating one flavour of CKAN specifically for governments, universities, cities etc (it is proposed these will be called portal instances) and one CKAN instance that is designed to cater specifically to data users (this will be called the data hub). In summary our plan going forward is:

CKAN Portals

The reality is that the portal owners we already work with (from governments, cities, etc) very rarely install CKAN standalone. More often, CKAN is integrated into their existing Wordpress, Drupal, Java or other content management system infrastructure. CKAN has good facilities for integration with such systems but we know we can do even better. One of the pressures on the current CKAN team is that different clients usually require some degree of customisation work and often the work is specific to the business practices of those clients. As a result the code we produce for them doesn't necessarily benefit the wider CKAN community. The more time we spend building client-specific customisations, the less time we have to put together the free software platform we'd like to see the world's open data built on. There is an opportunity for the CKAN team here though too. If we can provide a fantastic integration API for the core products we see portal owners using, we can work with the commercial partners who deliver those portals to help them integrate CKAN. That would mean the commercial partners could deal with the client-specific customisations they are expert in and the CKAN team can concentrate on delivering the core features that would benefit all those partners. The most well-known instance of CKAN is <http://data.gov.uk/> which already runs on a Drupal frontend. The Cabinet Office in the UK have agreed to open source the Drupal code under a BSD license so we'll start by building top class integration with the current data.gov.uk Drupal codebase, then aiming to support both Drupal 6 and 7 over the next couple of months. You can see the Drupal CKAN page here. Once we've demonstrated that Drupal and CKAN can integrate well, we plan to go further and adopt Drupal as the official CKAN front-end for CKAN portals. After all, there's no point in the CKAN team duplicating coding work building a front-end in Python if we already have a great one in Drupal. There are other communities in Spain and Germany who are also interested in building a Drupal-CKAN portal and who may be able to contribute to the effort, and there are still others who have their own Drupal data tools which we may be able to build it too so this seems a win-win situation. After all, CKAN's aim is to ensure that open data is set free, not to ensure that a Python-only solution is used for open data catalogs. The knowledge we gain from formal Drupal integration will also help us integrate with other CMSs so that CKAN can become the standard catalog plugin for anyone wanting an open data system. Having CKAN used as part of many existing systems is one thing, but the real value comes from when you can link them up. That's where the hub comes in so let's look at that next.

Linking up data in The Data Hub

Users of CKAN have always been confused between ckan.net (the data website) and ckan.org (the software project site). The CKAN team have always wanted ckan.net be the central hub for all open data but if users get confused by the name we won't do that very well. To begin to address the problem I'm pleased to confirm that after discussion on the CKAN discuss list the ckan.net site has been renamed to The Data Hub and can now be accessed at thedatahub.org. As part of the change you can also expect a vastly improved user experience over the next 6 months, now that we can focus thedatahub.org specifically on end users and not on our portal owners. Having an improved name and user interface is one thing, but the true power of the CKAN software will only be realised if each CKAN instance can be connected up with others. We already have interoperability functionality delivered in a number of ways, but most notably with CKAN's harvesting functionality. We want to see a world where larger CKAN portals integrate data from smaller child portals and in turn thedatahub.org pulls in data from all CKANs so that end users can find or submit data easily without needing to know about the existence of the original portals. We plan to implement functionality that allows any comments, notes, improvements or clean-ups made to a dataset on any CKAN instance to be federated to all CKAN instances with a copy of that dataset so that we build a truly distributed network of people working on data. This in turn would mean that portal owners who choose to use CKAN as a plugin to their platform will automatically get access to an entire community of data curators, wranglers, analysts and users who will collaboratively improve the data the individuals host.

Example

As an example, consider this case. CKAN currently runs the Manchester data catalog and the UK data portal and may soon and may soon be running a Welsh catalog too. You can imagine a world where users in Wales enter data on wales.data.gov.uk according to the organisational structure of the Wales Assembly Government, users in Manchester enter data on datagm.org.uk according to the structures of the Manchester council and everyone else enters data directly on data.gov.uk according to the departmental structure of the UK government. The data.gov.uk portal would then "pull in" datasets from both Manchester and Wales so that the data didn't need entering twice. On the data.gov.uk portal people would be able to find data for the whole of the UK, even though some of it came from Wales and some from Manchester. CKAN also hosts a site called publicdata.eu where we pull in datasets from all over Europe, including data.gov.uk. You can imagine that publicdata.eu could then itself be pulled into thedatahub.org so that we have a complete hierarchy of datasets all being pulled into each other. Now imagine what would happen if a Welsh user left a comment in the Welsh portal. That comment could also be sent to all of data.gov.uk, publicdata.eu and thedatahub.org (since all hold a copy of the original record). The data.gov.uk portal and thedatahub.org may be keen to publish Welsh comments and include a copy on the dataset page in the UK portal. The publicdata.eu site might decide that its target users aren't interested in comments so would ignore the notification. It isn't just the Welsh portal that could generate comments though. Someone could leave a comment on thedatahub.org too and that would be sent through the network too. And of course it isn't just comments, these notifications could be sent for lots of different activity.

In Conclusion

What we therefore want to build is a powerful, distributed set of nodes that link dataset metadata together (much like the current web does for HTML documents) so that all users can share in, and contribute to a free world of open data. I hope that is something you are as excited to be involved in as I am.