Support Us

Some new CKAN data portals

February 14, 2013 in Deployments

As the benefits of Open Data are recognised more and more widely, more data portals are appearing. Here are some recent new platforms with hot fresh data baked in a CKAN oven, as well as a couple that are coming soon. Only about half the portals mentioned here have had direct involvement by the Open Knowledge Foundation, a striking testament to CKAN’s open-source licence.

For a fuller list of CKAN data catalogues around the world, see this page.

Greek Open Data hub

The newest CKAN site is part of the Greek Open Data hub, which was announced today by the Open Knowledge Foundation’s Greek chapter. It includes a CKAN data repository as well as apps and info about the Greek Linked Open Data cloud.

Uruguay

Uruguay’s national open data portal catalogodatos.gub.uy was launched on 5 December. National and local government agencies can add datsets to the catalogue.

Aragon, Spain

Aragón Open Data provides data held by the Spanish region of Aragón. It is published by the Aragón government and went live this month.

Queensland, Australia

The state of Queensland launched the first governmental CKAN portal in Australia, Queensland Government data, in December 2012, with the help of the Open Knowledge Foundation. As the state premier said at the time, “The data held by the Queensland Government is the property of the people of Queensland. Therefore, where it is suitable for release, it will be released.”

publicdata.eu

The recent upgrade to publicdata.eu means that it is currently the only live site running the codebase for CKAN’s forthcoming version 2.0. Read more about the upgrade here.

Coming soon

Many other data publishers are looking at how to implement open data platforms. Here are a couple that have made announcements recently.

USA

As we noted in another post, the US government recently announced the decision to move its Open Data portal data.gov to run CKAN. This will for the first time combine general data in a single catalogue with the geodata previously catalogued at geo.data.gov. The Open Knowledge Foundation is working with the US government on the transition, including building new harvesting capabilities for geodata.

Germany

The German government has announced its intention to launch a national governmental data portal at govdata.de. The portal is being built by Fraunhofer Fokus, whose detailed description of the architecture of the site shows how CKAN will sit at the centre of the portal.

Check out the new-look publicdata.eu

February 5, 2013 in Deployments, Feature, News, Releases

A beta version of CKAN 2.0 will be released soon, but meantime, a sneak preview has come online with a major upgrade of publicdata.eu. The site, run by the Open Knowledge Foundation as part of the LOD2 project, collects records together in one place from a number of data catalogues throughout the European Union. It is the first production site up and running the code that will become CKAN 2.0.

Personalisation

A significant new feature of CKAN 2.0 is its personalisation features, which allow you to ‘follow’ particular datasets, groups, and users. For example, logged-in users see a ‘Follow’ button when viewing a dataset. A dashboard with the activity and updates you’re interested in, as shown. Details of how the personalisation features work are likely to change, and feedback is welcome while we’re working on it.

[IMG: dashboard on publicdata.eu]

Linked data

Ivan Ermilov of the University of Leipzig, another LOD2 partner, has implemented a CSV to RDF converter that is integrated with publicdata.eu. This means that many datasets now have data available in RDF, the native format of Linked Data, in line with the whole aims of the LOD2 project. This addition means that the site does not only collect data from different European catalogues so that they can be searched together, but enhances them with RDF – opening up interesting possibilities for harvesting.

CKAN’s metadata records are themselves available in RDF and on publicdata.eu this is now exposed on the user interface, where each dataset page has buttons offering its metadata in RDF and JSON.

Interface

A major change under the surface is that in CKAN 2.0, the ‘templating’ system has changed, making it much easier to develop custom themes for CKAN-powered sites. publicdata.eu uses a brand-new theme using the new templating. Users might also notice improvements to the UI, with better signposting around the site setting the current page in context, as on the dataset page shown below. The biggest UI change in CKAN 2.0 is to the dataset creation process. Though this is not visible at publicdata.eu, curious readers may try it out at demo.ckan.org.

Each dataset page includes a link to the source from which it was harvested, and there are buttons for sharing the dataset on Google+, Facebook and Twitter.

[IMG: dataset page]

Upgrading the harvesters

Work is in progress on improving publicdata.eu’s harvesting system. This will be completed soon, enabling the harvesting to be brought up-to-date. Please bear with us while this happens and watch this space for more information!

US government’s data.gov to use CKAN

February 4, 2013 in Deployments, News

This post is cross-posted from the main Open Knowledge Foundation blog.

You may have seen hints of it before, but the US government data portal, data.gov, has just announced officially that its next iteration – “data.gov 2.0″ – will incorporate CKAN, the open-source data management system whose development is led and co-ordinated by the Open Knowledge Foundation. The OKF itself is one of the organisations helping to implement the upgrade.

Like all governments, the US collects vast amounts of data in the course of its work. Because of its commitment to Open Data tens of thousands of datasets are openly published through data.gov. The new-look data.gov will be a major enhancement, and will for the first time bring together geospatial data with other kinds of data in one place.

CKAN is fast becoming an industry standard, and the US will become the latest to benefit from its powerful user interface for searching and browsing, rich metadata support, harvesting systems to help ingest data from existing government IT systems, and machine interface, helping developers to find and re-use the data. The partnership is also excellent news for CKAN, which is being improved with enhancements to its features for ingesting and handling geodata.

As it happens, CKAN itself is also moving towards a version 2.0. In fact, after months of hard work, the beta-version of CKAN 2.0 will hopefully be released in a couple of weeks. To keep up to date with developments, follow the CKAN blog or follow @CKANproject on Twitter.

Workshop: CKAN for Research Data Management

January 7, 2013 in Deployments, Events

A workshop in London in February will bring together people who are using or considering CKAN for managing research data. Anyone interested should see the announcement and get in touch with the organiser, Joss Winn.

Although most CKAN installations are for finding government and other official data, its value is increasingly being recognised for other kinds of data – including academic research data. It was recently adopted by Bristol University’s data.bris, who will use it to harvest metadata from existing systems, and provide a searchable interface and fixed URLs for datasets.

The Orbital Project at Lincoln University decided last year to adopt CKAN. The Kaptur project on visual arts research data, in partnership with four higher education institutions, is one of the latest to be evaluating CKAN.

Join CKAN team for Open Government Platform (OGPL) webinar, 19-20th December

December 19, 2012 in Events, News

The CKAN team will be joining a webinar on the Open Government Platform (OGPL), an open source platform for open data and open government.

Jeanne Holme, Evangelist for Data.gov writes:

We’ve been working hard on the Open Government Platform (OGPL), an open source capability for open data and open government around the world. This has been an active collaboration with the National Infomatics Centre of the Government of India and the US Government Data.gov team. The decision to move to an open source platform has been both challenging and rewarding.

As with any open source capability, the code is only as strong as the community around it. We are getting close to releasing the first complete package of OGPL and would like to get your ideas, feedback, and commits before we proceed. To help with this, we will be holding two information webinars this week (Wednesday and Thursday, December 19 and 20, 2012), and have updated the code and documentation on Github.

We’re delighted that CKAN will be part of the OGPL (watch this space for further details!). An agenda and further details for how to join the calls are available here.

CKAN in EC Open Source speech

December 19, 2012 in Events, News

Neelie Kroes, European Commissioner for Digital Agenda, proudly namechecked CKAN in a video address to an Open Source Conference in Amsterdam last week. She referred to the Open Data Portal being developed for the European Commission, she said:

We are building a portal for open data – so citizens can get a wealth of Commission data in one place, easy to find, easy to search, and easy to use and re-use. [...] Not only that, but our portal will be based entirely on open source solutions. It uses the CKAN system, built in Europe, that many other governments are also using: including the UK and Australia, and now under consideration by the US and Canada.

As Kroes mentions, the US and Canada have been considering CKAN for some time. Excitingly, both have decided in its favour, and have CKAN data portals due to launch in the spring.

We are hiring!

November 15, 2012 in Jobs

If you’re a Python web-developer who’d like to help build exciting open data projects then we’d love to hear from you.

Python Developers – CKAN/Services team

Role Description

The OKFN is recruiting junior and senior Python web developers, primarily to work on CKAN, our open source, open data catalogue. There will also be further responsibilities to develop data visualisation and other data-driven services.

CKAN is an open source web-based product written in Python. It allows users to submit, search for and find open data. As well as powering thedatahub.org, CKAN is the catalogue behind the UK government’s data.gov.uk website. It also powers over 30 other catalogues around the world. If you are an outstanding Python web developer, with a keen interest in open data and enjoy working on open-source products, we’d love to hear from you.

You’d be involved in:

  • customising CKAN for different governments and organisations;
  • helping develop new features for CKAN;
  • supporting existing software deployments;
  • working on other Python programming projects in the OKFN.

Note: We are interested in hearing from people who are available both full and part time.

Requirements

Essential:

  • Web app development experience in Python;
  • PostgreSQL;
  • Linux (Ubuntu/Redhat);
  • Git;
  • Enthusiasm about open data and open knowledge.

Bonus points for any of these (not essential though):

  • Solr;
  • Knowledge of the CKAN codebase and extensions;
  • Project management/consultancy experience;
  • Semantic web/RDF;
  • jQuery and CSS;
  • S3 and EC2;
  • Experience with Agile methods;
  • Knowledge of the geo-spatial community.

About the Open Knowledge Foundation

The Open Knowledge Foundation (OKFN) is a multi-award winning community-based, not-for-profit organisation. The Foundation now has projects and partnerships throughout the world and is especially active in Europe. We build tools and communities to create, use and share open knowledge – content and data that everyone can use, share and build on. We believe that by creating an open knowledge commons and developing tools and communities around this we can make a significant contribution to improving governance, research and the economy.

We’re changing the world by promoting a global shift towards more open ways of working in government, arts, sciences and much more. We don’t just talk about ideas, we deliver extraordinary software, events and publications.

How to apply

If you’d like to apply, please email jobs@okfn.org with the subject line “Python Developer – CKAN/Services” and a copy of your CV by Friday 23rd November and we’ll take it from there. We are flexible on employee versus contractor but we normally contract. We are also flexible on full or part time. Being based in and around London would be a plus but we are happy to consider applications from elsewhere.

 

Introducing the new Datastore

October 26, 2012 in Feature, News

CKAN’s new Datastore has been in development for a while. It’s now finally been released with the latest version of CKAN, version 1.8. We’re excited about the Datastore and we hope you will be, too. Let’s have a look at what it offers to both data publishers and developers.

Overview

A CKAN instance can act as a registry for data, storing and serving metadata like title, URL, publisher, etc. It can also store the data, using the Filestore and Datastore. While the Filestore stores entire files, the Datastore provides a database for structured storage of data together with a powerful API that allows easy create, read, update and delete operations.

A large data source is of limited use if not in structured form (such as a table); any application that uses the data needs it to have structure. The Datastore preserves the structure in your data and puts it behind a database, enabling queries and updates in situ. An application can query the data without needing to download the whole data file first – especially useful where the file is very large. One application that uses the Datastore is CKAN’s built-in previewer using Recline, which plots graphs and maps of tabular data.

For publishers

recline previewing data from the DataStore

A preview of data from the DataStore in recline.

The new version of the DataStore uses a full database for your data, unlike the old version which simply indexed it in a search engine. This means there is now a much more powerful machine interface (API) to the data (see below). For example, you can choose to update or add data points individually, rather than re-uploading an entire file.

An important improvement in searching is that queries can connect different resources together – greatly extending the possibilities for using your data, especially if you have used standard names (or Linked Data URIs) to identify the subjects of your data. For example, suppose you have some data indexed by country, using standard ISO country codes(GB for Great Britain, DE for Germany and so on). If there is another dataset also broken down by country code, we can easily combine the two and create a dataset that shows country-level patterns across the two data sources. The result will update itself automatically when one of the source resources changes.

In summary, the new DataStore helps ensure your data can be used easily in as many ways and by as many people as possible – including you.

For developers / data hackers

The datastore API returns JSON

The datastore API returns JSON

Let’s have a quick look at how you might use the Datastore. The API is described fully in the docs. While its create and update options are good news for publishers, developers will be most interested in the search and query capabilities. The new version of the Datastore uses an underlying PostgreSQL database, which enabled us to build a very powerful API. As described in the documentation, there are three different API endpoints for searching. We’ll look at each in turn.

SQL endpoint

The most powerful API endpoint is the datastore_search_sql endpoint. This allows arbitrary SQL queries on the Datastore. Results are still returned as JSON, which you can easily use in your application. We tried to make the wrapper around the database as thin as possible; as a result, the SQL endpoint gives you full control over your query. You can filter and aggregate data from a resource, or combine it with other resources using joins, as in the example above. Joining and SQL search are the most powerful features of the Datastore for developers who want to work with the data, and we hope people will come up with many uses for them.

HTSQL endpoint

As an alternative to using pure SQL, there is an endpoint for HTSQL. HTSQL is an easy to use, SQL-like language which you can use directly in a browser’s location bar. At present the HTSQL endpoint does not allow joining resources, but we are working on a way to do that as well.

Search endpoint

If you don’t need the full power of the SQL (or HTSQL) endpoint, you can use the datastore_search endpoint. It allows exact matching of certain fields via filters, or searching via the query parameter. The query parameter allows you to use PostgreSQL’s full text search, which lets you search across all fields of a resource and returns a ranking of the results. To use full PostgreSQL text search, set the plain parameter to False. This enables queries as described in the PostgreSQL docs. Again, we tried to make the wrapper as thin as we could to give you as much control as possible.

Interested? Check out the documentation on how to get started. And why not use the DataStore API and data from the DataHub at your next hack day? The DataHub is a public CKAN instance – create a group there for your event and upload data in advance.

As usual, please write to the dev list with any questions, and we’ll be happy to help.

CKAN 1.8 released

October 22, 2012 in Feature, Releases

The CKAN team is pleased to announce the release of its new major version, CKAN 1.8. Thanks to all the team and external contributors who have made it possible!

This release includes new features like the ability to follow users and datasets, and visualize their activity on a dashboard, and a completely revamped version of the Datastore. The new version of the Datastore is powered by PostgreSQL, which avoids having to install extra dependencies and allows us to support more powerful features, like a full SQL query interface. Users of the previous version of the Datastore based on ElasticSearch can contact the mailing list for details on how to migrate their data to the new version.

Apart from new features, we are working hard on internal refactorings to make the CKAN code base more maintainable and provide a more solid integration for extensions, making it easier to build them and upgrade them between CKAN versions. Some of these changes have been included in this release: have a look at the “API changes and deprecation” section in the CHANGELOG to see if you need to update any existing extensions.

We have also made available version 1.7.2, which contains important bug fixes for those users that want to stay on the CKAN 1.7 line.

Please refer to the documentation for instructions on how to install or upgrade CKAN, either via packages or from source, and feel free to contact the mailing list or the IRC channel if you have any questions.

v1.8 2012-10-19

Note: This version requires a requirements upgrade on source installations

Note: This version requires a database upgrade

Note: This version does not require a Solr schema upgrade

Major

  • New ‘follow’ feature that allows logged in users to follow other users or datasets (#2304)
  • New user dashboard that shows an activity stream of all the datasets and users you are following. Thanks to Sven R. Kunze for his work on this (#2305)
  • New version of the Datastore. It has been completely rewritten to use PostgreSQL as backend, it is more stable and fast and supports SQL queries (#2733)
  • Clean up and simplifyng of CKAN’s dependencies and source install instructions. Ubuntu 12.04 is now supported for source installs (#2428, #2592)
  • Big speed improvements when indexing datasets (#2788)
  • New action API reference docs, which individually document each function and its arguments and return values (#2345)
  • Updated translations, added Japanese and Korean translations

Minor

  • Add source install upgrade docs (#2757)
  • Mark more strings for translation (#2770)
  • Allow sort ordering of dataset listings on group pages (#2842)
  • Reenable simple search option (#2844)
  • Editing organization removes all datasets (#2845)
  • Accessibility enhancements on templates

Bug fixes

  • Fix for relative url being used when doing file upload to local storage
  • Various fixes on IGroupFrom (#2750)
  • Fix group dataset sort (#2722)
  • Fix adding existing datasets to organizations (#2843)
  • Fix 500 error in related controller (#2856)
  • Fix for non-open licenses appearing open
  • Editing organization removes all datasets (#2845)

API changes and deprecation

  • Template helper functions are now restricted by default. By default only those helper functions listed in lib.helpers.allowed_functions are available to templates. The full functions can still be made available by setting ckan.restrict_template_vars = false in your ini file. Only restricted functions will be allowed in future versions of CKAN.
  • Deprecated functions related to the old faceting data structure have been removed: helpers.py:facet_items(), facets.html:facet_sidebar(), facets.html:facet_list_items(). Internal use of the old facets datastructure (attached to the context, c.facets) has been superseded by use of the improved facet data structure, c.search_facets. The old data structure is still available on c.facets, but is deprecated, and will be removed in future versions. (#2313)

v1.7.2 2012-10-19

Minor

  • Documentation enhancements regarding file uploads

Bug fixes

  • Fixes for licences i18n
  • Remove sensitive data from user dict (#2784)
  • Fix bug in feeds controller (#2869)
  • Show dataset author and maintainer names even if they have no emails
  • Fix URLs for some Amazon buckets
  • Other minor fixes

News from the CKAN team 8 Oct 2012

October 9, 2012 in Feature, News, Releases

There have been some changes of plan and a lot of work behind the scenes since the last round-up of CKAN news back in July. Major areas of work include the new Datastore and significant overhauls to CKAN’s web user interface. The release of CKAN 1.8 has been delayed, in order that it can include the new Datastore. Some details are below.

New Datastore

CKAN’s Datastore provides structured storage and querying of data, and underpins CKAN’s previews as well as other custom-made applications to access and process data. In current releases of CKAN, the Datastore is built on elasticsearch, an open-source search engine which was quick to build and is good for providing full-text search of a data resource. However, elasticsearch is limited in what it can do, and we have wanted for a while to replace it with a full database-backed system. We’ve now had a chance to spend some time on this – David has built a new Datastore based on PostgreSQL, and Dominik has been adding improvements and working on the migrations.

The new Datastore will have a full SQL search API, enabling more complicated queries, including queries that join data across different data tables. (We’ve also found some reliability problems with elasticsearch which it will hopefully solve.)

Version 2.0

The new Datastore is one of several major changes in progress to the code base. As a result work has started on a code branch for version 2.0 to reflect the fact that it will be a significant change. The changes include the new Jinja templating system (and a resulting change to all the default templates), which will not cause much visible change for users, but will make life much easier for anyone wanting to run their own customised version of CKAN. Legacy support for the old templates will be included, so that instances with existing customisations can keep them until they switch to the new system.

New UI

CKAN’s default user interface hasn’t changed significantly in 5 years and sadly needed attention. That changed dramatically for the better recently when the team gained a front end developer – first Aron and now JohnM – who have worked with Toby on improving the UI. The results can be seen on the CKAN demo site. They will be integrated into CKAN’s default UI in version 2.0.

Organizations and authorization

CKAN 2.0 will also include a new access authorization system. At its heart will be Organizations, a new feature to enable a smoother workflow for data publishers. An Organization will be a collection of datasets and users, so that a dataset can only be changed by users in the relevant Organization. (For a community instance like the DataHub, all users will belong to a default Organization.)

Where’s 1.8?

The release of CKAN 1.8 has been delayed, in order that it can include the new Datastore. To anyone who’s been waiting eagerly for 1.8, apologies! On reflection, we decided that for existing sites running 1.7.1 to upgrade now, only to have to do another upgrade shortly when the new DataStore is added, would be an unnecessary effort for little gain. The good news is that the Datatore has now been added to the 1.8 branch and will be undergoing testing over the next couple of weeks. An added complication is that sites using the old elasticsearch Datastore will need a way to migrate to the new one.

Comings and goings

Our first front end developer Aron left recently, and John Martin has joined the team to replace him. Dominik, a university student from Potsdam who joined us as an intern for the summer, will be going back to college soon, but we hope to be seeing more of him! We also said farewell recently to Ross, who made a great contribution during his time on the team as a developer, and Toby too is imminently off to pastures new. All the best to them from the CKAN team.