Support Us

CKAN 2.6.0 released, patch versions for 2.3.x, 2.4.x and 2.5.x available

Adrià Mercader - November 2, 2016 in Releases

We are happy to announce that CKAN 2.6.0 is now released. In addition, new patch releases for older versions of CKAN are now available to download and install.

CKAN 2.6

The 2.6.0 release includes improvements on how private datasets are shown on search results, as well as several other minor improvements and over 50 bug fixes. You can check all individual changes on the CHANGELOG. Thank you very much to all the almost 30 community members that have submitted patches since the last release.

If you have customizations or extensions, we suggest you trial the upgrade first in a test environment and refer to the changes in the changelog. Upgrade instructions are below. As there aren’t many major changes since the last version upgrading should be relatively straight-forward.

Note that as previously announced, starting from this version, CKAN requires at least Python 2.7 and Postgres 9.2.

CKAN patch releases

These new patch releases for CKAN 2.3.x, 2.4.x and 2.5.x (2.3.5, 2.4.4 and 2.5.3) fix important bugs and security issues, so users are strongly encouraged to upgrade to the latest patch release for the CKAN version they are using.

For a list of the fixes included you can check the CHANGELOG.

Upgrading

For details on how to upgrade, see the following links depending on your install method: Upgrading CKAN.

If you find any issue, you can let the technical team know in the mailing list or the IRC channel.

 

Link Digital’s Enterprise CKAN Stack for AWS is Now Available on GitHub

Steven De Costa - October 13, 2016 in Deployments, Featured, Partners

As part of the commitment made at the White House Open Data Roundtable, Datashades, also trading as Link Digital, has recently released the preview of an Enterprise CKAN Stack for AWS.

The stack presents Link Digital’s best practice, with independently scalable layers, easily adapted to CI workflows and automated system maintenance. It is now freely available to use on our Datashades GitHub repository.

This OpsWorks stack has been in active use by Link Digital and presents a basis on which Link Digital builds and supports its Government Open Data platforms. Hence, the project can justly be called “eating your own dog food”.

Even now that there is a number of improvements in progress, we believe that the newly-published alpha version of the project will add value to the Public Data community.

To build an OpsWorks stack you will need these CloudFormation templates.
When entering parameters for the CloudFormation template you will need the following cookbook URL for the OpsWorks stack.

Steven De Costa at the IODC CKAN Booth

Steven De Costa at the IODC CKAN Booth

A longer monologue from a dev list discussion:

Attaching our high level architecture using RDS on AWS — for UAT and PROD: appendix_8_updated_aws-hosting-environment-2.

CloudFormation scripts for building out CKAN in a HA config can be found at https://github.com/DataShades/ckan-aws-templates

OpWorks version is here: https://github.com/DataShades/opswx-ckan-cookbook

Happy to collaborate on this and make it shine brighter :)

There are a few other relevant scripts under our datashades set of repos, such as the ASG one here: https://github.com/DataShades/updateasg

And, the general cloud storage one here: https://github.com/DataShades/ckanext-cloudstorage

And the S3 related one here: https://github.com/DataShades/ckanext-s3filestore

We’ve also improved the SSO approach with Saml2: https://github.com/DataShades/ckanext-saml2

And, begun some work for manipulating ACLs, which is important for private dataset resources you’d want to switch to ‘public’ when published: https://github.com/DataShades/ckanext-acl

Although not formally part of the CKAN roadmap I have a working model of where I’d like CKAN to head when it comes to enterprise file/data storage and access. If you are familiar with the concept of resource views then the idea I’m keen to pursue is similar. It is a concept of resource containers (not para-virtualization containers but storage or access point containers). The idea is to make CKAN extendable via extensions of a type that allow it to do more orchestration around how data is stored and made usable below the discovery layer of the metadata.

The story would be something like:
As a platform operator, I need to be able to configure a variety of storage and access endpoint possibilities, so that custodians can select where data is placed based on type of data or business need.

Resource container extensions would then be built to accommodate things like:

  1. Big data, transnational data feeds
  2. Semantic lakes
  3. Large file storage blobs
  4. Self declarative structured data (likely using data packaging/frictionless data)
  5. For cost auditing and accountability – storage into specified paid cloud accounts (different AWS, Azure, etc. accounts based on organisation)

I would image that resource view and resource container extensions would be paired in many cases to allow for the view to provide greater access and control of the data to provide an ability to query and extract insights from the data.

The European Data Portal has around 650k datasets. It is true that once a CKAN portal gets to such a size then it can be a chore to do anything over the entire set of data in quick time. However, with the entire catalog readable via API there is a place for other tools to come into the picture to provide meta analysis or broader views over all data in a portal.

CKAN’s structure allows for data ownership and custodianship to remain flexible as the governing entities change over time. If we keen those functions lightweight and build the more intensive data processing tasks within a resource container layer then I think that is the big win :) I see datastore and filestore as examples of resource containers. Datapusher is an example of an ETL that works with datastore but similar tools and concepts can be worked into the model and the open source goodness can grow organically to meet lots of different organisational needs.

Where CKAN differs from other portal software, in my experience, is that it can be used for open Government data, research data, private sector data and ‘data as knowledge’ in virtually any situation. Other portal software appears to be built around capturing a particular market opportunity to generate data as knowledge for a particular customer segment – civic hackers, jurisdictional bureaucrats, open data policy implementations, etc.

CKAN’s harvesting is good, but certainly not perfect. The approach for pushing from CKAN to elsewhere is likely to be used more in our future work, or as we refactor the architecture of current implementations. See: https://github.com/DataShades/ckanext-syndicate

By using multiple CKAN environments it is pretty easy to have catalogs of ‘working data’ that then push to the ‘published data’ catalog. We use this approach for Government open data when from the bottom up you have agency data collected into CKAN based information asset registers. Sometimes the data doesn’t even exist, but the data management plan can at least first be registered prior to populating the dataset with resources. Once the data is ready it can then be published and syndicated upward to a higher level jurisdictional portal – such as a council, city, state or province. Similarly such datasets can then be syndicated upward again into a national or regional portal – perhaps with further ETL functions put in place to combine the similarly structured data from multiple agencies into a master dataset that presents a larger view of the entire data collection effort.

If the domain of data collection differs, such as in a field of research, then the same architecture can still apply. Multiple research schools of chemistry, for example, could publish working data locally then syndicate upward into a global repository that allows for meta analysis of all research outcomes over the entire domain’s efforts. We’re working on a project in just this manner that is referenced here: http://linkdigital.com.au/news/2016/09/building-mdbox-an-open-access-simulation-data-repository-on-ckan-and-aws

Lastly, published open data is the result of effort which is put into a process of data collection and, usually, some analysis and clean up. The tools used to process data, to prepare, collect or visulise are all part of the value a dataset represents. To bridge data and code we’ve released a very simple resource view for GitHub repositories that can be found here: https://github.com/DataShades/ckanext-githubrepopreview 

Open Government initiatives are formed around principles of transparency, participation and collaboration. There is a desire to enable public-private collaboration over the long term and there is a role for Government to act as impresario to stimulate new markets and economic activity from publishing open data (ref: https://www.nesta.org.uk/sites/default/files/government_as_impresario.pdf). The reason we built the GitHub resource view is to encourage open source projects to emerge in connection to public datasets, via linking the opportunity for discovery of helpful code with the discovery of helpful datasets.

Sorry for the long monologue! I could have more succinctly just said CKAN rocks, check out all the open source goodness surrounding it and jump in :)

Registration Open for CKANCon and Call for Speakers Closing This Friday

Steven De Costa - August 18, 2016 in Featured

We are less than two months from CKANCon 2016, an official pre-event of this year’s International Open Data Conference, taking place in Madrid October 4!
As our community continues to grow rapidly, CKANCon will be a great opportunity to learn more about what others are doing with CKAN, and how you can use it in your organization.
We’ve had significant interest in speaking opportunities for this year’s event which is wonderful to see! Many speaking applications have come in the past few days so we are extending the deadline for speaker requests to this Friday, August 19. If you are interested in speaking, please fill out the CKANCon Speaker Request form before 12:00 a.m. EST this Friday.
Finally, we are happy to announce that registration for the event is now open! You can register today for both in person and online participation!
Looking forward to seeing everyone soon!

Join us for CKANCon 2016!

ashleycasovan - July 14, 2016 in Association, Community, Events, Feature, Featured

Join us in Madrid, Spain on October 4th, for CKANCon 2016, one of the official International Open Data Conference Pre-Events.

UPDATE: We are happy to announce that registration for the event is now open! You can register today for both in person and online participation!

CKANCon is a day packed with talks and discussions showcasing the incredible work people are doing with CKAN. This includes topics ranging from uses and best CKAN practices to technical services and new extensions. New, long-standing, and future CKAN users are encouraged to attend. Full details, including speakers and breakout sessions, will be announced soon.

If you’re interested in showcasing your CKAN work, please email ckancon@ckan.org! We are looking for speakers to give short talks about upcoming features, extensions, integrations and anything else CKAN.

CKAN-Gif_v4

CKAN patch releases 2.3.4, 2.4.3 and 2.5.2 now available

Adrià Mercader - March 31, 2016 in Releases

The CKAN team is happy to announce that the new patch releases for CKAN 2.3.x, 2.4.x and 2.5.x are now available to download and install.

These patch releases fix important bugs and security issues, so users are strongly encouraged to upgrade to the latest patch release for the CKAN version they are using.

Patch release upgrades are very straight-forward and do not contain any backwards incompatible changes or involve any change in the requirements, database or Solr schema.

As stated in the releases policy, the latest patch release is the only one officially supported.

For details on how to upgrade, see the following links depending on your install method:

If you find any problem, let us know in the mailing list or the IRC channel.

CKAN based source code of the European Data Portal now available

wendyc - March 20, 2016 in Deployments

The European Data Portal version 1 was released on 18 February 2016. Today, the source code of the CKAN based European Data Portal and its extensions are released on GitLab. So, what extensions were made to CKAN?

What can I find on the European Data Portal?

Let’s start with some background information to understand the function of the portal and its components. The primary purpose of the European Data Portal is to harvest the metadata of national, regional and local Open Data portals and act as a single point of access to all Open Data available across Europe. The number of harvested datasets increased from 240,000 in November 2015 to over 415,000 datasets today. All metadata is made available in 6 languages: English, French, German, Italian, Polish and Spanish.

Next to the harvesting process, the portal offers more information around providing data to support public bodies in releasing more data. The Goldbook provides an overview of everything you need to know as a data holder who wants to start publishing data. You can also find an explanation on how a portal can be harvested by the European Data Portal. The section using data underlines the benefits of re-using Open Data as well as a checklist illustrating the key steps you need to go through before using data. Share your story and take part in our survey to raise further awareness at public sector level for the release of more data. Finally, the Library contains a huge amount of additional training material, use cases, reports and so on.

In a nutshell, the Portal contains metadata, training material, reports, use cases and all, but how does the CKAN platform integrated into the Portal really work?

The CKAN extensions of the European Data Portal

In order to integrate the required functionalities into CKAN, an extension was specifically developed. This extension provides several aspects of the overall concept. Most notable are the support for multilingual metadata, the implementation of the DCAT application profile for data portals in Europe (DCAT-AP) standard and the synchronisation with a Linked Data triplestore.

The multilingual feature which enables the current search functionality in six different languages was realised by adding an additional metadata field, which holds the translations for arbitrary languages for each respective metadata field. Fields of datasets and resources are taken into account. When rendering the views, the current language setting is considered for serving the appropriate translation. This feature is used to integrate CKAN into an external service, where all relevant metadata is automatically translated into multiple languages by a machine translation service.

The CKAN core schema was considerably extended to fully support DCAT-AP, which provides a RDF vocabulary for specifying public datasets. Therefore, a mapping from the vocabulary to the JSON-style schema of CKAN was designed and implemented. In addition, a mechanism was added to automatically replicate the metadata into a Virtuoso triplestore. By doing so, all datasets are directly available via a SPARQL endpoint. Besides the availability as JSON and HTML, every dataset can be served as RDF.

The activities for extending CKAN in the context of the European Data Portal are in line with similar efforts of the CKAN community to enhance the software further towards Linked Data and multilingual support (see extensions ckanext-fluent and ckanext-dcat). The examples given are just some initiatives that will be further developed in the near future.

Access the Source Code on GitLab!

Authors: Wendy Carrara, Eva van Steenbergen and Fabian Kirstein on behalf of the European Data Portal

CKAN extensions Archiver and QA upgraded

davidread - January 27, 2016 in Data Quality, Extensions

Popular CKAN extensions ‘Archiver’ and ‘QA’ have recently been significantly upgraded. Now it is relatively simple to add automatic broken link checking and 5 stars of openness grading to any CKAN site. At a time when many open data portals suffer from quality problems, adding these reports make it easy to identify the problems and get credit when they are resolved.

Whilst these extensions have been around for a few years, most of the development has been on forks, whilst the core has been languishing. In the past couple of months there has been a big push to merge all the efforts from US (data.gov), Finland, Greece, Slovakia and Netherlands, and particularly those from UK (data.gov.uk), into core. It’s been a big leap forward in functionality. Now installers no longer need to customize templates – you get details of broken links and 5 stars shown on every dataset simply by installing and configuring the extensions. And now we’re all on the same page, it means we can work together better from now on.

ckanext-qa ckanext-archiver

The Archiver Extension regularly tries out all datasets’ data links to see if they are still working. File URLs that do work are downloaded and the user is offered the ‘cached’ copy. Otherwise, URLs that are broken are marked in red and listed in a report. See more: ckanext-archiver repo, docs and demo images

The QA Extension analyses the data files that Archiver has downloaded to reliably determine their format – CSV, XLS, PDF, etc, rather than trusting the format that the publisher has said they are. This information is combined with the data license and whether the data is currently accessible to give a rating out of 5 according to Tim Berners-Lee’s 5 Stars of Openness. A file that has no open licence, or is not available gets 0 stars. If it passes those tests but is only a PDF then it gets 1 star. A machine-readable but proprietry format like XLS gets it 2 stars, and an open format like CSV gets it 3 stars. 4 and 5 star data is that which uses standard schemas and references other datasets, which tends to mean RDF. See ckanext-qa repo, docs and demo images

Code of Conduct

Adrià Mercader - January 27, 2016 in Association, Community

As the CKAN community grows and includes more people from various backgrounds it seems like a good time to adopt a Code of Conduct that will ensure it remains a welcoming place for everybody.

The Code of Conduct can be accessed on the main CKAN repository:

https://github.com/ckan/ckan/blob/master/CONDUCT.rst

Rather than trying to come up with a useful one ourselves we have
adopted one based on The Open Code of Conduct.

As stated on the code, if you feel this has been breached you can
contact conduct at ckan.org. This currently forwards to the members of
the tech team.

As ever, feel free to send us any comments or feedback.

CKAN 2.5 released, patch versions for 2.0.x, 2.1.x, 2.2.x, 2.3.x and 2.4.x available

davidread - December 17, 2015 in Releases

We are happy to announce that CKAN 2.5 is now released. In addition, new patch releases for older versions of CKAN are now available to download and install.

CKAN 2.5

The 2.5 release (actually 2.5.1 as we skipped 2.5.0) offers speed improvements to the home page, searching and several other key pages and API. In addition, CKAN extensions can provide language translations in a more integrated way. And it’s now easy to customize the file uploader to suit using different cloud providers. 2.5 also includes plenty of other improvements contributed by the CKAN developer community during the past 4 months, as detailed in the CHANGELOG.

If you have customizations or extensions, we suggest you trial the upgrade first in a test environment and refer to the changes in the changelog. Upgrade instructions are below.

CKAN patch releases

These new patch releases for CKAN 2.0.x, 2.1.x, 2.2.x, 2.3.x and 2.4.x fix important bugs and security issues, so users are strongly encouraged to upgrade to the latest patch release for the CKAN version they are using.

For a list of the fixes included you can check the CHANGELOG.

Upgrading

For details on how to upgrade, see the following links depending on your install method:

If you find any issue, you can let the technical team know in the mailing list or the IRC channel.

 

ESRC Consumer Data Research Centre

alexsingleton - October 28, 2015 in Deployments

Integral to our activities as part of the ESRC Consumer Data Research Centre, we spent the summer working on a project that would create a searchable catalogue of the various data holdings that we are assembling, including retailer data that we have negotiated access to, but also a wealth of value added open data products. The site is available here: data.cdrc.ac.uk

One aspect that we were especially pleased with is the introduction of data stores for each local authority in the UK. These all have a separate URL for their own datastore; so, Liverpool could be found here for example: data.cdrc.ac.uk/lad/liverpool

We do not believe in simple replication of data sources available elsewhere, and we have added value to each open data deposit by reengineering these into new formats that are optimized for simple analysis, and which we hope are going to limit barriers to entry. As of 5/10/15 we created 8,738k separate data items for a very wide variety of topics.

Not every local authority have resources to create their own datastore, and for those which do, we hope that what we have created will be complementary. We have also linked many of the outputs through to our mapping interface which is available here: maps.cdrc.ac.uk

Some Technical Bits….

Given the location of this blog post; for the development we used the CKAN platform as this was open source and was widely used in those other data stores that we were familiar with. Off the shelf we have however made some considerable customisation.

The infrastructure we used to develop the CKAN included developing on Docker images for all the services that the CKAN relies upon, including a service management and configuration system. We also were dealing with multiple uploads that had been created using either R, Python and PostGIS, so we also scripted a bulk dataset uploading tool.

Some specific customisation:

For the products/topics/LADs/National/Regional search tabs:

  • Added support for filters based on products/topics/LADs
  • Added groups/labels for Open/Safeguarded/Secure datasets.
  • Added an interactive map on the front page (based on our maps.cdrc.ac.uk platform)
  • Added a Twitter feed
  • Add a blog proxy to a WordPress blog aggregator
  • Add download tracking
  • Improve efficiency of the CKAN code associated for the group listing
  • Add a geojson preview on the dataset pages
  • Prevented non-logged in users downloading – unfortunately we need to have this functionality to provide usage data to our funding body (sorry!)
  • Add system-wide notification messages
  • Add Google Analytic tracking code

Data blog

Customize a WP theme Cerauno to fit into the CKAN

Other additions

  • A plugin was developed to improve the user registration form (https://github.com/esrc-cdrc/ckan-ckanext-userextra)
  • Add checkboxes for newsletter options and a dropdown menu for sectors
  • Customized the metadata with a third-party plugin ckanext-schemin
  • Added a commenting system with a third-party plugin ckanext-ytpcomments
  • Improved the user experience of the commenting system by changing the look and feel, and allowed in-place commenting and editing.

We also did some Major bugfixes/improvements to the CKAN including:

  • Fixed the tracking system (broken by latest releases of CKAN)
  • Fixed the type system for groups

Besides these, there were various other small bugfix/improvements on the CKAN and third-part plugins.

We hope that these and our continuing contributions have been of use, and that you enjoy our data store: data.cdrc.ac.uk

Thanks in particular go to the hard work of Data Scientists Wen Li, Hai Nguyen and Michail Pavlis who have spent much of summer working on this project.