Support Us

You are browsing the archive for Feature.

Join us for CKANCon 2016!

ashleycasovan - July 14, 2016 in Association, Community, Events, Feature, Featured

Join us in Madrid, Spain on October 4th, for CKANCon 2016, one of the official International Open Data Conference Pre-Events.

UPDATE: We are happy to announce that registration for the event is now open! You can register today for both in person and online participation!

CKANCon is a day packed with talks and discussions showcasing the incredible work people are doing with CKAN. This includes topics ranging from uses and best CKAN practices to technical services and new extensions. New, long-standing, and future CKAN users are encouraged to attend. Full details, including speakers and breakout sessions, will be announced soon.

If you’re interested in showcasing your CKAN work, please email ckancon@ckan.org! We are looking for speakers to give short talks about upcoming features, extensions, integrations and anything else CKAN.

CKAN-Gif_v4

Implementing VectorTiles Preview of Geodata on HDX

chadhendrix - September 16, 2015 in Extensions, Feature, Visualization

This post is modified version of a post on the HDX blog.  It is modified here to highlight information of most interest to the CKAN community.  You can see the original post here.

Humanitarian data is almost always inherently geographic. Even the data in a simple CSV file will generally correspond to some piece of geography: a country, a district, a town, a bridge, or a hospital, for example.

HDX has built on CKAN’s preview capabilities with the ability to preview large (up to 500MB) vector geographic datasets in a variety of formats.  Resources uploaded (or linked) to HDX with the format strings ‘geojson’, ‘zipped shapefile’, or ‘kml’ will trigger the creation of a geo preview. Here is an example showing administrative boundaries for Colombia:


image00

To minimize bandwidth in the interest of often poorly-connected field locations, we built the preview from vector tiles. This means that details are removed at small scales but will reappear as you zoom in.

The preview is created only for the first layer it encounters in a resource. If the resource contains multiple layers, the others will not show up. For those cases, you can create separate resources for each layer and they will be available in the preview. Multiple geometry types (polygon + line, for example) in kml or geojson are not yet supported.

Implementation

It’s a common problem in interactive mapping: to preview the whole geographic dataset, we would need to send all of the data to the browser, but that can require a long download or even crash the browser. The classic solution is to use a set of pre-rendered map tiles — static map images made for different zoom levels and cut into tiny pieces called tiles.  The browser has to load only a few of these pieces for any given view of the map. However, because they are just raster images, the user cannot interact with them in any advanced way.

We wanted to maintain interactivity with the data, eventually having hover effects or allowing users to customize styling, so we knew that we needed a different approach. We reached out to our friends at Geonode who pointed us to the recently developed Vector Tiles Specification.

The vector tile solution is a similar approach to traditional map tiles, but instead of creating static image tiles, it involves cutting the geodata layer into small tiles of vector data. Each zoom level receives a simplification (level of detail, or LoD) pass, which reduces the number of vertices displayed, similar to the way that 3D video games or simulators reduce the number of polygons in distant objects to improve performance. This means that for any given zoom level and location, the browser needs to download only the vertices necessary to fill the map.  You can learn more about how vector tiles work in this helpful FOSS4G NA talk from earlier this year.

Because vector tiles are a somewhat-new technology, there wasn’t any off-the-shelf framework to let us integrate them with our CKAN instance. Instead, we built a custom solution from several existing components (along with our own integration code):

Our architecture looks like this:

image03

The GISRestLayer orchestrates the entire process by notifying each component when there is a task to do. It then informs CKAN when the task is complete, and a dataset has a geo preview available.  It can take a minute or longer to generate the preview, so the asynchronous approach — managed through Redis Queue (RQ) — was essential to let our users continue to work while the process is running. A special HDX team member, Geodata Preview Bot, is used to make the changes to CKAN. This makes the nature of the activity on the dataset clear to our users.

Future development

This approach gives HDX a good foundation for adding new geodata features in the future. We will be conducting research to understand what users think is important to add next. Here are some initial new-feature ideas:

  • Automatically generate additional download formats so that every geodataset is available in zipped shapefile, GeoJSON, KML, etc.
  • Allow the contributing user to specify the order of the resources in the map legend (and therefore which one appears by default).
  • Allow users to preview multiple datasets on the same map for comparison.
  • Automatically apply different symbol colors to different resources in the same dataset.
  • Allow users to style the geographic data, changing colors and symbols.
  • Allow users to configure and embed maps of their data in their organization or crisis pages.
  • Provide OGC-compliant web services of contributed datasets (WFS, WMS, etc.).
  • Allow external geographic data services (WMS, WFS, etc) to be added to a map preview.
  • Make our vector tiles available as a web service.

If any of these enhancements sound useful or you have new ideas, send us an email at hdx.feedback@gmail.com. If you have geodata to share with the HDX community, start adding your data here.

We would like to say a special thanks to Jeffrey Johnson who pointed us toward the vector tiles solution and to the contributors of all the open source projects listed above! In addition to GISRestLayer, you’ll find the rest of our code here.

Building tools for Open Data adoption

denis - September 11, 2015 in Community, Feature, Featured

At DataCats, we are focused on a simple problem — how do we make sure every single government has easy access to get up and running with Open Data? In other words, how do we make it as easy as possible for governments of all levels to start publishing open data?

The answer, as you might tell by this blog, is CKAN. But CKAN uses a very non-traditional technology stack, especially by government standards. Python, PostgreSQL, Solr, and Unix, are not in the toolbox of most IT departments. This is true not only for local government in Europe and North America, but also for almost all government in the developing world.

Our answer to this problem are two software projects which, like CKAN, are Free and Open Source Software. The first is the eponymously named datacats, and the second is named CKAN Multisite. The two projects together aim to solve the operational difficulties in deploying and managing CKAN installations.

datacats is a command line library built on Docker, a popular new alternative to virtualization that is experiencing explosive growth in industry. It aims to help CKAN developers easily get set up and running with one or more CKAN development instances, as well as deploy those easily on any provider – be it Amazon, Microsoft Azure, Digital Ocean, or a plain old physical server data centre.

Our team has been using datacats to develop a number of large CKAN projects for governments here in Canada and around the world. Being open source, we get word every week of another IT department somewhere that is trying it out.

CKAN Multisite is a companion project to datacats, targeted at system administrators who wish to manage one or more CKAN instances on their infrastructure. The project was very generously sponsored by U.S. Open Data. Multisite provides a simple API and a web interface through which starting, stopping, and managing CKAN servers is as simple as pressing a button. In essence it gives you your very own CKAN cloud.

CKAN is as an open source project that many national and large city governments depend on as the cornerstone of their open data programs. We hope that these two open source projects will help the CKAN ecosystem continue to grow. If you are a sysadmin or a developer working on CKAN, give it a try — and if you have the appetite — consider contributing to the projects themselves.

Presenting public finance just got easier

Tryggvi Björgvinsson - March 20, 2015 in Extensions, Feature, Featured, Releases, Visualization

mexico_ckan_openspending

CKAN 2.3 is out! The world-famous data handling software suite which powers data.gov, data.gov.uk and numerous other open data portals across the world has been significantly upgraded. How can this version open up new opportunities for existing and coming deployments? Read on.

One of the new features of this release is the ability to create extensions that get called before and after a new file is uploaded, updated, or deleted on a CKAN instance.

This may not sound like a major improvement  but it creates a lot of new opportunities. Now it’s possible to analyse the files (which are called resources in CKAN) and take them to new uses based on that analysis. To showcase how this works, Open Knowledge in collaboration with the Mexican government, the World Bank (via Partnership for Open Data), and the OpenSpending project have created a new CKAN extension which uses this new feature.

It’s actually two extensions. One, called ckanext-budgets listens for creation and updates of resources (i.e. files) in CKAN and when that happens the extension analyses the resource to see if it conforms to the data file part of the Budget Data Package specification. The budget data package specification is a relatively new specification for budget publications, designed for comparability, flexibility, and simplicity. It’s similar to data packages in that it provides metadata around simple tabular files, like a csv file. If the csv file (a resource in CKAN) conforms to the specification (i.e. the columns have the correct titles), then the extension automatically creates the Budget Data Package metadata based on the CKAN resource data and makes the complete Budget Data Package available.

It might sound very technical, but it really is very simple. You add or update a csv file resource in CKAN and it automatically checks if it contains budget data in order to publish it on a standardised form. In other words, CKAN can now automatically produce standardised budget resources which make integration with other systems a lot easier.

The second extension, called ckanext-openspending, shows how easy such an integration around standardised data is. The extension takes the published Budget Data Packages and automatically sends it to OpenSpending. From there OpenSpending does its own thing, analyses the data, aggregates it and makes it very easy to use for those who use OpenSpending’s visualisation library.

So thanks to a perhaps seemingly insignificant extension feature in CKAN 2.3, getting beautiful and understandable visualisations of budget spreadsheets is now only an upload to a CKAN instance away (and can only get easier as the two extensions improve).

To learn even more, see this report about the CKAN and OpenSpending integration efforts.

Geospatial update: MapBox, pycsw, and CKAN at FOSS4G

Adrià Mercader - October 12, 2013 in Extensions, Feature, News, Presentations

There has been a lot of recent work on CKAN’s popular and widely used spatial extension. The extension adds a spatial field to datasets and allows spatial metadata to be harvested from a variety of sources, queried (including with a map search), and exposed using the CSW standard for geospatial discovery. It also provides map previews for spatial data formats such as GeoJSON.

In this post I’ll describe two major new features, an overhaul and expansion of the documentation, and a recent presentation at FOSS4G, the open-source geospatial conference.

Support for MapBox and other tiles

The spatial extension displays maps for various purposes. By default, these uses MapQuest-OSM tiles, based on OpenStreetMap data and provided by MapQuest.

However, users wanting to customize the default maps for their own instances can now use map tiles hosted on MapBox. MapBox makes it really easy to create beautiful custom maps, using their online editor or the more advanced TileMill desktop tool. To use your own tiles, you just need to set a couple of configuration options to enjoy their handcrafted maps. You can see the MapBox tiles in action in the screenshot of a file preview below, or on our demo site, and all details to set it up yourself can be found in the documentation here.

[Image: map preview]

Support for custom tiles is not limited to MapBox: any tileset that follows the XYZ convention for web maps can be used on the widgets of your CKAN instance, even Stamen’s famous watercolor maps, as in the example below of map-based search filtering:

[Image: watercolour tiles]

Integration with pycsw

CKAN offers support for the CSW standard, a specification from the Open Geospatial Consortium for exposing
geospatial catalogues over the web. CKAN can both harvest remote CSW servers,
and expose its own records via a CSW interface. Until now, the latter has been done via a custom plugin, which provided a very limited subset of the specification and was quite flaky.

To improve that we have added features to integrate with pycsw, an excellent open source Python library that provides a full CSW implementation. At present only datasets harvested from other spatial sources can be exposed via pycsw, but hopefully more general support will be added in the future.

To find out more about CKAN and CSW, look at the documentation.

New and revamped documentation

The documentation had outgrown the README file and was hard to follow, so we moved it into proper Sphinx docs hosted online with the core CKAN docs. The full geospatial extension docs are now at docs.ckan.org/projects/ckanext-spatial.

This wasn’t just a cosmetic change: all the documentation has been restructured, cleaned up and updated, so hopefully topics like installation, geo-indexing datasets and the spatial harvesters are now clearer and easier to set up.

We aim to do a similar job soon updating the documentation for the spatial extension’s favourite companion, the harvester extension. As usual, any suggestions or pull requests to improve the documentation are very welcome.

CKAN at FOSS4G

FOSS4G is the main international conference for open source geospatial software. It is organized by different local chapters of OSGeo, an organization that promotes, incubates and fosters communities around free geo-related projects.

This year we dropped in to Nottigham in the UK to present CKAN at FOSS4G for the first time ever. Although not as popular as other veteran catalogues like
GeoNetwork or GeoNode, CKAN is starting to be well known in the geospatial community, and there was a lot of interest in potential features and integration with other tools.

Sadly the presentation doesn’t seem to have been recorded, but you can check out the slides here, or download them as PDF (5.5 Mb).

EDaWaX: Choosing CKAN for managing research data

Hendrik Bunke - September 25, 2013 in Deployments, Extensions, Feature

This is a guest post by Hendrik Bunke of the EDaWaX project, cross posted from the project blog. EDaWaX is a German project which aims to greatly increase the amount of research data in Economics that is made open.



One aim of EDaWaX is to develop and implement a web-platform prototype for a publication-related research data archive. We’ve chosen CKAN – an open source data portal platform – as basis for this prototype.

This post describes the reasons for this decision and tries to give some insights into CKAN, its features and technology. We’ll also discuss these features both in regard to our special use case and to suitability for research data management in general.

Before you proceed, it might be useful to have at least a short look at an article that covers a similar topic and does it far more extensively than this blog post. It’s written by Joss Winn and is titled Open Data and the Academy: An Evaluation of CKAN for Research Data Management. The paper was made available on GoogleDrive so others could comment or even add to the article. I’m also mentioning Joss’ paper to show that there is already an ongoing discussion – for example in this mailing list thread – about how to adapt CKAN, which is at the moment mainly used for government data, for research data management.

This post focuses on our special EDaWaX perspective and does also provide some more technical introduction (installation, writing extensions, using the API etc.). In addition we describe our own CKAN extensions that add basic theme customisations and custom metadata fields.

We hope this will be useful for those who are looking for a decent solution for a research data repository and have heard only a little or (most probably) nothing at all about CKAN yet.

EDaWaX criteria for research data archive software

We won’t go into detail about the EDaWaX project here. In short, EDaWaX is looking for ways to publish and curate research data in economics. Our focus is on publication-related data, meaning especially the data that authors of journal papers have used for their articles. One objective of the project is the development of a data archive for journals using an integral approach.

Our projected web application should demonstrate some features that the EDaWaX studies revealed to be important for replication purposes. We evaluated several software packages dealing with data publishing and had only a few, very general but fundamental, criteria for the software:

  • Open Source: This is a fundamental principle for us, but there are also practical reasons for this. We want to be able to modify and extend the software, and we would like to share our extensions.

  • API (reading and writing): This is quite important for a modular and flexible infrastructure. We also want to provide integration packages for other systems (CMS or special e-journal software). We think that research data must not just be stored in, perhaps closed, ‘data-silos’, but should be accessible and reusable as much as possible. An API opens up a lot more possibilities for this purpose.

  • Simple User Interface: We are mainly targeting authors and editorial offices who don’t have the time, resources and know-how to learn and use complicated UIs and workflows. This is also important for lowering the general barriers for publishing research data.

  • RDF metadata representation: We are aware that this might be a somewhat ‘avant-garde’ criterion. But we predict that it will be more and more important in the near future to have a general, linkable and machine-readable metadata interface, so our research data can be used and adopted as widely as possible.

The main ‘opponents’ of CKAN in this small ‘contest’ were Dataverse and Nesstar. But while both are well established platforms dedicated especially to research data (which CKAN is not), neither met most of our criteria. Nesstar is proprietary, not open software, you have to pay for the server component, and the only way to upload data is the use of the so called ‘Publisher’, a Windows-only client. That’s a no-go for us. Dataverse’s main problem compared to CKAN (besides the fact that it is an unfriendly Java-beast of software ;-) is the lack of a decent API. There is at least now a reading API (since March 2012), but you cannot use it to upload data. So, in the end there was no question which software we would choose: CKAN. Let’s see in detail why.

What is CKAN?

CKAN — an abbreviation for “Comprehensive Knowledge Archive Network”, which does not exactly describe its actual use-cases today — is an open-source (check) web platform for publishing and sharing data. Written in Python, it offers a simple, nice looking, and very friendly user interface (check) and provides by default a RDF metadata representation for each dataset (check). The feature that, in our view, makes CKAN really outstanding is its API, which allows access to nearly every function of the system including writing and deleting of datasets (check; more on that later).

CKAN has many, many more features that could be listed here, including harvesting, data visualisation and preview, full-text and fuzzy search, and faceting. It is widely used, mainly in the field of open government data, where it has become a de facto standard software package. CKAN powers the national open data portals of the UK, the USA, Australia or the EU, to name only a few well-known examples. The CKAN website has an impressive list with all known production instances.

The very active development of CKAN is led and organised by the Open Knowledge Foundation (OKF). There are around ten people at OKF who are mainly working on CKAN, most of them developers, and in addition at least 30-40 developers are contributing actively.

If you want to contribute to CKAN or develop an extension you should subscribe to the developer mailing list. There’s also a general discussion list, and if you want to use CKAN for research data management (and in the end that’s what this article is all about) please immediately subscribe to the quite newly established and already mentioned list ckan4rdm.

CKAN’s source code can be found at github. It is written very cleanly (forced by clear coding standards). As a reasonably experienced Python developer you won’t have major difficulties in understanding the code.

If you just want to test the front end, i.e. the user interface of CKAN, you don’t have to install it yourself. OKF is running the public open data portal datahub.io, where you can register and upload data. The portal is not only for testing purposes. For example, all the RDF data of the Linked Data cloud is registered there. There’s also a ‘pure’ demo site that gives you a first impression.

However, if you’re really considering CKAN to power your open data portal you should of course install it yourself.

Installation

Before you start to install and use CKAN please have a look at its extensive and excellent documentation. We will only give some hints here, to give you a starter.

If you have decided to give CKAN a try, you have two install options. If your machine runs a 64-bit Ubuntu 12.04, you can try to install all needed packages via apt-get. The package installation will do all necessary basic configurations for you, so it might be more convenient. However, this method also involves some lack of flexibility, so we would not recommend it. Moreover, if you may want to develop your own CKAN extension (more on that later) or use another OS platform, you must use the second method and install CKAN from source. I didn’t find that to be too difficult and, again, if you have some experience as developer it will be no real problem. In this scenario CKAN will be installed via git and pip in a virtualenv, which you will most probably already be familiar with. The application then is ran with paster. Under the hood, CKAN uses the Python framework Pylons (which has now merged with another framework and is called Pyramid; but that’s another story).

In addition to the core package of CKAN, you will have to install and configure some packages that CKAN requires. Nothing fancy here, though. CKAN uses PostgreSQL as its database, and for searching and indexing it relies on Solr, which involves the installation of a Java JDK and a Java application server like Jetty or Tomcat. It’s worth mentioning that you can run CKAN without Solr, but you’ll lose a lot of advanced search functionality like faceting for instance. The same goes for the database. Besides the almighty PostgreSQL you can also use the lightweight SQLite. This is quite handy for testing purposes or for development, but not recommended or supported for production installations.

API

As mentioned before, we think it’s the API that makes CKAN really outstanding. “All of a CKAN website’s core functionality (everything you can do with the web interface and more) can be used by external code that calls the CKAN API”, as the documentation states. And that’s true. You can

  • get all sorts of lists for packages, groups or tags;

  • get a full metadata representation of any dataset or resource (which is the actual data or file);

  • do all the kinds of searches you can do with the web interface;

  • create, update and delete datasets, resources and other objects. I’m emphasizing this because it’s really a killer feature, which enables you to develop your very own application based on API calls to an external CKAN installation. It makes, for example, mobile apps possible. Or you can write plugins for your local CMS, journal system or whatever.

From a programmer’s perspective this is just great, great, great. And even with our focus, open research data management, it enables a lot more usage scenarios than a simple web portal with a closed, proprietary database would do.

And in fact, it is quite easy to use the API. There are client libraries for any common web programming language (Python, Java, Perl, PHP, Javascript, Ruby), so you don’t need to write the basic functions on your own. A very simple Python script like the one below is sufficient to upload a file to a CKAN instance:

import ckanclient

CKAN = ckanclient.CkanClient(api_key=<your_CKAN_api_key>,
    base_location=<url_of_CKAN_instance/api>)
upmsg = CKAN.upload_file(<your_local_filename>)
print upmsg #this is not necessary ;-)

For demonstration and testing purposes we’ve developed a small sample application. It was built with the Pyramid framework and can completely manage the datasets of a certain group at an external CKAN instance. The demo pics show the list of the packages and a form to create a new dataset. Since this instance is for developing and evaluation purposes only it’s not public, but hopefully the pics will give you a first impression of what’s possible.

Custom application using the API: list of packages

Custom application using the API: list of packages

add_dataset

Pyramid app: add_form

It’s worth mentioning that, of course, writing your own application around the CKAN API also allows you to simply add features CKAN might not have. So, for instance, the little red X mark at the right side of all packages (screenshot #1) enables a direct deletion of the package. That’s something CKAN’s UI does not offer by default.

OK, you’re saying, I got it, the API is great. But I don’t want to program an external application. I just want to stick with the original platform, but I need a different look, and even more special functionalities. So, is CKAN extensible?

Short answer: Yes.

Long Answer: Writing Extensions

Adding a custom theme or functionality is done with so called extensions. CKAN extensions are ordinary Python packages containing one or more plugin classes. You can create them with paster in your virtual environment.

paster create -t ckanext ckanext-mycustomextension

Note that you must use the prefix ckanext-, for otherwise CKAN won’t detect and load your package. You then have to install it as a develop package in the Python path of your virtual environment. That’s done the usual way with

cd <path_to_your_extension>

python setup.py develop

or even

pip install -e <path_to_your_extension>

Please refer to the docs for a detailed description on writing extensions. Basically you use the so called PluginToolkit and a whole bunch of interface classes with which you can hook into CKAN core functionality with your own code. You will most probably also need to overwrite some Jinja templates, especially if you want to create a new look for your portal.

CKAN provides some basic example extensions that will quickly give you a rough understanding of how the plugin mechanism works. In addition there are many (many!) CKAN extensions already available. You can browse them at github.

So, what are the extensions we are developing for EDaWaX?

EDaWaX extensions and implementations

Basically we are working on two extensions for EDaWaX at the moment. The first one, called ckanext-edawax, is mainly for the UI. It tweaks some templates and UI elements (logo, fonts, colors etc.). In addition it removes elements we do not need at the moment, like ‘groups’ or facet fields, and it renames the default ‘Organizations’ to ‘Journals’, since this is our only type of organization and we’d like to reflect this focus. We will also add new elements, like proposed citation in a dataset view. You can get the idea of the prototype with these screenshots.

edawax_frontpage

EDaWaX custom frontpage

edawax_datasets

EDaWaX datasets view

edawax_single

EDaWaX single dataset view

Our second package, ckanext-dara, relates to metadata. CKAN offers only a kind of general and limited set of metadata for datasets (like title, description, author), that does not reflect any common schema. You can, nevertheless, add arbitrary fields via the webinterface for each dataset. But that’s not schema based. The approach of CKAN here is to avoid extensive metadata forms, that might restrict the usability of the portal, and also not to specialise on certain types or categories of data, like, you name it, research data. Dedicated research data applications like Dataverse do have an advantage here. Dataverse’s metadata forms are based on the well-known and very extensive DDI schema. CKAN is not originally a research data management tool, and the lack of decent metadata schema support is one point where this hurts. However, this more general approach as well as the plugin infrastructure (a feature that Dataverse does not offer, AFAIK) enables us to customise the dataset forms, add specific (meta‑)data, and to guarantee compatibility with a given schema. For EDaWaX this will be the da|ra schema, which itself is partially based on the well-known, but less complex data-cite schema. The German based da|ra is basically a DOI registration service for social science and economic data. Since we will automatically register DOIs for our datasets in the CKAN portal with da|ra it makes perfectly sense that we use their schema (which we must do anyway when submitting our metadata).

ckanext-dara is the CKAN-extension where all the metadata functionality as well as the DOI registration will be added. It is also planned to publish this package as Open Source on github. So far the development has concentrated on extending the standard CKAN dataset forms with da|ra specific metadata. The problem here is the conflict between usability and the aspiration to get as much metadata as possible. You know, we are working in a library. For librarians metadata is important. Very important. You could say that librarians think in metadata. We want every single detail of an object to be described in metadata, if possible in very cryptic metadata. Since metadata schemas are often (if not always) created by librarians, they tend to be kind of exhaustive. The current da|ra schema knows more than 100 elements, which is only a small set compared to DDI which knows up to 846 elements. Now please imagine you’re a scientist or the editor of a scientific journal who is asked to upload research data to our CKAN based platform. Would you like to see a form that asks for ~100 metadata elements? You certainly wouldn’t. Chances are good that you would immediately (and perhaps cursing) leave the site and forget about this open data thing for the next two decades or so.

To deal with this conflict we are following a twofold approach. First, we are dividing the da|ra metadata schema into three levels reflecting the necessity. Level 1 contains very basic metadata elements that are mandatory, like title, main author, or the publication year. These fields of level 1 correspond to the mandatory fields of the DataCite metadata schema. For this level we only need two or three new fields in addition to the ones that are already implemented in CKAN. Level 2 contains elements that are necessary (or perhaps better: desirable) for the searchability of the dataset or special properties of it. And finally Level 3 reflects the special metadata we need to ensure future reuse by integrating authority files. By integrating these authority files it should be possible to link persons to their affiliations, to articles, research data, to keywords or  to special fields of research.

Second, we will try to integrate these different levels of metadata as seamlessly as possible into CKAN’s UI. The general idea behind this is to give users the choice of which metadata functionalities they would like to equip their data with. To achieve this we collapse the subforms for level 2 and level 3 in the dataset form with a little help from jquery. The following screenshots give you an idea. Please note that this is an early stage of development and nothing’s finalised yet. We have not implemented level 3 yet. However, you can see that the form for level 2 (as well as later for level 3) is collapsed by default, so the “quick’n’dirty” user won’t have to deal with it if she does not want to. We are still thinking, however, about the motivation/information text and its presentation.

edawax_addform

ckanext-dara addform

edawax add 2

ckanext-dara addform extended

Journal CMS add-on

In addition to building the opendata portal itself we are planning to develop an add-on for an existing E-Journal, using the CKAN API. This will be done for the CMS Plone, which is the base of the E-Journal ‘Economics’. It should mainly be a testcase for usability of the CKAN API for editorial offices. Editors of ‘Economics’ have some experience with Dataverse (and are not always happy with it) so we have a very good setting here. Generally we consider integration in third-party systems to be very important for the acceptance of CKAN as a repository for publication-related research data. Users should not be bothered with having to use two (or even more) different systems for data and text. This approach also gives the maximum of integration for data and articles. Dataverse, for example, will develop such functionalities for OJS (Open Journal System), hopefully within the next two years. CKAN has a kind of head start here due to its great API, but I think we need to popularise CKAN in this respect, so this package will be developed as a ‘proof-of-concept’.

Conclusion

At least for our use case in the EDaWaX context, CKAN has proven so far to be the best available solution for an open research data portal. Due to its current focus on open government data, it might show some desiderata regarding the use for research data. CKAN is focused on data publishing, not data curation. This shows up clearly in its very basic support for metadata. But as we’ve shown, CKAN has two fantastic features – the API and the plugin mechanism – that facilitate the development of extensions and third-party apps for use-cases in the field of research data. Development in this direction has started already (not only in our project) and it’s foreseeable that those efforts will be ramped up soon.

So, if publishing of open research data is on your schedule please consider using CKAN and give it a serious try. It’s worth it. If you have any questions, comments or criticism please leave a note in the comments section. Please feel also free to write an email to h.bunke <at> zbw.eu in case you have a more specific question.

CKAN 2.1 released

Adrià Mercader - August 13, 2013 in Feature, News, Releases

We are happy to announce that the new CKAN 2.1 version is available to
download and install.

This version adds exciting new features, including an interface for
bulk dataset updates (shown below), improved previews for text files, a new
redesigned dashboard and significant improvements to the
documentation. Have a look at the
changelog
for a full list of all
new features and fixes, and play around on our demo site
which has been also updated.

[IMG: bulk update]

The new version is available to install
as usual from packages or
source, depending on your needs.
If you want to upgrade from a previous release, have a look at the
upgrade documentation.

Patch releases

Apart from the new 2.1 version, there are new patch releases available
for previous CKAN versions that fix bugs and security issues. Users
are strongly encouraged to upgrade to the latest patch release for the
CKAN version they are using, as this is the only one that will be
supported (remember that patch releases don’t contain backwards
incompatible changes). See the
release policy
for more details.

For details on how to upgrade, see the following links depending on
your CKAN version and install method:

2.0.x → 2.0.2
Package upgrade
Source upgrade
1.8.x → 1.8.2
1.7.x → 1.7.4
Package upgrade
Source upgrade

If you find any problems, let us know on the mailing list or the IRC channel (#ckan on freenode).

Mapping GeoJSON in CKAN

Dominik Moritz - July 31, 2013 in Extensions, Feature, News

A new feature in CKAN enables users to preview geographical data in GeoJSON files, a widely-used open format for geodata. The preview renders the data on an interactive map.

GeoJSON is a simple file format for storing spatial features such as points and areas together with attributes. The new preview loads GeoJSON data into a Leaflet map, which shows the MapQuest base layer in the background. Points are shown as markers, and areas as polygons; click on a point or area, and a pop-up will show the details associated with it. You can also pan and zoom the map. Try it out in the demo below.

The preview is part of CKAN’s spatial extension, which already offers previews for WMS resources. To enable the GeoJSON preview, ensure that the spatial extension is installed, and add geojson_preview to the ckan.plugins list in your CKAN config file. Also, if you want users to be able to view prefiles linked on remote servers, add the resource_proxy extension. Now, if you create a new resource with a GeoJSON file and set the file type to geojson (or gson), you can navigate to the resource page, where the preview will appear automatically.

By default, CKAN offers previews for many file types including images, web sites, CSV files, Excel spreadsheets, JSON files and PDFs. These previews are implemented in CKAN extensions and use the IResourcePreview extension interface. This means that new previews can be added very easily, and previews are not limited to the default ones that come with CKAN. If you want to write a another kind of preview, have a look at the examples in CKAN and the interface documentation.

Adding custom previews to CKAN

Dominik Moritz - March 13, 2013 in Extensions, Feature

A new feature in CKAN 2.0 enables you to add custom previews for different file types.

CKAN’s ability to preview resources gives users a quick way to check if they have the dataset they need – as well as to begin to explore the data. CKAN has built-in previews for certain filetypes, such as images and CSV files. However, if you have resources in another format, users cannot preview them. Custom previews provide a simple way to add previews for more filetypes, or even modify existing preview methods.

[IMG: preview]

Map preview of CSV file in CKAN

New previews can be build as a CKAN extension with the help of the IResourcePreview interface. This is the same interface as CKAN already uses for built-in previews. (Note that, while most of these are enabled by default, the built-in PDF preview is not. To enable PDF preview, you need to edit the CKAN .ini file and add pdf_preview to ckan.plugins.)

A preview extension must implement three methods:

  • can_preview to indicate that it can preview the dataset,
  • preview_template to return the template for the preview, and
  • setup_template_variables to add the data that should be rendered to the template.

If you happen to write an extension that previews files that are fetched via an Ajax call, you should also have a look at the resource proxy extension. This offers a workaround for the same origin policy which normally prevents files being fetched from a domain that is different from the domain of the CKAN site.

To get started with your own preview extension, I recommend that you read the extension documentation linked above, and then have a look at the built-in preview extensions at ckanext. If you have any questions, let us know on the mailing list and we’ll try to help.

European Union launches CKAN data portal

Mark Wainwright - February 22, 2013 in Deployments, Feature, News

The European Commission (EC) has unveiled a new data portal, which will be used to publish data from the EC and other bodies of the European Union.

This major project was announced last year, and it went live in December for testing before today’s announcement. The portal includes extensive CKAN customisation and development work by the Open Knowledge Foundation, including a multilingual extension enabling data descriptions (metadata) to be made available in different languages: at present the metadata is offered in English, French, German, Italian and Polish. The portal was originally planned for EC data, but it will now also hold data from the European Environment Agency, and hopefully in time a number of other EU bodies as well.

The EU has been a key mover in driving the Open Data agenda in member states, so it is fitting that it is now promoting transparency re-use of its own data holdings by making them available in one place. It has for some years been encouraging member states to publish data via dedicated portals, and it also supports the OKF’s work on publicdata.eu, a prototype of a pan-European data portal harvesting data from catalogues across the Union, via the LOD2 research project.