Support Us

Implementing VectorTiles Preview of Geodata on HDX

chadhendrix - September 16, 2015 in Extensions, Feature, Visualization

This post is modified version of a post on the HDX blog.  It is modified here to highlight information of most interest to the CKAN community.  You can see the original post here.

Humanitarian data is almost always inherently geographic. Even the data in a simple CSV file will generally correspond to some piece of geography: a country, a district, a town, a bridge, or a hospital, for example.

HDX has built on CKAN’s preview capabilities with the ability to preview large (up to 500MB) vector geographic datasets in a variety of formats.  Resources uploaded (or linked) to HDX with the format strings ‘geojson’, ‘zipped shapefile’, or ‘kml’ will trigger the creation of a geo preview. Here is an example showing administrative boundaries for Colombia:


image00

To minimize bandwidth in the interest of often poorly-connected field locations, we built the preview from vector tiles. This means that details are removed at small scales but will reappear as you zoom in.

The preview is created only for the first layer it encounters in a resource. If the resource contains multiple layers, the others will not show up. For those cases, you can create separate resources for each layer and they will be available in the preview. Multiple geometry types (polygon + line, for example) in kml or geojson are not yet supported.

Implementation

It’s a common problem in interactive mapping: to preview the whole geographic dataset, we would need to send all of the data to the browser, but that can require a long download or even crash the browser. The classic solution is to use a set of pre-rendered map tiles — static map images made for different zoom levels and cut into tiny pieces called tiles.  The browser has to load only a few of these pieces for any given view of the map. However, because they are just raster images, the user cannot interact with them in any advanced way.

We wanted to maintain interactivity with the data, eventually having hover effects or allowing users to customize styling, so we knew that we needed a different approach. We reached out to our friends at Geonode who pointed us to the recently developed Vector Tiles Specification.

The vector tile solution is a similar approach to traditional map tiles, but instead of creating static image tiles, it involves cutting the geodata layer into small tiles of vector data. Each zoom level receives a simplification (level of detail, or LoD) pass, which reduces the number of vertices displayed, similar to the way that 3D video games or simulators reduce the number of polygons in distant objects to improve performance. This means that for any given zoom level and location, the browser needs to download only the vertices necessary to fill the map.  You can learn more about how vector tiles work in this helpful FOSS4G NA talk from earlier this year.

Because vector tiles are a somewhat-new technology, there wasn’t any off-the-shelf framework to let us integrate them with our CKAN instance. Instead, we built a custom solution from several existing components (along with our own integration code):

Our architecture looks like this:

image03

The GISRestLayer orchestrates the entire process by notifying each component when there is a task to do. It then informs CKAN when the task is complete, and a dataset has a geo preview available.  It can take a minute or longer to generate the preview, so the asynchronous approach — managed through Redis Queue (RQ) — was essential to let our users continue to work while the process is running. A special HDX team member, Geodata Preview Bot, is used to make the changes to CKAN. This makes the nature of the activity on the dataset clear to our users.

Future development

This approach gives HDX a good foundation for adding new geodata features in the future. We will be conducting research to understand what users think is important to add next. Here are some initial new-feature ideas:

  • Automatically generate additional download formats so that every geodataset is available in zipped shapefile, GeoJSON, KML, etc.
  • Allow the contributing user to specify the order of the resources in the map legend (and therefore which one appears by default).
  • Allow users to preview multiple datasets on the same map for comparison.
  • Automatically apply different symbol colors to different resources in the same dataset.
  • Allow users to style the geographic data, changing colors and symbols.
  • Allow users to configure and embed maps of their data in their organization or crisis pages.
  • Provide OGC-compliant web services of contributed datasets (WFS, WMS, etc.).
  • Allow external geographic data services (WMS, WFS, etc) to be added to a map preview.
  • Make our vector tiles available as a web service.

If any of these enhancements sound useful or you have new ideas, send us an email at hdx.feedback@gmail.com. If you have geodata to share with the HDX community, start adding your data here.

We would like to say a special thanks to Jeffrey Johnson who pointed us toward the vector tiles solution and to the contributors of all the open source projects listed above! In addition to GISRestLayer, you’ll find the rest of our code here.

Building tools for Open Data adoption

denis - September 11, 2015 in Community, Feature, Featured

At DataCats, we are focused on a simple problem — how do we make sure every single government has easy access to get up and running with Open Data? In other words, how do we make it as easy as possible for governments of all levels to start publishing open data?

The answer, as you might tell by this blog, is CKAN. But CKAN uses a very non-traditional technology stack, especially by government standards. Python, PostgreSQL, Solr, and Unix, are not in the toolbox of most IT departments. This is true not only for local government in Europe and North America, but also for almost all government in the developing world.

Our answer to this problem are two software projects which, like CKAN, are Free and Open Source Software. The first is the eponymously named datacats, and the second is named CKAN Multisite. The two projects together aim to solve the operational difficulties in deploying and managing CKAN installations.

datacats is a command line library built on Docker, a popular new alternative to virtualization that is experiencing explosive growth in industry. It aims to help CKAN developers easily get set up and running with one or more CKAN development instances, as well as deploy those easily on any provider – be it Amazon, Microsoft Azure, Digital Ocean, or a plain old physical server data centre.

Our team has been using datacats to develop a number of large CKAN projects for governments here in Canada and around the world. Being open source, we get word every week of another IT department somewhere that is trying it out.

CKAN Multisite is a companion project to datacats, targeted at system administrators who wish to manage one or more CKAN instances on their infrastructure. The project was very generously sponsored by U.S. Open Data. Multisite provides a simple API and a web interface through which starting, stopping, and managing CKAN servers is as simple as pressing a button. In essence it gives you your very own CKAN cloud.

CKAN is as an open source project that many national and large city governments depend on as the cornerstone of their open data programs. We hope that these two open source projects will help the CKAN ecosystem continue to grow. If you are a sysadmin or a developer working on CKAN, give it a try — and if you have the appetite — consider contributing to the projects themselves.

Matthew Fullerton and some interesting CKAN extension development.

Steven De Costa - August 21, 2015 in Community, Extensions

Matthew Fullerton - mattfullertonNote: This is a re-post from one of our CKAN community contributors, Matthew Fullerton. He has been working on some interesting extensions, which are outlined below. You can support Matthew’s work by providing comments below, or you can link through to his GitHub profile to comment or get in touch there.

 

Styling GeoJSON data

The GeoView extension makes it easy to add resource views of GeoJSON data. In our extended extension, attributes of the features (lines, points) in the FeatureCollection are styled according to MapBox’s SimpleStyle spec.

Here’s an example where the file has been processed to add colors based on traffic flow state:
https://smartlane.io/dataset/differentgeovisualizations/resource/49f0fcffb3c848c8b1c6ddc33e4a83fe

And another where the points are styled to (vaguely) look like colored traffic lights:
https://smartlane.io/dataset/differentgeovisualizations/resource/a4e397adcbd948bfa77a296c5fcc9559
(watch out, it can take a while to load)

Realtime GeoJSON data

Using leaflet.realtime, an extension for the leaflet library that CKAN (GeoView) uses to visualize GeoJSON, maps can have changing points or colors/styles.

Here is an example of traffic lights changing according to pre-recorded data:
https://smartlane.io/dataset/trafficlightstreamfrankfurtniederrad/resource/b6e4319ef29b480bad6d214a753d3c2d

I’ll try and add a demo with moving data points soon, it ought to work without any further code changes. The problem is often getting the live data in GeoJSON format… but we have a backend for preprocessing other data.

Realtime data plotting

By making only a few small changes, we are able to continuously update Graph views. You can see the changing (or not) temperature in our office here:
https://smartlane.io/dataset/temperaturesensor/resource/bd6456385541499e861bf9c97e60f35a

That’s an example for ‘lines and points’ but it works for things like bar graphs too. Last week we had people competing to achieve the best time in a remote controlled robot race where their time was automatically displayed as a bar on a ‘leader board’. For good measure we had an automatically updating histogram of the times too. Updating the actual data in CKAN is easy thanks to the DataStore API.

Matthew Fullerton

Freelance Software Developer and EXIST Stipend holder with the start up project “Tapestry” http://www.smartlane.de/

Two new CKAN extensions – Webhooks and Geopusher

Steven De Costa - August 16, 2015 in Extensions

Denis Zgonjanin recently shared the following update on two new extensions via the CKAN Dev mail list.

If you are working on CKAN extensions and would like to share details with other developers then post your updates via the mail list. We’ll always look at promoting the great work of community contributions via this blog :) If you have an interesting CKAN story to share feel free to ping @starl3n to organise a guest post.

From Denis:

Webhooks

A problem I’ve had personally is having my open data apps know when a dataset they’ve been using has been updated. You can of course poll CKANperiodically, but then you need cron jobs or a queue, and when you’re using a cheap PaaS like heroku for your apps, integrating queues and cron is just an extra hassle.

This extension lets people register a URL with CKAN, which CKAN will call when a certain event happens – for example, a dataset update. The extension uses the built-in CKAN celery queue, so as to be non-blocking.

If you do end up using it, there are still a bunch of nice features to be built, including a simple web interface through which users can register webhooks (right now they can only be created through the action API)

Geopusher

So you know how you have a lot of Shapefiles and KML files in your CKANs (because government), but your users prefer GeoJSON? This extension will automatically convert shapefiles and KML into GeoJSON, and create a new GeoJSON resource within that dataset. There are some cases where this won’t work depending on complexity of SHP or KML file, but it works well in general.

This extension also uses the built-in celery queue to do its work, so for both of these extensions you will need to start the celery daemon in order to use them:

`paster --plugin=<span class="il">ckan</span> celeryd -c development.ini`

Beauty behind the scenes

Tryggvi Björgvinsson - August 5, 2015 in Deployments, Extensions, Featured

Good things can often go unnoticed, especially if they’re not immediately visible. Last month the government of Sweden, through Vinnova, released a revamped version of their open data portal, Öppnadata.se. The portal still runs on CKAN, the open data management system. It even has the same visual feeling but the principles behind the portal are completely different. The main idea behind the new version of Öppnadata.se is automation. Open Knowledge teamed up with the Swedish company Metasolutions to build and deliver an automated open data portal.

Responsive design

In modern web development, one aspect of website automation called responsive design has become very popular. With this technique the website automatically adjusts the presentation depending on the screen size. That is, it knows how best to present the content given different screen sizes. Öppnadata.se got a slight facelift in terms of tweaks to its appearance, but the big news on that front is that it now has a responsive design. The portal looks different if you access it on mobile phones or if you visit it on desktops, but the content is still the same.

These changes were contributed to CKAN. They are now a part of the CKAN core web application as of version 2.3. This means everyone can now have responsive data portals as long as they use a recent version of CKAN.

New Öppnadata.se

New Öppnadata.se

Old Öppnadata.se

Old Öppnadata.se

Data catalogs

Perhaps the biggest innovation of Öppnadata.se is how the automation process works for adding new datasets to the catalog. Normally with CKAN, data publishers log in and create or update their datasets on the CKAN site. CKAN has for a long time also supported something called harvesting, where an instance of CKAN goes out and fetches new datasets and makes them available. That’s a form of automation, but it’s dependent on specific software being used or special harvesters for each source. So harvesting from one CKAN instance to another is simple. Harvesting from a specific geospatial data source is simple. Automatically harvesting from something you don’t know and doesn’t exist yet is hard.

That’s the reality which Öppnadata.se faces. Only a minority of public organisations and municipalities in Sweden publish open data at the moment. So a decision hasn’t been made by a majority of the public entities for what software or solution will be used to publish open data.

To tackle this problem, Öppnadata.se relies on an open standard from the World Wide Web Consortium called DCAT (Data Catalog Vocabulary). The open standard describes how to publish a list of datasets and it allows Swedish public bodies to pick whatever solution they like to publish datasets, as long as one of its outputs conforms with DCAT.

Öppnadata.se actually uses a DCAT application profile which was specially created for Sweden by Metasolutions and defines in more detail what to expect, for example that Öppnadata.se expects to find dataset classifications according the Eurovoc classification system.

Thanks to this effort significant improvements have been made to CKAN’s support for RDF and DCAT. They include application profiles (like the Swedish one) for harvesting and exposing DCAT metadata in different formats. So a CKAN instance can now automatically harvest datasets from a range of DCAT sources, which is exactly what Öppnadata.se does. For Öppnadata.se, the CKAN support also makes it easy for Swedish public bodies who use CKAN to automatically expose their datasets correctly so that they can be automatically harvested by Öppnadata.se. For more information have a look at the CKAN DCAT extension documentation.

Dead or alive

The Web is decentralised and always changing. A link to a webpage that worked yesterday might not work today because the page was moved. When automatically adding external links, for example, links to resources for a dataset, you run into the risk of adding links to resources that no longer exist.

To counter that Öppnadata.se uses a CKAN extension called Dead or alive. It may not be the best name, but that’s what it does. It checks if a link is dead or alive. The checking itself is performed by an external service called deadoralive. The extension just serves a set of links that the external service decides to check to see if some links are alive. In this way dead links are automatically marked as broken and system administrators of Öppnadata.se can find problematic public bodies and notify them that they need to update their DCAT catalog (this is not automatic because nobody likes spam).

These are only the automation highlights of the new Öppnadata.se. Other changes were made that have little to do with automation but are still not immediately visible, so a lot of Öppnadata.se’s beauty happens behind the scenes. That’s also the case for other open data portals. You might just visit your open data portal to get some open data, but you might not realise the amount of effort and coordination it takes to get that data to you.

Image of Swedish flag by Allie_Caulfield on Flickr (cc-by)

CKAN 2.4 release and patch releases

davidread - July 22, 2015 in Releases, Uncategorized

We are happy to announce that CKAN 2.4 is now released. In addition, new patch releases for older versions of CKAN are now available to download and install.

CKAN 2.4

The 2.4 release brings a way to set the CKAN config via environment variables and via the API, which is useful for automated deployment setups. 2.4 also includes plenty of other improvements contributed by the CKAN developer community during the past 4 months, as detailed in the 2.4.0 CHANGELOG

If you have customizations or extensions, we suggest you trial the upgrade first in a test environment and refer to the changes in the changelog. Upgrade instructions are below.

CKAN patch releases

These new patch releases for CKAN 2.0.x, 2.1.x, 2.2.x and 2.3.x fix important bugs and security issues, so users are strongly encouraged to upgrade to the latest patch release for the CKAN version they are using.

For a list of the fixes included you can check the CHANGELOG:

Upgrading

For details on how to upgrade, see the following links depending on your install method:

If you find any issue, you can let the technical team know in the mailing list or the IRC channel.

 

Some introductory presentations for CKAN

Steven De Costa - June 8, 2015 in Community, Presentations

Reposted from the CKAN Association LinkedIn group. Feel free to join if you use LinkedIn.

Thanks to Augusto Herrmann Batista and OK Brazil for allowing the following repost:

I recently presented a couple of “lightning courses” to introduce an audience to CKAN.

One was at the Linked Open Data Brasil conference in Florianópolis, Brazil, on November 2014. It’s in Portuguese language.

http://www.slideshare.net/AugustoHerrmannBatis/minicurso-de-ckan

The other one was presented at the IV Moscow Urban Forum, in Russia, on December 2014. This one is in English.

http://www.slideshare.net/AugustoHerrmannBatis/ckan-overview

Feel free to share and reuse, as they are CC-BY.

Bazinga! Minutes from the CKAN Association Steering Group – 1 April (no joke)

Steven De Costa - April 1, 2015 in Association, Featured

Readme.txt

The following minutes represent what the Steering Group discussed today but please remember its also just a meeting (context: no real work is ever done in a meeting). The objective is to discuss and assign actions when needed, to make decisions when needed and to generally align everyone in the various ways each member is already supporting the CKAN project. Reading between the lines of this update there are a few points to call out and make mention of.

  1. The Steering Group (SG) are renewed with energy and determination. While the last meeting might have been some time ago we have set ourselves the objective of meeting weekly (after next week) because it is clear that the CKAN project is advancing rapidly and support from the SG needs to align with the velocity of the project without any risk of holding it back. Let’s add some buzzwords and suggest that the SG is aiming to bootstrap the project and intersect on multiple vectors to achieve maximum lift via regular and meaningful engagement with its project stakeholders (Please don’t take that last sentence seriously).
  2. ‘Distill out a 1-3 pager’ in relation to the business plan means getting lean and putting focus on the most essential parts of the CKAN Association business plan. Long docs with much wordage are great in some situations but in the case of the CKAN project we have an avid community of exceptionally bright people who are fine with the key objectives, strategies and tactics put forward in the most succinct way possible.
  3. If there is to be an operating model for the SG then it will be this: Say what is going to get done. Get it done+. Let everyone know it is done.
  4. Some awesome questions are answered at the end of this post.

+ In some cases things might not actually get done but we will strive to do the best we can. Yes, we’ll be transparent with goals. Yes, we’ll be happy to take any and all feedback. Yes, we are working for the CKAN project and are ultimately governed via public peer review by the project’s community.

CKAN Association Steering Group Meeting 1 April 2015

  • Present: Ashley Casovan (Chair), Steven De Costa, Rufus Pollock (Secretary)
  • Apologies: Antonio Acuna

Minutes

  1. Steering Group Goals (next quarter)
    1. Announce more clearly existence and purpose of Steering Group
      1. steering group email alias: steering-group@ckan.org (goes to group)
    2. Announce objectives which are
      1. Finalise business plan (have now had out for consultation for some time)
        1. Distill out a 1-3 pager
        2. Finalise and announce to list
        3. hangout on air to announce
      2. Community meetings
        1. Technical team run one at least one (general) developer community meeting in next 2m
        2. At least one users community meeting in next 2m
  2. Responsibilities of the SG
    1. Like a board – see http://ckan.org/about/steering-group/
    2. Similarities to Drupal Board: job is to support the community in moving the project forward – self-determination
  3. Review Actions
    1. https://trello.com/b/D6zxiuFJ/ckan-association-steering-group – primarily business plan and response to questions [note this Trello board is private]
  4. CKAN Event at IODC
    1. CIOs and CTO – CKAN is part of national and regional infrastructure
    2. efficiency gains on open data
    3. https://github.com/ckan/ideas-and-roadmap/issues/120
    4. Technical capability
  5. Review of student position description
    1. Ashley to send out to SG members for comment
  6. Meeting schedule: SG will meet weekly (for present) every Thursday at 12 noon UK (for 30m)
  7. Publishing minutes from this meeting – will aim to send asap

Your questions answered

Q: Is the SG interested in increasing transparency of the SG meetings? How will this be achieved?

SG: Yes. This was discussed and we would like to propose the next meeting be run in two parts. One part will be closed to attend to some regular business of the group with regard to coordinating efforts. The second half will be broadcast as a Hangout on Air for people to watch. We’ll aim to collect questions ahead of the meetings and address them during the broadcast with further options for an active Q&A session from the audience.

Q: Have the SG determined whether members of the association are yet contributing funds, or developers to the project? What are they? What happens if members don’t?

SG: There is ongoing work in this area. Most members are contributing in-kind (not exclusively developers). We’d be happy to make the pledged contributions public via the members listing on CKAN.org. At this time it is an honour system with regard to meeting membership obligations that are provided in-kind. If a member is suspected of not providing the expected level of in-kind contribution then the Steering Group will investigate and consider appropriate actions upon conclusion of such investigations.

Q: How does the SG see its role with respect to providing direction for the project?

SG: Support the community of both technical stakeholders and users in ways which allow them to act in concert to move the project forward in the direction these stakeholders determine to be best for the project.

Q: How is the SG raising more funds, other than membership, to further fund development of CKAN?

SG: This is a question the Steering Group is working through currently. Our focus is on the Business Plan and putting strategic objectives down for all to see via that document. Grant applications and the coordination of requirements to meet the needs of a group of platform owners is also being considered. With the latter the proposed approach is to release an expression of interest for funding support against specific development activities. Those who highly value such activities would be asked to help contribute to a pool of funds that would then see the development work paid for.

CKAN Association Steering Group – about to set sail!

Steven De Costa - March 31, 2015 in Association, Events, Featured

boatThe CKAN Association Steering Group will be meeting in about 30 hours from now. I wanted to make sure we took the opportunity to ask for community questions regarding the CKAN project.

So, please comment here with any questions you might like discussed and/or answered by the steering group :)

This will be my first chance to catch up with everyone in the group so I will have lots of questions of my own. I’m also keen to provide updates on how I see things are going with regard to developing and extending the CKAN community and its reach with regard to communications activities. We have a modest starting point, so updates will be easy to provide. It would be great to get comments via this post on what more people would like to see. However, there are many action items incomplete from within the Community and Communications Team so I’ll also be reporting on that. We don’t yet have a list of CKAN vendors and this is clearly needed based on the number of CKAN Dev list requests regarding upgrade questions when planning a move to 2.3.

Some great positive indicators I see for the project are the number of people active on the CKAN Dev email list and the high volume of quality conversations that are taking place there. It appears the the 2.3 release has been the catalyst needed for a fantastic reinvestment (at least publicly) from both the regular technical team members and the wider community of awesome people doing amazing stuff within their own open data projects. I would like those on the steering group to recognise this change and actively work to support ###MOAR###!

As a new member of the steering group I should introduce myself. You can see the bio attached to this post but for a fresh video-cast of something I’m involved in within my local area you can also take a look at the Australian Open Knowledge Chapter Board meeting that was held earlier today. The video is embedded below. I do actually mention the work I’ve been doing within the CKAN association at some point so please excuse the ‘inception’-like self referential nature of all this.

The main message here is – steering group meeting in about 30 hours. Please comment on this post to amplify your voice within that forum.

Rock on!
Steven

 

Presenting public finance just got easier

Tryggvi Björgvinsson - March 20, 2015 in Extensions, Feature, Featured, Releases, Visualization

mexico_ckan_openspending

CKAN 2.3 is out! The world-famous data handling software suite which powers data.gov, data.gov.uk and numerous other open data portals across the world has been significantly upgraded. How can this version open up new opportunities for existing and coming deployments? Read on.

One of the new features of this release is the ability to create extensions that get called before and after a new file is uploaded, updated, or deleted on a CKAN instance.

This may not sound like a major improvement  but it creates a lot of new opportunities. Now it’s possible to analyse the files (which are called resources in CKAN) and take them to new uses based on that analysis. To showcase how this works, Open Knowledge in collaboration with the Mexican government, the World Bank (via Partnership for Open Data), and the OpenSpending project have created a new CKAN extension which uses this new feature.

It’s actually two extensions. One, called ckanext-budgets listens for creation and updates of resources (i.e. files) in CKAN and when that happens the extension analyses the resource to see if it conforms to the data file part of the Budget Data Package specification. The budget data package specification is a relatively new specification for budget publications, designed for comparability, flexibility, and simplicity. It’s similar to data packages in that it provides metadata around simple tabular files, like a csv file. If the csv file (a resource in CKAN) conforms to the specification (i.e. the columns have the correct titles), then the extension automatically creates the Budget Data Package metadata based on the CKAN resource data and makes the complete Budget Data Package available.

It might sound very technical, but it really is very simple. You add or update a csv file resource in CKAN and it automatically checks if it contains budget data in order to publish it on a standardised form. In other words, CKAN can now automatically produce standardised budget resources which make integration with other systems a lot easier.

The second extension, called ckanext-openspending, shows how easy such an integration around standardised data is. The extension takes the published Budget Data Packages and automatically sends it to OpenSpending. From there OpenSpending does its own thing, analyses the data, aggregates it and makes it very easy to use for those who use OpenSpending’s visualisation library.

So thanks to a perhaps seemingly insignificant extension feature in CKAN 2.3, getting beautiful and understandable visualisations of budget spreadsheets is now only an upload to a CKAN instance away (and can only get easier as the two extensions improve).

To learn even more, see this report about the CKAN and OpenSpending integration efforts.