Support Us

You are browsing the archive for James Gardner.

Building CKAN Debian Packages

James Gardner - November 14, 2011 in Deployments, Extensions, Tutorial

As I’m sure you are aware, it has been possible to install CKAN from Debian packages on an Ubuntu 10.04 system using apt-get for some time.

As part of the release of CKAN 1.5 we had a community meeting about CKAN’s Debian packaging approach and how it could be improved. As a result of the meeting, two key changes were recommended:

  • Multiple CKAN instances on the same machine needed to be supported
  • A Python virtualenv should be created for each instance so that different instances could run different versions of extensions

I’m pleased to announce that both these changes have been made in CKAN 1.5 along with full support for setting up Solr search as part of the install as well as the following improvements:

  • Eggs are cached by the build system for quicker builds
  • You don’t need to be root to install CKAN (you now use sudo)
  • All the old bugs/permission issues/edge cases I was aware of are fixed
  • All the subtleties I’m aware of are documented
  • You can now specify the CKAN instance hostname when you create an instance rather than fiddling with /etc/hosts and apache configs
  • Solr and Postgres are now optional to support the case when you host them on different machines such as with datacatalogs.org
  • The packaging version an instance was created with is saved in /var/lib/ckan/$INSTANCE/packaging_version.txt so we have a chance of doing upgrades in future if we wish

The whole process of installing CKAN from packages and setting up new CKAN instances is now documented in detail the URL below:

CKAN Package Install: http://docs.ckan.org/docs/ckan/en/ckan-1.5/install-from-package.html#run-the-package-installer

I won’t duplicate the instructions or advice in the docs here, but I do want to talk a bit more about how I built the packages. The CKAN packages are built with a tool I wrote for the purpose called “buildkit”. I’m pleased to announce that this tool is now available as an open source tool too that anyone wishing to package CKAN themselves, rather than relying on the official packages, can do so. There are some instructions on how to do this at the URL below:

Building and testing CKAN Ubuntu Packages: http://packages.python.org/buildkit/manual.html#example-building-and-testing-the-ckan-package-install

The important thing is that the entire process can be boiled down to a single script. CKAN 1.5 includes such a script, and you can see it here:

https://bitbucket.org/okfn/ckan/src/release-v1.5/build.sh

If you are interested in building Debian packages for CKAN, have a go at installing buildkit and then try to run the script above. You can get buildkit from the buildkit page on the Python Package Index or you can clone the git source with git clone git://git.3aims.com/buildkit. That should get you started!

CKAN Workshop at ODG Camp in Warsaw

James Gardner - October 18, 2011 in Events

The CKAN team will be running a hands-on workshop at the Open Government Data Camp 2011 in Warsaw tomorrow before the main conference begins on Thursday.

We will give demos of the CKAN platform and will also cover installing, customising and extending a CKAN instance. Additionally, we’ll be there to talk about using the API, technical architecture and product roadmap, as well as showcasing our very own community instance: thedatahub.org.

Come along to get involved, ask questions and find out more what open source open data platforms are!

You can sign up at our eventbrite page here, the workshop begins at 11:00 am and the team will be around until 18:00 to answer questions. We’ll also be at the main conference days on Thursday and Friday so hope to talk to many of you there.

CKAN at PyCon UK

James Gardner - September 23, 2011 in Data, Events

The CKAN team are oragnising a talk, a workshop and a sprint at the forthcoming PyCon UK conference in Coventry this weekend.

We plan to divide the CKAN slot into two parts:

  • Talk – 30 mins description of what open data is all about, how CKAN can help and the technical background around how we use Python and support extensions
  • Workshop – 20 mins helping people to get up and running with their own instance (perhaps adding some data and changing the theme too if there is time)
The Sprint will run from 14:20 on Saturday until 5pm on Monday and we’ll be concentrating on:
  1. Geospatial features
  2. Our “webstore” for hosting data
If you are coming to the conference feel free to drop in.
More information:

Python Web Developer (September 2011)

James Gardner - September 12, 2011 in Events, Jobs

The OKF is looking for Python for more web developers interested in open data to work on CKAN. CKAN (as I’m sure you’ll know) is a our web-based product built in Python using SQLAlchemy, Pylons and other libraries. It allows users to submit, search for and find open datasets. As well as powering The Data Hub, CKAN is the catalogue behind the UK Government’s high profile data.gov.uk website and the European Union’s Public Data site. It also powers over 20 other catalogues around the world including those in Norway, Holland and Finland, with more on the way. If you are a really good web developer with a keen interest in open data, and enjoy working in Python on open source products, we’d love to hear from you. As well as CKAN and depending on your skills you might also like to work on:
  • The WebStore – our SQLite based solution for allowing people to process data online and plot the results
  • The DataHub – our public catalogue which will include more social features
  • Geospatial features – such as plotting data on maps, and harvesting geospatial data from other sources
  • Drupal integration
Requirements Essential:
  • Web app development experience in python (experience with SQLAlchemy, Pylons, Flask highly desirable)
  • PostgreSQL
  • Linux (preferably Ubuntu)
  • Enthusiasm about open data and open knowledge
Bonus points for any of these (not essential though):
  • Drupal
  • Geo-spatial work (OpenLayers, OGC standards, CSW servers, WMS servers etc)
  • Experience with agile methods
  • Sysadmin or Devops skills
  • Debian packaging skills
  • Redis, Solr, RabbitMQ
  • Semantic web/RDF
  • Expert JavaScript, jQuery and CSS
We are looking to hire solid developers, particularly those who take pleasure in finishing code and seeing it deployed. About the organisation The Open Knowledge Foundation is a not-for-profit founded in 2004 that builds communities and tools for the creation, sharing and use of open knowledge – any information that people can use, reuse and redistribute. The Foundation is a great place to work. It’s a small team, so there’s opportunity to make a big difference. There’s always lots of stuff going on; interesting people popping in and out all the time; press exposure; quite a broad remit; and open-ended possibilities. In keeping with the spirit of the organisation, you can find out a lot about the different projects using Google or from the OKF website. Contact Info: To apply to work in the CKAN team please send your CV or blog URL to james.gardner@okfn.org and we’ll take it from there. We are flexible on employee versus contractor but we normally contract. We are also running a sprint at PyConUK on 24th September so feel free to come along. If you fancy getting involved in CKAN straight away, why not set up an instance, join the CKAN-discuss mailing list, look at some of the outstanding tickets and start contributing as a community member? We’re very happy to hire community members who have already shown their worth too.

CKAN Vision Update

James Gardner - July 8, 2011 in News, Roadmap

For the last 6 months the CKAN team has been expanding and growing, focusing on catalog interoperability and metadata revisioning/moderation technology amongst other features. At the excellent OKCon in Berlin we really crystallised our mission. Below I’m going to set out our plans and roadmap for the next 6 months.

Key things we took from the conference:

  • CKAN is now firmly part of people’s thinking about open data
  • This space is rapidly evolving and demand is growing for tools like CKAN
  • That we can’t take for granted that everyone will see that its natural — and necessary — to build an open data platform using open tools and software

At the Open Knowledge Foundation we firmly believe that it is only possible to fully realise the goals of open data if the tools and infrastructure to collect, clean up, publish, revise, improve and understand data are free to share and improve on too.

CKAN, our Data Hub software, is already a key tool but we can do much more to make it attractive and useful to both institutions and ordinary data wranglers. We also know that we want to develop our collaboration with other organizations working in the software and open knowledge space, in particular potential partners who could deliver CKAN as part of their offering (pleas get in touch at info@ckan.org if that is you!).

To help support this goal we also want to see a shift of focus in the CKAN team over the next 6 months to deliver value to two different sets of people in particular:

  • Governments, councils, universities and any other organsisations who will want to run their own data catalog
  • End users who either want to get hold of some key data or who otherwise want to engage in a data community interpreting data

At the moment having a single CKAN has meant we haven’t been focusing enough on either group. We recognise that the site that existed at ckan.net didn’t have the necessary polish on either the user interface or the social activity features it needed and the CKAN software we deploy to countries and cities doesn’t integrate as well as we’d like with the existing infrastructure those countries and cities have.

We’re going to address these issues by creating one flavour of CKAN specifically for governments, universities, cities etc (it is proposed these will be called portal instances) and one CKAN instance that is designed to cater specifically to data users (this will be called the data hub).

In summary our plan going forward is:

CKAN Portals

The reality is that the portal owners we already work with (from governments, cities, etc) very rarely install CKAN standalone. More often, CKAN is integrated into their existing WordPress, Drupal, Java or other content management system infrastructure. CKAN has good facilities for integration with such systems but we know we can do even better.

One of the pressures on the current CKAN team is that different clients usually require some degree of customisation work and often the work is specific to the business practices of those clients. As a result the code we produce for them doesn’t necessarily benefit the wider CKAN community. The more time we spend building client-specific customisations, the less time we have to put together the free software platform we’d like to see the world’s open data built on.

There is an opportunity for the CKAN team here though too. If we can provide a fantastic integration API for the core products we see portal owners using, we can work with the commercial partners who deliver those portals to help them integrate CKAN. That would mean the commercial partners could deal with the client-specific customisations they are expert in and the CKAN team can concentrate on delivering the core features that would benefit all those partners.

The most well-known instance of CKAN is http://data.gov.uk/ which already runs on a Drupal frontend. The Cabinet Office in the UK have agreed to open source the Drupal code under a BSD license so we’ll start by building top class integration with the current data.gov.uk Drupal codebase, then aiming to support both Drupal 6 and 7 over the next couple of months. You can see the Drupal CKAN page here.

Once we’ve demonstrated that Drupal and CKAN can integrate well, we plan to go further and adopt Drupal as the official CKAN front-end for CKAN portals. After all, there’s no point in the CKAN team duplicating coding work building a front-end in Python if we already have a great one in Drupal.

There are other communities in Spain and Germany who are also interested in building a Drupal-CKAN portal and who may be able to contribute to the effort, and there are still others who have their own Drupal data tools which we may be able to build it too so this seems a win-win situation. After all, CKAN’s aim is to ensure that open data is set free, not to ensure that a Python-only solution is used for open data catalogs.

The knowledge we gain from formal Drupal integration will also help us integrate with other CMSs so that CKAN can become the standard catalog plugin for anyone wanting an open data system.

Having CKAN used as part of many existing systems is one thing, but the real value comes from when you can link them up. That’s where the hub comes in so let’s look at that next.

Linking up data in The Data Hub

Users of CKAN have always been confused between ckan.net (the data website) and ckan.org (the software project site). The CKAN team have always wanted ckan.net be the central hub for all open data but if users get confused by the name we won’t do that very well.

To begin to address the problem I’m pleased to confirm that after discussion on the CKAN discuss list the ckan.net site has been renamed to The Data Hub and can now be accessed at thedatahub.org.

As part of the change you can also expect a vastly improved user experience over the next 6 months, now that we can focus thedatahub.org specifically on end users and not on our portal owners.

Having an improved name and user interface is one thing, but the true power of the CKAN software will only be realised if each CKAN instance can be connected up with others. We already have interoperability functionality delivered in a number of ways, but most notably with CKAN’s harvesting functionality.

We want to see a world where larger CKAN portals integrate data from smaller child portals and in turn thedatahub.org pulls in data from all CKANs so that end users can find or submit data easily without needing to know about the existence of the original portals.

We plan to implement functionality that allows any comments, notes, improvements or clean-ups made to a dataset on any CKAN instance to be federated to all CKAN instances with a copy of that dataset so that we build a truly distributed network of people working on data.

This in turn would mean that portal owners who choose to use CKAN as a plugin to their platform will automatically get access to an entire community of data curators, wranglers, analysts and users who will collaboratively improve the data the individuals host.

Example

As an example, consider this case. CKAN currently runs the Manchester data catalog and the UK data portal and may soon and may soon be running a Welsh catalog too.

You can imagine a world where users in Wales enter data on wales.data.gov.uk according to the organisational structure of the Wales Assembly Government, users in Manchester enter data on datagm.org.uk according to the structures of the Manchester council and everyone else enters data directly on data.gov.uk according to the departmental structure of the UK government.

The data.gov.uk portal would then “pull in” datasets from both Manchester and Wales so that the data didn’t need entering twice. On the data.gov.uk portal people would be able to find data for the whole of the UK, even though some of it came from Wales and some from Manchester.

CKAN also hosts a site called publicdata.eu where we pull in datasets from all over Europe, including data.gov.uk. You can imagine that publicdata.eu could then itself be pulled into thedatahub.org so that we have a complete hierarchy of datasets all being pulled into each other.

Now imagine what would happen if a Welsh user left a comment in the Welsh portal. That comment could also be sent to all of data.gov.uk, publicdata.eu and thedatahub.org (since all hold a copy of the original record). The data.gov.uk portal and thedatahub.org may be keen to publish Welsh comments and include a copy on the dataset page in the UK portal.

The publicdata.eu site might decide that its target users aren’t interested in comments so would ignore the notification. It isn’t just the Welsh portal that could generate comments though. Someone could leave a comment on thedatahub.org too and that would be sent through the network too. And of course it isn’t just comments, these notifications could be sent for lots of different activity.

In Conclusion

What we therefore want to build is a powerful, distributed set of nodes that link dataset metadata together (much like the current web does for HTML documents) so that all users can share in, and contribute to a free world of open data. I hope that is something you are as excited to be involved in as I am.

Python Web Expert Jobs

James Gardner - December 16, 2010 in Uncategorized

The Open Knowledge Foundation is looking for really good Python web developers to join our organisation to work on CKAN, our open source web-based catalogue system built on Pylons.

As you probably already know, CKAN allows users to submit, search and find open data packages. As well as powering ckan.net, CKAN software runs the UK Government’s data.gov.uk site (which is often in the news, most recently with the UK release to the public of all government spending over £25,000), and it also powers over 20 other catalogues around the world with more on the way in Norway, Holland and Finland. Providing central places where people can register, find, and download datasets is a key part of building the web of data.

The Open Knowledge Foundation is a great place to work. It’s a small team, so there’s opportunity to make a big difference. There’s always lots of stuff going on; interesting people popping in and out all the time; some press exposure; some activism; quite a broad remit; and open-ended possibilities. In keeping with the spirit of the organisation, you can find out a lot about the different projects using Google.

Week to week, work in the CKAN team generally involves a Monday morning team catch-up via Skype where we each update other team members on what we achieved the previous week and the areas we plan to work on during the coming week. There are often discussions on our mailing lists about new functionality that has been suggested by someone in the community or new features required by a particular government organisation. The tasks are then broken down into tickets on our trac so that the community can see what we are planning. We are all working towards a common goal so to a large extent team members usually choose what they want to work on in a particular week. Those of us that live in similar locations also try to work physically together as much as possible.

At the moment the core team consists of Rufus Pollock, James Gardner, John Bywater, Nils Toedtmann, Friedrich Lindenberg, David Read, Seb Bacon, Richard Pope, Will Waites and others. Between us we have skills in Python, Pylons, PostgreSQL, SQLAlchemy, Genshi, Solr, AMQP, cloud server deployments, project management, analysis and design, the semantic web and RDF. The more of these you have experience with the better but Python, Pylons, SQLAlchemy, PostgreSQL, the web (including REST) are pretty essential. In other words, we are looking for more than your typical code monkey.

The work is varied and interesting but here’s a snapshot of the sort of things we are doing this month:

  • Creating a JSON-P data proxy API to allow browser-based mashups to be build directly against data in CKAN
  • Providing facilities for geo-spatial data and the ability to harvest information from INSPIRE metadata records as part of the UK Location Programme
  • Moving functionality into extension packages so that the codebase can continue to be maintainable whilst supporting many different customers
  • Refactoring the tests so that they pass very quickly, facilitating a more test-driven approach
  • Building a more community-focussed site at ckan.org and making it easier to run CKAN instances
  • Changing the data.gov.uk site to help sepearate the concepts of data owners and organisations which help make the data available

If you are a really good developer with a keen interest in open data, and enjoy working with open source, we’d love to hear from you. Whilst we are a not-for-profit we recognise the value really good developers can bring to a team and your pay would reflect this. It would be great if you were based in London but remote working is perfectly acceptable too. We have team members in Berlin, Gloucester and Edinburgh as well as London.

To find out more about CKAN you can clone the CKAN mercurial repository from ckan.org, follow the latest install README, browse the mailing list archives or try submitting any real data sets you know of to ckan.net.

To learn more about the Open Knowledge Foundation visit http://okfn.org.

To apply for the position please send your CV or blog URL to james.gardner@okfn.org and we’ll take it from there. We are flexible on employee versus contractor but we normally contract.

Also if you know of any excellent Python developers who you think may be interested in this position, please forward them this post.

Open Data Day: Announcing CKAN Data Proxy

James Gardner - December 4, 2010 in Uncategorized

Today is Open Data Day where developers and enthusiasts from over the world get together to make improvements to the world of open data. The London event was organised by myself and Rufus Pollock and held at the Trampery (thanks to the generous support of Charles Armstrong and Trampoline Systems).

The main thrust of the London event was to make improvements to CKAN, the open data catalogue. One of the problems Rufus and I have wanted to see solved for a little while is that it currently isn’t possible to build mashups against the data held in CKAN unless you set up your own server to proxy the data.

To solve this problem we wrote a small WSGI application called jsonpdataproxy which we’ve installed on Google App Engine at http://1.latest.jsonpdataproxy.appspot.com/. This little application allows you to specify the URL of a small excel file and it will return some JSON-P data for the sheet you’ve specified.

The significance of this little tool is huge. It means that you can now build browser-based applications directly against data sets without needing your own server. Anyone can do it!

Here is an example URL:

http://1.latest.jsonpdataproxy.appspot.com/?sheet=1&indent=4&url=http://research.dwp.gov.uk/asd/asd4/r1_values.xls

If you click it you’ll see a JSON-P data structure with a response key showing some key information about what is being returned (including the name of the sheet) and a data key representing the rows in the spreadsheet.

Because this is JSON-P you can get this data from any domain using standard jQuery. Here’s an example you can try yourself which will show the data in a table:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" dir="ltr">
<head>
<title>API Examples</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<script type="text/javascript" 
    src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js">
</script>
</head>
<body>
    <h1>Dynamic Data Table via a Proxy</h1>
    <div id="result"></div>
    <script type="text/javascript">
        $.ajax({
            url: 'http://1.latest.jsonpdataproxy.appspot.com/',
            type: 'GET',
            data: {
                'url': 'http://research.dwp.gov.uk/asd/asd4/r1_values.xls',
                'sheet': '1',
            },
            success: function(data){
                if (data['error'] !== undefined){
                    alert(data.error.title);
                } else {
                    var content = '<table>';
                    for (var i=0; i<data.response.length; i++) { 
                        content += '<tr>';
                        for (var j=0; j<data.response[i].length; j++) { 
                             content += '<td>'+data.response[i][j]+'</td>';
                        }
                        content += '</tr>';
                    }
                    content += '</table>'
                    $('#result').html(content);
                }
            },
            error: function() {
                alert('Failed to get spreadsheet data.')
            },
            dataType: 'jsonp',
            jsonpCallback: 'callback'
        });
    </script>
</body>
</html>

If you download this example, save it as simple.html and open it in a browser you’ll see the data from the Excel file!

Next step for us: integrate this into CKAN’s package page to begin to be able to provide data previews!