Last weekend saw the Development Data Challenge
, a two-day hackathon to develop projects using international development data organised by PublishWhatYouFund
. The event was expertly chaired by the excellent Mark Brough, research officer at PWYF, and kindly hosted by the Guardian
. About twenty people attended, with programmers and non-programmers present in about equal numbers.
A range of excellent questions had been submitted in advance by Guardian readers and others, and proceedings began by discussing these, grouping them, and winnowing out a few on which we felt we might make some headway in a couple of days. As well as other sources, a number of potentially useful datasets had been collected on the DataHub
Mapping water in South Sudan
The group I found myself in included my new colleague Dominik Moritz, a student from Germany who is showing his mettle as an intern developer with CKAN, and mapping expert Sam Larsen, among others. Our brief was to look at the effectiveness of aid, originally in Malawi - because someone had a good source of Malawi data - but we changed course dramatically with the arrival of Sara-Jayne Farmer (above), a Brit now based in New York who has worked with UNDP
and had some data from the world's newest country, South Sudan, that she was keen to explore. The data included location of settlements and of wells and other water sources, so a natural question was: how close are the settlements to water sources? As Sara pointed out, if you are 6 miles from water, then walking to fetch it is a 4-hour round trip - which probably means that if you are a girl you don't go to school.
A low-cost intervention?
While our coders worked on plotting the data we had, I searched for hydrological data that we could use to enrich the results, and Sara researched licensing conditions that would enable us to publish the data. A demo
of the resulting code
shows how much can be done, as well as how interpretation inevitably shows up gaps in the data. With it you can view, for example, settlements over 5000m from a water source (in a single state, Central Equatoria). But how many people live there? We did not have population data for the settlements, so we can't say. Overlaying a hydrological map (the image layer) shows that these most affected villages are all sitting on aquifer. On the surface, it certainly seems that drilling wells here would be a low-cost way to improve access to water for those who most need it. (On the other hand perhaps, for example, water there is so accessible that the villages have hand-dug wells that are not recorded in the data.)
The real challenge
Dominik has written up the technical aspects
of the coding process on the School of Data blog, and Laura Newman has written an overview
of the other projects on the day. The take-home message from all of them is simple: for development projects to deliver aid effectively and reliably, we need lots of data - financial data, health data, demographic data, economic data, even, as we've seen, geological data. As more and more data becomes freely available, the real challenge is to make full use of it.