Blog

The Creator of CKAN, Rufus Pollock, on Open Data, AI, and the Tools We Need to Make Sense of It All

Rufus Pollock, creator of CKAN and founder of the Open Knowledge Foundation, shares the origin story of CKAN, lessons from 25 years in open data, and a vision for building sensemaking tools in the age of AI.

Yoana Popova
06 Jul 2025
Share

🔑 Key Takeaways (TL;DR)

➤ CKAN was born from a personal need: Rufus wanted data to answer serious questions about the world—but it was hard to find or locked away.

➤ Open data alone isn’t enough: We need tools and infrastructure that help us make sense of it.

Inspired by open source and CPAN: CKAN aimed to do for data what Debian and Linux package managers did for software.

➤ Sensemaking is the next frontier: The future isn’t just about more data—it’s about helping people understand and act on it.

➤ AI is both a risk and an opportunity: It could centralize power—or it could help us clean, enrich, and contextualize data at scale.

➤ The real challenge is the wisdom gap: Can we learn to use our most powerful tools wisely, collectively, and for good?

Who is Rufus Pollock?

Rufus Pollock is a leading thinker, technologist, and entrepreneur in the global open data movement. He is best known as the original creator of CKAN — the world’s leading open-source data management system — and the founder of the Open Knowledge Foundation (OKF), an international nonprofit that pioneered open data infrastructure and advocacy since 2004.

Over the past 25 years, Rufus has helped shape the global open data ecosystem. He served as an adviser on open data policy to the UK Government, the US White House, the World Bank, and the UN. He is also the creator of Frictionless Data — a framework of lightweight specifications and tooling to make working with data easier, faster, and more interoperable — and founder of DataHub.io, an open platform for discovering and sharing datasets.

Among his many contributions:

Frictionless Data: He created the Frictionless Data standard and ecosystem, enabling effortless data publishing, sharing, and validation using lightweight specifications like Data Package and Tabular Data Package.
DataHub: He launched DataHub.io, a community-driven data catalog powered by CKAN and Frictionless standards.
Thought leadership: He has written extensively on information justice, digital power, and public infrastructure in books like The Open Revolution (openrevolution.net) and essays on rufuspollock.com.
Open Definition: He was instrumental in formulating the original Open Definition, which underpins many legal and policy frameworks for open data worldwide, and contributed early work that helped seed Creative Commons.

Currently, Rufus is:

President at Datopian — a company delivering modern data portals built on CKAN, PortalJS, and open standards.
Co-founder of Life Itself — a research and action institute pioneering new ways of living and systems change.

At CKAN Monthly Live #33, Rufus returned to the community to reflect on CKAN’s origin story — how it began with a personal need to find usable, trustworthy data; how it evolved from a wiki to powering national open data portals; and why we need better tools for sensemaking in an age of information overload and AI disruption.

“CKAN was never meant to become a big open source project. It started because I needed something that didn’t exist — a CPAN for data.”

The Origin Story: Why CKAN Was Created

In the late 1990s, Rufus—then a curious teenager—was asking big questions:

“How many people can the Earth support? Are we going to run out of fossil fuels? What’s the population going to be in 2050?”

He found books filled with tables and graphs. But the raw data—the numbers behind the charts—was either impossible to find or locked behind expensive paywalls.

“You’d read a book with loads of data... but when you went looking for the datasets, they just weren’t available. You could go and find reports, but they might cost like $10,000 or $20,000.”

He realized that the problem wasn’t just knowledge—it was infrastructure. The data wasn’t open, and there were no tools to manage and share it efficiently.

While studying at Cambridge, he encountered open-source tools like Linux and Debian. The experience of using a community-powered, modular system blew his mind.

“Wouldn’t that be possible for data?”, he asked.

CKAN wasn’t initially planned as a product. It began as a tool to power a single site—ckan.net (now datahub.io). The idea was simple: what if data had the same collaborative infrastructure as software?

CKAN: The CPAN of Data

The origins of CKAN lie in a simple but powerful analogy: what CPAN did for software, CKAN could do for data.

“CKAN is named CKAN because of CPAN.”

➤ The name CKAN comes from CPAN — the Comprehensive Perl Archive Network.

➤ The first version was a wiki (built in MoinMoin), then rewritten in Python using Pylons.

➤ Official launch: Creative Commons Summit 2007 in Dubrovnik.

➤ Originally built to run the catalog site ckan.net (now reborn as datahub.io).

In the early 2000s, Rufus was inspired by the design of open-source ecosystems like Linux and Debian—especially the idea of reusable software components managed through a package registry. CPAN, the Comprehensive Perl Archive Network, stood out as a model: a centralized catalog where developers could publish, discover, and build upon each other’s code.

He envisioned something similar for datasets:
→ A registry where data could be published once, discovered easily, and reused globally.

CKAN began humbly. The first version was a wiki, built with MoinMoin, running the catalog site ckan.net (now reborn as DataHub.io). In 2007, it was rewritten in Python using Pylons, and officially launched at the Creative Commons Summit in Dubrovnik.

At the time, there was no grand roadmap. CKAN was not created to be a global open-source standard. It was simply a tool to solve a problem: how to make datasets findable, reusable, and trustable.

But the timing was right.

How CKAN Became a Global Tool

Around 2008–2010, the open data wave gained political traction and the open data movement was accelerating. Governments in the US, UK, and elsewhere needed working tools—and CKAN was ready. Governments began reaching out:

“We need something like this now.”

CKAN was already mature, working, and open source—so they adopted it. From the UK and US to Australia and Canada, it quickly became the backbone of dozens of national open data portals.

CKAN’s Global Expansion

From 2009 to 2014, CKAN rapidly evolved:

Used by governments across the world—UK, US, Canada, Finland, Australia, and more.
Major updates (CKAN 2.0+) introduced a powerful plugin architecture.
Competing with proprietary data platforms, CKAN emerged as the leading open-source data management system.

“It started as a tool for one site, and became a global infrastructure.”

People weren’t just publishing open government data anymore. They were using CKAN for:

Internal data governance
Academic data sharing (FAIR)
Machine learning pipelines
NGO data workflows

Rufus credits the community—including early contributors like Adrià Mercader and Steven De Costa—for growing CKAN into what it is today.

“CKAN today is this mature, powerful, extensible platform. It’s the world’s leading open source data management system.”

Open Data Alone Isn’t Enough

❓ Why More Data Isn’t Always Better

At the beginning of the open data movement, many believed something simple: more data → more insight → better action.

“If we just had more data… we’d get knowledge. From knowledge would come insight. And from insight, we’d get action.”

But reality didn’t follow that path.

Why?

People interpret data through their own mental models.
Access to knowledge doesn’t guarantee understanding—or change.
In some cases, presenting evidence makes people dig into false beliefs more deeply (“backfire effect”).

Data doesn’t speak for itself. It needs interpretation. Framing. Meaning.

“Information without meaning creates even confusion. Openness requires open minds.”

Sensemaking: The Missing Layer

A central theme of Rufus’ talk was sensemaking—how individuals and societies interpret information and decide what to do.

He cited the story of a firefighter who survived by lighting a counter-fire—while others ran and died. Why? They couldn’t make sense of what he was doing. Their mental models failed.

“We’re always making sense—especially in times of crisis or change. And our tools need to support that.”

This insight reshaped his own work:

CKAN was part of the solution: infrastructure for data.
But we also need tools for narratives, context, stories, and human connection.
That’s why he co-founded Life Itself, a collective exploring deeper cultural transformation.

The Meta-Crisis: Why This All Matters

From climate collapse to AI risk, we’re facing multiple interwoven crises.

Rufus calls this the meta-crisis—a breakdown in the systems we use to make sense of the world.

Ecological limits are being breached.
Our technologies outpace our wisdom.
Trust in institutions is eroding.
Polarization is rising.

And data? It’s part of the solution—but only if we use it well.

AI + CKAN: A Double-Edged Sword

Rufus addressed the emerging role of AI with nuance:

The Risks

AI systems can centralize power.
Closed models pose governance challenges.
Children being raised by AI tutors could reshape human development—just as social media already has.

The Opportunities

Automating metadata cleanup.
Enhancing discoverability.
Supporting storytelling and context generation.
Improving data workflows, especially for under-resourced teams.

“The AI + open data combo is incredibly powerful—but it must be democratic, ethical, and human-centered.”

The Funding Problem: A Deeper Barrier

Open data and open source still suffer from one key challenge:
unsustainable funding.

Rufus argued that current models (like crowdfunding or corporate donations) aren't enough. He proposed a radical solution:

Remuneration Rights

Inspired by how Spotify or Netflix pay creators.
Public funds would pay for open infrastructure and open data—based on usage, not patents or monopolies.
Governments already fund closed software—why not open?

This could scale to music, software, and even medicine.
The full argument is in his book: The Open Revolution (free to download).

What's Next for CKAN?

CKAN has moved beyond open data. It's now used for:

Internal data governance
FAIR academic data
Machine learning pipelines (e.g., Croissant profiles)
Custom enterprise use cases

The platform remains modular, extensible, and open by design.
And the vision? A universal “data fabric” for trustworthy, contextual, human-readable data.

“We don’t just need data. We need ways to tell stories with it, to make sense of it, and to act on it—together.”

Someone Built a Sheet Music Directory on CKAN. I Did Not See That Coming.

In Category on 24 Jun 2026

The Most Unexpected CKAN Use Case I've Ever Seen: A Sheet Music Directory With AI Metadata

Wolfgang from Ondics built an open source sheet music catalog on CKAN — with AI metadata generation, YouTube playback, and cross-instance sharing. Here's how.

In Category on 23 Jun 2026

See What's New in the CKAN World: Ecosystem Catalog, HDX Spotlight, New Community Forum — and CKAN Running a Sheet Music Directory

A recap of what the CKAN community covered on June 17, 2026: a live demo of the new CKAN Ecosystem Catalog, a deep-dive into HDX Tabular Data Endpoints, the launch of the new community discussion forum — and, surprise surprise, a very unexpected use of CKAN as a sheet music directory with AI-assisted metadata. Yes, really.