What CKAN Made Possible: A Sixteen-Year Open Data Story
From NYCBigApps to federal metadata standards β a co-founder's story of why CKAN's real value was never the portal
The rise of artificial intelligence presents a major inflection point for open source data platforms like CKAN. However, my view is that CKAN's response should not be to define a collective "AI standard," but rather to double down on its core mission as an agnostic, high-provenance data management system that feeds the machine learning ecosystem.
I argue that attempts to shackle the project to a single, common way of integrating AI would make the CKAN community more brittle or weak. Instead, I believe a variety of approaches to AI integration will ultimately be more powerful for the ecosystem.
CKAN is a data management platform that should remain agnostic regarding how its data is used. The project's success is tied to its purpose, which is fundamentally about CKAN itself, not about the future of AI.
CKAN's true contribution to the AI era lies in its ability to provide high-quality, trustworthy data β specifically, data with provenance: traceable, verifiable, and reusable for machine learning.
This concept of provenance is vital, as it is associated with the learnability of the data itself and its reusability for machine learning. This focus requires that CKAN be machine operable for machine learning. My involvement in groups like MLCommons and the Croissant meetings reflects this commitment to ensuring CKAN data is structured and available for machine learning frameworks.
I contend that the generative nature of AI β which I call "warm reasoning" or entropic reasoning β must be confined by "cold reasoning": the verifiable, defined facts and statistics of reality.
Just as statistics historically served as "State Data" (or Statistic) to build confidence in the current state of a jurisdiction, CKAN data manages the constraints of reality. This foundation of fact-based, cold reasoning is essential for managing state resources, particularly in areas like economics.
I believe the future of AI is not centralized with organizations like OpenAI or Anthropic. Instead, it has to do with people and organizations owning their own understanding locally.
In this decentralized future, CKAN serves a crucial function: it becomes the domain of knowledge or the domain of data for a local jurisdiction or company. By intrinsically recording and verifying data, CKAN ensures that information within a domain of control aligns with local expectations and usefulness.
CKAN's value is not AI itself β it is the essential foundation for AI: a data storage capability that allows for provenance and provides cold reasoning to confine AI's warm reasoning.
The community needs to understand its own autonomy and its own accountability for forging ahead and finding the many pathways by which the CKAN project will continue to strengthen the ecosystem of open, transparent, interoperable, and standards-based ways for end users and platform managers to gain substantial value from CKAN as an open source data management platform.
This is an accountability the ecosystem players know well. The drive to innovate and build apart, even as we build together. There is no single solution, only the many worlds of possible solutions that only the 'many minds' of an open source community can bring into perfect superposition all at once.
We will co-constructively optimize and make CKAN the only 'superpositional' choice for a world that might otherwise collapse its own wave function and then 'singularly' provide just the one way forward that most might want, but many still would dismiss out of hand.
This is an opinion piece. The views expressed are the author's own. CKAN is an open source project stewarded by Datopian and Link Digital, with the Open Knowledge Foundation as purpose trustee.
From NYCBigApps to federal metadata standards β a co-founder's story of why CKAN's real value was never the portal
Steven De Costa, CKAN co-steward since 2012, reflects on the reciprocal value at CKAN's core, the impact of AI on open data, and why CKAN is built to last another 20 years.