The rise of artificial intelligence presents a major inflection point for open source data platforms like CKAN. However, my view is that CKAN's response should not be to define a collective "AI standard," but rather to double down on its core mission as an agnostic, high-provenance data management system that feeds the machine learning ecosystem.
The Power of Agnosticism
I argue that attempts to shackle the project to a single, common way of integrating AI would make the CKAN community more brittle or weak. Instead, I believe a variety of approaches to AI integration will ultimately be more powerful for the ecosystem.
CKAN is a data management platform that should remain agnostic regarding how its data is used. The project's success is tied to its purpose, which is fundamentally about CKAN itself, not about the future of AI.
CKAN's Indispensable Role: Provenance and Operability
CKAN's true contribution to the AI era lies in its ability to provide high-quality, trustworthy data β specifically, data with provenance: traceable, verifiable, and reusable for machine learning.
This concept of provenance is vital, as it is associated with the learnability of the data itself and its reusability for machine learning. This focus requires that CKAN be machine operable for machine learning. My involvement in groups like MLCommons and the Croissant meetings reflects this commitment to ensuring CKAN data is structured and available for machine learning frameworks.
Bounding AI: The Necessity of Cold Reasoning
I contend that the generative nature of AI β which I call "warm reasoning" or entropic reasoning β must be confined by "cold reasoning": the verifiable, defined facts and statistics of reality.
Just as statistics historically served as "State Data" (or Statistic) to build confidence in the current state of a jurisdiction, CKAN data manages the constraints of reality. This foundation of fact-based, cold reasoning is essential for managing state resources, particularly in areas like economics.
🌡️ Warm Reasoning
The generative nature of AI β entropic reasoning that must be confined by verifiable facts and defined statistics of reality.
🧊 Cold Reasoning
The verifiable, defined facts and statistics of reality that CKAN data provides β essential for managing state resources and grounding AI outputs.
The Future is Local Ownership
I believe the future of AI is not centralized with organizations like OpenAI or Anthropic. Instead, it has to do with people and organizations owning their own understanding locally.
In this decentralized future, CKAN serves a crucial function: it becomes the domain of knowledge or the domain of data for a local jurisdiction or company. By intrinsically recording and verifying data, CKAN ensures that information within a domain of control aligns with local expectations and usefulness.
CKAN's value is not AI itself β it is the essential foundation for AI: a data storage capability that allows for provenance and provides cold reasoning to confine AI's warm reasoning.
What the CKAN Community Needs Today
The community needs to understand its own autonomy and its own accountability for forging ahead and finding the many pathways by which the CKAN project will continue to strengthen the ecosystem of open, transparent, interoperable, and standards-based ways for end users and platform managers to gain substantial value from CKAN as an open source data management platform.
This is an accountability the ecosystem players know well. The drive to innovate and build apart, even as we build together. There is no single solution, only the many worlds of possible solutions that only the 'many minds' of an open source community can bring into perfect superposition all at once.
We will co-constructively optimize and make CKAN the only 'superpositional' choice for a world that might otherwise collapse its own wave function and then 'singularly' provide just the one way forward that most might want, but many still would dismiss out of hand.
SC
Steven De Costa
Co-Steward, CKAN · Chairman, Link Digital
Steven De Costa is a co-steward of the CKAN project and chairman of Link Digital, one of the world's leading CKAN implementation specialists. He has represented CKAN at the White House Open Data Roundtable and contributes to international machine learning data standards through MLCommons and related forums.