Blog

What CKAN Made Possible: A Sixteen-Year Open Data Story

From NYCBigApps to federal metadata standards — a co-founder's story of why CKAN's real value was never the portal

datHere's tagline is AI-Ready Data Infrastructure you can build on — with standards-based, best-of-breed, open source solutions to make your data Useful, Usable and Used. Every word in that tagline is the product of hard lessons learned over sixteen years of open data work. CKAN has been at the centre of that journey from the beginning — not as a product we happened to use, but as the platform that made the work tractable in the first place.

This is that story.

Public Service Ambitions

Public service was a natural calling for me, growing up with parents who were lifelong civil servants. I was driven by a desire to modernize and streamline the overwhelming paperwork that defined their daily professional lives.

Following my graduation in 1989, I balanced university teaching, my Master's studies, and part-time work at IBM Philippines. I even passed the civil service exam, as I wanted to apply what I learned to modernizing government. However, my parents had become disillusioned with the public sector due to the instability caused by political shifts. My father, who served as Manila's City Personnel Officer, was punitively reassigned to Manila Zoo after he blocked the hiring of unqualified, politically-connected individuals. It took a four-year legal battle with the Civil Service Commission for him to be vindicated and restored to his rightful role.

Because of these experiences, my parents urged me toward a private-sector career. They gave me a one-year deadline to secure a permanent role at IBM, but a hiring freeze redirected my path. Though I insisted "the pot of gold is in our own backyard" — my parents insisted and I emigrated to the United States in the summer of 1991. I landed in NYC and stayed with my brother in Queens. Later that year, I got hired and sponsored by Axiom Systems Group and started my career in the private sector.

Before Open Data Had a Name

Before Sami Baig and I started building together, I got bitten by the semantic web bug with my first open source project — Semantic MediaWiki. I managed to parlay that passion into my job and started the Knowledge Engineering Practice at TCG. We focused on the applications of the Semantic Web in the enterprise — linked data, ontology management, and critically, metadata. I also led the Goldensource Enterprise Data Management practice.

The question that kept surfacing, across every client engagement, was the same one: the data exists, but nobody can find it, trust it, or use it at scale. That was true in financial services in the mid-2000s. It turned out to be equally true in government data a few years later.

So when I heard about Mayor Bloomberg's NYCBigApps initiative in 2009, I thought this would be the perfect opportunity to finally apply what I'd learned over two decades to open data. In 2010, I convinced my boss to enter NYCBigApps 2.0 with partners Revelytix and Spry. We won the Large Organization Award with NYCDataWeb — a knowledge graph of NYC's open data. The following year, Sami and I entered NYCBigApps 3.0 as independents, working out of his basement for six months. We built NYCFacets — a semantic knowledgebase that computed and derived additional metadata from raw open data. We called this "extrametadata": two million facts retrieved from raw open data by inference rather than manual curation. NYCFacets won the Grand Prize.

The portal problem was always, underneath, a metadata problem.

The insight that drove NYCFacets — that better metadata exponentially increases the value of the underlying data — is the same insight that drives datHere today. We have spent the intervening years trying to operationalise it at every level of the stack.

Ontodia: CKAN as a Platform, Not Just a Portal

After the NYCBigApps win, Sami and I founded Ontodia, working out of NYU's Varick Incubator. We were appointed as a CKAN Professional Services Partner in 2013 — our first formal engagement with CKAN as infrastructure rather than product — and began deploying portals for Newark, Jersey City, the UNDP, the NYC Department of Education, Pittsburgh, San Antonio, Boston, and others. In parallel, we ran research projects with DARPA SBIR, NYU's Center for Urban Science and Progress, BetaNYC, and IBM Watson Research Center.

What we discovered quickly was that CKAN's real value was not the portal interface — it was the API, the metadata model, and the extension architecture that let you build on top of it without forking it. We built NYCpedia on that foundation: a semantic encyclopedia of New York City computing hyperlocal KPIs from raw open data at the neighbourhood level. Then CivicDashboards in 2015 — a productised version, taglined "Your Data in Context", pre-computing indicators for all US states, 3,000+ counties, and 30,000+ municipalities loosely following ISO-37120 standards. CKAN was the connective tissue that made aggregation and federation tractable.

The OpenGov Years — and What They Clarified

Ontodia was acquired by OpenGov in 2016 as its first acquisition. Sami and I ran the Open Data CKAN SaaS business through 2019, deploying roughly 100 data portals across the US — including several WhatWorks Cities. Operationally instructive. Also clarifying about what we didn't want to do.

A one-size-fits-all SaaS model is fundamentally in tension with CKAN's nature as an extensible platform.

The portals we deployed were technically CKAN, but the value that makes CKAN worth using — its composability, its extension ecosystem, its openness — was largely suppressed in a SaaS wrapper optimized for uniformity. The free rider problem was real: deploying CKAN as a closed product while contributing minimally back to the project is not a sustainable model for anyone.

The GSA data.gov multi-tenant CKAN opportunity — an ambitious project OpenGov's team had prepared a strong response for — was ultimately passed over by management at the last minute, as it was professional services and not SaaS. We watched a significant chance to demonstrate what CKAN could do at federal scale get set aside for commercial reasons. That stung. It also clarified exactly what we wanted to build next.

I joined Datopian in 2019 as US Lead, primarily to help with the GSA multi-tenant CKAN project that Datopian and CivicActions ultimately won, and to help manage development of the Gates Foundation's CKAN-based Data Exchange.

datHere: "Build With, Not For"

Sami and I founded datHere in 2020 with a specific thesis: CKAN is most valuable when treated as a platform — PaaS, not SaaS — and open data infrastructure works best when built with client-partners, not delivered to them. We call it the "Build With, Not For" playbook. We use client use cases to inform what we build, then generalise and open-source the solution with the client's consent.

This isn't altruism — we tell clients it's in their enlightened self-interest. They don't end up with bespoke software they have to maintain. Their contribution to the digital commons is acknowledged, which sometimes helps with sponsors and funders. And the wider ecosystem can reuse and contribute to continued improvement of the solution. It's also what CKAN's own stewardship model demonstrates: commercial investment in the commons generates compounding returns for everyone who builds on it.

⚠️ The SaaS Trap Uniform deployment suppresses CKAN's core value: composability, extensibility, and the ability to build purpose-fit infrastructure without forking. Clients get a portal. They don't get a platform. And the project gets minimal contribution back.
🚀 The PaaS Opportunity Treating CKAN as a platform means the client's specific use case shapes the solution — and the generalised output goes back to the commons. The client funds infrastructure that outlasts the engagement. Everyone compounds.
What the Work Actually Looks Like

The engagements datHere has taken on since 2020 span sectors but share a common shape. An early hedge fund client during COVID needed an Enterprise Data Inventory — a CKAN-based catalogue that automatically populated metadata on a nightly basis from its dispersed data holdings. The pilot revealed a hard constraint: metadata inferencing with standard Python tooling was too slow to be operationally viable. That failure directly produced qsv — our open-source tool for high-speed metadata inference from tabular data. A client problem became a community asset.

Since then: helping Mathematica build their internal data catalog and serving as partners on the NSF National Secure Data Service (NSDS) Demonstration Projects — the Federal Data Usage Platform and the Standard Application Process Portal. Working with the Texas Water Development Board and the Internet of Water Coalition on water data infrastructure. Supporting the Western Pennsylvania Regional Data Center — a client relationship that has now run continuously since 2014, across Ontodia, OpenGov, and datHere. And partnering with the University of Pittsburgh on the NSF POSE grant to scale the CKAN ecosystem — a direct, funded effort to formalise and grow the community infrastructure the project depends on.

We are now active contributors to the CKAN 3.0 roadmap, members of the MLCommons Datasets and Croissant Working Groups, and contributors to DCAT-US v3 — the federal metadata standard that will govern how US government datasets are described and discovered. The metadata work that started in a basement in 2011 is now upstream of how AI systems will find and consume public data.

A client problem became a community asset — that's the feedback loop that makes open source infrastructure compound over time.

What CKAN Made Possible

Here is what I think is genuinely true, after sixteen years: CKAN is the reason a small, mission-driven team could do the work we've done without institutional backing or venture capital. The extension architecture meant we never had to fork. The API meant we could build on top without being locked into the portal metaphor. The community meant we were never working alone, even when the client roster was thin. And the project's track record — powering the world's first open government data portals, deployed across 60+ countries on six continents — meant that when we walked into a federal agency or a research institution and said "we work with CKAN," we were credible before we'd said anything else.

That credibility is not incidental. It is what the twenty-year investment in CKAN as a project — by Rufus Pollock, by the Open Knowledge Foundation, by Datopian and Link Digital as co-stewards, by the hundreds of contributors who have shaped the codebase and the community — has made possible for everyone who builds on it. datHere is one data point in that story. There are hundreds more.

JN
Joel Natividad Co-founder · datHere

Joel Natividad is co-founder of datHere, a mission-driven data engineering company focused on open source data infrastructure. He has been building on CKAN since 2013 — through Ontodia, OpenGov, and Datopian, where he served as US Lead. He is a member of the MLCommons Datasets and Croissant Working Groups, a contributor to DCAT-US v3, and an active participant in the CKAN 3.0 roadmap.