What CKAN Made Possible: A Sixteen-Year Open Data Story

From NYCBigApps to federal metadata standards — a co-founder's story of why CKAN's real value was never the portal

Joel Natividad
CKAN infrastructure, community story, datHere, data portal, metadata, open data, open source
08 Jun 2026
Share

The portal problem is always, underneath, a metadata problem. That insight came to us in a basement in Queens in 2011, building NYCFacets on raw open data from New York City. CKAN was the foundation that made it tractable. Sixteen years later, it still is.

This is that story.

Public Service Ambitions

Public service was a natural calling for me, growing up with parents who were lifelong civil servants. I was driven by a desire to modernize and streamline the overwhelming paperwork that defined their daily professional lives.

Following my graduation in 1989, I balanced university teaching, my Master's studies, and part-time work at IBM Philippines. I even passed the civil service exam, as I wanted to apply what I learned to modernizing government. However, my parents had become disillusioned with the public sector due to the instability caused by political shifts. My father, who served as Manila's City Personnel Officer, was punitively reassigned to Manila Zoo after he blocked the hiring of unqualified, politically-connected individuals. It took a four-year legal battle with the Civil Service Commission for him to be vindicated and restored to his rightful role.

Because of these experiences, my parents urged me toward a private-sector career. They gave me a one-year deadline to secure a permanent role at IBM, but a hiring freeze redirected my path. Though I insisted "the pot of gold is in our own backyard" — my parents insisted and I emigrated to the United States in the summer of 1991. I landed in NYC and stayed with my brother in Queens. Later that year, I got hired and sponsored by Axiom Systems Group and started my career in the private sector.

Before Open Data Had a Name

Before Sami Baig and I started building together, I got bitten by the semantic web bug with my first open source project — Semantic MediaWiki. I managed to parlay that passion into my job and started the Knowledge Engineering Practice at TCG. We focused on the applications of the Semantic Web in the enterprise — linked data, ontology management, and critically, metadata. I also led the Goldensource Enterprise Data Management practice.

The question that kept surfacing, across every client engagement, was the same one: the data exists, but nobody can find it, trust it, or use it at scale. That was true in financial services in the mid-2000s. It turned out to be equally true in government data a few years later.

So when I heard about Mayor Bloomberg's NYCBigApps initiative in 2009, I thought this would be the perfect opportunity to finally apply what I'd learned over two decades to open data. In 2010, I convinced my boss to enter NYCBigApps 2.0 with partners Revelytix and Spry. We won the Large Organization Award with NYCDataWeb — a knowledge graph of NYC's open data. The following year, Sami and I entered NYCBigApps 3.0 as independents, working out of his basement for six months. We built NYCFacets — a semantic knowledgebase that computed and derived additional metadata from raw open data. We called this "extrametadata": two million facts retrieved from raw open data by inference rather than manual curation. NYCFacets won the Grand Prize.

The portal problem was always, underneath, a metadata problem.

The insight that drove NYCFacets — that better metadata exponentially increases the value of the underlying data — is the same insight that drives datHere today. We have spent the intervening years trying to operationalise it at every level of the stack.

Ontodia: CKAN as a Platform, Not Just a Portal

After the NYCBigApps win, Sami and I founded Ontodia, working out of NYU's Varick Incubator. We were appointed as a CKAN Professional Services Partner in 2013 — our first formal engagement with CKAN as infrastructure rather than product — and began deploying portals for Newark, Jersey City, the UNDP, the NYC Department of Education, Pittsburgh, San Antonio, Boston, and others. In parallel, we ran research projects with DARPA SBIR, NYU's Center for Urban Science and Progress, BetaNYC, and IBM Watson Research Center.

What we discovered quickly was that CKAN's real value was not the portal interface — it was the API, the metadata model, and the extension architecture that let you build on top of it without forking it. We built NYCpedia on that foundation: a semantic encyclopedia of New York City computing hyperlocal KPIs from raw open data at the neighbourhood level. Then CivicDashboards in 2015 — a productised version, taglined "Your Data in Context", pre-computing indicators for all US states, 3,000+ counties, and 30,000+ municipalities loosely following ISO-37120 standards. CKAN was the connective tissue that made aggregation and federation tractable.

CKAN Platform CKAN Extensions CKAN API CKAN Instances Worldwide

The OpenGov Years — and What They Clarified

Ontodia was acquired by OpenGov in 2016 as its first acquisition. Sami and I ran the Open Data CKAN SaaS business through 2019, deploying roughly 100 data portals across the US — including several WhatWorks Cities. Operationally instructive. Also clarifying about what we didn't want to do.

Open source only stays healthy over the long run when the organizations that build on it also invest back into it.

The portals we deployed were genuinely CKAN under the hood — but in a managed-SaaS model, the very things that make CKAN worth choosing (its composability, its extension ecosystem, its openness) naturally give way to the uniformity and supportability a productized service depends on. That's a defensible tradeoff for a SaaS business. It also kept me thinking about a deeper question for the whole ecosystem — the classic free rider problem: open source only stays healthy over the long run when the organizations that build on it also invest back into it.

The GSA data.gov multi-tenant CKAN opportunity — one our team had prepared a strong response for — ultimately went a different direction, as it was a professional-services engagement rather than a managed-SaaS one. Watching a chance to show what CKAN could do at federal scale go elsewhere stung. But it also clarified exactly what I wanted to build next.

I joined Datopian in 2019 as US Lead, primarily to help with the GSA multi-tenant CKAN project that Datopian and CivicActions ultimately won, and to help manage development of the Gates Foundation's CKAN-based Data Exchange.

What the CKAN Ecosystem Taught Us

The OpenGov years clarified something the CKAN ecosystem had been demonstrating all along: infrastructure built with the community outlasts infrastructure built for it. Sami and I founded datHere in 2020 around that principle. We use client use cases to inform what we build, then generalise and open-source the solution with the client's consent. We call it "Build With, Not For".

It isn't altruism — we tell clients it's in their enlightened self-interest. They don't end up with bespoke software they have to maintain. Their contribution to the digital commons is acknowledged, which sometimes helps with sponsors and funders. The wider ecosystem can reuse and improve the solution. CKAN's own stewardship model demonstrates this: commercial investment in the commons generates compounding returns for everyone who builds on it.

⚠️ The SaaS Trap Uniform deployment suppresses CKAN's core value: composability, extensibility, and the ability to build purpose-fit infrastructure without forking. Clients get a portal. They don't get a platform. And the project gets minimal contribution back.

🚀 The PaaS Opportunity Treating CKAN as a platform means the client's specific use case shapes the solution — and the generalised output goes back to the commons. The client funds infrastructure that outlasts the engagement. Everyone compounds.

What the Work Actually Looks Like

The engagements datHere has taken on since 2020 span sectors but share a common shape. An early hedge fund client during COVID needed an Enterprise Data Inventory — a CKAN-based catalogue that automatically populated metadata on a nightly basis from its dispersed data holdings. The pilot revealed a hard constraint: metadata inferencing with standard Python tooling was too slow to be operationally viable. That failure directly produced qsv — our open-source tool for high-speed metadata inference from tabular data. A client problem became a community asset.

Since then: helping Mathematica build their internal data catalog and serving as partners on the NSF National Secure Data Service (NSDS) Demonstration Projects — the Federal Data Usage Platform and the Standard Application Process Portal. Working with the Texas Water Development Board and the Internet of Water Coalition on water data infrastructure. Supporting the Western Pennsylvania Regional Data Center — a client relationship that has now run continuously since 2014, across Ontodia, OpenGov, and datHere. And partnering with the University of Pittsburgh on the NSF POSE grant to scale the CKAN ecosystem — a direct, funded effort to formalise and grow the community infrastructure the project depends on.

We are now active contributors to the CKAN 3.0 roadmap, members of the MLCommons Datasets and Croissant Working Groups, and contributors to DCAT-US v3 — the federal metadata standard that will govern how US government datasets are described and discovered. The metadata work that started in a basement in 2011 is now upstream of how AI systems will find and consume public data.

A client problem became a community asset — that's the feedback loop that makes open source infrastructure compound over time.

qsv on GitHub CKAN POSE Grant DCAT-US v3 MLCommons Croissant Internet of Water

What CKAN Made Possible

Here is what I think is genuinely true, after sixteen years: CKAN is the reason a small, mission-driven team could do the work we've done without institutional backing or venture capital. The extension architecture meant we never had to fork. The API meant we could build on top without being locked into the portal metaphor. The community meant we were never working alone, even when the client roster was thin. And the project's track record — powering the world's first open government data portals, deployed across 60+ countries on six continents — meant that when we walked into a federal agency or a research institution and said "we work with CKAN," we were credible before we'd said anything else.

That credibility is not incidental. It is what the twenty-year investment in CKAN as a project — by Rufus Pollock, by the Open Knowledge Foundation, by Datopian and Link Digital as co-stewards, by the hundreds of contributors who have shaped the codebase and the community — has made possible for everyone who builds on it. datHere is one data point in that story. There are hundreds more.

Those lessons — standards-based, best-of-breed, open source, data that is useful, usable, and used — are baked into every word of how we describe our work today. But they didn't come from a strategy session. They came from CKAN.

Joel Natividad Co-founder · datHere

Joel Natividad is co-founder of datHere, a mission-driven data engineering company focused on open source data infrastructure. He has been building on CKAN since 2013 — through Ontodia, OpenGov, and Datopian, where he served as US Lead. He is a member of the MLCommons Datasets and Croissant Working Groups, a contributor to DCAT-US v3, and an active participant in the CKAN 3.0 roadmap.

Someone Built a Sheet Music Directory on CKAN. I Did Not See That Coming.

In Category on 24 Jun 2026

The Most Unexpected CKAN Use Case I've Ever Seen: A Sheet Music Directory With AI Metadata

Wolfgang from Ondics built an open source sheet music catalog on CKAN — with AI metadata generation, YouTube playback, and cross-instance sharing. Here's how.

In Category on 23 Jun 2026

See What's New in the CKAN World: Ecosystem Catalog, HDX Spotlight, New Community Forum — and CKAN Running a Sheet Music Directory

A recap of what the CKAN community covered on June 17, 2026: a live demo of the new CKAN Ecosystem Catalog, a deep-dive into HDX Tabular Data Endpoints, the launch of the new community discussion forum — and, surprise surprise, a very unexpected use of CKAN as a sheet music directory with AI-assisted metadata. Yes, really.

What CKAN Made Possible: A Sixteen-Year Open Data Story

The Most Unexpected CKAN Use Case I've Ever Seen: A Sheet Music Directory With AI Metadata

See What's New in the CKAN World: Ecosystem Catalog, HDX Spotlight, New Community Forum — and CKAN Running a Sheet Music Directory

Connect with CKAN