Blog

CKAN 2.12 Improves Large Dataset Downloads by Up to 15x and Adds Advanced Filtering Features

CKAN 2.12 solves the large dataset download problem. A 13-million-record dataset that took 30 minutes — or timed out — now downloads in 2 minutes.

Yoana Popova
13 Jan 2026
Share

CKAN is widely used to publish datasets containing millions of records. While smaller exports performed well, very large Datastore dumps exposed a scalability limitation.

Consider a dataset with 13 million records — not unusual for census data, public health records, or economic indicators. In previous CKAN versions, downloading this data through the Datastore could take 30 minutes or more, assuming the request did not time out first.

The technical cause was offset pagination. To fetch page 1,000, the database skips 999,000 rows. To fetch page 2,000, it skips 1,999,000 rows. Each page requires more work than the last.

In simple terms: It's like finding page 1,000 in a book by flipping through all 999 pages first, then finding page 1,001 by flipping through 1,000 pages. The further you go, the slower it gets.

At large scale, this approach does not perform reliably.

The solution

CKAN 2.12 replaces offset pagination with keyset pagination.

Instead of counting rows, the system uses the indexed ID field as a bookmark. The database jumps directly to the correct position every time, regardless of how far into the dataset you are. It's like using a book's index instead of flipping pages.

Results:

That 13-million-record dataset now downloads in 2 minutes instead of 30
15x performance improvement
Downloads complete reliably — timeouts eliminated
Performance remains consistent as dataset size increases

This keyset pagination solution was developed by Yan Rudenko.

Advanced filtering

Improved download performance is only part of the change. CKAN 2.12 also introduces a way to filter data before export.

In earlier versions, users typically had to download entire datasets and apply filters locally, even when only a subset of the data was required. For large datasets, this approach was slow and often impractical.

With CKAN 2.12, users can apply structured filters directly at the Datastore level. For example, a request can specify a defined time range, age range, and a limited set of regions: year BETWEEN 2020 AND 2023 AND age BETWEEN 18 AND 65 AND region IN ('North', 'Central', 'East')

can now be expressed in search or delete queries as:

"filters": {
  "year": {"gte": 2020, "lte": 2023},
  "age": {"gte": 18, "lte": 65},
  "region": ["North", "Central", "East"]
}

Only matching records are returned, reducing the amount of data transferred and processed.

This functionality is provided hese improvements will be included in the upcoming CKAN 2.12 release.by the Advanced Query Filter specification developed by Adrià Mercader from the CKAN Tech Core Team.

The specification introduces a consistent and scalable filtering model, including:

Range queries: population > 100000 or year BETWEEN 2020 AND 2026
Complex logic: Nested AND/OR conditions like (age > 25 AND income < 50000) OR education = 'graduate'
Unified syntax: Consistent query language across all CKAN instances

The filtering system does two things. First, it enables the new pagination approach. Second, it allows users to filter data before download rather than retrieving everything and filtering locally.

These changes were integrated and prepared for release by Ian Ward and will be included in CKAN 2.12.

What you get

If you publish data:

Large dataset exports now complete reliably without manual intervention
Users can download complete datasets without contacting support
Infrastructure costs decrease as database load becomes predictable

If you use data:

Census datasets with 10+ million records download in minutes, not hours
No more timeout errors halfway through large exports
Filter datasets before download to extract only relevant records
Build data pipelines that don't break when dataset size increases

If you manage CKAN:

Reduced support burden for failed downloads
Consistent, predictable database performance under load
Advanced filtering capabilities without custom development

Availability

These improvements will be included in the upcoming CKAN 2.12 release.

In Category on 10 Feb 2026

CKAN Turns 20: Two Decades of Open Data Infrastructure

CKAN turns 20. Explore how an open-source experiment became global data infrastructure, powering governments, research, and public-interest data worldwide.

In Category on 26 Jan 2026

CKAN at 20: What the Community Says Matters Most Now (recap fom CKAN Monthly Live #39)

A recap of CKAN Monthly Live #39 covering POSE Phase II updates, the two upcoming storytelling workshops, a preview of the CKAN Ecosystem Catalog, and how the community can plug into CKAN@20 anniversary activities in 2026.