CKAN is widely used to publish datasets containing millions of records. While smaller exports performed well, very large Datastore dumps exposed a scalability limitation.
Consider a dataset with 13 million records — not unusual for census data, public health records, or economic indicators. In previous CKAN versions, downloading this data through the Datastore could take 30 minutes or more, assuming the request did not time out first.
The technical cause was offset pagination. To fetch page 1,000, the database skips 999,000 rows. To fetch page 2,000, it skips 1,999,000 rows. Each page requires more work than the last.
In simple terms: It's like finding page 1,000 in a book by flipping through all 999 pages first, then finding page 1,001 by flipping through 1,000 pages. The further you go, the slower it gets.
At large scale, this approach does not perform reliably.
The solution
CKAN 2.12 replaces offset pagination with keyset pagination.
Instead of counting rows, the system uses the indexed ID field as a bookmark. The database jumps directly to the correct position every time, regardless of how far into the dataset you are. It's like using a book's index instead of flipping pages.
Results:
That 13-million-record dataset now downloads in 2 minutes instead of 30
15x performance improvement
Downloads complete reliably — timeouts eliminated
Performance remains consistent as dataset size increases
This keyset pagination solution was developed by Yan Rudenko.
Advanced filtering
Improved download performance is only part of the change. CKAN 2.12 also introduces a way to filter data before export.
In earlier versions, users typically had to download entire datasets and apply filters locally, even when only a subset of the data was required. For large datasets, this approach was slow and often impractical.
With CKAN 2.12, users can apply structured filters directly at the Datastore level. For example, a request can specify a defined time range, age range, and a limited set of regions: year BETWEEN 2020 AND 2023 AND age BETWEEN 18 AND 65 AND region IN ('North', 'Central', 'East')
can now be expressed in search or delete queries as:
Only matching records are returned, reducing the amount of data transferred and processed.
This functionality is provided hese improvements will be included in the upcoming CKAN 2.12 release.by the Advanced Query Filter specification developed by Adrià Mercader from the CKAN Tech Core Team.
The specification introduces a consistent and scalable filtering model, including:
Range queries:population > 100000 or year BETWEEN 2020 AND 2026
Complex logic: Nested AND/OR conditions like (age > 25 AND income < 50000) OR education = 'graduate'
Unified syntax: Consistent query language across all CKAN instances
The filtering system does two things. First, it enables the new pagination approach. Second, it allows users to filter data before download rather than retrieving everything and filtering locally.
These changes were integrated and prepared for release by Ian Ward and will be included in CKAN 2.12.
What you get
If you publish data:
Large dataset exports now complete reliably without manual intervention
Users can download complete datasets without contacting support
Infrastructure costs decrease as database load becomes predictable
If you use data:
Census datasets with 10+ million records download in minutes, not hours
No more timeout errors halfway through large exports
Filter datasets before download to extract only relevant records
Build data pipelines that don't break when dataset size increases
If you manage CKAN:
Reduced support burden for failed downloads
Consistent, predictable database performance under load
Advanced filtering capabilities without custom development
Availability
These improvements will be included in the upcoming CKAN 2.12 release.
A recap of CKAN Monthly Live #39 covering POSE Phase II updates, the two upcoming storytelling workshops, a preview of the CKAN Ecosystem Catalog, and how the community can plug into CKAN@20 anniversary activities in 2026.
CKAN's new interface brings WCAG 2.2 AAA-compliant accessibility, mobile-responsive design, HTMX-powered performance improvements, and a modern design system to the world's leading open data platform. The revamp eliminates technical barriers for smaller organizations while maintaining CKAN's extensive feature set. The update is now in the master branch, with release targeted for 2026 and full integration in CKAN 3.0.