Feature TourCKAN is a fully-featured, mature, open source data management solution. CKAN provides a streamlined way to make your data discoverable and presentable. Each dataset is given its own page with a rich collection of metadata, making it a valuable and easily searchable resource. Check out our live interactive demo!
Publish & find datasetsPublish datasets via import or through a web interface. Search by keyword or filter by tags. See dataset information at a glance. Full change history lets you easily undo changes or view old versions.
Store & manage dataStore the raw data and metadata. Visualise structured data with interactive tables, graphs and maps. Get statistics and usage metrics for your datasets. Search geospatial data on a map by area.
Engage with users & othersFederate networks with other CKAN nodes. Theme with CSS or integrate with a CMS. Build a community with extensions that allow users to comment on and follow datasets.
Publish and Manage Data
An intuitive web interface allows dataset publishers and curators to easily register, update and refine datasets in a distributed authorisation model called ‘Organizations’. ‘Organizations’ allow each publisher to have their own dataset entry and approval process with numerous members. This means responsibility can be distributed and authorization access managed by each department or agencies’ admins instead of centrally.
You can add and edit data in CKAN in many ways, including:
- Directly via the web interface
- Using CKAN’s rich JSON API
- Via custom spreadsheet importers
Many organisations already have their data in repositories with well-defined process and procedures for publishing and managing data. In this case the data can be simply pulled regularly into CKAN from the existing repositories.
To facilitate this model we’ve developed a sophisticated and customisable “harvesting” mechanism which can fetch and import records from many different repository sources, including:
- Geospatial CSW Servers (see geospatial for more information)
- Existing web catalogues
- Simple HTML index pages or Web Accessible Folders
- ArcGIS, Geoportal Servers and Z39.50 databases
- Other CKAN instances
This functionality is used on data.gov to get data in from hundreds of their agencies, on data.gov.uk to implement a Discovery Metadata Service used to fulfill the UK’s obligations under the EU INSPIRE directive. It is also used on publicdata.eu to pull in information from other catalogues to make them all searchable in one place.
- Publisher (Organization) admin dashboard: manage members, datasets, approve datasets to be public, manage harvest sources all from each organization admin page.
- Forms: Create portal or publisher specific forms that pre-fill certain fields or have additional required fields to fit individual requirements.
Datasets can be public or private. If they are private they are only visible to the logged in members of their owning publishing Organization (e.g. Department of National Statistics). Admins can approve datasets for publication with our bulk editing tool which lets you search, facets and pick datasets to become public or private.
Search and Discovery
CKAN provides a rich search experience which allows for quick ‘Google-style’ keyword search as well as faceting by tags and browsing between related datasets. Users can quickly see what datasets are available, in which formats and with which licence, straight from the search results. All dataset fields are searchable (see below for the metadata we bring out into the interface).
Search on all dataset attributes – ausers can search on all dataset metadata, everything from title to tags to publisher name.
Full-text search – search full-text fields.
Fuzzy-matching – option to search for closely matching terms instead of exact matches.
Faceted search – drill-down via facets – for example tags, format, licence, publisher. Ability to consecutively narrow the search by further facets allowing users to limit their search to datasets with specific formats or tags after they see the search results.
Search via API – All search facilities can also be made available over an API.
A CKAN portal provides a rich set of metadata for each dataset.
Title – allows intuitive labelling of the dataset for search, sharing and linking.
Unique identifier – dataset has a unique URL which is customizable by the publisher.
Groups – display of which groups the dataset belongs to if applicable. Groups (such as science data) allow easier data linking, finding and sharing amongst interested publishers and users.
Description – additional information describing or analysing the data. This can either be static or an editable wiki which anyone can contribute to instantly or via admin moderation.
Data preview – preview .csv data quickly and easily in browser to see if this is the dataset you want.
Revision history – CKAN allows you to display a revision history for datasets which are freely editable by users (as is thedatahub.org)
Extra fields – these hold any additional information, such as location data (see geospatial feature) or types relevant to the publisher or dataset. How and where extra fields display is customizable.
Licence – instant view of whether the data is available under an open licence or not. This makes it clear to users whether they have the rights to use, change and re-distribute the data.
Tags – see what labels the dataset in question belongs to. Tags also allow for browsing between similarly tagged datasets in addition to enabling better discoverability through tag search and faceting by tags.
Multiple formats (if provided) – see the different formats the data has been made available in quickly in a table, with any further information relating to specific files provided inline.
API key – allows access every metadata field of the dataset and ability to change the data if you have the relevant permissions via API.
CKAN has advanced geospatial features, covering data preview, search, and discovery.
Where structured data with location information is loaded into CKAN’s DataStore, CKAN can plot the data on an interactive map. The screenshot shows a map view of a sample dataset, with markers showing individual data points and full details shown for records as they are selected.
CKAN can understand a location associated with a dataset, and use this to offer geospatial search capabilities via the web interface and API. A user searching for datasets can filter the results by geographical location, specifying a bounding box to limit the area the are interested in. CKAN understands different co-ordinate geometries and parses location information accordingly.
To ensure your datasets can be easily integrated with other systems, CKAN includes tools to import geo-coded metadata in a number of formats and make it queriable (‘discoverable’) according to the INSPIRE standard. It can import major metadata schemas such as ISO19139, GEMINI 2.1 and FGDC can handle records hosted in a variety of ways, including the geospatial CSW standard, WAFs, ArcGIS portals, Geoportal Servers and Z39.50 databases. CKAN can also serve geospatial packages via its own CSW interface. The architecture is extensible, making it easy to support other standards and distribution services.
For more detailed documentation on CKAN’s geospatial functionality, see the Geospatial Capabilities documentation.
CKAN provides several key features that allow users of a CKAN portal to communicate and collaborate on data.
Comments extension – users can add comments and discussion on a dataset. The extension can be enabled or disabled at any time.
Share – users can quickly and easily promote and discuss a dataset using twitter and facebook integration.
RSS/Atom feeds – create feeds of any changes and revisions to datasets and groups.
Follow extension – ‘follow’ a dataset to be informed of any changes, updates or new activity.
To do extension – flag a dataset with an issue or instructions of what is missing or still ‘to do’. This allows for a community driven effort for improving and adding to metadata.
CKAN’s data previewing tool has a host of powerful features for previewing data stored in the DataStore.
Table view: If structured data is uploaded or linked to CKAN as a .csv or Excel table, the DataStore loads it into a database, allowing CKAN to give a range of ways to view and process the data. Initially it is displayed as a table. The user can sort the data on particular columns, filter or facet by values, or hide columns entirely.
Graphing data: You can also display the data on a graph, choosing the variables on the axes and comparing a number of variables by graphing them together on the same y-axis.
Mapping data: If the table has columns that CKAN recognises as latitude and longitude, it can plot the data points on a map, which can be panned (dragged) and zoomed. Selecting a data point displays all the field values in the corresponding row.
Image data: CKAN’s previewing is not restricted to tabular data. Common image formats will be displayed, and if a resource is a web page, it will also be previewed directly in the CKAN dataset.
Roll your own: CKAN’s built-in previews use the DataStore’s API. If you have your own data previewing tools or are planning to build them, it’s easy to plug them into the API so that you can create visualisations on the fly, without the need for users to download the data.
CKAN is highly customisable. See screenshots below for examples of various themed CKAN instances. You can customise the appearance of your CKAN portal yourself using the documentation here: http://docs.ckan.org/en/latest/theming.html or alternatively the CKAN team or a CKAN partner can do this for you as part of a support contract, CKAN Hosted plan or general set-up fee. See our Pricing Page or Contact us for more details.
Here are some examples of CKAN portals around the world:
As well as holding metadata and links to the offsite data, CKAN can provide secure storage for the data itself. When creating the dataset or resource, you can either link to data hosted elsewhere, or upload it in the same action as registering it on CKAN.
Data can be stored in any format. For structured data, e.g. when a spreadsheet is uploaded, CKAN provides a rich API for the data itself, allowing users to query, retrieve and use data instantly from datasets in CKAN without needing to download or process it first. CKAN’s own visualisation tools use this to display data previews, graphs and visualisations of the data on the dataset resource page.
CKAN uses the Open Knowledge Foundation’s Versioned Domain Model (VDM) to keep a complete history of all edits and versions of dataset metadata.
CKAN allows you to pick and choose which features you want to use for your data portal. There are over 60 different extensions, all of which can be independently added in any configuration you choose. Anyone can build new extensions and contribute.
Because the harvesting functionality can be used to pull in metadata from other CKAN instances, it can also be used to create a federated network of CKAN nodes which share data between each other.
This is useful if, for example, a national portal wanted to aggregate information from local government CKAN instances, or if a topic-specific CKAN instance was created which aggregated a subset of datasets from other CKAN sources.
CKAN follows the DCAT standard for data catalogue metadata, so data can also be federated from other non-CKAN catalogues.
APICKAN provides a rich RESTful JSON API for querying and accessing dataset information. The API gives access to:
- Full querying / searching (with all features of the main interface, including full-text search, querying on any attribute and faceting)
- Full dataset information, including download links
- Stored data
- Dataset listings by publisher, or by theme, etc
- Recent activity and additions (also available via RSS/Atom feed)
- Statistics on dataset usage, such as number of downloads of dataset resources using the Google analytics extension
- RDF version of the catalogue (using the rdf extension)
- CSV & JSON dumps of entire catalogue
The API is fully documented at http://docs.ckan.org/.
In addition to the read API, a write API can be provided for authorised users that allows for full update of dataset information (metadata). This enables publishers to easily integrate dataset publication with existing tools and workflows.