Blog

CKAN 3.0 Product Strategy Research (part 2)

As promised, here are five more of the 37 interviews Alexander Gostev has conducted with various stakeholders during the engagement process. These insights will help make CKAN 3.0 even better than before. More updates will be coming soon!

08-CKAN 3-product strategy research-02-01.png

STAKEHOLDER ENGAGEMENT RESULTS 6-10 of 37


Respondent 6: Manager

Interview date: 16 June 2022

OVERVIEW

The organization that operates in the Pacific Islands region works on publishing, managing and supplying data to decision-makers from local governments. The interviewee is occupied with data procurement and data stewardship. A major UI update is planned in near future.

  • The respondent looked into Socrata and used DKAN for PoC but switched to CKAN because of the bigger and more active community and more extensions available.

Plugins used

  • Data integration extensions
  • Data harvesters
  • Metadata extensions

CKAN issues

  • Generally, CKAN is good 🟢
  • Customization can be frustrating 🔴
  • Having UI form for adding licenses and managing legal docs is what he needs 💡
  • Program managers want to have more visibility from their tech partners to catch malfunctions faster. Example: when the harvester stopped working.
  • He needs workflow for restricted datasets when meta-data is available, but the dataset is restricted 💡

Jobs customers do:

In general, it’s a centralized catalogue that provides:

  • Preview of datasets
  • Grouping of datasets

National governments

  • Easy access to decision-ready data products
  • Links to other data sources

Large research institutions

  • To know what data exists in a particular area

Other online services used for working with data

  • .Stat (dotstat) platform - statistical platform
    • To publish indicators for UN SDGs ad other international development frameworks
  • Geo-spatial databases that provide OGC endpoints, e.g. WMS, WFS, CSW
    • Geo-server
  • Spatial data explorer
    • Open-source tool - teria.js
    • Integrated with iframe


Respondent 7: Manager

Interview date: 16 June 2022

OVERVIEW

The interviewee installs, customises and maintains CKAN. Provides governance and training to departmental staff. He built an automated, data-driven, collaboratively edited, web-based reporting system.

He’s a CKAN maintainer. Manages internal faced CKAN instance. Focused on workflow automation.

The interviewee says that CKAN is easy to install 🟢

CKAN integrations

  • Open refine (Google refine) https://openrefine.org/ 💡
    • Spreadsheet on steroids
    • Very easy to clean the data
      • Duplicates
  • https://nationalmap.gov.au/💡
    • If the dataset has csv-geo (❓) you can display it on a map
    • Teria.js
    • Immediate value
    • Plugin but not upgraded
    • CKAN Cesium Preview
    • Configure CKAN as data source
  • CKAN XQA - rudimentary plugin, it can be upgraded 💡

Issues

  • The interviewee’s data sets are for internal use, but he wants to have a more flexible visibility model for public or private 💡
    • Sensitive data
    • Visibility option: public sector only
    • CKAN dataset visibility
    • Use case: metadata public, data private, single datasets can be shared
  • He’s limited to tech that can run small
    • Kubernetes
    • Docker compose
    • CKAN 🟢
  • People aren’t using technology on their own. It’s should be part of their workflow. The respondent makes it internally by creating a policy. Controlled parameters are
    • Quality of data published
    • Amount of data delivered
    • Tags added

Ideas

  • It’s critical to drag Data Engineers to use CKAN

Respondent 8: Data Distributor

Interview date: 16 June 2022

OVERVIEW

The interviewee is 2-3 years in the industry. The respondent’s primary role is in data distribution: picking a tech (CKAN), defining a flow of services, installs CKAN instances. He also worked as a developer and created CKAN client lib for Java which he uses.

Current usage

  • As a data distributor, he works a lot on meta-data as people should know well what they’re buying.
  • Most exciting feature of CKAN for the interviewee - the preview, when he wants to show visualization to people. 🟢
    • CKAN + Python
    • CKAN as a visualization tool
    • Plugin: Data → algo → visualization
    • Created frontend based on CKAN → visualization on top of it

Business Model

  • He got funding for some of his projects
  • 15 countries in the consortium
  • The interviewee get data from different countries of Europe and Asia
  • Data exploiters (his) clients are from EU (AI, analyzing data)
  • The respondent has 100+ datasets published
  • He would like to standardize descriptors automatically (15 pages)
    • Still figuring out how to attach descriptors to the dataset 💡
    • How to attach Html document to the dataset published automatically.
    • Doesn’t want to have a long description on UI as it hinders the download dataset button.
  • The respondent developed their data access system for data distribution 💡:
    • Open access
    • Controlled access
    • Manual
  • He develops his plugins to cover his own needs
    • Has a number of plugins in their pipeline

Issues

  • Publishing issue - doesn’t know how to add tags 🔴
  • The interviewee wants to contribute as a developer. He has an idea but needs help in creating a PR

Ideas

  • If it’s a new feature in the plugin, it’s cool to separate it from the plugin and upload it to CKAN. Now the respondent needs to get in touch with the contributor. But with guidelines, he could make his plugin based on features you liked (reusability) 💡
  • Getting feedback from the community (community feature) 💡

How to make CKAN support easier

  • Gitter works great. All the questions get responded 🟢
  • FAQ would help. The obvious stuff is important for newbies 💡

Used Plugins

  • Harvesting
  • Scheming
  • d-cat
  • Google Analytics
  • Kitlock - identity provider
  • Helm for deployment (customized it heavily)

CKAN 3.0 top-3 directions of improvement

  1. Expand distribution portal: catalogue, data sets
  2. Expand data exploration: if you provide API, you can connect your datasets
  3. Easier deploy with docker: a couple of weeks to deploy. But as it’s open-source, documentation is there (big +)
  4. Support: Keeping up to date is important. Documentation helps with troubleshooting
  5. Solr: it was hard to understand how to install it - fixed with docker install

Respondent 9: data consultant

Interview date: 16 June 2022

OVERVIEW

The interviewee is working across government and enterprise clients by managing the delivery of data-driven solutions. He’s not contributing but rather following.

Now the respondent is interested in the Enterprise capabilities of CKAN for internal use: with multiple data environments, for consumers that don’t understand data.

Products the respondent used

  • Socrata,
  • JKAN (smaller out-of-the-box one),
  • Mashta (Australia),
  • Dataverse (from Harvard, data marketplace),
  • Data Republic (Australia),
  • Data. world,
  • open-metadata.org,
  • one library from Ln on data discovery and cataloguing

CKAN features, strengths, weaknesses

  • CKAN misses a lot as a catalogue 🔴
  • For the data portal, important functionality is missing as well 🔴
    • Ability to create a workflow between several users who manages data 💡
    • User management and roles: developers, data publishers, users 💡
  • Integration with internal tools can be better 🔴
    • Connectors don’t cover his needs
    • Authenticators for corporate environments 💡
    • More data sources 💡
  • UI doesn’t matter as anyone uses their own UI (+1 for decoupling frontend) 💡
  • Dokerization is helpful and works great 🟢

Success metric for the interviewee’s customers?

  • Case 1: Search and discoverability - ability to identify data assets that we didn’t know they’re there.
  • Case 2: % of users in organization that use CKAN as storage, then moving data to analyze in Tableau, for example.
  • Case 3: % of team members who use CKAN in their workflow. Engagement metric for non-tech users.

Respondent 10: data manager

Interview date: 24 June 2022

OVERVIEW

The interviewee manages a CKAN data portal that aggregates data from several other CKAN instances. They use Magda in their setup for searching and cataloguing. It’s done to improving of UX - to have the least clicks to the data 💡.

CKAN visually looks better than MAGDA 🟢

Success metrics for the respondent’s customers

  • End users:
    • Number of dataset downloads 💡
    • Number of visitors
  • Manager
    • Number of datasets
    • Number of downloaded datasets
    • Accessibility and ease of use for data custodians (UX for publishing data) 💡

CKAN 3.0 top-3 directions of improvement

  1. Improve the visualization of data (maybe as a plugin)
    1. Ability to put a particular visualization onto data
      1. At the moment, when you load data, it’s a table and graph one for everything
    2. Mapping program (map for specialized data)
      1. Magda has a good preview map
  2. User management/data sharing
    1. These datasets are open
    2. These are open if you apply
  3. Customizable dashboard - linking Tableau, Power BI, and Google Data Studio

CKAN issues

  • Logging - the interviewee found it difficult to track usage by API
    • How many people (by API) have downloaded datasets
    • Simple statistics on content usage and engagement (Twitter, Reddit style) 💡

How to make CKAN support easier

  • Training for people who are jumping into managing CKAN.
  • Options of how to do it in a different way
  • Best practices