Blog

CKAN 3.0 Product Strategy Research (part 1)

Since April, our Product Manager Alexander Gostev has been working hard to get input from various stakeholders on what they would like to see in the upcoming #CKAN 3.0. As we wrap up the engagement process, we'll share the interviews with the community. Below are 5/37 interviews, and we invite everyone to join the discussion and share your comments. This feedback will help ensure that CKAN 3.0 is even better than before. So stay tuned for more updates.

08-CKAN 3-product strategy research-02-01.png


STAKEHOLDER ENGAGEMENT RESULTS 5 of 37


Respondent 1: Entrepreneur

Interview date - 10 May 2022

OVERVIEW

The interviewee is an avid CKAN user, he is at the same time Promoter, Facilitator, and End User. 12 years into open data from 1st generation solutions.

His ideas were based on solid experience and real-world observations and revolved around several directions of CKAN evolution:

  1. Making CKAN accessible for non-data professionals by cleaning up data after it’s uploaded
    1. Accessibility is hugely important for growing the user base and getting a bigger market share
    2. Data storage capability
    3. Pre-populating meta-data requires less effort to publish a data set
    4. Automatic join by location is the first target for automating as it’s the most popular way to join two data sets
    5. The mission is to get users faster from boring to the fun stuff. From data cleaning to data storytelling
  2. Secure internal data sharing
    1. Multiple big projects were about using CKAN as internal data storage
    2. A tech solution can provide value as a legal shield (enabler) and convenience feature (cost cutter)
  3. Increase re-usability of plugins
    1. CKAN plugins are one of the competitive advantages of our system
    2. Now plugins lack version certifications
    3. We can write case studies for plugins to encourage re-usability
    4. How we can ensure plugin operability besides certifications?

Concerning that interviewees’ ideas are based on real-world usage observations, I’d rank their input highly. Although they are high-level directions and most likely require significant effort to achieve, we can try to find low-hanging fruits with each of them and test them on the market.


Respondent 2: Developer

Interview date - 1 June 2022

OVERVIEW

The interviewee has been with CKAN for 11 years starting with version 1 when was working on the UK gov’t open data portal. Was involved in the formation of forming core team and community from 2013-2014. The tech team has been constant throughout all the time until now.

For CKAN he is a release manager for most of the releases. Checks pull requests and does mostly maintenance work.

Also, he is a part of [global, non-profit network that promotes and shares information at no charge] when they create open data solutions for non-profits like charitable foundations.

CKAN problem areas

  • The interviewee admitted that he is biased, working a lot with developers
  • Core is OK 🟢

Learning CKAN is difficult: implementation, deployment 🔴

UI is old. Everybody changes it. The structure is clunky 🔴

Search provider flexibility: Solr is hard to install and maintain. There are many more hosted options. So users should have a way to choose an easier option. 💡

CKAN 3.0

  1. Making it easier to start to update plugins, to install updates
  2. Making it easier to install plugins / customise
  3. Making it easier to maintain, install updates, and highlight outdated things to update

We have this thing, we need a plan and resources.

Data portals respondent likes

What is important to do

  • Ideas should be collected
  • Maybe we’re missing voices. What different customers' needs are?
  • Someone should go, talk to and hear customers
  • We’re developing what we think to make sense but how do we know we are on the right track?
  • What are important problems that can be solved with CKAN?


Respondent 3: Developer

Interview date - 2 June 2022

OVERVIEW

15 years in IT, telecom, and healthcare, mostly working on Java. Likes open source and the idea of open data. In the open-data ecosystem, he’s a developer, and solutions producer, helping users to open their data. The interviewee isn’t involved in data engineering, but he wants to get into it. Solutions developer provides are not generic but custom all the time.

The respondent’s clients are

  • Only governmental orgs
  • Beforehand there were: NGO’s, governments (80%), scientific institutions

CKAN 3.0 top priorities should be

  1. Clean up an old codebase. Make it easier to read for new developers (Accessibility for Developers)
  2. Decoupling UI (To be able to use own frontend framework, to ease adding own UI)

Plugins frequently used

  • xloader - core
  • data pusher - core
  • scheme - core
  • harvester - core
  • ticket
  • other ones are custom made

What existing data portals interviewee liked

Good portals = when the data is in quantity and easily discoverable

CKAN issues

  • Nothing big, CKAN is mature enough 🟢
  • Switching Solr is expected by client developers 💡
  • Switching to own data store should be possible 💡
  • It’s important to standardise open data → tell how open data should work

Latest positive changes

✅ Pylons → Flask

✅ Python → Python 3


Respondent 4: Business developer

Interview date - 3 June

OVERVIEW

The interviewee has been a business developer for seven years, working for open data consulting company. She mostly has a focused experience on CKAN products and within the government sector.

They use own UI for every new project. She is interested in expanding user management and user roles and do a lot of harvesting usually.

CKAN 3.0 top-3 directions of improvement

  1. Flexibility. Now it costs a lot to have a new data type. Flexible schema.
  2. Ability to use different data storage. The respondent is interested in using of graph databases (starting with Link Data).
  3. Visualisation. Tools for end users.

Core plugins are

  • dcat
  • xloader
  • scheming
  • showcase
  • hierarchy
  • harvest

Her clients are mostly:

  • Governmental institutions in [European country]
    • National data catalogues
    • Municipalities
  • Private company in the Energy sector, for internal usage

CKAN issues that the interviewee has

  • Good dockerised setup for hosting is what we need
    • A lot of people struggling with the current implementation
    • You should install the container, then install plugins one by one
    • You should update every container
  • When supporting CKAN
    • Better error messages for the harvester plugin
      • A better description of what happened for the developer
      • Now it’s hard to provide clients with a good error message

Jobs that interviewee’s customers usually fulfill with CKAN

  • Harvesting
    • Dashboard - where it fails, what works, what doesn’t, what has stopped, create test
  • Upload data
  • Meta data
    • d-cat
    • Onboarding customers on how to work with metadata to follow the country's standard
    • Harvesting a lot to pre-populate metadata
  • Data analysis. Examples:
    • Apps where public toilets are
    • Where trash is picked up
  • User management
    • Questions are regular
    • Public administrations always need more roles (permissions, who’s responsible, who’s the author, owner, publisher)


Respondent 5: Developer

Interview date - 3 June

OVERVIEW

The respondent works within open data in several roles. He has a job creating data portals for clients; contributes to CKAN as a developer and analyses open data on it’s own; has a certain attachment to Latin America and created websites for governmental institutions in 2 Latin American countries. The interviewee contributed as he loves open-source and open data thus CKAN is the perfect intersection of both worlds for him.

CKAN alternatives for Latin American governmental institutions

  • Local governments in [a Latin American country] use WordPress mostly
  • Small governments in Latin America are constrained on resources and have just one web developer
  • One of the largest cities in one of the biggest Latin American countries, but only three guys work with data there
  • Only ministries and highest-level gov’t institutions can use CKAN

Core plugins for him and governmental clients are

  • scheming
  • data store
  • x-loader
  • pusher

CKAN 3.0 top-3 directions of improvement

For Developer

  1. Cleaning and refactoring from old code
  2. Infrastructure, architecture design improvement
  3. Solr must be optional (not to be Solr dependent)

For User

  1. UI simplification:
  • Wants to have an overview of features - what’s used and what isn’t
  • The target is to get rid of features that aren’t used (Sergey moved Activities to a plugin )
  • He likes the modular approach to the system - you have what you need
  • Change workflow on how to create the dataset and upload the resource (resource should be first, then dataset step, then metadata)
  • Decoupling frontend and backend is good (even in a monolithic repo we can have a separate UI layer)

CKAN issues

  1. Solr is number one
  2. Unit test infrastructure is extremely slow (25 mins is too much)
    1. Our initiatives are out of budget
  3. Moving closer to Flask is good, and it’ll enable to removing of many unnecessary things from the code.
    1. How we create the Flask app can be improved, so we should learn how to use modern approaches.
  4. Docs for end users
    1. Documentation is obsolete
  5. Docs for developers
    1. As we provide the framework on how to build plugins

What can help with CKAN support

  1. Supporting materials
    1. Guide with examples
    2. Repo with examples
  • Once we struggled a lot with engaging users
  • The process of uploading data could be very bureaucratic

Jobs customers do

  • Uploading data
  • User roles
    • Sensitive data
    • Admin / Publisher / Owner
    • Who can see the data
    • Who can edit the data
    • Who can use it
  • Authorisation
    • Axure
    • Amazon
    • G Drive
  • Visualisations
    • Tools to engage with users
  • Community
    • Engaging
    • Forum
    • Comments
    • Discussion
    • Feedback on the dataset