Do you know how your Open Data is being used?
Measuring the Performance of the Western PA Regional Data Center
by David Walker
Western Pennsylvania Regional Data Center is a shared open data project established in 2015 to make public information more accessible and useful in the Pittsburgh region. It was created in 2015, and is managed by the University of Pittsburgh Center for Urban and Social Research, in partnership with Allegheny County and the City of Pittsburgh. The Regional Data Center manages a community open data portal, and serves as a community information intermediary by offering a number of services, products, and programs to help people find and use data.
During the planning phase, project organizers also developed a performance management framework for the Regional Data Center. This framework was designed to track the degree to which people were finding, interacting with, and taking action as a result of using data. We use this information to learn more about our users, uncover new opportunities for programs and initiatives, and adjust course as necessary. Our intent has always been to openly share our performance statistics with our data user community, project partners, funders, publishers, and advisors.
Initially, we tracked our performance statistics on a monthly basis using a Google Sheets spreadsheet. We would transcribe overall site usage stats from Google Analytics, and we also tracked a number of other program measures on our spreadsheet, including the number of classes that use data in local universities, and the number of news articles that mentioned the project or used data from the open data portal. We quickly began to realize that it was very difficult to communicate with our stakeholders using our spreadsheet. Our initial system also did not allow publishers to directly evaluate the results of their investment in open data. It was clear that they wanted us to provide an easily navigable interface for viewing statistics on the usage of individual datasets and that it was important to them to be able to share this information within their own organizations.
As a first attempt to provide such an interface, we developed a system to track a variety of variables, including the number of visitors our web site receives each month, the total number of pages viewed, and the counts of pageviews and downloads of each dataset and resource.
- The dashboard’s landing page provides a picture of overall use of our CKAN instance. We can track monthly usage in terms of number of visitors and pageviews. This data clearly shows that usage has grown, and is highest when university students are in class. We have also seen usage spikes soon after popular datasets are released, as was the case in April 2016 with the release of crash data, and April 2017 with the release of tax lien data and a package of data related to cardiovascular disease and social determinants of health.
- The “Resource stats” tab allows users to view sparklines depicting the monthly download history of each resource, along with other pageview and download metrics. Users can also download this information as CSV files. The search bar enables filtering by name of the dataset, resource, or publisher. Users can resort the table by a particular column by clicking on the name of that column.
- The “Dataset stats” tab provides similar functionality to the “Resource stats” tab, but here all downloads stats are obtained by summing over the downloads of all the resources within that dataset). Publishers can use these statistics to see what issues the community is interested in. This information enables us to detect surges in usage when we feature a dataset in our newsletter or on Twitter, a sure indication that we are reaching an audience.