In May 2021, the University of Virginia Faculty Senate passed a set of Open Access Guidelines and Recommendations. This was the first university-wide resolution on the subject, though the School of Data Science adopted its own open access policy a few months earlier. At the Claude Moore Health Sciences Library, we primarily serve the School of Medicine, the School of Nursing, and the Medical Center (collectively known as UVA Health). Though we now knew that the majority of the university’s faculty supported open access, we wanted to learn more about the attitudes of our library’s constituents in the health sciences.
We decided to create a dashboard to visualize what proportion of recent journal articles published by UVA Health authors were open access. This would help us to better understand the current publication landscape as well as identify particular authors who consistently published in open access journals, and therefore might be worth contacting as potential partners for outreach on the topic. UVA has institutional subscriptions to commercial software that measures research impact but the license terms for those products restrict how their data can be shared, which would limit the possibilities for future development of our project. To align with the larger project’s goal of encouraging open behaviours, we wanted to create the dashboard using only open data sources. Additionally, using open data sources and sharing our code would allow anyone to replicate it or adapt it to study their own institution’s publications.
To restrict our search to UVA Health authors, we needed an API that included institutional affiliation for all authors. We initially considered other options however the Europe PMC API proved to be the best data source for this project as it focuses on biomedical publications and the core metadata option for the API includes the institutional affiliation for all authors separated into individual fields.
After deciding on a data source, we needed to find a way to process and visualize the data. We ingested the Europe PMC API core metadata in JSON into a pandas dataframe and used Streamlit to create the visualization. Streamlit is an open source Python library which can be used to create interactive web applications. The free Sharing option allows users to create up to three complete web apps with shareable URLs using code stored in a public GitHub repository. With only three files, one for infrastructure, one for python libraries, and one for code, users can configure the environment for diverse analysis projects. We used Streamlit’s built-in caching feature to improve the speed of the application and Altair data visualization tools to create the charts.
The Open Data Dashboard now shows the proportion of open access versus subscription-only articles published by UVA-affiliated authors and indexed in Europe PMC from 2017 to the present, and we have also added an exploratory data analysis widget made using the Python library Sweetviz to provide a quick visualization of the dataset and help us uncover potential areas for future analysis.
We wanted to use the Europe PMC affiliation data to sort articles published by members of different departments or units at UVA Health, but this has proved challenging as the names are written in varying ways. Because of these risks, we have not drilled down in to the data based on department, even though this use of the data would likely be of the greatest interest to the library’s constituents.
We received assistance from numerous sources while working on this project. When we initially encountered problems in getting the full data set to load, Dr. Maaly Nassar quickly replied to our email to the Europe PMC Developer Forum and provided a piece of code which solved our problem. James Bennett at our local Code for America team recommended Sweetviz, and Randy Zwitch from Streamlit was available on Twitter for assistance setting up caching. More information on the project can be found in this blog post and our GitHub repository is available here.
Please direct any questions on the Open Data Dashboard to:
Lucy Carr Jones (lgc3t[at]virginia.edu), MSIS, Library Assistant, University of Virginia Claude Moore Health Sciences Library; and Anson Parker (adp6j[at]virginia.edu), Web Developer, University of Virginia Claude Moore Health Sciences Library.