News blog

Updates from Europe PMC, a global database of life sciences literature

Europe PMC team

 | 25 October 2021

 | 3 MINS READ

 | Guest post

Using the Europe PMC REST API to Study Open Access Publishing at the University of Virginia


In May 2021, the University of Virginia Faculty Senate passed a set of Open Access Guidelines and Recommendations. This was the first university-wide resolution on the subject, though the School of Data Science adopted its own open access policy a few months earlier. At the Claude Moore Health Sciences Library, we primarily serve the School of Medicine, the School of Nursing, and the Medical Center (collectively known as UVA Health). Though we now knew that the majority of the university’s faculty supported open access, we wanted to learn more about the attitudes of our library’s constituents in the health sciences.

We decided to create a dashboard to visualize what proportion of recent journal articles published by UVA Health authors were open access. This would help us to better understand the current publication landscape as well as identify particular authors who consistently published in open access journals, and therefore might be worth contacting as potential partners for outreach on the topic. UVA has institutional subscriptions to commercial software that measures research impact but the license terms for those products restrict how their data can be shared, which would limit the possibilities for future development of our project. To align with the larger project’s goal of encouraging open behaviours, we wanted to create the dashboard using only open data sources. Additionally, using open data sources and sharing our code would allow anyone to replicate it or adapt it to study their own institution’s publications. 

To restrict our search to UVA Health authors, we needed an API that included institutional affiliation for all authors. We initially considered other options however the Europe PMC API proved to be the best data source for this project as it focuses on biomedical publications and the core metadata option for the API includes the institutional affiliation for all authors separated into individual fields.

After deciding on a data source, we needed to find a way to process and visualize the data. We ingested the Europe PMC API core metadata in JSON into a pandas dataframe and used Streamlit to create the visualization. Streamlit is an open source Python library which can be used to create interactive web applications. The free Sharing option allows users to create up to three complete web apps with shareable URLs using code stored in a public GitHub repository. With only three files, one for infrastructure, one for python libraries, and one for code, users can configure the environment for diverse analysis projects. We used Streamlit’s built-in caching feature to improve the speed of the application and Altair data visualization tools to create the charts.

The Open Data Dashboard now shows the proportion of open access versus subscription-only articles published by UVA-affiliated authors and indexed in Europe PMC from 2017 to the present, and we have also added an exploratory data analysis widget made using the Python library Sweetviz to provide a quick visualization of the dataset and help us uncover potential areas for future analysis. 

We wanted to use the Europe PMC affiliation data to sort articles published by members of different departments or units at UVA Health, but this has proved challenging as the names are written in varying ways. Because of these risks, we have not drilled down in to the data based on department, even though this use of the data would likely be of the greatest interest to the library’s constituents.

We received assistance from numerous sources while working on this project. When we initially encountered problems in getting the full data set to load, Dr. Maaly Nassar quickly replied to our email to the Europe PMC Developer Forum and provided a piece of code which solved our problem. James Bennett at our local Code for America team recommended Sweetviz, and Randy Zwitch from Streamlit was available on Twitter for assistance setting up caching. More information on the project can be found in this blog post and our GitHub repository is available here

Please direct any questions on the Open Data Dashboard to:
Lucy Carr Jones (
lgc3t[at]virginia.edu), MSIS, Library Assistant, University of Virginia Claude Moore Health Sciences Library; and Anson Parker (adp6j[at]virginia.edu), Web Developer, University of Virginia Claude Moore Health Sciences Library.

2 comments on "Using the Europe PMC REST API to Study Open Access Publishing at the University of Virginia"


Europe PMC features use case examples from programmatic users on the Europe PMC API Case Studies page (https://europepmc.org/API-case-studies). If you have used the Europe PMC API and would like to share your story please get in touch via helpdesk[at]europepmc.org

You can now read more about the UoV Health Open Data Dashboard in the following publication: Parker, A., Heflin, A., & Jones, L. (2021). Analyzing University of Virginia Health publications using open data, Python, and Streamlit. Journal of the Medical Library Association, 109(4), 688–689. doi:https://doi.org/10.5195/jmla.2021.1360

Post a comment


I agree to the limited use of my personal data as described in the Europe PMC advanced user services privacy policy.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Subscribe to the Europe PMC News blog to receive the latest updates

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Partnerships & funding

Europe PMC is a service of the Europe PMC Funders' Group, in partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI); and in cooperation with the National Center for Biotechnology Information (NCBI) at the U.S. National Library of Medicine (NCBI/NLM) . It includes content provided to the PubMed Central (NLM/PMC) archive by participating publishers.