Thursday, 21 December 2017

Looking back, looking forward: a year in review

As 2017 concludes, we reflect on Europe PMC’s landmark achievements. Let’s take a closer look at the highlights of this year.

Cutting-edge search and viewing mechanisms

Search results now display snippets - fragments of the publication text that contain matches to your query. Snippets highlight the search term in its context, making it easier to identify the relevant content you were looking for. The snippet link on the article abstract allows you to navigate directly to where your  search terms appear in the full text. Another new feature is the publication year filter, so that you can limit your search results to a particular year, or even a year range.

In addition to these search improvements we have focused on the abstract and full text display. We have added a citation graph to the publication pages, showing how frequently the article was referenced over time. The graph is a visual representation of the “cited by” information available via the Citations tab. The redesigned affiliation view helps you navigate through the list of authors, highlighting those that belong to the same institution.

User services

Last year we introduced Author Profile pages based on publicly available ORCID data. This year we have made them more discoverable thanks to the “Suggested Authors” feature. An author search in Europe PMC now brings up a Suggested Authors box linking to matching researchers that have an ORCID iD. The box displays up to the two most prolific researchers and links to their Author Profile page. Currently, there are over 600,000 biomedical researchers life science researchers actively publishing and using their ORCIDs, having claimed about 4.7 million articles available in Europe PMC.

We have also made it easier to manage your accounts in Europe PMC. User accounts allow you to save frequently run searches. You can sign in using your ORCID, Twitter, and Europe PMC Plus credentials, or create a separate Europe PMC account. If you have several accounts due to multiple sign-ins you can now merge them to have all your saved searches in one place.

Integrated research data

Data integration is a unique feature of Europe PMC. To help researchers navigate the data-rich literature we have developed the SciLite tool. It highlights text-mined biological terms (annotations) in scientific articles and links them to related data records in public resources. Since its launch in July last year, the SciLite application has expanded to cover both abstracts and open access full text articles. The list of SciLite annotations has also broadened with the new providers joining the scheme. In addition to core named entities (gene/protein names, organisms, diseases, chemicals, Gene Ontology terms, database accession numbers), phosphorylation events, and gene functions, users can find gene-disease associations from Open Targets and DisGeNET, as well as protein-protein interactions from the IntAct database. We have added a new feature to the SciLite tool and you can now see a list of individual terms for each annotation type.

To make it easier to access all primary data associated with a study, Europe PMC has integrated with the BioStudies database. BioStudies acts as a data container, combining supplemental data and linked data residing in public repositories in a single location. Linked data is identified by text-mined accession numbers for over 20 major data resources in the life sciences, including ENA, PDBe, and UniProt. Over a million Europe PMC articles now have corresponding BioStudies records, which can be explored by clicking on the BioStudies link on the abstract pages.

Support for text and data mining

We want to make the most of having 1.8 million open access articles and 4.5 million full text articles in Europe PMC. We encourage text mining groups and developers to develop new technologies, which can improve information retrieval and researcher workflows. All open content in Europe PMC can be accessed via APIs or via the FTP site. This year we have added the literature-data crosslinks and supplementary data files (including figures for open access articles) to the list of content available for bulk download. We have also expanded the Europe PMC programmatic tools suit with the Annotations API, which allows retrieval of all text-mined annotations from Europe PMC and other providers. The API provides the means for a wider community of biomedical scientists to exploit the results of text-mining in their own research.

User community

We envision Europe PMC as an innovation platform, open for new developments coming from the community itself. We continuously undertake outreach and engagement efforts to foster a collaborative community of Europe PMC users and stakeholders. We have expanded Europe PMC user support with online training covering different aspects of the service, from literature and data search to programmatic access. We have set up a public developer forum for discussions, questions and suggestions about Europe PMC web services. Join the group to connect with the Europe PMC developers and other power users.

Among other community news, our family of 28 funders was joined by the Wellcome Trust/DBT India Alliance, who became a member of Europe PMC so that the research they fund can be archived in Europe PMC, supporting their open access policies. Over 2300 grants have been added by our funders since the spring and the database now includes more than 60,000 grants, crosslinked to the articles they supported.

It is critical to us to understand the needs of our user community and their goals, and manage Europe PMC services accordingly. At the beginning of this year we conducted a user research study to gain a better understanding of literature search behaviour and published a user research report. In the Summer we conducted a user survey with over 300 participants. The feedback we have received from the user research and survey will inform next year’s development.

After looking back at Europe PMC accomplishments, we are looking forward to the new and exciting goals for 2018. You can see what’s in store for 2018 on the Europe PMC roadmap. As always, we welcome your feedback. Leave a comment, send us an e-mail, or connect with us via Twitter. We wish you all happy holidays. Season’s greetings from Europe PMC!

Tuesday, 19 December 2017

PubMed Central Canada is retiring: response from Europe PMC

The PubMed Central International (PMCI) network is a collaborative effort between the PMCI repositories, publishers, and funding organisations that wish to preserve and provide free access to journal articles authored by the researchers they support. For many years, the PMCI network has consisted of three nodes, PMC USA, Europe PMC and PMC Canada. As of the 23rd February 2018, this network will reduced to two nodes following the decision by the Canadian Institutes of Health Research (CIHR) and the National Research Council (NRC) to close down PMC Canada.

Europe PMC is committed to the PMCI collaboration, and will continue to provide a world-class service, as it has done for the past decade, with funds secured until 2021. The funding of Europe PMC is shared across twenty eight international life science funding organisations, providing a resilient sustainability model for this key research infrastructure (1).

The European Bioinformatics Institute (EMBL-EBI), which runs Europe PMC, and the NCBI, which runs PMC USA, have a long history of collaboration in the provision of open life sciences data resources, starting with the International Nucleotide Sequence Database Collaboration (INSDC). The PMCI network is well-aligned with these resources, and plays an integral part in fulfilling our shared goals to enable international open science.

Tuesday, 14 November 2017

Harness the power of text-mining for biomedical discovery: introducing Europe PMC Annotations API

We are excited to announce the launch of Europe PMC Annotations API, which provides programmatic access to annotations text-mined from biomedical abstracts and open access full text articles. The Annotations API is a part of Europe PMC’s programmatic tools suit and is freely available on the Europe PMC website:

The exponential growth in scientific data and scholarly content cannot be addressed by conventional means of information discovery. Text-mining offers a practical solution to scale information extraction and advance biomedical research. However its application is still limited, partially due to the technical know-how needed to set up a text-mining pipeline. Nonetheless, even non-specialists can capitalize on the text-mining outcomes. Making the text-mining outputs openly available can enable a broad audience of researchers and developers to address current challenges in biomedical literature analysis. For that reason, Europe PMC has established a
community annotation platform. It consolidates text-mined annotations from various providers and makes them available both via the Europe PMC website as text highlights using the SciLite application, and now programmatically, with the Europe PMC Annotations API.

What types of annotations are available via the API?

The text-mining platform hosts a variety of annotation types, including the following:
  • Named entities: gene/protein names, organisms, diseases, Gene Ontology terms, chemicals and accession numbers.
  • Biological events: protein phosphorylation
  • Relationships: gene-disease associations
  • Text phrases: GeneRIF (Gene Reference into Function), as well as protein-protein interactions.

The number and diversity of annotation types available to users grows as the new providers join the scheme. Currently annotations are contributed by a number of providers, including Europe PMC’s own text-mining pipeline for named entities, the Swiss Institute of Bioinformatics for gene function annotations, DisGeNET and the Open Targets Platform for gene-disease relations, IntAct for manually curated protein-protein interactions, and NaCTEM for phosphorylation events.

What can you do using the Annotations API?

The Annotations API offers diverse functionality for various user workflows. It is possible to retrieve all mentions of chemical entities found in results section, or recall all articles that discuss involvement of FFA2 protein in diabetes. Users can combine a number of parameters (such as annotation type, annotation provider, or even article section) to specify their query. The filter parameter allows the user to switch between two options: retrieving only specified annotations for each article, or retrieving all the annotations limited to the list of articles that contain the specified annotation. Output formats include XML, ID_LIST, JSON and JSON-LD, which produces a linked data representation of the annotations for annotations exchange across different platforms, making it easier to consume the data.

The following modules are available for the Annotations API:
  • get /annotationsByArticleIds: Get the annotations contained in the list of articles specified by the user. The user can specify article iDs, annotation type, annotation provider, or article section.
  • get /annotationsByEntity: Get the annotations of the articles which have at least one annotation tagging the specified entity (e.g. BRCA1, metformin, or cancer).
  • get /annotationsByProvider: Get the annotations of the articles which have at least one annotation provided by the specified provider (e.g. Open Targets).
  • get /annotationsByRelationship: Get the annotations of the articles which have at least one annotation tagging two specified entities relating to each other (i.e. NRGN-schizophrenia for gene-disease relationship).
  • get /annotationsBySectionAndOrType: Get the annotations of the articles which have at least one annotation of a type inside an article section (e.g. all organisms annotations in “Materials and Methods” section). The user can also choose to specify only the annotation type or only the article section.

Advancing science with text-mining

Systematic analysis of research literature offers immense potential for advancing scientific knowledge. However, extracting facts and evidence is a difficult, time-consuming and laborious task. That’s where text-mining comes into play, helping to advance the discovery process. Our goal in developing the Annotations API was to make text-mining outputs available to a wider community of biomedical scientists, and we hope that with this new tool anyone will be able to harness the power of text-mining for the benefit of their own research.

Thursday, 31 August 2017

Wellcome Trust/DBT India Alliance joins Europe PMC

We're delighted to announce that the Wellcome Trust/DBT India Alliance joins Europe PMC as a new funder. This brings the Europe PMC funder family to 28 members.

Wellcome Trust/DBT India Alliance logo

The Wellcome Trust/DBT India Alliance is an equal partnership initiative between the Wellcome Trust (UK) and the Department of Biotechnology (Government of India). The broad aim of the India Alliance is to build excellence in the Indian biomedical scientific community by supporting future leaders in the field.

Researchers funded by the Wellcome Trust/DBT India Alliance will join thousands of others who make their published research articles freely available from Europe PMC as soon as possible, and in any event within six months of publication.

For more information about joining Europe PMC, visit our website:

Wednesday, 19 July 2017

Blow your own trumpet: build an academic CV with ease

Whether applying for a postdoc, a grant, or a tenure position, you will need to compile a publication list every now and then. There can be many ways to do so, from maintaining an Excel sheet, to building your profile on a faculty page. One option, however, requires minimum effort from your side and keeps a stable record of all your achievements. It is our all-time favourite – an ORCID iD. Getting one is easy: you register with the ORCID foundation at
How does it help you to maintain a publication list? Very simple - we have already taken care of that. We automatically generate Author Profiles for biomedical researchers based on their ORCID record. You can find yours under the following link:[your ORCID iD]. We list all your publications in Europe PMC that you have claimed and made public in your ORCID profile, conveniently linking to abstracts and free full text. There are links to your ORCID and ImpactStory pages to help keep all related information in one place.
We know that showing impact is important, so you can see citation counts for every paper and even sort papers in your profile by “times cited” to see which of your works made the biggest impact. All citations come from open sources and no subscription is required. The other option is to sort your publication list by date, to display the most recent publications first.
Do you fancy something more visual than a dry list? Download a publication graph from your Author Profile. It shows your academic output year by year as well as your yearly citation rate. An extra touch for open science champions – open access papers are marked blue on the publication graph. Looks neat, doesn’t it?
If at this point you are wondering how long it will take you to add all your publications, no worries. We know how busy and stressful academic research can be. That’s why we made sure that you won’t spend much time and effort updating the list. We have teamed up with ORCID and developed a simple tool that allows you to claim your published papers found in Europe PMC to your ORCID. It works like a charm: authorise the tool, scroll through suggested items, pick your contributions from the list – and voila – they are added to your record. Nowadays more and more publishers will make your job even easier – by adding your newly published manuscript to your ORCID record on your behalf. All you need to do is to provide your ORCID ID upon submission.

You can link to your Author Profile page from your institutional or personal website, or attach the graph image to your application. The link to your profile will also appear whenever people search for your work on Europe PMC, helping to promote the science you do.
With an author profile that lists all your publications it is easy to get credit where it’s due, and keep your academic CV in check. Try it for yourself with Europe PMC.

Friday, 12 May 2017

Preprints and the ASAPBio "Central" Services

Jo McEntyre, EMBL-EBI; Thomas Lemberger, EMBO; Mark Patterson, eLife; Kristen Rattan, Collaborative Knowledge Foundation; Alfonso Valencia, Barcelona Supercomputer Centre. The use of preprints in the life sciences offers tantalising opportunities to change the way research results are communicated and reused, and the work of ASAPbio has been key in engaging the scientific community to promote their uptake. We fully support these goals, and consequently submitted a response to the recent ASAPbio Request for Applications (RFA). In light of ASAPbio’s understandable recent decision to suspend the RFA process for four months, we are making our proposal public here, to encourage and contribute to ongoing, open discussions on these matters.
Our consortium is led by the European Bioinformatics Institute (EMBL-EBI), with collaborators in the Collaborative Knowledge Foundation, the Barcelona Supercomputer Centre, eLife and EMBO. We appreciate that not everyone interested in preprints will have time to read the full proposal, so we summarise some of the main points here.
We put in a response to the RFA because we share the excitement and enthusiasm that has emerged recently around the use of preprints in the biological sciences. The reason for our excitement is simple - alongside the rapid communication of research, we see massive potential for innovation based on preprint content. We envision that the best route to enable these goals is through a reasonable number of preprint servers and services, coordinated through the operation of agreed community standards. The standards will allow content to be federated and/or aggregated across servers, depending on the use cases. This model allows a diversity of approaches to addressing the opportunities and challenges that preprints bring.
Between us, we are developing infrastructure and services for publishing processes, article enrichment, text and data mining tools, bioinformatics, and mechanisms for data integration and discovery. But more important than our singular contributions, we are also embedded in broader researcher and developer communities that are as enthusiastic as we are about the opportunities for innovation that preprints offer. Alongside the core elements in the ASAPbio RFA, the fundamental theme of our proposal is therefore to enable those communities to engage with preprint content and contribute to moving scientific communication "beyond the PDF".
Our proposal is to combine existing and emerging open-source software and open data infrastructure to facilitate the ingestion of preprints from any source into a community archive and then share the content in different ways. This satisfies not only the scientific imperative of rapidly discoverable research results, but also creates a platform for innovation that has the promise to make information discovery faster and more effective in the future.
In short, the central services we envisage will enable any interested party to develop "plug-in" applications that can be used - optionally and in any order - in any part of the system. Some applications might work on individual documents prior to release (for example in quality control); others might work on the collection as a whole, post release; some might be fundamental "mission-critical" steps (like document conversions); and some might be more experimental. We propose to engage the developer and text- and data-mining communities through open challenges to invent new applications based on preprints. No-one knows where the next "killer app" will come from, so we want to foster broad participation and expose these developments to the wider scientific community.
The top priority is to support the uptake of preprints by the scientific community and ensure their citability and discoverability. But in order to realise transformative developments in the future, there are necessities beyond this.
Most critical among these is the ability to reuse preprints. By this, we mean not only that the content has a license that supports reuse (the CC-BY license), but also that the content is readily available as a whole, so that would-be application developers and text-miners do not have to struggle to gather content together. Most peer-reviewed literature is still subject to access and reuse restrictions and is highly distributed – with preprints we have a unique opportunity to support unrestricted and comprehensive reuse from the outset.
Secondly, quality metadata and the consistent application of standards are essential. We care about open standards like JATS for structuring the XML of full text articles, and are open to discussion about how this may evolve to support preprints in the future. Author names with ORCIDs, machine readable data citation, correctly identified institutions and funding sources are all critical for a connected research management ecosystem. Given these building blocks, others could develop tools that reduce the repetitive reporting burden on researchers, or services and indicators to give a wider stakeholder group a better understanding of the influence and impact of research. Finally, a governance structure that represents the interests of the community is a necessity, as services around preprints need to remain current and address evolving user needs over time. This approach to preprints infrastructure lends itself to reuse within different disciplinary contexts, providing a basis for cross-disciplinary standards of core elements, yet allowing adaptation by those communities according to their specific scientific requirements. Central services are a crucial part of biology today. It is hard to imagine how biology could progress without resources such as the wwPDB, or the International Nucleotide Sequence Database Collaboration. We are excited about preprints because they offer a tremendous opportunity to move science forward in parallel with these data resources, enabling integration of research outputs and knowledge discovery. We welcome comments and discussion as we move towards these shared goals, supporting science into the future.

Friday, 28 April 2017

Keeping track of published literature

Modern scientists are busier than ever. Their typical days are filled not only with experimental work, but also with teaching, supervising, mentoring, grant applications, budget planning… The list goes on and on. No wonder there is barely any time left to stay on top of the field. Keeping track of published literature is made easier by following these simple tips:

Newest first

Don’t get lost in the long list of publications. To find the most recent articles on Europe PMC, sort results by date. If you want to limit your search to a specific date range – last week or last month – set this in the advanced search.

Focus on what’s important

Citations are the currency of the academic world. Familiarise yourself with the most cited papers in your area by using “Times cited” as a sorting order. For your convenience, citation counts are displayed for each publication in the search results on Europe PMC.

Follow your colleagues

Are you already familiar with the experts in your field? Check for publications from a specific author by typing their last name into the search bar. For scientists with common last names, such as Smith or Wu, paste their unique ORCID ID into the search bar to match the author exactly.

Automate repetitive tasks

Don’t waste your time on a job that your computer can do for you. Doing the same search every now and then? Instead of typing a long query into the search bar every time, save your search and recall it with one click. In Europe PMC all of your saved searches appear in your account. Create an account or log in with ORCID or Twitter.

Stay alert

With a busy schedule, it is easy to miss an exciting discovery.  Any search, including those by keywords, author, or scientific journal, can be turned into an RSS feed on Europe PMC. This way, once an article on your topic is added to the database, you will be notified immediately.

Tuesday, 28 February 2017

Sort it out: Sorting your search results with Europe PMC

Imagine you are exploring a new topic. You start by searching for relevant papers in the field. You type your query, click the search button, and end up with thousands of scientific articles, waiting to be read. How do you identify which papers to focus on?

In Europe PMC, search results can be sorted differently to help you navigate through the literature. By default, results are sorted by relevance. But how is relevance defined? Let’s look at a search example: say we are interested in oxidative DNA damage. Once you type that string into a search box, the sorting algorithm ranks all your results and displays them in order. The algorithm takes into account how often search terms are found in the text. A document that mentions "DNA" ten times is likely to be more relevant to you than one with a single mention. The relevance score also depends on how many of the search terms a document contains. For instance, papers discussing "oxidative damage" or "DNA damage" are less appropriate than the ones specifically covering "oxidative DNA damage".
Rare terms will be more relevant than common ones. Note that we expand your search by including synonyms, so whenever you search for DNA damage, you will also discover articles mentioning genotoxic stress. You can switch off synonym expansion in the advanced search, or simply place your search terms in double quotes for an exact match, e.g. "oxidative DNA damage".

Scientific abstracts are ranked higher than articles, and papers are considered more relevant than books and other documents. Nonetheless, you can always change the type of content you are looking for via the "Popular content sets" on the right-hand side of the search results, or in the advanced search.

Using our relevance sorting, more recent publications will appear higher in the list, but if you want the latest papers over anything else, you can simply sort by date. For instance, if you are eager to see the latest manuscript from your collaborators, or the new publications from your favourite journal, just look at the most recent papers. What if, instead, you are interested in classic articles that have pioneered the field and laid the foundation for future research? Good news: you don't need to scroll to the last page of results. Simply search by date, in reverse order. Now you can see how a scientific field has progressed.

Another way to search for foundational articles is by sorting results by times cited. Citation counts can help you find experts in the field, or help you identify the most impactful works. When ordering results by times cited, the number of citations is displayed for each paper.

With Europe PMC you can find research that matters with the click of a button. Sorting has never been easier!