Friday, 12 May 2017

Preprints and the ASAPBio "Central" Services

Jo McEntyre, EMBL-EBI; Thomas Lemberger, EMBO; Mark Patterson, eLife; Kristen Rattan, Collaborative Knowledge Foundation; Alfonso Valencia, Barcelona Supercomputer Centre. The use of preprints in the life sciences offers tantalising opportunities to change the way research results are communicated and reused, and the work of ASAPbio has been key in engaging the scientific community to promote their uptake. We fully support these goals, and consequently submitted a response to the recent ASAPbio Request for Applications (RFA). In light of ASAPbio’s understandable recent decision to suspend the RFA process for four months, we are making our proposal public here, to encourage and contribute to ongoing, open discussions on these matters.
Our consortium is led by the European Bioinformatics Institute (EMBL-EBI), with collaborators in the Collaborative Knowledge Foundation, the Barcelona Supercomputer Centre, eLife and EMBO. We appreciate that not everyone interested in preprints will have time to read the full proposal, so we summarise some of the main points here.
We put in a response to the RFA because we share the excitement and enthusiasm that has emerged recently around the use of preprints in the biological sciences. The reason for our excitement is simple - alongside the rapid communication of research, we see massive potential for innovation based on preprint content. We envision that the best route to enable these goals is through a reasonable number of preprint servers and services, coordinated through the operation of agreed community standards. The standards will allow content to be federated and/or aggregated across servers, depending on the use cases. This model allows a diversity of approaches to addressing the opportunities and challenges that preprints bring.
Between us, we are developing infrastructure and services for publishing processes, article enrichment, text and data mining tools, bioinformatics, and mechanisms for data integration and discovery. But more important than our singular contributions, we are also embedded in broader researcher and developer communities that are as enthusiastic as we are about the opportunities for innovation that preprints offer. Alongside the core elements in the ASAPbio RFA, the fundamental theme of our proposal is therefore to enable those communities to engage with preprint content and contribute to moving scientific communication "beyond the PDF".
Our proposal is to combine existing and emerging open-source software and open data infrastructure to facilitate the ingestion of preprints from any source into a community archive and then share the content in different ways. This satisfies not only the scientific imperative of rapidly discoverable research results, but also creates a platform for innovation that has the promise to make information discovery faster and more effective in the future.
In short, the central services we envisage will enable any interested party to develop "plug-in" applications that can be used - optionally and in any order - in any part of the system. Some applications might work on individual documents prior to release (for example in quality control); others might work on the collection as a whole, post release; some might be fundamental "mission-critical" steps (like document conversions); and some might be more experimental. We propose to engage the developer and text- and data-mining communities through open challenges to invent new applications based on preprints. No-one knows where the next "killer app" will come from, so we want to foster broad participation and expose these developments to the wider scientific community.
The top priority is to support the uptake of preprints by the scientific community and ensure their citability and discoverability. But in order to realise transformative developments in the future, there are necessities beyond this.
Most critical among these is the ability to reuse preprints. By this, we mean not only that the content has a license that supports reuse (the CC-BY license), but also that the content is readily available as a whole, so that would-be application developers and text-miners do not have to struggle to gather content together. Most peer-reviewed literature is still subject to access and reuse restrictions and is highly distributed – with preprints we have a unique opportunity to support unrestricted and comprehensive reuse from the outset.
Secondly, quality metadata and the consistent application of standards are essential. We care about open standards like JATS for structuring the XML of full text articles, and are open to discussion about how this may evolve to support preprints in the future. Author names with ORCIDs, machine readable data citation, correctly identified institutions and funding sources are all critical for a connected research management ecosystem. Given these building blocks, others could develop tools that reduce the repetitive reporting burden on researchers, or services and indicators to give a wider stakeholder group a better understanding of the influence and impact of research. Finally, a governance structure that represents the interests of the community is a necessity, as services around preprints need to remain current and address evolving user needs over time. This approach to preprints infrastructure lends itself to reuse within different disciplinary contexts, providing a basis for cross-disciplinary standards of core elements, yet allowing adaptation by those communities according to their specific scientific requirements. Central services are a crucial part of biology today. It is hard to imagine how biology could progress without resources such as the wwPDB, or the International Nucleotide Sequence Database Collaboration. We are excited about preprints because they offer a tremendous opportunity to move science forward in parallel with these data resources, enabling integration of research outputs and knowledge discovery. We welcome comments and discussion as we move towards these shared goals, supporting science into the future.

Friday, 28 April 2017

Keeping track of published literature

Modern scientists are busier than ever. Their typical days are filled not only with experimental work, but also with teaching, supervising, mentoring, grant applications, budget planning… The list goes on and on. No wonder there is barely any time left to stay on top of the field. Keeping track of published literature is made easier by following these simple tips:

Newest first

Don’t get lost in the long list of publications. To find the most recent articles on Europe PMC, sort results by date. If you want to limit your search to a specific date range – last week or last month – set this in the advanced search.

Focus on what’s important

Citations are the currency of the academic world. Familiarise yourself with the most cited papers in your area by using “Times cited” as a sorting order. For your convenience, citation counts are displayed for each publication in the search results on Europe PMC.

Follow your colleagues

Are you already familiar with the experts in your field? Check for publications from a specific author by typing their last name into the search bar. For scientists with common last names, such as Smith or Wu, paste their unique ORCID ID into the search bar to match the author exactly.

Automate repetitive tasks

Don’t waste your time on a job that your computer can do for you. Doing the same search every now and then? Instead of typing a long query into the search bar every time, save your search and recall it with one click. In Europe PMC all of your saved searches appear in your account. Create an account or log in with ORCID or Twitter.

Stay alert

With a busy schedule, it is easy to miss an exciting discovery.  Any search, including those by keywords, author, or scientific journal, can be turned into an RSS feed on Europe PMC. This way, once an article on your topic is added to the database, you will be notified immediately.

Tuesday, 28 February 2017

Sort it out: Sorting your search results with Europe PMC


Imagine you are exploring a new topic. You start by searching for relevant papers in the field. You type your query, click the search button, and end up with thousands of scientific articles, waiting to be read. How do you identify which papers to focus on?


In Europe PMC, search results can be sorted differently to help you navigate through the literature. By default, results are sorted by relevance. But how is relevance defined? Let’s look at a search example: say we are interested in oxidative DNA damage. Once you type that string into a search box, the sorting algorithm ranks all your results and displays them in order. The algorithm takes into account how often search terms are found in the text. A document that mentions "DNA" ten times is likely to be more relevant to you than one with a single mention. The relevance score also depends on how many of the search terms a document contains. For instance, papers discussing "oxidative damage" or "DNA damage" are less appropriate than the ones specifically covering "oxidative DNA damage".
Rare terms will be more relevant than common ones. Note that we expand your search by including synonyms, so whenever you search for DNA damage, you will also discover articles mentioning genotoxic stress. You can switch off synonym expansion in the advanced search, or simply place your search terms in double quotes for an exact match, e.g. "oxidative DNA damage".



Scientific abstracts are ranked higher than articles, and papers are considered more relevant than books and other documents. Nonetheless, you can always change the type of content you are looking for via the "Popular content sets" on the right-hand side of the search results, or in the advanced search.



Using our relevance sorting, more recent publications will appear higher in the list, but if you want the latest papers over anything else, you can simply sort by date. For instance, if you are eager to see the latest manuscript from your collaborators, or the new publications from your favourite journal, just look at the most recent papers. What if, instead, you are interested in classic articles that have pioneered the field and laid the foundation for future research? Good news: you don't need to scroll to the last page of results. Simply search by date, in reverse order. Now you can see how a scientific field has progressed.



Another way to search for foundational articles is by sorting results by times cited. Citation counts can help you find experts in the field, or help you identify the most impactful works. When ordering results by times cited, the number of citations is displayed for each paper.



With Europe PMC you can find research that matters with the click of a button. Sorting has never been easier!

Tuesday, 20 December 2016

Europe PMC - a year in review

This has been a very busy and fruitful year for Europe PMC. With holidays approaching we take a look back at some of the Europe PMC achievements and milestones of 2016.
We kicked off in January with a new, mobile-friendly homepage and information pages. The final design was based on user research in which we gathered lots of feedback from you to come up with design concepts, which were thoroughly tested again before release. We still have work to do on improving these pages and we intend to chip away with making them better, based on feedback, throughout 2017.
Our spring was focussed on the migration of manuscript submission services from JISC to the EMBL-EBI. We installed a database (developed by JISC) that contains information on all the grants awarded by the funders of Europe PMC. This information is important to us to authorize the upload of manuscripts from grant holders, but we also make the data discoverable on the Europe PMC website via the grant finder, or programmatically via the GRIST API. Over 2500 grants have been added by our funders since the spring and the database now includes more than 56,000 grants, crosslinked to the articles they supported.
Summer was a hot time for us. One of the key objectives for Europe PMC is to enable application developers and text miners to reuse the open access full text content, in order to find innovative approaches to searching, filtering and reading the scientific literature. Towards this goal, in July we launched SciLite - a tool that allows text-mined terms to be displayed as an overlay on research articles. For Europe PMC users, SciLite makes it easier to scan articles for key concepts and provides deep links with related data. For text miners, it is an opportunity to showcase their work to a wider public. SciLite is based on the Web Annotation Data Model and is open to any text-mining group. So far, SciLite operates on articles with a CC-0, CC-BY or CC-BY-NC licence, and has contributions from the National Centre for Text Mining (NaCTeM) and the Bibliomics and Text Mining group at the University of Applied Sciences, Geneva, as well as from the core Europe PMC text-mining pipeline; the number of contributors is set to grow in 2017. All the full text annotations are available via a SPARQL API.
As a part of our mission to deliver a world-class service for all of our users, we took a major step towards improving our search by migrating Europe PMC indexing to Solr in the late summer. This has already significantly improved the speed with which we can return search results, as well as making our indexing of new content more efficient. As well as these basics, Solr comes with some useful features built-in, such as filters, auto-suggestions, and snippets, which we plan to start exploiting next year. Other search improvements introduced in 2016 include the addition of reverse sort orders for “date” and “times cited”, so that you can see results ranked by “oldest first” or “least cited”, and the ability to search by embargo date (as well as their display on abstract pages).
Finally, for the summer, our family of 27 funders was joined by the Academy of Medical Sciences (AMS), who became a member of Europe PMC so that the research they fund can be archived in Europe PMC, supporting their open access policies.
We are always open to new opportunities to improve on existing services. This year our External Link providers, who link out from the content in Europe PMC to their resources, can now display an image alongside the link itself. Already the iconic Altmetric donut is apparent on articles under the “External Links” tab, the CitePeer icon alerts users to alternative reference lists on the Related Articles tab, and the ImpactStory badge can be seen on the Author Profile pages. The Author Profile pages are generated for every researcher with a paper in Europe PMC listed in their ORCID profile, and their number has been constantly growing throughout the year. Currently, there are about 350,000 life science researchers actively publishing and using their ORCIDs, having claimed about 3.5 million articles available in Europe PMC.
Our autumn has been about building a program of community engagement and starting research projects with users that will inform next year’s development. The Europe PMC team has now expanded to include a community manager. We hope that this will help us to connect and reach out to all of our stakeholders, create opportunities for networking and discussions, as well as to identify useful synergies. It’s critical to us to understand the needs of our user community and their goals, and manage Europe PMC services accordingly. Finally, as part of our community engagement, we aspire to be open about our development plans in order to gain feedback at the earliest opportunity. So we have concluded this year with the publication of a roadmap that summarizes our recent accomplishments and highlights our near-term development plans, which we will update quarterly on the Europe PMC website. As always, we welcome your feedback. Leave a comment, send us an e-mail, or connect with us via Twitter.
We are looking forward to 2017 and wish all our users happy holidays. Season’s greetings from Europe PMC!

Monday, 7 November 2016

Growing connections: new Community Manager at Europe PMC

What exactly is Europe PMC? How is it different from PubMed Central? What can I do with Europe PMC?
Those are some of the questions I had myself when I first heard of Europe PMC. And now my task is to make sure that everyone can answer them easily.
My name is Maria Levchenko and I am a recent addition to the Europe PMC team. As a Community Manager I am responsible for reaching out to all of our users, informing them about new developments, but also gathering their valuable feedback, in order to make our services even more user-friendly and helpful for everyone.
My background is in biomedical research. I have also briefly worked at the European Research Council – one of the science funders that support Europe PMC – to better understand how funding agencies shape the way we do science.
My interest in Europe PMC stemmed from their goal to create a world-class literature resource. I am passionate about tools that help the scientific community on their quest for knowledge, and I believe that Europe PMC can do for research what Google does for our everyday life.
Our mission is to build open, full-text scientific literature resources and support innovation by engaging users, enabling contributors and integrating related research data. Europe PMC provides access and adds value to more than 30 million abstracts and almost 4 million full-text articles. All of the content in Europe PMC is free to read and we strongly support open access, in order to make the scientific literature available to everyone. On top of the core content, we see Europe PMC as a community-sourced platform for new developments that will improve the ways we can search, browse and integrate the literature with research data. We strive to be open about our work so that anyone can use the services and create new tools. We want to foster a sense of collaborative community, and this is where I come into play. My role is to connect with you, our users, provide you with the means to read and use the content and make your own contributions, helping to make Europe PMC a useful resource for scientists, developers and the general public alike.
In the coming weeks and months, I will be sharing news about our work, providing useful materials on how to use our service, as well as responding to your feedback.
We invite you to tell us what you think about Europe PMC and its role in your professional life. Leave a comment, send us an e-mail at helpdesk@europepmc.org, or connect with us via Twitter @EuropePMC_news.
Looking forward to hearing from you!

Friday, 9 September 2016

SciLite Annotations

Manual curation is vital for maintaining quality information in biological databases. However, with the exponential growth in biological data this approach is both challenging and time consuming. We wanted to create a scalable and sustainable process which could complement manual curation.   

We recently launched a new service - SciLite - that allows text-mined annotations to be displayed on research articles. The aim of this effort is to promote sustainability of curated databases by bridging the gap between literature and data. To this end, SciLite:
  • supports database curation processes, by highlighting biological concepts, making it easier to find key concepts described in articles;
  • provides a mechanism to link those concepts to the related resources, for efficient data integration. 
SciLite's open design enables community-driven annotations, from text-mining groups, and manual annotations, to be made available to Europe PMC users.

What are SciLite annotations?

SciLite annotations enable biological terms and concepts, such as genes/proteins, diseases, organisms and accession numbers, to be highlighted on full text articles in Europe PMC. Using the check-boxes on the right-hand side of article pages, readers can select the types of concepts that they are most interested in and matching annotations for that article will be highlighted on the article text as below. 







Clicking on the highlighted terms in the text opens a popup with a link to related database record. In the example below, an interactive Protein Data Bank (PDB) structure model is also provided when you click on a highlighted PDB accession number.


When annotations are provided by the text-mining community, the source of the annotation is displayed. In the example below, the Gene Reference into Function (GeneRIF) annotations were provided by the Text Mining group at HES-SO, Geneva, Switzerland.


Another text-mining group - NaCTeM - has provided phosphorylation event annotations, as shown below.



Take a look at this example article with annotations.
Annotations are displayed on articles with a CC-BY, CC-BY-NC or CC-0 license.

How are annotations generated?

The biological terms and concepts are identified by text mining algorithms, which are developed by a variety of text mining groups. Any text-mining group can participate in this scheme. Once concepts of interest have been identified within the text, they are formatted according to the W3C Web Annotation Data Model, and stored in a triple store via the EMBL-EBI RDF Platform. If you are a text miner, find out how to provide annotations to Europe PMC at our SciLite annotations page.

We're improving annotations with your help

Because annotations are generated automatically by text mining algorithms, we want to ensure that annotations are useful to users of Europe PMC. On each annotation there is the opportunity to provide feedback by either marking the annotation as incorrect, or endorsing useful annotations.
Screenshot showing feedback feature for an annotation
This information is fed back to the Europe PMC team and will be acted upon, helping to improve the annotations overall.


Thursday, 4 August 2016

Academy of Medical Sciences joins Europe PMC

We're delighted to announce that the Academy of Medical Sciences has joined Europe PMC as a new funder, bringing the total funders to 27.

The Academy of Medical Sciences represents the diverse spectrum of medical science – from basic research through clinical application to healthcare delivery. Their mission is to promote medical science and its translation into benefits for society.

Scientists and clinicians funded by the Academy of Medical Sciences will join thousands of others who make their published research articles freely available from Europe PMC as soon as possible, and in any event within six months of publication.

For more information about joining Europe PMC, visit our website:
http://europepmc.org/Joining