Friday, 17 October 2014

Evidence Finder: testing, testing!

Evidence Finder (EvF) provides a new way of searching Europe PMC that will help you find the most relevant articles more quickly. By surfacing “facts” based on biological concepts, EvF enhances the Europe PMC search, targeting relevant sentences from within full text articles. Over the next few weeks, we will be running an experiment on the Europe PMC website that will incorporate EvF and explore how it is used.


How EvF works on the Europe PMC website
 
When you search for a  gene, protein, disease or metabolite in Europe PMC, a panel of questions relevant to the query will appear next to the search results (left: EvF questions generated in response to a search for malaria).








Clicking on a question will elicit a search for assertions related to that question from the full text content of articles in Europe PMC. Relevant sentences from the articles will be displayed right on the search results page so you can quickly assess the relevance of search results (below:  EvF results in answer to the question “What causes malaria?).

screenshot.png

The Experiment

EvF is developed by NaCTeM, and has been available via the Europe PMC labs site since February 2012, for a six-week period (from 15th October 2014 to 26th November 2014), EvF questions will appear on the full text search results page of the main Europe PMC website. The goal of this experiment is to provide the opportunity for Europe PMC users to try EvF in the context of their normal searching. After this trial period, we will analyse the use of EvF, and if it is popular and useful, this functionality could be incorporated into the Europe PMC website.

Try these examples to get started using EvF:

RAF1 | diabetes | COPD | bevacizumab | hypertension

“Evidence Finder pushes the limits of searching the scientific literature," says Jo McEntyre, head of Literature Services at EMBL-EBI. "There is so much information buried deep within research articles;  we have to constantly invent better ways to capture and capitalise on these insights. Now, in the era of big data and open science, we can be more innovative than ever before."

Feedback


We are very interested in any feedback you would like to give us on your experiences using Evidence Finder, and would love to hear from you: please use the blue Feedback tab at the bottom of every Europe PMC page, post a comment on this blog entry, or email us at helpdesk@europepmc.org.

Furthermore, if you are a text or data miner who has developed an algorithm or application that adds value to the full text open access content, and would like to explore how to link up with Europe PMC, we would like to hear from you.

Further information about EvF can be found in the FAQ:


Stay in touch with what's happening at Europe PMC by following us on Twitter: @EuropePMC_News

Friday, 26 September 2014

Linking grants to your publication

It is important that you add all appropriate grant information to relevant papers on Europe PMC to support:
  • Grant Reporting
  • Compliance with Funder open access policies
  • Data Consistency    
  • Resource Discovery
  • Author Claiming
There are a variety of routes that enable grants to be linked to papers in Europe PMC.  We have outlined these below to enable authors and Funders to choose the most appropriate route(s) to ensure that the grants used to support their research are correctly attributed:

1. Using the Europe PMC plus Grant Linking module
This option is only available to PIs (Principal Investigators) of grants from the Europe PMC Funders. Log in to Europe PMC plus and select the Grant Linking tab to associate papers with your grants.







Note: Linking grants to PubMed papers does not signify compliance to your Funder’s mandate. Full text versions must either be deposited directly by the Publisher or self-archived. See http://europepmc.org/FAQs#respubs for more information.

2. When self-archiving author manuscripts via Europe PMC plus
All appropriate associations can be made as part of the submission process of Europe PMC Funder grantholder research articles.

















3. Bulk grant linking
This option is open to members of the Europe PMC Funders group, who can submit a list of articles (identified by PubMed ID) and grant IDs. We will then create the appropriate grant-article associations, which will display in both Europe PMC and PubMed.

4. When the paper is indexed by NLM indexers
This option only applies where the research has been supported by one of the original eight funders of Europe PMC. Grants will automatically be added to PubMed papers during the indexing process if the grant has been correctly acknowledged in the paper.

5. Direct pipelines
Grant-article associations are collated from various sources, including Researchfish; all associations are applied to Europe PMC and validated ones pushed through to PubMed.


You can search Europe PMC using a grant ID to find all papers associated with a particular grant number, using either the advanced search form, or the the following search syntax:



Europe PMC also has a Grant Lookup Tool which holds detailed, consolidated grant information across all Funders. The data is also available via an API.

This post is by Rob Rowbotham, Europe PMC Helpdesk Manager.
For more information, please contact the Europe PMC Helpdesk at:
helpdesk@EuropePMC.org

Stay in touch with what's happening at Europe PMC by following us on Twitter: @EuropePMC_News

Wednesday, 20 August 2014

Using Europe PMC’s export options as research funder

Recently Europe PMC released new export format options, to help users get Europe PMC’s wealth of metadata (and our open access papers) into the file formats they like to use.


We’ve already posted about how useful this function is for researchers, but it’s great for research funders too. Funders can convert their carefully constructed search into a tab separated file and use Excel to manipulate the metadata as they wish.



Funders can find out how many papers acknowledging their funding have been made open access, what research areas papers are being published in, or which grants papers are linked to, among other things!



Wednesday, 6 August 2014

New export options now available: use Europe PMC to populate your bibliographic references in EndNote, Reference Manager etc.

Europe PMC has released new export format options, indicated in the image below:



The RIS export format is typically used by Reference Manager and EndNote bibliographic applications for example, so you can now easily import Europe PMC citations.

You can also email citations to yourself, or others, by selecting this destination option from the Export menu and filling in the required address fields.

Identifying articles and populating your reference list now just got easier using Europe PMC.

More information can be found here: http://europepmc.org/FAQs#Export_Citations


Stay up-to-date with Europe PMC news by following us on Twitter, @EuropePMC_News




Friday, 18 July 2014

Taylor & Francis Open Access Survey: translating values into licences

Guest post from Alex Green, Transformation Project Co-ordinator, Wellcome Trust

Last month saw the publication of the 2014 Taylor & Francis Open Access Survey. Combining responses from just over 7,900 authors who published with Taylor & Francis in 2012 (9% of the total), this represents the opinions of authors from across the world in roughly the proportions they have published with Taylor & Francis – although my inner data geek really wants to get hold of the full dataset to apply some weighting to the under-represented authors of East and South East Asia.

The Taylor & Francis survey shows strong support for open access (OA) publishing by authors and a clear belief in its benefits compared with traditional publication, which has increased from the 2013 survey. 70% of respondents also disagreed or strongly disagreed with the statement ‘There are no fundamental benefits to open access publication’, up 10% from 2013.

However, when read alongside this evidence of increasing widespread positive attitudes to OA publishing, the findings on licensing are striking. In particular, the preference for more restrictive licenses seems at odds with the attitudes and values expressed by authors in response to the Section 1 questions on that topic. For example, in Question 5, 71% of authors were happy for their work to be re-used without prior knowledge or permission for non-commercial gain provided they receive attribution. This is equivalent to a CC-BY-NC licence, but only 18% of authors selected this licence as their first or second choice when answering Question 6 (see graphic below).

Question 5


Question 6

This mismatch between author attitudes and licence preference seems common to nearly all responses. Again in Question 5 authors seem concerned about commercial reuse of their work without prior knowledge, with 65% disagreeing that it is acceptable. However in Question 6, 47% selected Copyright Assignment as their first or second preference for licensing making it the second most popular licence along with Exclusive License to Publish. Having assigned copyright away, authors would have little control over any commercial reuse, not least by the publisher they have assigned the copyright to.

There are several hypotheses we could explore to account for this seeming contradiction. Looking down the results for Question 6 there is a clear drop off towards the bottom of the list for ‘preferred licences’. It would be interesting to know if these options were always presented in the survey in the same order that they are presented in the report. If so, we may be seeing response order effects (Krosnick and Alwyn 1987), but unfortunately full details of the methodology are not openly available along with the report. 

The contradictions between responses may also be partly due to a degree of satisficing, an effect where respondents choose adequate answers rather than optimal answers because it is easier to pick the first acceptable, or most familiar, option than fully evaluate all options (Simon 1957, Krosnick 1991). This could be leading to selection of the more familiar Copyright Assignment and Exclusive License to Publish over the various Creative Commons licence combinations.

Finally, it’s probably relevant to note that the definition boxes provided to respondents at the start of the survey gave explanations of different modes of OA publishing, repositories and text- and data-mining but not the differences between the licences. These aren’t always easy to grasp, especially when completing a survey at speed – I don’t mind admitting that I had to check with a colleague on the exact differences between the various licence options. Perhaps, as Dr David Green, Global Journals Publishing Director said in the Taylor & Francis press release, there is still ‘much work left to do in simplifying our policies and documentation so that our author communities are in no doubt as to what their OA options are’.
 

Monday, 7 July 2014

The right to read is the right to mine: Text and data mining copyright exceptions introduced in the UK.

New copyright exceptions to text and data mining for non-commercial research have recently come into effect and this is welcome news for UK researchers and research, argues Ross Mounce. Here he provides a brief overview of the past issues discouraging text and data mining and what the future holds now that these exceptions have been introduced. But despite legal barriers being removed, many technical barriers still remain. Furthermore it remains to be decided what formally constitutes ‘non-commercial’ research.

After eight long years including not one but two expert-led reviews of intellectual property; new copyright exceptions, some of which in particular will enable and empower UK academic research came into force on June 1st 2014. All disciplines are set to benefit from this: the humanities, the social sciences, science, technology and medicine.
Of particular interest to myself and other researchers is the ‘Exception for copying of works for use by text and data analytics’. In order to understand why this is so important, let me take you back to how it was before the copyright exception came into force (and how the legal situation still is for researchers in most other European countries):

Content Mining: mining one or more types of media for information; media as data (Image credit: DeclanTM (Flickr, CC BY)
The situation before the copyright exception
Before this exception came into force in the UK, for subscription-access content, you’d essentially have to ask permission from the publisher, before you started analysing. If you proceeded, without permission, to download electronic copies of ‘their’ copyrighted materials (see author’s note - bottomen masse for analysis, you would be infringing ‘their’ copyright – it would be illegal, and they could take legal action against you, even if your analysis was undertaken for non-commercial academic research purposes. Depending on the exact subscription-access agreement held with your institution (of which your institution may not be able to disclose the details of, because of confidentiality clauses!), the publisher could even ask for additional fees to be paid to cover this ‘additional’ type of usage if it is not covered in the subscription agreement. Many agreements did & and still do explicitly prohibit text and data mining.

If one did ask for permission, the process was complex and lengthy, involving many employees, and much bureaucracy, for each publisher. That’s if the publisher agrees to give permission. In a study by the Publishing Research Consortium it was found that “only 35% of the respondents [publishers] state that permission is granted in the majority or in 100% of the cases for all requests” (p. 106) – and that sample of publishers included open access publishers that by definition, allow mining. Thus publishers can and have denied permission for content mining research on ‘their’ works.

The situation after the introduction of the UK copyright exception for TDM
After June 1st 2014, for research conducted in the UK, under the jurisdiction of UK law, for ‘non-commercial’ research purposes (more of which later…), the new copyright exception overrides anything in subscription contracts that prohibits content mining. As Peter Murray-Rust puts it: The Right To Read Is The Right To Mine, and provided you are in the UK, and doing ‘non-commercial’ research that is now true, and legal. This provides welcome and useful protection for researchers against litigious publishers.
No researcher, doing ‘non-commercial’ research in the UK, needs to agree to, nor abide by the terms of any text and data mining ‘licence’ that publishers may wish to impose upon researchers.

Schemes such as CrossRef’s text and data mining services will be heavily advertised to researchers by the major publishers, in order to try and control the way in which researchers do content mining both through legal means (the licencing) and technical means (the API). The use of such services entails agreeing to detailed and lengthy licencing agreements, which many probably won’t read. If you do read the full terms and conditions you’ll find them disappointingly limiting which is why organisations such as LIBER have publicly criticized these terms.

Even with some legal barriers now removed, technical barriers remain
Despite legal barriers being removed, non-trivial technical barriers still remain which can be problematic for content mining. Most websites for instance have rate limits. If you are detected attempting to crawl or scrape too many pages (i.e. research articles) within too short a time-span, your access to that website may be blocked. Publishers such as BioMed Central (BMC) have a crawl rate limit of one article per second which is an acceptable rate limit for researchers. Through Elsevier’s text-mining API there’s a limit of 10,000 articles per week which is equivalent to a rate limit of 1 article every 60 seconds. At that rate limit it would take ~21 years to go through all 11 million articles that Elsevier control access-to through their Science Direct platform – not really feasible! The rate limit imposed is entirely artificial – researchers with good internet connections can crawl many articles per second if they were allowed to. The publisher sets the rate ‘allowed’ and even despite this new copyright exception, to get the rate-limit changed a researcher would still have to beg permission from the publisher, which the publisher is fully within their rights to either grant or not.

Open Access publishers tend to be exceedingly helpful to content miners: BMC, Hindawi, and MDPI to name but a few, make available whole content dumps (i.e. everything they publish), openly available to download by anyone for any purpose which greatly facilitates content mining. For biomedical researchers, the PubMed Central Open Access Subset and Europe PMC also allow downloading of full-text dumps but these are limited to CC BY papers only (another reason of many why CC BY is the preferred licence of open access publishers).

Other less-helpful publishers sometimes pay money to employ external firms like Atypon to populate their websites with booby-trapped links that block access to the entire subscribing institution if clicked. These links called ‘spider-trap’ links, inevitably end-up doing more harm than good as in the recent #ACSgate debacle whereby over 200 institutions had their access to one publisher’s content blocked by people innocently clicking on a DOI-like URL that was openly available on the publisher’s website.

Image Source Shutterstock Copyright: dmitriyGo
Why do these publishers dislike crawling and scraping so much? Scraping the web is a normal, legitimate activity for researchers; even a recent European Commission report says so:
‘Scraping’ the World-wide web for data is today a familiar activity for the digitally literate researcher. (p. 11)

With over 50 million scholarly articles out there, and millions being published each and every year in popular fields like biomedical science; content mining is fast-becoming a necessity. Human-eyes can only read so much. Computers, and computational techniques to help us comprehensively and rigorously mine the literature are a boon for research. One expert report on the state of content mining argues that “European academics are falling behind their Asian and North American counterparts” – this new copyright exception will thus help the competitiveness of UK research in the global sphere.

The only nagging question remaining to address is: what is ‘non-commercial’?
I won’t pretend to give a convincing answer to this. I simply don’t know, and I can see it being a potentially difficult sticking point for many.

For my own research on extracting data from evolutionary tree figures (phylogeny), I can feel safe that this subject and use-case might not readily be definiable as ‘commercial’ but for other researchers I can imagine it may not be so easy to safely & surely classify their research as ‘non-commercial’. Indeed a recent court in case in Germany seemed to indicate that ‘non-commercial’ use was only safely equivalent to personal-use. The consequences, risks and side-effects of ‘non-commercial’ remain largely untested in case law and can prevent much more usage than you might think. Will publishers be eager to sue academic researchers for what they perceive to be commercial mining? I hope not, but sadly it would not surprise me if they did.

Author’s Note: I feel pained to discuss the copyright owned by publishers, over work written by academics hence the inverted commas when discussing ‘their’ copyright. Part of the reason academia got into this copyright-pickle in the first place is that we allowed publishers (and still do for some!), to take copyright away from the authors with completely unnecessary copyright transfer agreements (CTA’s). Publishers do NOT need a CTA to publish your work, so don’t sign them! You can instead retain your copyright over your work, and just give them a non-exclusive license to publish. Keep your copyright!

Disclaimer & Warning: None of this article constitutes formal, vetted legal advice and should not be relied on or treated as a substitute for specific advice relevant to particular circumstances. Academic publishers and even societies can, and do take legal action against research-related activities, if they feel so inclined.

About the Author
Dr Ross Mounce is a BBSRC-funded postdoc at the University of Bath, working on the PLUTo project to liberate phyloinformatic data from the literature. He is working with The Content Mine team to encourage the adoption and use of content mining tools and techniques, including giving a workshop at this year’s Open Knowledge Festival 2014 (Berlin, July). As a keen advocate for open scholarship, you can also find him at OpenCon 2014 (Washington D.C., November) – the student and early career researcher conference on Open Access, Open Education and Open Data.


This piece originally appeared on The London School of Economics and Political Science's Impact Blog under a Creative Commons CC BY license. One of the images has been replaced, and another omitted (please see individual images for copyright restrictions). The article gives the views of the author.