Literature clients (indra.literature)

indra.literature.get_full_text(paper_id, idtype, preferred_content_type='text/xml')[source]

Return the content and the content type of an article.

This function retreives the content of an article by its PubMed ID, PubMed Central ID, or DOI. It prioritizes full text content when available and returns an abstract from PubMed as a fallback.

Parameters:
  • paper_id (string) – ID of the article.
  • idtype ('pmid', 'pmcid', or 'doi) – Type of the ID.
  • preferred_content_type (Optional[st]r) – Preference for full-text format, if available. Can be one of ‘text/xml’, ‘text/plain’, ‘application/pdf’. Default: ‘text/xml’
Returns:

  • content (str) – The content of the article.
  • content_type (str) – The content type of the article

indra.literature.id_lookup(paper_id, idtype)[source]

Take an ID of type PMID, PMCID, or DOI and lookup the other IDs.

If the DOI is not found in Pubmed, try to obtain the DOI by doing a reverse-lookup of the DOI in CrossRef using article metadata.

Parameters:
  • paper_id (string) – ID of the article.
  • idtype ('pmid', 'pmcid', or 'doi) – Type of the ID.
Returns:

ids – A dictionary with the following keys: pmid, pmcid and doi.

Return type:

dict

Pubmed client (indra.literature.pubmed_client)

Search and get metadata for articles in Pubmed.

indra.literature.pubmed_client.expand_pagination(pages)[source]

Convert a page number to long form, e.g., from 456-7 to 456-457.

indra.literature.pubmed_client.get_abstract(pubmed_id, prepend_title=True)[source]

Get the abstract of an article in the Pubmed database.

indra.literature.pubmed_client.get_article_xml[source]

Get the XML metadata for a single article from the Pubmed database.

indra.literature.pubmed_client.get_ids[source]

Search Pubmed for paper IDs given a search term.

The options are passed as named arguments. For details on parameters that can be used, see https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch Some useful parameters to pass are db=’pmc’ to search PMC instead of pubmed reldate=2 to search for papers within the last 2 days mindate=‘2016/03/01’, maxdate=‘2016/03/31’ to search for papers in March 2016.

indra.literature.pubmed_client.get_ids_for_gene[source]

Get the curated set of articles for a gene in the Entrez database.

Search parameters for the Gene database query can be passed in as keyword arguments.

Parameters:hgnc_name (string) – The HGNC name of the gene. This is used to obtain the HGNC ID (using the hgnc_client module) and in turn used to obtain the Entrez ID associated with the gene. Entrez is then queried for that ID.
indra.literature.pubmed_client.get_issns_for_journal[source]

Get a list of the ISSN numbers for a journal given its NLM ID.

Structure of the XML output returned by the NLM Catalog query:

NLMCatalogRecordSet
  NLMCatalogRecord
    NlmUniqueID
    DateCreated
    DateRevised
    DateAuthorized
    DateCompleted
    DateRevisedMajor
    TitleMain
    MedlineTA
    TitleAlternate +
    AuthorList
    ResourceInfo
      TypeOfResource
      Issuance
      ResourceUnit
    PublicationTypeList
    PublicationInfo
      Country
      PlaceCode
      Imprint
      PublicationFirstYear
      PublicationEndYear
    Language
    PhysicalDescription
    IndexingSourceList
      IndexingSource
        IndexingSourceName
        Coverage
    GeneralNote +
    LocalNote
    MeshHeadingList
    Classification
    ELocationList
    LCCN
    ISSN +
    ISSNLinking
    Coden
    OtherID +
indra.literature.pubmed_client.get_metadata_for_ids(pmid_list, get_issns_from_nlm=False)[source]

Get article metadata for up to 200 PMIDs from the Pubmed database.

Parameters:
  • pmid_list (list of PMIDs as strings) – Can contain 1-200 PMIDs.
  • get_issns_from_nlm (boolean) – Look up the full list of ISSN number for the journal associated with the article, which helps to match articles to CrossRef search results. Defaults to False, since it slows down performance.
Returns:

Contains the following fields: ‘doi’, ‘title’, ‘authors’, ‘journal_title’, ‘journal_abbrev’, ‘journal_nlm_id’, ‘issn_list’, ‘page’.

Return type:

dict

indra.literature.pubmed_client.get_title(pubmed_id)[source]

Get the title of an article in the Pubmed database.

Pubmed Central client (indra.literature.pmc_client)

indra.literature.pmc_client.filter_pmids(pmid_list, source_type)[source]

Filter a list of PMIDs for ones with full text from PMC.

Parameters:
  • pmid_list (list) – List of PMIDs to filter.
  • source_type (string) – One of ‘fulltext’, ‘oa_xml’, ‘oa_txt’, or ‘auth_xml’.
Returns:

Return type:

list of PMIDs available in the specified source/format type.

indra.literature.pmc_client.id_lookup(paper_id, idtype=None)[source]

This function takes a Pubmed ID, Pubmed Central ID, or DOI and use the Pubmed ID mapping service and looks up all other IDs from one of these. The IDs are returned in a dictionary.

CrossRef client (indra.literature.crossref_client)

indra.literature.crossref_client.doi_query(pmid, search_limit=10)[source]

Get the DOI for a PMID by matching CrossRef and Pubmed metadata.

Searches CrossRef using the article title and then accepts search hits only if they have a matching journal ISSN and page number with what is obtained from the Pubmed database.

Return a list of links to the full text of an article given its DOI. Each list entry is a dictionary with keys: - URL: the URL to the full text - content-type: e.g. text/xml or text/plain - content-version - intended-application: e.g. text-mining

indra.literature.crossref_client.get_metadata[source]

Returns the metadata of an article given its DOI from CrossRef as a JSON dict

Elsevier client (indra.literature.elsevier_client)

For information on the Elsevier API, see:
indra.literature.elsevier_client.download_article(doi)[source]

Download an article in XML format from Elsevier.

indra.literature.elsevier_client.get_abstract(doi)[source]

Get the abstract of an article from Elsevier.

indra.literature.elsevier_client.get_article(doi, output='txt')[source]

Get the full body of an article from Elsevier. There are two output modes: ‘txt’ strips all xml tags and joins the pieces of text in the main text, while ‘xml’ simply takes the tag containing the body of the article and returns it as is . In the latter case, downstream code needs to be able to interpret Elsever’s XML format.

indra.literature.elsevier_client.get_dois[source]

Search ScienceDirect through the API for articles.

See http://api.elsevier.com/content/search/fields/scidir for constructing a query string to pass here. Example: ‘abstract(BRAF) AND all(“colorectal cancer”)’