ISI (indra.sources.isi)

This module provides an input interface and processor to the ISI reading system.

The reader is set up to run within a Docker container. For the ISI reader to run, set the Docker memory and swap space to the maximum. For processing nxml files, install the nxml2txt utility (https://github.com/spyysalo/nxml2txt) and set the configuration variable NXML2TXT_PATH to its location. In addition, since the reader works with Python 2 only, make sure PYTHON2_PATH is set in your config file or environment and points to a Python 2 executable.

ISI API (indra.sources.isi.api)

indra.sources.isi.api.process_json_file(file_path, pmid=None, extra_annotations=None, add_grounding=True)[source]

Extracts statements from the given ISI output file.

Parameters:
  • file_path (str) – The ISI output file from which to extract statements
  • pmid (int) – The PMID of the document being preprocessed, or None if not specified
  • extra_annotations (dict) – Extra annotations to be added to each statement from this document (can be the empty dictionary)
  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
indra.sources.isi.api.process_nxml(nxml_filename, pmid=None, extra_annotations=None, cleanup=True, add_grounding=True)[source]

Process an NXML file using the ISI reader

First converts NXML to plain text and preprocesses it, then runs the ISI reader, and processes the output to extract INDRA Statements.

Parameters:
  • nxml_filename (str) – nxml file to process
  • pmid (Optional[str]) – pmid of this nxml file, to be added to the Evidence object of the extracted INDRA statements
  • extra_annotations (Optional[dict]) – Additional annotations to add to the Evidence object of all extracted INDRA statements. Extra annotations called ‘interaction’ are ignored since this is used by the processor to store the corresponding raw ISI output.
  • cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True
  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
Returns:

ip – A processor containing extracted Statements

Return type:

indra.sources.isi.processor.IsiProcessor

indra.sources.isi.api.process_output_folder(folder_path, pmids=None, extra_annotations=None, add_grounding=True)[source]

Recursively extracts statements from all ISI output files in the given directory and subdirectories.

Parameters:
  • folder_path (str) – The directory to traverse
  • pmids (Optional[str]) – PMID mapping to be added to the Evidence of the extracted INDRA Statements
  • extra_annotations (Optional[dict]) – Additional annotations to add to the Evidence object of all extracted INDRA statements. Extra annotations called ‘interaction’ are ignored since this is used by the processor to store the corresponding raw ISI output.
  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
indra.sources.isi.api.process_preprocessed(isi_preprocessor, num_processes=1, output_dir=None, cleanup=True, add_grounding=True)[source]

Process a directory of abstracts and/or papers preprocessed using the specified IsiPreprocessor, to produce a list of extracted INDRA statements.

Parameters:
  • isi_preprocessor (indra.sources.isi.preprocessor.IsiPreprocessor) – Preprocessor object that has already preprocessed the documents we want to read and process with the ISI reader
  • num_processes (Optional[int]) – Number of processes to parallelize over
  • output_dir (Optional[str]) – The directory into which to put reader output; if omitted or None, uses a temporary directory.
  • cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True
  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
Returns:

ip – A processor containing extracted statements

Return type:

indra.sources.isi.processor.IsiProcessor

indra.sources.isi.api.process_text(text, pmid=None, cleanup=True, add_grounding=True)[source]

Process a string using the ISI reader and extract INDRA statements.

Parameters:
  • text (str) – A text string to process
  • pmid (Optional[str]) – The PMID associated with this text (or None if not specified)
  • cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True
  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
Returns:

ip – A processor containing statements

Return type:

indra.sources.isi.processor.IsiProcessor

ISI Processor (indra.sources.isi.processor)

class indra.sources.isi.processor.IsiProcessor(reader_output, pmid=None, extra_annotations=None, add_grounding=True)[source]

Processes the output of the ISI reader.

reader_output

json – The output JSON of the ISI reader as a json object.

verbs

set[str] – A list of verbs that have appeared in the processed ISI output

pmid

str – The PMID to assign to the extracted Statements

extra_annotations

dict – Annotations to be included with each extracted Statement

statements

list[indra.statements.Statement] – Extracted statements

get_statements()[source]

Process reader output to produce INDRA Statements.