ISI (indra.sources.isi)

This module provides an input interface and processor to the ISI reading system.

The reader is set up to run within a Docker container. For the ISI reader to run, set the Docker memory and swap space to the maximum.

ISI API (indra.sources.isi.api)

indra.sources.isi.api.process_json_file(file_path, pmid=None, extra_annotations=None, add_grounding=True, molecular_complexes_only=False)[source]

Extracts statements from the given ISI output file.

Parameters
  • file_path (str) – The ISI output file from which to extract statements

  • pmid (int) – The PMID of the document being preprocessed, or None if not specified

  • extra_annotations (dict) – Extra annotations to be added to each statement from this document (can be the empty dictionary)

  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped

  • molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.

indra.sources.isi.api.process_nxml(nxml_filename, pmid=None, extra_annotations=None, **kwargs)[source]

Process an NXML file using the ISI reader

First converts NXML to plain text and preprocesses it, then runs the ISI reader, and processes the output to extract INDRA Statements.

Parameters
  • nxml_filename (str) – nxml file to process

  • pmid (Optional[str]) – pmid of this nxml file, to be added to the Evidence object of the extracted INDRA statements

  • extra_annotations (Optional[dict]) – Additional annotations to add to the Evidence object of all extracted INDRA statements. Extra annotations called ‘interaction’ are ignored since this is used by the processor to store the corresponding raw ISI output.

  • num_processes (Optional[int]) – Number of processes to parallelize over

  • cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True

  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped

  • molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.

Returns

ip – A processor containing extracted Statements

Return type

indra.sources.isi.processor.IsiProcessor

indra.sources.isi.api.process_output_folder(folder_path, pmids=None, extra_annotations=None, add_grounding=True, molecular_complexes_only=False)[source]

Recursively extracts statements from all ISI output files in the given directory and subdirectories.

Parameters
  • folder_path (str) – The directory to traverse

  • pmids (Optional[str]) – PMID mapping to be added to the Evidence of the extracted INDRA Statements

  • extra_annotations (Optional[dict]) – Additional annotations to add to the Evidence object of all extracted INDRA statements. Extra annotations called ‘interaction’ are ignored since this is used by the processor to store the corresponding raw ISI output.

  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped

  • molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.

indra.sources.isi.api.process_preprocessed(isi_preprocessor, num_processes=1, output_dir=None, cleanup=True, add_grounding=True, molecular_complexes_only=False)[source]

Process a directory of abstracts and/or papers preprocessed using the specified IsiPreprocessor, to produce a list of extracted INDRA statements.

Parameters
  • isi_preprocessor (indra.sources.isi.preprocessor.IsiPreprocessor) – Preprocessor object that has already preprocessed the documents we want to read and process with the ISI reader

  • num_processes (Optional[int]) – Number of processes to parallelize over

  • output_dir (Optional[str]) – The directory into which to put reader output; if omitted or None, uses a temporary directory.

  • cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True

  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped

  • molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.

Returns

ip – A processor containing extracted statements

Return type

indra.sources.isi.processor.IsiProcessor

indra.sources.isi.api.process_text(text, pmid=None, **kwargs)[source]

Process a string using the ISI reader and extract INDRA statements.

Parameters
  • text (str) – A text string to process

  • pmid (Optional[str]) – The PMID associated with this text (or None if not specified)

  • num_processes (Optional[int]) – Number of processes to parallelize over

  • cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True

  • add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped

  • molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.

Returns

ip – A processor containing statements

Return type

indra.sources.isi.processor.IsiProcessor

ISI Processor (indra.sources.isi.processor)

class indra.sources.isi.processor.IsiProcessor(reader_output, pmid=None, extra_annotations=None, add_grounding=False)[source]

Processes the output of the ISI reader.

Parameters
  • reader_output (json) – The output JSON of the ISI reader as a json object.

  • pmid (Optional[str]) – The PMID to assign to the extracted Statements

  • extra_annotations (Optional[dict]) – Annotations to be included with each extracted Statement

  • add_grounding (Optional[bool]) – If True, Gilda is used as a service to ground the Agents in the extracted Statements.

verbs

A list of verbs that have appeared in the processed ISI output

Type

set[str]

statements

Extracted statements

Type

list[indra.statements.Statement]

get_statements()[source]

Process reader output to produce INDRA Statements.

retain_molecular_complexes()[source]

Filter the statements to Complexes between molecular entities.