ISI (indra.sources.isi
)¶
This module provides an input interface and processor to the ISI reading system.
The reader is set up to run within a Docker container. For the ISI reader to run, set the Docker memory and swap space to the maximum.
ISI API (indra.sources.isi.api
)¶
- indra.sources.isi.api.process_json_file(file_path, pmid=None, extra_annotations=None, add_grounding=True, molecular_complexes_only=False)[source]¶
Extracts statements from the given ISI output file.
- Parameters
file_path (str) – The ISI output file from which to extract statements
pmid (int) – The PMID of the document being preprocessed, or None if not specified
extra_annotations (dict) – Extra annotations to be added to each statement from this document (can be the empty dictionary)
add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.
- indra.sources.isi.api.process_nxml(nxml_filename, pmid=None, extra_annotations=None, **kwargs)[source]¶
Process an NXML file using the ISI reader
First converts NXML to plain text and preprocesses it, then runs the ISI reader, and processes the output to extract INDRA Statements.
- Parameters
nxml_filename (str) – nxml file to process
pmid (Optional[str]) – pmid of this nxml file, to be added to the Evidence object of the extracted INDRA statements
extra_annotations (Optional[dict]) – Additional annotations to add to the Evidence object of all extracted INDRA statements. Extra annotations called ‘interaction’ are ignored since this is used by the processor to store the corresponding raw ISI output.
num_processes (Optional[int]) – Number of processes to parallelize over
cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True
add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.
- Returns
ip – A processor containing extracted Statements
- Return type
- indra.sources.isi.api.process_output_folder(folder_path, pmids=None, extra_annotations=None, add_grounding=True, molecular_complexes_only=False)[source]¶
Recursively extracts statements from all ISI output files in the given directory and subdirectories.
- Parameters
folder_path (str) – The directory to traverse
pmids (Optional[str]) – PMID mapping to be added to the Evidence of the extracted INDRA Statements
extra_annotations (Optional[dict]) – Additional annotations to add to the Evidence object of all extracted INDRA statements. Extra annotations called ‘interaction’ are ignored since this is used by the processor to store the corresponding raw ISI output.
add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.
- indra.sources.isi.api.process_preprocessed(isi_preprocessor, num_processes=1, output_dir=None, cleanup=True, add_grounding=True, molecular_complexes_only=False)[source]¶
Process a directory of abstracts and/or papers preprocessed using the specified IsiPreprocessor, to produce a list of extracted INDRA statements.
- Parameters
isi_preprocessor (indra.sources.isi.preprocessor.IsiPreprocessor) – Preprocessor object that has already preprocessed the documents we want to read and process with the ISI reader
num_processes (Optional[int]) – Number of processes to parallelize over
output_dir (Optional[str]) – The directory into which to put reader output; if omitted or None, uses a temporary directory.
cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True
add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.
- Returns
ip – A processor containing extracted statements
- Return type
- indra.sources.isi.api.process_text(text, pmid=None, **kwargs)[source]¶
Process a string using the ISI reader and extract INDRA statements.
- Parameters
text (str) – A text string to process
pmid (Optional[str]) – The PMID associated with this text (or None if not specified)
num_processes (Optional[int]) – Number of processes to parallelize over
cleanup (Optional[bool]) – If True, the temporary folders created for preprocessed reading input and output are removed. Default: True
add_grounding (Optional[bool]) – If True the extracted Statements’ grounding is mapped
molecular_complexes_only (Optional[bool]) – If True, only Complex statements between molecular entities are retained after grounding.
- Returns
ip – A processor containing statements
- Return type
ISI Processor (indra.sources.isi.processor
)¶
- class indra.sources.isi.processor.IsiProcessor(reader_output, pmid=None, extra_annotations=None, add_grounding=False)[source]¶
Processes the output of the ISI reader.
- Parameters
reader_output (json) – The output JSON of the ISI reader as a json object.
pmid (Optional[str]) – The PMID to assign to the extracted Statements
extra_annotations (Optional[dict]) – Annotations to be included with each extracted Statement
add_grounding (Optional[bool]) – If True, Gilda is used as a service to ground the Agents in the extracted Statements.