TEES (indra.sources.tees)

The TEES processor requires an installaton of TEES. To install TEES:

  1. Clone the latest stable version of TEES using

    git clone https://github.com/jbjorne/TEES.git

  2. Put this TEES cloned repository in one of these three places: the same directory as INDRA, your home directory, or ~/Downloads. If you put TEES in a location other than one of these three places, you will need to pass this directory to indra.sources.tees.api.process_text each time you call it.

  3. Run configure.py within the TEES installation to install TEES dependencies.

TEES API (indra.sources.tees.api)

This module provides a simplified API for invoking the Turku Event Extraction System (TEES) on text and extracting INDRA statement from TEES output.

See publication: Jari Björne, Sofie Van Landeghem, Sampo Pyysalo, Tomoko Ohta, Filip Ginter, Yves Van de Peer, Sofia Ananiadou and Tapio Salakoski, PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations. Proceedings of BioNLP 2012, pages 82-90, 2012.

indra.sources.tees.api.extract_output(output_dir)[source]

Extract the text of the a1, a2, and sentence segmentation files from the TEES output directory. These files are located within a compressed archive.

Parameters

output_dir (str) – Directory containing the output of the TEES system

Returns

  • a1_text (str) – The text of the TEES a1 file (specifying the entities)

  • a2_text (str) – The text of the TEES a2 file (specifying the event graph)

  • sentence_segmentations (str) – The text of the XML file specifying the sentence segmentation

indra.sources.tees.api.process_text(text, pmid=None, python2_path=None)[source]

Processes the specified plain text with TEES and converts output to supported INDRA statements. Check for the TEES installation is the TEES_PATH environment variable, and configuration file; if not found, checks candidate paths in tees_candidate_paths. Raises an exception if TEES cannot be found in any of these places.

Parameters
  • text (str) – Plain text to process with TEES

  • pmid (str) – The PMID from which the paper comes from, to be stored in the Evidence object of statements. Set to None if this is unspecified.

  • python2_path (str) – TEES is only compatible with python 2. This processor invokes this external python 2 interpreter so that the processor can be run in either python 2 or python 3. If None, searches for an executible named python2 in the PATH environment variable.

Returns

tp – A TEESProcessor object which contains a list of INDRA statements extracted from TEES extractions

Return type

TEESProcessor

indra.sources.tees.api.run_on_text(text, python2_path)[source]

Runs TEES on the given text in a temporary directory and returns a temporary directory with TEES output.

The caller should delete this directory when done with it. This function runs TEES and produces TEES output files but does not process TEES output into INDRA statements.

Parameters
  • text (str) – Text from which to extract relationships

  • python2_path (str) – The path to the python 2 interpreter

Returns

output_dir – Temporary directory with TEES output. The caller should delete this directgory when done with it.

Return type

str

TEES Processor (indra.sources.tees.processor)

This module takes the TEES parse graph generated by parse_tees and converts it into INDRA statements.

See publication: Jari Björne, Sofie Van Landeghem, Sampo Pyysalo, Tomoko Ohta, Filip Ginter, Yves Van de Peer, Sofia Ananiadou and Tapio Salakoski, PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations. Proceedings of BioNLP 2012, pages 82-90, 2012.

class indra.sources.tees.processor.TEESProcessor(a1_text, a2_text, sentence_segmentations, pmid)[source]

Converts the output of the TEES reader to INDRA statements.

Only extracts a subset of INDRA statements. Currently supported statements are: * Phosphorylation * Dephosphorylation * Binding * IncreaseAmount * DecreaseAmount

Parameters
  • a1_text (str) – The TEES a1 output file, with entity information

  • a2_text (str) – The TEES a2 output file, with the event graph

  • sentence_segmentations (str) – The TEES sentence segmentation XML output

  • pmid (int) – The pmid which the text comes from, or None if we don’t want to specify at the moment. Stored in the Evidence object for each statement.

statements

A list of INDRA statements extracted from the provided text via TEES

Type

list[indra.statements.Statement]

connected_subgraph(node)[source]

Returns the subgraph containing the given node, its ancestors, and its descendants.

Parameters

node (str) – We want to create the subgraph containing this node.

Returns

subgraph – The subgraph containing the specified node.

Return type

networkx.DiGraph

find_event_parent_with_event_child(parent_name, child_name)[source]

Finds all event nodes (is_event node attribute is True) that are of the type parent_name, that have a child event node with the type child_name.

find_event_with_outgoing_edges(event_name, desired_relations)[source]

Gets a list of event nodes with the specified event_name and outgoing edges annotated with each of the specified relations.

Parameters
  • event_name (str) – Look for event nodes with this name

  • desired_relations (list[str]) – Look for event nodes with outgoing edges annotated with each of these relations

Returns

event_nodes – Event nodes that fit the desired criteria

Return type

list[str]

general_node_label(node)[source]

Used for debugging - gives a short text description of a graph node.

get_entity_text_for_relation(node, relation)[source]

Looks for an edge from node to some other node, such that the edge is annotated with the given relation. If there exists such an edge, and the node at the other edge is an entity, return that entity’s text. Otherwise, returns None.

Looks for an edge from node to some other node, such that the edge is annotated with the given relation. If there exists such an edge, returns the name of the node it points to. Otherwise, returns None.

node_has_edge_with_label(node_name, edge_label)[source]

Looks for an edge from node_name to some other node with the specified label. Returns the node to which this edge points if it exists, or None if it doesn’t.

Parameters
  • G – The graph object

  • node_name – Node that the edge starts at

  • edge_label – The text in the relation property of the edge

node_to_evidence(entity_node, is_direct)[source]

Computes an evidence object for a statement.

We assume that the entire event happens within a single statement, and get the text of the sentence by getting the text of the sentence containing the provided node that corresponds to one of the entities participanting in the event.

The Evidence’s pmid is whatever was provided to the constructor (perhaps None), and the annotations are the subgraph containing the provided node, its ancestors, and its descendants.

print_parent_and_children_info(node)[source]

Used for debugging - prints a short description of a a node, its children, its parents, and its parents’ children.

process_binding_statements()[source]

Looks for Binding events in the graph and extracts them into INDRA statements.

In particular, looks for a Binding event node with outgoing edges with relations Theme and Theme2 - the entities these edges point to are the two constituents of the Complex INDRA statement.

process_decrease_expression_amount()[source]

Looks for Negative_Regulation events with a specified Cause and a Gene_Expression theme, and processes them into INDRA statements.

process_increase_expression_amount()[source]

Looks for Positive_Regulation events with a specified Cause and a Gene_Expression theme, and processes them into INDRA statements.

process_phosphorylation_statements()[source]

Looks for Phosphorylation events in the graph and extracts them into INDRA statements.

In particular, looks for a Positive_regulation event node with a child Phosphorylation event node.

If Positive_regulation has an outgoing Cause edge, that’s the subject If Phosphorylation has an outgoing Theme edge, that’s the object If Phosphorylation has an outgoing Site edge, that’s the site

indra.sources.tees.processor.s2a(s)[source]

Makes an Agent from a string describing the agent.