Preassembler (indra.preassembler)

class indra.preassembler.Preassembler(hierarchies, stmts=None, matches_fun=None, refinement_fun=None)[source]

De-duplicates statements and arranges them in a specificity hierarchy.

Parameters:
  • hierarchies (dict[indra.preassembler.hierarchy_manager]) – A dictionary of hierarchies with keys such as ‘entity’ (hierarchy of entities, primarily specifying relationships between genes and their families) and ‘modification’ pointing to HierarchyManagers
  • stmts (list of indra.statements.Statement or None) – A set of statements to perform pre-assembly on. If None, statements should be added using the add_statements() method.
  • matches_fun (Optional[function]) – A functon which takes a Statement object as argument and returns a string key that is used for duplicate recognition. If supplied, it overrides the use of the built-in matches_key method of each Statement being assembled.
  • refinement_fun (Optional[function]) – A function which takes two Statement objects and a hierarchies dict as an argument and returns True or False. If supplied, it overrides the built-in refinement_of method of each Statement being assembled.
stmts

Starting set of statements for preassembly.

Type:list of indra.statements.Statement
unique_stmts

Statements resulting from combining duplicates.

Type:list of indra.statements.Statement
related_stmts

Top-level statements after building the refinement hierarchy.

Type:list of indra.statements.Statement
hierarchies

A dictionary of hierarchies with keys such as ‘entity’ and ‘modification’ pointing to HierarchyManagers

Type:dict[indra.preassembler.hierarchy_manager]
add_statements(stmts)[source]

Add to the current list of statements.

Parameters:stmts (list of indra.statements.Statement) – Statements to add to the current list.
combine_duplicate_stmts(stmts)[source]

Combine evidence from duplicate Statements.

Statements are deemed to be duplicates if they have the same key returned by the matches_key() method of the Statement class. This generally means that statements must be identical in terms of their arguments and can differ only in their associated Evidence objects.

This function keeps the first instance of each set of duplicate statements and merges the lists of Evidence from all of the other statements.

Parameters:stmts (list of indra.statements.Statement) – Set of statements to de-duplicate.
Returns:Unique statements with accumulated evidence across duplicates.
Return type:list of indra.statements.Statement

Examples

De-duplicate and combine evidence for two statements differing only in their evidence lists:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> map2k1 = Agent('MAP2K1')
>>> mapk1 = Agent('MAPK1')
>>> stmt1 = Phosphorylation(map2k1, mapk1, 'T', '185',
... evidence=[Evidence(text='evidence 1')])
>>> stmt2 = Phosphorylation(map2k1, mapk1, 'T', '185',
... evidence=[Evidence(text='evidence 2')])
>>> pa = Preassembler(hierarchies)
>>> uniq_stmts = pa.combine_duplicate_stmts([stmt1, stmt2])
>>> uniq_stmts
[Phosphorylation(MAP2K1(), MAPK1(), T, 185)]
>>> sorted([e.text for e in uniq_stmts[0].evidence]) # doctest:+IGNORE_UNICODE
['evidence 1', 'evidence 2']
combine_duplicates()[source]

Combine duplicates among stmts and save result in unique_stmts.

A wrapper around the method combine_duplicate_stmts().

Connect related statements based on their refinement relationships.

This function takes as a starting point the unique statements (with duplicates removed) and returns a modified flat list of statements containing only those statements which do not represent a refinement of other existing statements. In other words, the more general versions of a given statement do not appear at the top level, but instead are listed in the supports field of the top-level statements.

If unique_stmts has not been initialized with the de-duplicated statements, combine_duplicates() is called internally.

After this function is called the attribute related_stmts is set as a side-effect.

The procedure for combining statements in this way involves a series of steps:

  1. The statements are grouped by type (e.g., Phosphorylation) and each type is iterated over independently.
  2. Statements of the same type are then grouped according to their Agents’ entity hierarchy component identifiers. For instance, ERK, MAPK1 and MAPK3 are all in the same connected component in the entity hierarchy and therefore all Statements of the same type referencing these entities will be grouped. This grouping assures that relations are only possible within Statement groups and not among groups. For two Statements to be in the same group at this step, the Statements must be the same type and the Agents at each position in the Agent lists must either be in the same hierarchy component, or if they are not in the hierarchy, must have identical entity_matches_keys. Statements with None in one of the Agent list positions are collected separately at this stage.
  3. Statements with None at either the first or second position are iterated over. For a statement with a None as the first Agent, the second Agent is examined; then the Statement with None is added to all Statement groups with a corresponding component or entity_matches_key in the second position. The same procedure is performed for Statements with None at the second Agent position.
  4. The statements within each group are then compared; if one statement represents a refinement of the other (as defined by the refinement_of() method implemented for the Statement), then the more refined statement is added to the supports field of the more general statement, and the more general statement is added to the supported_by field of the more refined statement.
  5. A new flat list of statements is created that contains only those statements that have no supports entries (statements containing such entries are not eliminated, because they will be retrievable from the supported_by fields of other statements). This list is returned to the caller.

On multi-core machines, the algorithm can be parallelized by setting the poolsize argument to the desired number of worker processes. This feature is only available in Python > 3.4.

Note

Subfamily relationships must be consistent across arguments

For now, we require that merges can only occur if the isa relationships are all in the same direction for all the agents in a Statement. For example, the two statement groups: RAF_family -> MEK1 and BRAF -> MEK_family would not be merged, since BRAF isa RAF_family, but MEK_family is not a MEK1. In the future this restriction could be revisited.

Parameters:
  • return_toplevel (Optional[bool]) – If True only the top level statements are returned. If False, all statements are returned. Default: True
  • poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.
  • size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
Returns:

The returned list contains Statements representing the more concrete/refined versions of the Statements involving particular entities. The attribute related_stmts is also set to this list. However, if return_toplevel is False then all statements are returned, irrespective of level of specificity. In this case the relationships between statements can be accessed via the supports/supported_by attributes.

Return type:

list of indra.statement.Statement

Examples

A more general statement with no information about a Phosphorylation site is identified as supporting a more specific statement:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(hierarchies, [st1, st2])
>>> combined_stmts = pa.combine_related() # doctest:+ELLIPSIS
>>> combined_stmts
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> combined_stmts[0].supported_by
[Phosphorylation(BRAF(), MAP2K1())]
>>> combined_stmts[0].supported_by[0].supports
[Phosphorylation(BRAF(), MAP2K1(), S)]
find_contradicts()[source]

Return pairs of contradicting Statements.

Returns:contradicts – A list of Statement pairs that are contradicting.
Return type:list(tuple(Statement, Statement))
indra.preassembler.flatten_evidence(stmts, collect_from=None)[source]

Add evidence from supporting stmts to evidence for supported stmts.

Parameters:
  • stmts (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().
  • collect_from (str in ('supports', 'supported_by')) – String indicating whether to collect and flatten evidence from the supports attribute of each statement or the supported_by attribute. If not set, defaults to ‘supported_by’.
Returns:

stmts – Statement hierarchy identical to the one passed, but with the evidence lists for each statement now containing all of the evidence associated with the statements they are supported by.

Return type:

list of indra.statements.Statement

Examples

Flattening evidence adds the two pieces of evidence from the supporting statement to the evidence list of the top-level statement:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1,
... evidence=[Evidence(text='foo'), Evidence(text='bar')])
>>> st2 = Phosphorylation(braf, map2k1, residue='S',
... evidence=[Evidence(text='baz'), Evidence(text='bak')])
>>> pa = Preassembler(hierarchies, [st1, st2])
>>> pa.combine_related() # doctest:+ELLIPSIS
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> [e.text for e in pa.related_stmts[0].evidence] # doctest:+IGNORE_UNICODE
['baz', 'bak']
>>> flattened = flatten_evidence(pa.related_stmts)
>>> sorted([e.text for e in flattened[0].evidence]) # doctest:+IGNORE_UNICODE
['bak', 'bar', 'baz', 'foo']
indra.preassembler.flatten_stmts(stmts)[source]

Return the full set of unique stms in a pre-assembled stmt graph.

The flattened list of statements returned by this function can be compared to the original set of unique statements to make sure no statements have been lost during the preassembly process.

Parameters:stmts (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().
Returns:stmts – List of all statements contained in the hierarchical statement graph.
Return type:list of indra.statements.Statement

Examples

Calling combine_related() on two statements results in one top-level statement; calling flatten_stmts() recovers both:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(hierarchies, [st1, st2])
>>> pa.combine_related() # doctest:+ELLIPSIS
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> flattened = flatten_stmts(pa.related_stmts)
>>> flattened.sort(key=lambda x: x.matches_key())
>>> flattened
[Phosphorylation(BRAF(), MAP2K1()), Phosphorylation(BRAF(), MAP2K1(), S)]
indra.preassembler.render_stmt_graph(statements, reduce=True, english=False, rankdir=None, agent_style=None)[source]

Render the statement hierarchy as a pygraphviz graph.

Parameters:
  • stmts (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().
  • reduce (bool) – Whether to perform a transitive reduction of the edges in the graph. Default is True.
  • english (bool) – If True, the statements in the graph are represented by their English-assembled equivalent; otherwise they are represented as text-formatted Statements.
  • rank_dir (str or None) – Argument to pass through to the pygraphviz AGraph constructor specifying graph layout direction. In particular, a value of ‘LR’ specifies a left-to-right direction. If None, the pygraphviz default is used.
  • agent_style (dict or None) –

    Dict of attributes specifying the visual properties of nodes. If None, the following default attributes are used:

    agent_style = {'color': 'lightgray', 'style': 'filled',
                   'fontname': 'arial'}
    
Returns:

Pygraphviz graph with nodes representing statements and edges pointing from supported statements to supported_by statements.

Return type:

pygraphviz.AGraph

Examples

Pattern for getting statements and rendering as a Graphviz graph:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(hierarchies, [st1, st2])
>>> pa.combine_related() # doctest:+ELLIPSIS
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> graph = render_stmt_graph(pa.related_stmts)
>>> graph.write('example_graph.dot') # To make the DOT file
>>> graph.draw('example_graph.png', prog='dot') # To make an image

Resulting graph:

Example statement graph rendered by Graphviz