Preassembler (indra.preassembler)

class indra.preassembler.Preassembler(ontology, stmts=None, matches_fun=None, refinement_fun=None)[source]

De-duplicates statements and arranges them in a specificity hierarchy.

Parameters
  • ontology (indra.ontology.IndraOntology) – An INDRA Ontology object.

  • stmts (list of indra.statements.Statement or None) – A set of statements to perform pre-assembly on. If None, statements should be added using the add_statements() method.

  • matches_fun (Optional[function]) – A functon which takes a Statement object as argument and returns a string key that is used for duplicate recognition. If supplied, it overrides the use of the built-in matches_key method of each Statement being assembled.

  • refinement_fun (Optional[function]) – A function which takes two Statement objects and an ontology as an argument and returns True or False. If supplied, it overrides the built-in refinement_of method of each Statement being assembled.

stmts

Starting set of statements for preassembly.

Type

list of indra.statements.Statement

unique_stmts

Statements resulting from combining duplicates.

Type

list of indra.statements.Statement

related_stmts

Top-level statements after building the refinement hierarchy.

Type

list of indra.statements.Statement

ontology

An INDRA Ontology object.

Type

dict[indra.preassembler.ontology_graph.IndraOntology]

add_statements(stmts)[source]

Add to the current list of statements.

Parameters

stmts (list of indra.statements.Statement) – Statements to add to the current list.

combine_duplicate_stmts(stmts)[source]

Combine evidence from duplicate Statements.

Statements are deemed to be duplicates if they have the same key returned by the matches_key() method of the Statement class. This generally means that statements must be identical in terms of their arguments and can differ only in their associated Evidence objects.

This function keeps the first instance of each set of duplicate statements and merges the lists of Evidence from all of the other statements.

Parameters

stmts (list of indra.statements.Statement) – Set of statements to de-duplicate.

Returns

Unique statements with accumulated evidence across duplicates.

Return type

list of indra.statements.Statement

Examples

De-duplicate and combine evidence for two statements differing only in their evidence lists:

>>> from indra.ontology.bio import bio_ontology
>>> map2k1 = Agent('MAP2K1')
>>> mapk1 = Agent('MAPK1')
>>> stmt1 = Phosphorylation(map2k1, mapk1, 'T', '185',
... evidence=[Evidence(text='evidence 1')])
>>> stmt2 = Phosphorylation(map2k1, mapk1, 'T', '185',
... evidence=[Evidence(text='evidence 2')])
>>> pa = Preassembler(bio_ontology)
>>> uniq_stmts = pa.combine_duplicate_stmts([stmt1, stmt2])
>>> uniq_stmts
[Phosphorylation(MAP2K1(), MAPK1(), T, 185)]
>>> sorted([e.text for e in uniq_stmts[0].evidence])
['evidence 1', 'evidence 2']
combine_duplicates()[source]

Combine duplicates among stmts and save result in unique_stmts.

A wrapper around the method combine_duplicate_stmts().

Connect related statements based on their refinement relationships.

This function takes as a starting point the unique statements (with duplicates removed) and returns a modified flat list of statements containing only those statements which do not represent a refinement of other existing statements. In other words, the more general versions of a given statement do not appear at the top level, but instead are listed in the supports field of the top-level statements.

If unique_stmts has not been initialized with the de-duplicated statements, combine_duplicates() is called internally.

After this function is called the attribute related_stmts is set as a side-effect.

The procedure for combining statements in this way involves a series of steps:

  1. The statements are subjected to (built-in or user-supplied) filters that group them based on potential refinement relationships. For instance, the ontology-based filter positions each statement, based on its agent arguments, with the ontology, and determines potential refinements based on paths in the ontology graph.

  2. Each statement is then compared with the set of statements it can potentially refine, as determined by the pre-filters. If the statement represents a refinement of the other (as defined by the refinement_of() method implemented for the Statement), then the more refined statement is added to the supports field of the more general statement, and the more general statement is added to the supported_by field of the more refined statement.

  3. A new flat list of statements is created that contains only those statements that have no supports entries (statements containing such entries are not eliminated, because they will be retrievable from the supported_by fields of other statements). This list is returned to the caller.

Note

Subfamily relationships must be consistent across arguments

For now, we require that merges can only occur if the isa relationships are all in the same direction for all the agents in a Statement. For example, the two statement groups: RAF_family -> MEK1 and BRAF -> MEK_family would not be merged, since BRAF isa RAF_family, but MEK_family is not a MEK1. In the future this restriction could be revisited.

Parameters
Returns

The returned list contains Statements representing the more concrete/refined versions of the Statements involving particular entities. The attribute related_stmts is also set to this list. However, if return_toplevel is False then all statements are returned, irrespective of level of specificity. In this case the relationships between statements can be accessed via the supports/supported_by attributes.

Return type

list of indra.statement.Statement

Examples

A more general statement with no information about a Phosphorylation site is identified as supporting a more specific statement:

>>> from indra.ontology.bio import bio_ontology
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(bio_ontology, [st1, st2])
>>> combined_stmts = pa.combine_related() 
>>> combined_stmts
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> combined_stmts[0].supported_by
[Phosphorylation(BRAF(), MAP2K1())]
>>> combined_stmts[0].supported_by[0].supports
[Phosphorylation(BRAF(), MAP2K1(), S)]
find_contradicts()[source]

Return pairs of contradicting Statements.

Returns

contradicts – A list of Statement pairs that are contradicting.

Return type

list(tuple(Statement, Statement))

normalize_equivalences(ns, rank_key=None)[source]

Normalize to one of a set of equivalent concepts across statements.

This function changes Statements in place without returning a value.

Parameters
  • ns (str) – The db_refs namespace for which the equivalence relation should be applied.

  • rank_key (Optional[function]) – A function handle which assigns a sort key to each entry in the given namespace to allow prioritizing in a controlled way which concept is normalized to.

normalize_opposites(ns, rank_key=None)[source]

Normalize to one of a pair of opposite concepts across statements.

This function changes Statements in place without returning a value.

Parameters
  • ns (str) – The db_refs namespace for which the opposite relation should be applied.

  • rank_key (Optional[function]) – A function handle which assigns a sort key to each entry in the given namespace to allow prioritizing in a controlled way which concept is normalized to.

indra.preassembler.find_refinements_for_statement(stmt, filters)[source]

Return refinements for a single statement given initialized filters.

Parameters
  • stmt (indra.statements.Statement) – The statement whose relations should be found.

  • filters (list[indra.preassembler.refinement.RefinementFilter]) – A list of refinement filter instances. The filters passed to this function need to have been initialized with stmts_by_hash.

Returns

A set of statement hashes that this statement refines.

Return type

set

indra.preassembler.flatten_evidence(stmts, collect_from=None)[source]

Add evidence from supporting stmts to evidence for supported stmts.

Parameters
  • stmts (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().

  • collect_from (str in ('supports', 'supported_by')) – String indicating whether to collect and flatten evidence from the supports attribute of each statement or the supported_by attribute. If not set, defaults to ‘supported_by’.

Returns

stmts – Statement hierarchy identical to the one passed, but with the evidence lists for each statement now containing all of the evidence associated with the statements they are supported by.

Return type

list of indra.statements.Statement

Examples

Flattening evidence adds the two pieces of evidence from the supporting statement to the evidence list of the top-level statement:

>>> from indra.ontology.bio import bio_ontology
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1,
... evidence=[Evidence(text='foo'), Evidence(text='bar')])
>>> st2 = Phosphorylation(braf, map2k1, residue='S',
... evidence=[Evidence(text='baz'), Evidence(text='bak')])
>>> pa = Preassembler(bio_ontology, [st1, st2])
>>> pa.combine_related() 
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> [e.text for e in pa.related_stmts[0].evidence]
['baz', 'bak']
>>> flattened = flatten_evidence(pa.related_stmts)
>>> sorted([e.text for e in flattened[0].evidence])
['bak', 'bar', 'baz', 'foo']
indra.preassembler.flatten_stmts(stmts)[source]

Return the full set of unique stms in a pre-assembled stmt graph.

The flattened list of statements returned by this function can be compared to the original set of unique statements to make sure no statements have been lost during the preassembly process.

Parameters

stmts (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().

Returns

stmts – List of all statements contained in the hierarchical statement graph.

Return type

list of indra.statements.Statement

Examples

Calling combine_related() on two statements results in one top-level statement; calling flatten_stmts() recovers both:

>>> from indra.ontology.bio import bio_ontology
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(bio_ontology, [st1, st2])
>>> pa.combine_related() 
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> flattened = flatten_stmts(pa.related_stmts)
>>> flattened.sort(key=lambda x: x.matches_key())
>>> flattened
[Phosphorylation(BRAF(), MAP2K1()), Phosphorylation(BRAF(), MAP2K1(), S)]
indra.preassembler.render_stmt_graph(statements, reduce=True, english=False, rankdir=None, agent_style=None)[source]

Render the statement hierarchy as a pygraphviz graph.

Parameters
  • statements (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().

  • reduce (bool) – Whether to perform a transitive reduction of the edges in the graph. Default is True.

  • english (bool) – If True, the statements in the graph are represented by their English-assembled equivalent; otherwise they are represented as text-formatted Statements.

  • rankdir (str or None) – Argument to pass through to the pygraphviz AGraph constructor specifying graph layout direction. In particular, a value of ‘LR’ specifies a left-to-right direction. If None, the pygraphviz default is used.

  • agent_style (dict or None) –

    Dict of attributes specifying the visual properties of nodes. If None, the following default attributes are used:

    agent_style = {'color': 'lightgray', 'style': 'filled',
                   'fontname': 'arial'}
    

Returns

Pygraphviz graph with nodes representing statements and edges pointing from supported statements to supported_by statements.

Return type

pygraphviz.AGraph

Examples

Pattern for getting statements and rendering as a Graphviz graph:

>>> from indra.ontology.bio import bio_ontology
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(bio_ontology, [st1, st2])
>>> pa.combine_related() 
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> graph = render_stmt_graph(pa.related_stmts)
>>> graph.write('example_graph.dot') # To make the DOT file
>>> graph.draw('example_graph.png', prog='dot') # To make an image

Resulting graph:

Example statement graph rendered by Graphviz

Refinement filter classes and functions (indra.preassembler.refinement)

This module implements classes and functions that are used for finding refinements between INDRA Statements as part of the knowledge-assembly process. These are imported by the preassembler module.

class indra.preassembler.refinement.OntologyRefinementFilter(ontology)[source]

This filter uses an ontology to position statements and their agents to filter down significantly on the set of possible relations for a given statement.

Parameters

ontology (indra.ontology.OntologyGraph) – An INDRA ontology graph.

extend(stmts_by_hash)[source]

Extend the initial data structures with a set of new statements.

Parameters

stmts_by_hash (dict[int, indra.statements.Statement]) – A dict of statements keyed by their hashes.

Return a set of statement hashes that a given statement is potentially related to.

Parameters
  • stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.

  • possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.

  • direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.

Returns

A set of INDRA Statement hashes that are potentially related to the given statement.

Return type

set of int

initialize(stmts_by_hash)[source]

Initialize the filter class with a set of statements.

The filter can build up some useful data structures in this function before being applied to any specific statements.

Parameters

stmts_by_hash (dict[int, indra.statements.Statement]) – A dict of statements keyed by their hashes.

class indra.preassembler.refinement.RefinementConfirmationFilter(ontology, refinement_fun=None)[source]

This class runs the refinement function between potentially related statements to confirm whether they are indeed, conclusively in a refinement relationship with each other.

In this sense, this isn’t a real filter, though implementing it as one is convenient. This filter is meant to be used as the final component in a series of pre-filters.

Return a set of statement hashes that a given statement is potentially related to.

Parameters
  • stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.

  • possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.

  • direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.

Returns

A set of INDRA Statement hashes that are potentially related to the given statement.

Return type

set of int

class indra.preassembler.refinement.RefinementFilter[source]

A filter which is applied to one or more statements to eliminate candidate refinements that are not possible according to some criteria. By applying a series of such filters, the preassembler can avoid doing n-by-n comparisons to determine refinements among n statements.

The filter class can take any number of constructor arguments that it needs to perform its task. The base class’ constructor initializes a shared_data attribute as an empty dict.

It also needs to implement an initialize function which is called with a stmts_by_hash argument, containing a dict of statements keyed by hash. This function can build any data structures that may be needed to efficiently apply the filter later. It cab store any such data structures in the shared_data dict to be accessed by other functions later.

Finally, the class needs to implement a get_related function, which takes a single INDRA Statement as input to return the hashes of potentially related other statements that the filter was initialized with. The function also needs to take a possibly_related argument which is either None (no other filter was run before) or a set, which is the superset of possible relations as determined by some other previously applied filter.

extend(stmts_by_hash)[source]

Extend the initial data structures with a set of new statements.

Parameters

stmts_by_hash (dict[int, indra.statements.Statement]) – A dict of statements keyed by their hashes.

get_less_specifics(stmt, possibly_related=None)[source]

Return a set of hashes of statements that are potentially related and less specific than the given statement.

get_more_specifics(stmt, possibly_related=None)[source]

Return a set of hashes of statements that are potentially related and more specific than the given statement.

Return a set of statement hashes that a given statement is potentially related to.

Parameters
  • stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.

  • possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.

  • direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.

Returns

A set of INDRA Statement hashes that are potentially related to the given statement.

Return type

set of int

initialize(stmts_by_hash)[source]

Initialize the filter class with a set of statements.

The filter can build up some useful data structures in this function before being applied to any specific statements.

Parameters

stmts_by_hash (dict[int, indra.statements.Statement]) – A dict of statements keyed by their hashes.

class indra.preassembler.refinement.SplitGroupFilter(split_groups)[source]

This filter implements splitting statements into two groups and only considering refinement relationships between the groups but not within them.

Return a set of statement hashes that a given statement is potentially related to.

Parameters
  • stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.

  • possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.

  • direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.

Returns

A set of INDRA Statement hashes that are potentially related to the given statement.

Return type

set of int

indra.preassembler.refinement.get_agent_key(agent)[source]

Return a key for an Agent for use in refinement finding.

Parameters

agent (indra.statements.Agent or None) – An INDRA Agent whose key should be returned.

Returns

The key that maps the given agent to the ontology, with special handling for ungrounded and None Agents.

Return type

tuple or None

indra.preassembler.refinement.get_relevant_keys(agent_key, all_keys_for_role, ontology, direction)[source]

Return relevant agent keys for an agent key for refinement finding.

Parameters
  • agent_key (tuple or None) – An agent key of interest.

  • all_keys_for_role (set) – The set of all agent keys in a given statement corpus with a role matching that of the given agent_key.

  • ontology (indra.ontology.IndraOntology) – An IndraOntology instance with respect to which relevant other agent keys are found for the purposes of refinement.

  • direction (str) – The direction in which to find relevant agents. The two options are ‘less_specific’ and ‘more_specific’ for agents that are less and more specific, per the ontology, respectively.

Returns

The set of relevant agent keys which this given agent key can possibly refine.

Return type

set

Custom preassembly functions (indra.preassembler.custom_preassembly)

This module contains a library of functions that are useful for building custom preassembly logic for some applications. They are typically used as matches_fun or refinement_fun arguments to the Preassembler and other modules.

indra.preassembler.custom_preassembly.agent_grounding_matches(agent)[source]

Return an Agent matches key just based on grounding, not state.

indra.preassembler.custom_preassembly.agent_name_matches(agent)[source]

Return a sorted, normalized bag of words as the name.

indra.preassembler.custom_preassembly.agent_name_stmt_type_matches(stmt)[source]

Return True if the statement type and normalized agent name matches.

indra.preassembler.custom_preassembly.agents_stmt_type_matches(stmt)[source]

Return a matches key just based on Agent grounding and Stmt type.