Preassembler (indra.preassembler
)
- class indra.preassembler.Preassembler(ontology, stmts=None, matches_fun=None, refinement_fun=None)[source]
De-duplicates statements and arranges them in a specificity hierarchy.
- Parameters
ontology (
indra.ontology.IndraOntology
) – An INDRA Ontology object.stmts (list of
indra.statements.Statement
or None) – A set of statements to perform pre-assembly on. If None, statements should be added using theadd_statements()
method.matches_fun (Optional[function]) – A functon which takes a Statement object as argument and returns a string key that is used for duplicate recognition. If supplied, it overrides the use of the built-in matches_key method of each Statement being assembled.
refinement_fun (Optional[function]) – A function which takes two Statement objects and an ontology as an argument and returns True or False. If supplied, it overrides the built-in refinement_of method of each Statement being assembled.
- stmts
Starting set of statements for preassembly.
- Type
list of
indra.statements.Statement
- unique_stmts
Statements resulting from combining duplicates.
- Type
list of
indra.statements.Statement
Top-level statements after building the refinement hierarchy.
- Type
list of
indra.statements.Statement
- ontology
An INDRA Ontology object.
- Type
dict[
indra.preassembler.ontology_graph.IndraOntology
]
- add_statements(stmts)[source]
Add to the current list of statements.
- Parameters
stmts (list of
indra.statements.Statement
) – Statements to add to the current list.
- combine_duplicate_stmts(stmts)[source]
Combine evidence from duplicate Statements.
Statements are deemed to be duplicates if they have the same key returned by the matches_key() method of the Statement class. This generally means that statements must be identical in terms of their arguments and can differ only in their associated Evidence objects.
This function keeps the first instance of each set of duplicate statements and merges the lists of Evidence from all of the other statements.
- Parameters
stmts (list of
indra.statements.Statement
) – Set of statements to de-duplicate.- Returns
Unique statements with accumulated evidence across duplicates.
- Return type
list of
indra.statements.Statement
Examples
De-duplicate and combine evidence for two statements differing only in their evidence lists:
>>> from indra.ontology.bio import bio_ontology >>> map2k1 = Agent('MAP2K1') >>> mapk1 = Agent('MAPK1') >>> stmt1 = Phosphorylation(map2k1, mapk1, 'T', '185', ... evidence=[Evidence(text='evidence 1')]) >>> stmt2 = Phosphorylation(map2k1, mapk1, 'T', '185', ... evidence=[Evidence(text='evidence 2')]) >>> pa = Preassembler(bio_ontology) >>> uniq_stmts = pa.combine_duplicate_stmts([stmt1, stmt2]) >>> uniq_stmts [Phosphorylation(MAP2K1(), MAPK1(), T, 185)] >>> sorted([e.text for e in uniq_stmts[0].evidence]) ['evidence 1', 'evidence 2']
- combine_duplicates()[source]
Combine duplicates among stmts and save result in unique_stmts.
A wrapper around the method
combine_duplicate_stmts()
.
Connect related statements based on their refinement relationships.
This function takes as a starting point the unique statements (with duplicates removed) and returns a modified flat list of statements containing only those statements which do not represent a refinement of other existing statements. In other words, the more general versions of a given statement do not appear at the top level, but instead are listed in the supports field of the top-level statements.
If
unique_stmts
has not been initialized with the de-duplicated statements,combine_duplicates()
is called internally.After this function is called the attribute
related_stmts
is set as a side-effect.The procedure for combining statements in this way involves a series of steps:
The statements are subjected to (built-in or user-supplied) filters that group them based on potential refinement relationships. For instance, the ontology-based filter positions each statement, based on its agent arguments, with the ontology, and determines potential refinements based on paths in the ontology graph.
Each statement is then compared with the set of statements it can potentially refine, as determined by the pre-filters. If the statement represents a refinement of the other (as defined by the refinement_of() method implemented for the Statement), then the more refined statement is added to the supports field of the more general statement, and the more general statement is added to the supported_by field of the more refined statement.
A new flat list of statements is created that contains only those statements that have no supports entries (statements containing such entries are not eliminated, because they will be retrievable from the supported_by fields of other statements). This list is returned to the caller.
Note
Subfamily relationships must be consistent across arguments
For now, we require that merges can only occur if the isa relationships are all in the same direction for all the agents in a Statement. For example, the two statement groups: RAF_family -> MEK1 and BRAF -> MEK_family would not be merged, since BRAF isa RAF_family, but MEK_family is not a MEK1. In the future this restriction could be revisited.
- Parameters
return_toplevel (Optional[bool]) – If True only the top level statements are returned. If False, all statements are returned. Default: True
filters (Optional[list[
indra.preassembler.refinement.RefinementFilter
]]) – A list of RefinementFilter classes that implement filters on possible statement refinements. For details on how to construct such a filter, see the documentation ofindra.preassembler.refinement.RefinementFilter
. If no user-supplied filters are provided, the default ontology-based filter is applied. If a list of filters is provided here, theindra.preassembler.refinement.OntologyRefinementFilter
isn’t appended by default, and should be added by the user, if necessary. Default: None
- Returns
The returned list contains Statements representing the more concrete/refined versions of the Statements involving particular entities. The attribute
related_stmts
is also set to this list. However, if return_toplevel is False then all statements are returned, irrespective of level of specificity. In this case the relationships between statements can be accessed via the supports/supported_by attributes.- Return type
list of
indra.statement.Statement
Examples
A more general statement with no information about a Phosphorylation site is identified as supporting a more specific statement:
>>> from indra.ontology.bio import bio_ontology >>> braf = Agent('BRAF') >>> map2k1 = Agent('MAP2K1') >>> st1 = Phosphorylation(braf, map2k1) >>> st2 = Phosphorylation(braf, map2k1, residue='S') >>> pa = Preassembler(bio_ontology, [st1, st2]) >>> combined_stmts = pa.combine_related() >>> combined_stmts [Phosphorylation(BRAF(), MAP2K1(), S)] >>> combined_stmts[0].supported_by [Phosphorylation(BRAF(), MAP2K1())] >>> combined_stmts[0].supported_by[0].supports [Phosphorylation(BRAF(), MAP2K1(), S)]
- normalize_equivalences(ns, rank_key=None)[source]
Normalize to one of a set of equivalent concepts across statements.
This function changes Statements in place without returning a value.
- Parameters
ns (str) – The db_refs namespace for which the equivalence relation should be applied.
rank_key (Optional[function]) – A function handle which assigns a sort key to each entry in the given namespace to allow prioritizing in a controlled way which concept is normalized to.
- normalize_opposites(ns, rank_key=None)[source]
Normalize to one of a pair of opposite concepts across statements.
This function changes Statements in place without returning a value.
- Parameters
ns (str) – The db_refs namespace for which the opposite relation should be applied.
rank_key (Optional[function]) – A function handle which assigns a sort key to each entry in the given namespace to allow prioritizing in a controlled way which concept is normalized to.
- indra.preassembler.find_refinements_for_statement(stmt, filters)[source]
Return refinements for a single statement given initialized filters.
- Parameters
stmt (indra.statements.Statement) – The statement whose relations should be found.
filters (list[
indra.preassembler.refinement.RefinementFilter
]) – A list of refinement filter instances. The filters passed to this function need to have been initialized with stmts_by_hash.
- Returns
A set of statement hashes that this statement refines.
- Return type
- indra.preassembler.flatten_evidence(stmts, collect_from=None)[source]
Add evidence from supporting stmts to evidence for supported stmts.
- Parameters
stmts (list of
indra.statements.Statement
) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy withcombine_related()
.collect_from (str in ('supports', 'supported_by')) – String indicating whether to collect and flatten evidence from the supports attribute of each statement or the supported_by attribute. If not set, defaults to ‘supported_by’.
- Returns
stmts – Statement hierarchy identical to the one passed, but with the evidence lists for each statement now containing all of the evidence associated with the statements they are supported by.
- Return type
list of
indra.statements.Statement
Examples
Flattening evidence adds the two pieces of evidence from the supporting statement to the evidence list of the top-level statement:
>>> from indra.ontology.bio import bio_ontology >>> braf = Agent('BRAF') >>> map2k1 = Agent('MAP2K1') >>> st1 = Phosphorylation(braf, map2k1, ... evidence=[Evidence(text='foo'), Evidence(text='bar')]) >>> st2 = Phosphorylation(braf, map2k1, residue='S', ... evidence=[Evidence(text='baz'), Evidence(text='bak')]) >>> pa = Preassembler(bio_ontology, [st1, st2]) >>> pa.combine_related() [Phosphorylation(BRAF(), MAP2K1(), S)] >>> [e.text for e in pa.related_stmts[0].evidence] ['baz', 'bak'] >>> flattened = flatten_evidence(pa.related_stmts) >>> sorted([e.text for e in flattened[0].evidence]) ['bak', 'bar', 'baz', 'foo']
- indra.preassembler.flatten_stmts(stmts)[source]
Return the full set of unique stms in a pre-assembled stmt graph.
The flattened list of statements returned by this function can be compared to the original set of unique statements to make sure no statements have been lost during the preassembly process.
- Parameters
stmts (list of
indra.statements.Statement
) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy withcombine_related()
.- Returns
stmts – List of all statements contained in the hierarchical statement graph.
- Return type
list of
indra.statements.Statement
Examples
Calling
combine_related()
on two statements results in one top-level statement; callingflatten_stmts()
recovers both:>>> from indra.ontology.bio import bio_ontology >>> braf = Agent('BRAF') >>> map2k1 = Agent('MAP2K1') >>> st1 = Phosphorylation(braf, map2k1) >>> st2 = Phosphorylation(braf, map2k1, residue='S') >>> pa = Preassembler(bio_ontology, [st1, st2]) >>> pa.combine_related() [Phosphorylation(BRAF(), MAP2K1(), S)] >>> flattened = flatten_stmts(pa.related_stmts) >>> flattened.sort(key=lambda x: x.matches_key()) >>> flattened [Phosphorylation(BRAF(), MAP2K1()), Phosphorylation(BRAF(), MAP2K1(), S)]
- indra.preassembler.render_stmt_graph(statements, reduce=True, english=False, rankdir=None, agent_style=None)[source]
Render the statement hierarchy as a pygraphviz graph.
- Parameters
statements (list of
indra.statements.Statement
) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy withcombine_related()
.reduce (bool) – Whether to perform a transitive reduction of the edges in the graph. Default is True.
english (bool) – If True, the statements in the graph are represented by their English-assembled equivalent; otherwise they are represented as text-formatted Statements.
rankdir (str or None) – Argument to pass through to the pygraphviz AGraph constructor specifying graph layout direction. In particular, a value of ‘LR’ specifies a left-to-right direction. If None, the pygraphviz default is used.
agent_style (dict or None) –
Dict of attributes specifying the visual properties of nodes. If None, the following default attributes are used:
agent_style = {'color': 'lightgray', 'style': 'filled', 'fontname': 'arial'}
- Returns
Pygraphviz graph with nodes representing statements and edges pointing from supported statements to supported_by statements.
- Return type
pygraphviz.AGraph
Examples
Pattern for getting statements and rendering as a Graphviz graph:
>>> from indra.ontology.bio import bio_ontology >>> braf = Agent('BRAF') >>> map2k1 = Agent('MAP2K1') >>> st1 = Phosphorylation(braf, map2k1) >>> st2 = Phosphorylation(braf, map2k1, residue='S') >>> pa = Preassembler(bio_ontology, [st1, st2]) >>> pa.combine_related() [Phosphorylation(BRAF(), MAP2K1(), S)] >>> graph = render_stmt_graph(pa.related_stmts) >>> graph.write('example_graph.dot') # To make the DOT file >>> graph.draw('example_graph.png', prog='dot') # To make an image
Resulting graph:
Refinement filter classes and functions (indra.preassembler.refinement
)
This module implements classes and functions that are used for finding refinements between INDRA Statements as part of the knowledge-assembly process. These are imported by the preassembler module.
- class indra.preassembler.refinement.OntologyRefinementFilter(ontology)[source]
This filter uses an ontology to position statements and their agents to filter down significantly on the set of possible relations for a given statement.
- Parameters
ontology (indra.ontology.OntologyGraph) – An INDRA ontology graph.
Return a set of statement hashes that a given statement is potentially related to.
- Parameters
stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.
possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.
direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.
- Returns
A set of INDRA Statement hashes that are potentially related to the given statement.
- Return type
- class indra.preassembler.refinement.RefinementConfirmationFilter(ontology, refinement_fun=None)[source]
This class runs the refinement function between potentially related statements to confirm whether they are indeed, conclusively in a refinement relationship with each other.
In this sense, this isn’t a real filter, though implementing it as one is convenient. This filter is meant to be used as the final component in a series of pre-filters.
Return a set of statement hashes that a given statement is potentially related to.
- Parameters
stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.
possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.
direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.
- Returns
A set of INDRA Statement hashes that are potentially related to the given statement.
- Return type
- class indra.preassembler.refinement.RefinementFilter[source]
A filter which is applied to one or more statements to eliminate candidate refinements that are not possible according to some criteria. By applying a series of such filters, the preassembler can avoid doing n-by-n comparisons to determine refinements among n statements.
The filter class can take any number of constructor arguments that it needs to perform its task. The base class’ constructor initializes a shared_data attribute as an empty dict.
It also needs to implement an initialize function which is called with a stmts_by_hash argument, containing a dict of statements keyed by hash. This function can build any data structures that may be needed to efficiently apply the filter later. It cab store any such data structures in the shared_data dict to be accessed by other functions later.
Finally, the class needs to implement a get_related function, which takes a single INDRA Statement as input to return the hashes of potentially related other statements that the filter was initialized with. The function also needs to take a possibly_related argument which is either None (no other filter was run before) or a set, which is the superset of possible relations as determined by some other previously applied filter.
- get_less_specifics(stmt, possibly_related=None)[source]
Return a set of hashes of statements that are potentially related and less specific than the given statement.
- get_more_specifics(stmt, possibly_related=None)[source]
Return a set of hashes of statements that are potentially related and more specific than the given statement.
Return a set of statement hashes that a given statement is potentially related to.
- Parameters
stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.
possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.
direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.
- Returns
A set of INDRA Statement hashes that are potentially related to the given statement.
- Return type
- class indra.preassembler.refinement.SplitGroupFilter(split_groups)[source]
This filter implements splitting statements into two groups and only considering refinement relationships between the groups but not within them.
Return a set of statement hashes that a given statement is potentially related to.
- Parameters
stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.
possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.
direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.
- Returns
A set of INDRA Statement hashes that are potentially related to the given statement.
- Return type
- indra.preassembler.refinement.get_agent_key(agent)[source]
Return a key for an Agent for use in refinement finding.
- Parameters
agent (indra.statements.Agent or None) – An INDRA Agent whose key should be returned.
- Returns
The key that maps the given agent to the ontology, with special handling for ungrounded and None Agents.
- Return type
tuple or None
- indra.preassembler.refinement.get_relevant_keys(agent_key, all_keys_for_role, ontology, direction)[source]
Return relevant agent keys for an agent key for refinement finding.
- Parameters
agent_key (tuple or None) – An agent key of interest.
all_keys_for_role (set) – The set of all agent keys in a given statement corpus with a role matching that of the given agent_key.
ontology (indra.ontology.IndraOntology) – An IndraOntology instance with respect to which relevant other agent keys are found for the purposes of refinement.
direction (str) – The direction in which to find relevant agents. The two options are ‘less_specific’ and ‘more_specific’ for agents that are less and more specific, per the ontology, respectively.
- Returns
The set of relevant agent keys which this given agent key can possibly refine.
- Return type
Custom preassembly functions (indra.preassembler.custom_preassembly
)
This module contains a library of functions that are useful for building custom preassembly logic for some applications. They are typically used as matches_fun or refinement_fun arguments to the Preassembler and other modules.
- indra.preassembler.custom_preassembly.agent_grounding_matches(agent)[source]
Return an Agent matches key just based on grounding, not state.
- indra.preassembler.custom_preassembly.agent_name_matches(agent)[source]
Return a sorted, normalized bag of words as the name.
- indra.preassembler.custom_preassembly.agent_name_polarity_matches(stmt, sign_dict)[source]
Return a key for normalized agent names and polarity.
- indra.preassembler.custom_preassembly.agent_name_stmt_matches(stmt)[source]
Return the normalized agent names.