Preassembly (indra.preassembler)

Preassembler (indra.preassembler)

class indra.preassembler.Preassembler(hierarchies, stmts=None)[source]

De-duplicates statements and arranges them in a specificity hierarchy.

Parameters:
  • hierarchies (dict[indra.preassembler.hierarchy_manager]) – A dictionary of hierarchies with keys such as ‘entity’ (hierarchy of entities, primarily specifying relationships between genes and their families) and ‘modification’ pointing to HierarchyManagers
  • stmts (list of indra.statements.Statement or None) – A set of statements to perform pre-assembly on. If None, statements should be added using the add_statements() method.
stmts

list of indra.statements.Statement – Starting set of statements for preassembly.

unique_stmts

list of indra.statements.Statement – Statements resulting from combining duplicates.

related_stmts

list of indra.statements.Statement – Top-level statements after building the refinement hierarchy.

hierarchies

dict[indra.preassembler.hierarchy_manager] – A dictionary of hierarchies with keys such as ‘entity’ and ‘modification’ pointing to HierarchyManagers

add_statements(stmts)[source]

Add to the current list of statements.

Parameters:stmts (list of indra.statements.Statement) – Statements to add to the current list.
static combine_duplicate_stmts(stmts)[source]

Combine evidence from duplicate Statements.

Statements are deemed to be duplicates if they have the same key returned by the matches_key() method of the Statement class. This generally means that statements must be identical in terms of their arguments and can differ only in their associated Evidence objects.

This function keeps the first instance of each set of duplicate statements and merges the lists of Evidence from all of the other statements.

Parameters:stmts (list of indra.statements.Statement) – Set of statements to de-duplicate.
Returns:Unique statements with accumulated evidence across duplicates.
Return type:list of indra.statements.Statement

Examples

De-duplicate and combine evidence for two statements differing only in their evidence lists:

>>> map2k1 = Agent('MAP2K1')
>>> mapk1 = Agent('MAPK1')
>>> stmt1 = Phosphorylation(map2k1, mapk1, 'T', '185',
... evidence=[Evidence(text='evidence 1')])
>>> stmt2 = Phosphorylation(map2k1, mapk1, 'T', '185',
... evidence=[Evidence(text='evidence 2')])
>>> uniq_stmts = Preassembler.combine_duplicate_stmts([stmt1, stmt2])
>>> uniq_stmts
[Phosphorylation(MAP2K1(), MAPK1(), T, 185)]
>>> sorted([e.text for e in uniq_stmts[0].evidence]) 
['evidence 1', 'evidence 2']
combine_duplicates()[source]

Combine duplicates among stmts and save result in unique_stmts.

A wrapper around the static method combine_duplicate_stmts().

Connect related statements based on their refinement relationships.

This function takes as a starting point the unique statements (with duplicates removed) and returns a modified flat list of statements containing only those statements which do not represent a refinement of other existing statements. In other words, the more general versions of a given statement do not appear at the top level, but instead are listed in the supports field of the top-level statements.

If unique_stmts has not been initialized with the de-duplicated statements, combine_duplicates() is called internally.

After this function is called the attribute related_stmts is set as a side-effect.

The procedure for combining statements in this way involves a series of steps:

  1. The statements are grouped by type (e.g., Phosphorylation) and each type is iterated over independently.
  2. Statements of the same type are then grouped according to their Agents’ entity hierarchy component identifiers. For instance, ERK, MAPK1 and MAPK3 are all in the same connected component in the entity hierarchy and therefore all Statements of the same type referencing these entities will be grouped. This grouping assures that relations are only possible within Statement groups and not among groups. For two Statements to be in the same group at this step, the Statements must be the same type and the Agents at each position in the Agent lists must either be in the same hierarchy component, or if they are not in the hierarchy, must have identical entity_matches_keys. Statements with None in one of the Agent list positions are collected separately at this stage.
  3. Statements with None at either the first or second position are iterated over. For a statement with a None as the first Agent, the second Agent is examined; then the Statement with None is added to all Statement groups with a corresponding component or entity_matches_key in the second position. The same procedure is performed for Statements with None at the second Agent position.
  4. The statements within each group are then compared; if one statement represents a refinement of the other (as defined by the refinement_of() method implemented for the Statement), then the more refined statement is added to the supports field of the more general statement, and the more general statement is added to the supported_by field of the more refined statement.
  5. A new flat list of statements is created that contains only those statements that have no supports entries (statements containing such entries are not eliminated, because they will be retrievable from the supported_by fields of other statements). This list is returned to the caller.

On multi-core machines, the algorithm can be parallelized by setting the poolsize argument to the desired number of worker processes. This feature is only available in Python > 3.4.

Note

Subfamily relationships must be consistent across arguments

For now, we require that merges can only occur if the isa relationships are all in the same direction for all the agents in a Statement. For example, the two statement groups: RAF_family -> MEK1 and BRAF -> MEK_family would not be merged, since BRAF isa RAF_family, but MEK_family is not a MEK1. In the future this restriction could be revisited.

Parameters:
  • return_toplevel (Optional[bool]) – If True only the top level statements are returned. If False, all statements are returned. Default: True
  • poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.
  • size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
Returns:

The returned list contains Statements representing the more concrete/refined versions of the Statements involving particular entities. The attribute related_stmts is also set to this list. However, if return_toplevel is False then all statements are returned, irrespective of level of specificity. In this case the relationships between statements can be accessed via the supports/supported_by attributes.

Return type:

list of indra.statement.Statement

Examples

A more general statement with no information about a Phosphorylation site is identified as supporting a more specific statement:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(hierarchies, [st1, st2])
>>> combined_stmts = pa.combine_related() 
>>> combined_stmts
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> combined_stmts[0].supported_by
[Phosphorylation(BRAF(), MAP2K1())]
>>> combined_stmts[0].supported_by[0].supports
[Phosphorylation(BRAF(), MAP2K1(), S)]
find_contradicts()[source]

Return pairs of contradicting Statements.

Returns:contradicts – A list of Statement pairs that are contradicting.
Return type:list(tuple(Statement, Statement))
indra.preassembler.flatten_evidence(stmts, collect_from=None)[source]

Add evidence from supporting stmts to evidence for supported stmts.

Parameters:
  • stmts (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().
  • collect_from (str in ('supports', 'supported_by')) – String indicating whether to collect and flatten evidence from the supports attribute of each statement or the supported_by attribute. If not set, defaults to ‘supported_by’.
Returns:

stmts – Statement hierarchy identical to the one passed, but with the evidence lists for each statement now containing all of the evidence associated with the statements they are supported by.

Return type:

list of indra.statements.Statement

Examples

Flattening evidence adds the two pieces of evidence from the supporting statement to the evidence list of the top-level statement:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1,
... evidence=[Evidence(text='foo'), Evidence(text='bar')])
>>> st2 = Phosphorylation(braf, map2k1, residue='S',
... evidence=[Evidence(text='baz'), Evidence(text='bak')])
>>> pa = Preassembler(hierarchies, [st1, st2])
>>> pa.combine_related() 
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> [e.text for e in pa.related_stmts[0].evidence] 
['baz', 'bak']
>>> flattened = flatten_evidence(pa.related_stmts)
>>> sorted([e.text for e in flattened[0].evidence]) 
['bak', 'bar', 'baz', 'foo']
indra.preassembler.flatten_stmts(stmts)[source]

Return the full set of unique stms in a pre-assembled stmt graph.

The flattened list of statements returned by this function can be compared to the original set of unique statements to make sure no statements have been lost during the preassembly process.

Parameters:stmts (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().
Returns:stmts – List of all statements contained in the hierarchical statement graph.
Return type:list of indra.statements.Statement

Examples

Calling combine_related() on two statements results in one top-level statement; calling flatten_stmts() recovers both:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(hierarchies, [st1, st2])
>>> pa.combine_related() 
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> flattened = flatten_stmts(pa.related_stmts)
>>> flattened.sort(key=lambda x: x.matches_key())
>>> flattened
[Phosphorylation(BRAF(), MAP2K1()), Phosphorylation(BRAF(), MAP2K1(), S)]
indra.preassembler.render_stmt_graph(statements, reduce=True, english=False, rankdir=None, agent_style=None)[source]

Render the statement hierarchy as a pygraphviz graph.

Parameters:
  • stmts (list of indra.statements.Statement) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy with combine_related().
  • reduce (bool) – Whether to perform a transitive reduction of the edges in the graph. Default is True.
  • english (bool) – If True, the statements in the graph are represented by their English-assembled equivalent; otherwise they are represented as text-formatted Statements.
  • rank_dir (str or None) – Argument to pass through to the pygraphviz AGraph constructor specifying graph layout direction. In particular, a value of ‘LR’ specifies a left-to-right direction. If None, the pygraphviz default is used.
  • agent_style (dict or None) –

    Dict of attributes specifying the visual properties of nodes. If None, the following default attributes are used:

    agent_style = {'color': 'lightgray', 'style': 'filled',
                   'fontname': 'arial'}
    
Returns:

Pygraphviz graph with nodes representing statements and edges pointing from supported statements to supported_by statements.

Return type:

pygraphviz.AGraph

Examples

Pattern for getting statements and rendering as a Graphviz graph:

>>> from indra.preassembler.hierarchy_manager import hierarchies
>>> braf = Agent('BRAF')
>>> map2k1 = Agent('MAP2K1')
>>> st1 = Phosphorylation(braf, map2k1)
>>> st2 = Phosphorylation(braf, map2k1, residue='S')
>>> pa = Preassembler(hierarchies, [st1, st2])
>>> pa.combine_related() 
[Phosphorylation(BRAF(), MAP2K1(), S)]
>>> graph = render_stmt_graph(pa.related_stmts)
>>> graph.write('example_graph.dot') # To make the DOT file
>>> graph.draw('example_graph.png', prog='dot') # To make an image

Resulting graph:

Example statement graph rendered by Graphviz

Entity grounding curation and mapping (indra.preassembler.grounding_mapper)

class indra.preassembler.grounding_mapper.GroundingMapper(gm, agent_map=None)[source]

Maps grounding of INDRA Agents based on a given grounding map.

gm

dict – The grounding map, a dictionary mapping strings (entity names) to a dictionary of database identifiers.

agent_map

Optional[dict] – A dictionary mapping strings to grounded INDRA Agents with given state.

map_agent(agent, do_rename)[source]

Return the given Agent with its grounding mapped.

This function grounds a single agent. It returns the new Agent object (which might be a different object if we load a new agent state from json) or the same object otherwise.

Parameters:
  • agent (indra.statements.Agent) – The Agent to map.
  • do_rename (bool) – If True, the Agent name is updated based on the mapped grounding. If do_rename is True the priority for setting the name is FamPlex ID, HGNC symbol, then the gene name from Uniprot.
Returns:

  • grounded_agent (indra.statements.Agent) – The grounded Agent.
  • maps_to_none (bool) – True if the Agent is in the grounding map and maps to None.

map_agents(stmts, do_rename=True)[source]

Return a new list of statements whose agents have been mapped

Parameters:
  • stmts (list of indra.statements.Statement) – The statements whose agents need mapping
  • do_rename (Optional[bool]) – If True, the Agent name is updated based on the mapped grounding. If do_rename is True the priority for setting the name is FamPlex ID, HGNC symbol, then the gene name from Uniprot. Default: True
Returns:

mapped_stmts – A list of statements given by mapping the agents from each statement in the input list

Return type:

list of indra.statements.Statement

map_agents_for_stmt(stmt, do_rename=True)[source]

Return a new Statement whose agents have been grounding mapped.

Parameters:
  • stmt (indra.statements.Statement) – The Statement whose agents need mapping.
  • do_rename (Optional[bool]) – If True, the Agent name is updated based on the mapped grounding. If do_rename is True the priority for setting the name is FamPlex ID, HGNC symbol, then the gene name from Uniprot. Default: True
Returns:

mapped_stmt – The mapped Statement.

Return type:

indra.statements.Statement

rename_agents(stmts)[source]

Return a list of mapped statements with updated agent names.

Creates a new list of statements without modifying the original list.

The agents in a statement should be renamed if the grounding map has updated their db_refs. If an agent contains a FamPlex grounding, the FamPlex ID is used as a name. Otherwise if it contains a Uniprot ID, an attempt is made to find the associated HGNC gene name. If one can be found it is used as the agent name and the associated HGNC ID is added as an entry to the db_refs. If neither a FamPlex ID or HGNC name can be found, falls back to the original name.

Parameters:stmts (list of indra.statements.Statement) – List of statements whose Agents need their names updated.
Returns:mapped_stmts – A new list of Statements with updated Agent names
Return type:list of indra.statements.Statement
update_agent_db_refs(agent, agent_text, do_rename=True)[source]

Update db_refs of agent using the grounding map

If the grounding map is missing one of the HGNC symbol or Uniprot ID, attempts to reconstruct one from the other.

Parameters:
  • agent (indra.statements.Agent) – The agent whose db_refs will be updated
  • agent_text (str) – The agent_text to find a grounding for in the grounding map dictionary. Typically this will be agent.db_refs[‘TEXT’] but there may be situations where a different value should be used.
  • do_rename (Optional[bool]) – If True, the Agent name is updated based on the mapped grounding. If do_rename is True the priority for setting the name is FamPlex ID, HGNC symbol, then the gene name from Uniprot. Default: True
Raises:
  • ValueError – If the the grounding map contains and HGNC symbol for agent_text but no HGNC ID can be found for it.
  • ValueError – If the grounding map contains both an HGNC symbol and a Uniprot ID, but the HGNC symbol and the gene name associated with the gene in Uniprot do not match or if there is no associated gene name in Uniprot.
indra.preassembler.grounding_mapper.agent_texts(agents)[source]

Return a list of all agent texts from a list of agents.

None values are associated to agents without agent texts

Parameters:agents (list of indra.statements.Agent) –
Returns:agent texts from input list of agents
Return type:list of str/None
indra.preassembler.grounding_mapper.agent_texts_with_grounding(stmts)[source]

Return agent text groundings in a list of statements with their counts

Parameters:stmts (list of indra.statements.Statement) –
Returns:List of tuples of the form (text: str, ((name_space: str, ID: str, count: int)…), total_count: int)

Where the counts within the tuple of groundings give the number of times an agent with the given agent_text appears grounded with the particular name space and ID. The total_count gives the total number of times an agent with text appears in the list of statements.

Return type:list of tuple
indra.preassembler.grounding_mapper.all_agents(stmts)[source]

Return a list of all of the agents from a list of statements.

Only agents that are not None and have a TEXT entry are returned.

Parameters:stmts (list of indra.statements.Statement) –
Returns:agents – List of agents that appear in the input list of indra statements.
Return type:list of indra.statements.Agent
indra.preassembler.grounding_mapper.get_agents_with_name(name, stmts)[source]

Return all agents within a list of statements with a particular name.

indra.preassembler.grounding_mapper.get_sentences_for_agent(text, stmts, max_sentences=None)[source]

Returns evidence sentences with a given agent text from a list of statements

Parameters:
  • text (str) – An agent text
  • stmts (list of indra.statements.Statement) – INDRA Statements to search in for evidence statements.
  • max_sentences (Optional[int/None]) – Cap on the number of evidence sentences to return. Default: None
Returns:

sentences – Evidence sentences from the list of statements containing the given agent text.

Return type:

list of str

indra.preassembler.grounding_mapper.load_grounding_map(grounding_map_path, ignore_path=None, lineterminator='\r\n')[source]

Return a grounding map dictionary loaded from a csv file.

In the file pointed to by grounding_map_path, the number of name_space ID pairs can vary per row and commas are used to pad out entries containing fewer than the maximum amount of name spaces appearing in the file. Lines should be terminated with

both a carriage return and a new line by default.

Optionally, one can specify another csv file (pointed to by ignore_path) containing agent texts that are degenerate and should be filtered out.

Parameters:
  • grounding_map_path (str) – Path to csv file containing grounding map information. Rows of the file should be of the form <agent_text>,<name_space_1>,<ID_1>,… <name_space_n>,<ID_n>
  • ignore_path (Optional[str]) – Path to csv file containing terms that should be filtered out during the grounding mapping process. The file Should be of the form <agent_text>,,…, where the number of commas that appear is the same as in the csv file at grounding_map_path. Default: None
  • lineterminator (Optional[str]) – Line terminator used in input csv file. Default:
Returns:

g_map – The grounding map constructed from the given files.

Return type:

dict

indra.preassembler.grounding_mapper.protein_map_from_twg(twg)[source]

Build map of entity texts to validate protein grounding.

Looks at the grounding of the entity texts extracted from the statements and finds proteins where there is grounding to a human protein that maps to an HGNC name that is an exact match to the entity text. Returns a dict that can be used to update/expand the grounding map.

Parameters:twg (list of tuple) – list of tuples of the form output by agent_texts_with_grounding
Returns:protein_map – dict keyed on agent text with associated values {‘TEXT’: agent_text, ‘UP’: uniprot_id}. Entries are for agent texts where the grounding map was able to find human protein grounded to this agent_text in Uniprot.
Return type:dict
indra.preassembler.grounding_mapper.save_base_map(filename, grouped_by_text)[source]

Dump a list of agents along with groundings and counts into a csv file

Parameters:
  • filename (str) – Filepath for output file
  • grouped_by_text (list of tuple) – List of tuples of the form output by agent_texts_with_grounding
indra.preassembler.grounding_mapper.save_sentences(twg, stmts, filename, agent_limit=300)[source]

Write evidence sentences for stmts with ungrounded agents to csv file.

Parameters:
  • twg (list of tuple) – list of tuples of ungrounded agent_texts with counts of the number of times they are mentioned in the list of statements. Should be sorted in descending order by the counts. This is of the form output by the function ungrounded texts.
  • stmts (list of indra.statements.Statement) –
  • filename (str) – Path to output file
  • agent_limit (Optional[int]) – Number of agents to include in output file. Takes the top agents by count.
indra.preassembler.grounding_mapper.ungrounded_texts(stmts)[source]

Return a list of all ungrounded entities ordered by number of mentions

Parameters:stmts (list of indra.statements.Statement) –
Returns:ungroundc – list of tuples of the form (text: str, count: int) sorted in descending order by count.
Return type:list of tuple

Site curation and mapping (indra.preassembler.sitemapper)

class indra.preassembler.sitemapper.MappedStatement(original_stmt, mapped_mods, mapped_stmt)[source]

Information about a Statement found to have invalid sites.

Parameters:
  • original_stmt (indra.statements.Statement) – The statement prior to mapping.
  • mapped_mods (list of tuples) – A list of invalid sites, where each entry in the list has two elements: ((gene_name, residue, position), mapped_site). If the invalid position was not found in the site map, mapped_site is None; otherwise it is a tuple consisting of (residue, position, comment). Note that some entries in the site map are curated errors, that is, sites that are known to be frequent misattributions to certain proteins. Such sites are mapped to tuples (None, None, comment).
  • mapped_stmt (indra.statements.Statement) – The statement after mapping. Note that if no information was found in the site map, it will be identical to the original statement.
class indra.preassembler.sitemapper.SiteMapper(site_map, use_cache=False)[source]

Use curated site information to standardize modification sites in stmts.

Parameters:
  • site_map (dict (as returned by load_site_map())) – A dict mapping tuples of the form (gene, orig_res, orig_pos) to a tuple of the form (correct_res, correct_pos, comment), where gene is the string name of the gene (canonicalized to HGNC); orig_res and orig_pos are the residue and position to be mapped; correct_res and correct_pos are the corrected residue and position, and comment is a string describing the reason for the mapping (species error, isoform error, wrong residue name, etc.).
  • use_cache (Optional[bool]) – If True, the SITEMAPPER_CACHE_PATH from the config (or environment) is loaded and cached mappings are read and written to the given path. Otherwise, no cache is used. Default: False

Examples

Fixing site errors on both the modification state of an agent (MAP2K1) and the target of a Phosphorylation statement (MAPK1):

>>> map2k1_phos = Agent('MAP2K1', db_refs={'UP':'Q02750'}, mods=[
... ModCondition('phosphorylation', 'S', '217'),
... ModCondition('phosphorylation', 'S', '221')])
>>> mapk1 = Agent('MAPK1', db_refs={'UP':'P28482'})
>>> stmt = Phosphorylation(map2k1_phos, mapk1, 'T','183')
>>> (valid, mapped) = default_mapper.map_sites([stmt])
>>> valid
[]
>>> mapped  
[
MappedStatement:
    original_stmt: Phosphorylation(MAP2K1(mods: (phosphorylation, S, 217), (phosphorylation, S, 221)), MAPK1(), T, 183)
    mapped_mods: (('MAP2K1', 'S', '217'), ('S', '218', 'off by one'))
                 (('MAP2K1', 'S', '221'), ('S', '222', 'off by one'))
                 (('MAPK1', 'T', '183'), ('T', '185', 'off by two; mouse sequence'))
    mapped_stmt: Phosphorylation(MAP2K1(mods: (phosphorylation, S, 218), (phosphorylation, S, 222)), MAPK1(), T, 185)
]
>>> ms = mapped[0]
>>> ms.original_stmt
Phosphorylation(MAP2K1(mods: (phosphorylation, S, 217), (phosphorylation, S, 221)), MAPK1(), T, 183)
>>> ms.mapped_mods 
[(('MAP2K1', 'S', '217'), ('S', '218', 'off by one')), (('MAP2K1', 'S', '221'), ('S', '222', 'off by one')), (('MAPK1', 'T', '183'), ('T', '185', 'off by two; mouse sequence'))]
>>> ms.mapped_stmt
Phosphorylation(MAP2K1(mods: (phosphorylation, S, 218), (phosphorylation, S, 222)), MAPK1(), T, 185)
map_sites(stmts, do_methionine_offset=True, do_orthology_mapping=True, do_isoform_mapping=True)[source]

Check a set of statements for invalid modification sites.

Statements are checked against Uniprot reference sequences to determine if residues referred to by post-translational modifications exist at the given positions.

If there is nothing amiss with a statement (modifications on any of the agents, modifications made in the statement, etc.), then the statement goes into the list of valid statements. If there is a problem with the statement, the offending modifications are looked up in the site map (site_map), and an instance of MappedStatement is added to the list of mapped statements.

Parameters:
  • stmts (list of indra.statement.Statement) – The statements to check for site errors.
  • do_methionine_offset (boolean) – Whether to check for off-by-one errors in site position (possibly) attributable to site numbering from mature proteins after cleavage of the initial methionine. If True, checks the reference sequence for a known modification at 1 site position greater than the given one; if there exists such a site, creates the mapping. Default is True.
  • do_orthology_mapping (boolean) – Whether to check sequence positions for known modification sites in mouse or rat sequences (based on PhosphoSitePlus data). If a mouse/rat site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
  • do_isoform_mapping (boolean) – Whether to check sequence positions for known modifications in other human isoforms of the protein (based on PhosphoSitePlus data). If a site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
Returns:

2-tuple containing (valid_statements, mapped_statements). The first element of the tuple is a list valid statements (indra.statement.Statement) that were not found to contain any site errors. The second element of the tuple is a list of mapped statements (MappedStatement) with information on the incorrect sites and corresponding statements with correctly mapped sites.

Return type:

tuple

indra.preassembler.sitemapper.default_mapper = <indra.preassembler.sitemapper.SiteMapper object>

A default instance of SiteMapper that contains the site information found in resources/curated_site_map.csv’.

indra.preassembler.sitemapper.load_site_map(path)[source]

Load the modification site map from a file.

The site map file should be a comma-separated file with six columns:

Gene: HGNC gene name
OrigRes: Original (incorrect) residue
OrigPos: Original (incorrect) residue position
CorrectRes: The correct residue for the modification
CorrectPos: The correct residue position
Comment: Description of the reason for the error.
Parameters:path (string) – Path to the tab-separated site map file.
Returns:A dict mapping tuples of the form (gene, orig_res, orig_pos) to a tuple of the form (correct_res, correct_pos, comment), where gene is the string name of the gene (canonicalized to HGNC); orig_res and orig_pos are the residue and position to be mapped; correct_res and correct_pos are the corrected residue and position, and comment is a string describing the reason for the mapping (species error, isoform error, wrong residue name, etc.).
Return type:dict

Hierarchy manager (indra.preassembler.hierarchy_manager)

class indra.preassembler.hierarchy_manager.HierarchyManager(rdf_file, build_closure=True, uri_as_name=True)[source]

Store hierarchical relationships between different types of entities.

Used to store, e.g., entity hierarchies (proteins and protein families) and modification hierarchies (serine phosphorylation vs. phosphorylation).

Parameters:
  • rdf_file (string) – Path to the RDF file containing the hierarchy.
  • build_closure (Optional[bool]) – If True, the transitive closure of the hierarchy is generated up from to speed up processing. Default: True
  • uri_as_name (Optional[bool]) – If True, entries are accessed directly by their URIs. If False entries are accessed by finding their name through the hasName relationship. Default: True
graph

instance of rdflib.Graph – The RDF graph containing the hierarchy.

build_transitive_closures()[source]

Build the transitive closures of the hierarchy.

This method constructs dictionaries which contain terms in the hierarchy as keys and either all the “isa+” or “partof+” related terms as values.

Return True if two entities have the speicified relationship.

This relation is constructed possibly through multiple links connecting the two entities directly or indirectly.

Parameters:
  • ns1 (str) – Namespace code for an entity.
  • id1 (str) – URI for an entity.
  • ns2 (str) – Namespace code for an entity.
  • id2 (str) – URI for an entity.
  • closure_dict (dict) – A dictionary mapping node names to nodes that have the specified relationship, directly or indirectly. Empty if this has not been precomputed.
  • relation_func (function) – Function with arguments (node, graph) that generates objects with some relationship with node on the given graph.
Returns:

True if t1 has the specified relationship with t2, either directly or through a series of intermediates; False otherwise.

Return type:

bool

extend_with(rdf_file)[source]

Extend the RDF graph of this HierarchyManager with another RDF file.

Parameters:rdf_file (str) – An RDF file which is parsed such that the current graph and the graph described by the file are merged.
find_entity[source]

Get the entity that has the specified name (or synonym).

Parameters:x (string) – Name or synonym for the target entity.
get_children(uri)[source]

Return all (not just immediate) children of a given entry.

Parameters:uri (str) – The URI of the entry whose children are to be returned. See the get_uri method to construct this URI from a name space and id.
get_parents(uri, type='all')[source]

Return parents of a given entry.

Parameters:
  • uri (str) – The URI of the entry whose parents are to be returned. See the get_uri method to construct this URI from a name space and id.
  • type (str) – ‘all’: return all parents irrespective of level; ‘immediate’: return only the immediate parents; ‘top’: return only the highest level parents
is_opposite(ns1, id1, ns2, id2)[source]

Return True if two entities are in an “is_opposite” relationship

Parameters:
  • ns1 (str) – Namespace code for an entity.
  • id1 (str) – URI for an entity.
  • ns2 (str) – Namespace code for an entity.
  • id2 (str) – URI for an entity.
Returns:

True if t1 has an “is_opposite” relationship with t2.

Return type:

bool

isa(ns1, id1, ns2, id2)[source]

Return True if one entity has an “isa” relationship to another.

Parameters:
  • ns1 (str) – Namespace code for an entity.
  • id1 (string) – URI for an entity.
  • ns2 (str) – Namespace code for an entity.
  • id2 (str) – URI for an entity.
Returns:

True if t1 has an “isa” relationship with t2, either directly or through a series of intermediates; False otherwise.

Return type:

bool

isa_or_partof(ns1, id1, ns2, id2)[source]

Return True if two entities are in an “isa” or “partof” relationship

Parameters:
  • ns1 (str) – Namespace code for an entity.
  • id1 (str) – URI for an entity.
  • ns2 (str) – Namespace code for an entity.
  • id2 (str) – URI for an entity.
Returns:

True if t1 has a “isa” or “partof” relationship with t2, either directly or through a series of intermediates; False otherwise.

Return type:

bool

partof(ns1, id1, ns2, id2)[source]

Return True if one entity is “partof” another.

Parameters:
  • ns1 (str) – Namespace code for an entity.
  • id1 (str) – URI for an entity.
  • ns2 (str) – Namespace code for an entity.
  • id2 (str) – URI for an entity.
Returns:

True if t1 has a “partof” relationship with t2, either directly or through a series of intermediates; False otherwise.

Return type:

bool

exception indra.preassembler.hierarchy_manager.UnknownNamespaceException[source]