Source code for indra.statements

"""
Statements represent mechanistic relationships between biological agents.

Statement classes follow an inheritance hierarchy, with all Statement types
inheriting from the parent class :py:class:`Statement`. At
the next level in the hierarchy are the following classes:

- :py:class:`Complex`
- :py:class:`Modification`
- :py:class:`SelfModification`
- :py:class:`RegulateActivity`
- :py:class:`RegulateAmount`
- :py:class:`ActiveForm`
- :py:class:`Translocation`
- :py:class:`Gef`
- :py:class:`Gap`
- :py:class:`Conversion`

There are several types of Statements representing post-translational
modifications that further inherit from
:py:class:`Modification`:

- :py:class:`Phosphorylation`
- :py:class:`Dephosphorylation`
- :py:class:`Ubiquitination`
- :py:class:`Debiquitination`
- :py:class:`Sumoylation`
- :py:class:`Desumoylation`
- :py:class:`Hydroxylation`
- :py:class:`Dehydroxylation`
- :py:class:`Acetylation`
- :py:class:`Deacetylation`
- :py:class:`Glycosylation`
- :py:class:`Deglycosylation`
- :py:class:`Farnesylation`
- :py:class:`Defarnesylation`
- :py:class:`Geranylgeranylation`
- :py:class:`Degeranylgeranylation`
- :py:class:`Palmitoylation`
- :py:class:`Depalmitoylation`
- :py:class:`Myristoylation`
- :py:class:`Demyristoylation`
- :py:class:`Ribosylation`
- :py:class:`Deribosylation`
- :py:class:`Methylation`
- :py:class:`Demethylation`

There are additional subtypes of :py:class:`SelfModification`:

- :py:class:`Autophosphorylation`
- :py:class:`Transphosphorylation`

Interactions between proteins are often described simply in terms of their
effect on a protein's "activity", e.g., "Active MEK activates ERK", or "DUSP6
inactives ERK".  These types of relationships are indicated by the
:py:class:`RegulateActivity` abstract base class which has subtypes

- :py:class:`Activation`
- :py:class:`Inhibition`

while the :py:class:`RegulateAmount` abstract base class has subtypes

- :py:class:`IncreaseAmount`
- :py:class:`DecreaseAmount`

Statements involve one or more biological *Agents*, typically proteins,
represented by the class :py:class:`Agent`. Agents can have several types
of context specified on them including

- a specific post-translational modification state (indicated by one or
  more instances of :py:class:`ModCondition`),
- other bound Agents (:py:class:`BoundCondition`),
- mutations (:py:class:`MutCondition`),
- an activity state (:py:class:`ActivityCondition`), and
- cellular location

The *active* form of an agent (in terms of its post-translational modifications
or bound state) is indicated by an instance of the class
:py:class:`ActiveForm`.

Agents also carry grounding information which links them to database entries.
These database references are represented as a dictionary in the `db_refs`
attribute of each Agent. The dictionary can have multiple entries. For
instance, INDRA's input Processors produce genes and proteins that carry both
UniProt and HGNC IDs in db_refs, whenever possible. Bioentities provides a name
space for protein families that are typically used in the literature.  More
information about Bioentities can be found here:
https://github.com/sorgerlab/bioentities

+------------------------+------------------+--------------------------+
| Type                   | Database         | Example                  |
+========================+==================+==========================+
| Gene/Protein           | HGNC             | {'HGNC': '11998'}        |
+------------------------+------------------+--------------------------+
| Gene/Protein           | UniProt          | {'UP': 'P04637'}         |
+------------------------+------------------+--------------------------+
| Gene/Protein family    | Bioentities      | {'BE': 'ERK'}            |
+------------------------+------------------+--------------------------+
| Gene/Protein family    | InterPro         | {'IP': 'IPR000308'}      |
+------------------------+------------------+--------------------------+
| Gene/Protein family    | Pfam             | {'PF': 'PF00071'}        |
+------------------------+------------------+--------------------------+
| Gene/Protein family    | NextProt family  | {'NXPFAM': '03114'}      |
+------------------------+------------------+--------------------------+
| Chemical               | ChEBI            | {'CHEBI': 'CHEBI:63637'} |
+------------------------+------------------+--------------------------+
| Chemical               | PubChem          | {'PUBCHEM': '42611257'}  |
+------------------------+------------------+--------------------------+
| Metabolite             | HMDB             | {'HMDB': 'HMDB00122'}    |
+------------------------+------------------+--------------------------+
| Process, location, etc.| GO               | {'GO': 'GO:0006915'}     |
+------------------------+------------------+--------------------------+
| Process, disease, etc. | MeSH             | {'MESH': 'D008113'}      |
+------------------------+------------------+--------------------------+
| General terms          | NCIT             | {'NCIT': 'C28597'}       |
+------------------------+------------------+--------------------------+
| Raw text               | TEXT             | {'TEXT': 'Nf-kappaB'}    |
+------------------------+------------------+--------------------------+


The evidence for a given Statement, which could include relevant citations,
database identifiers, and passages of text from the scientific literature, is
contained in one or more :py:class:`Evidence` objects associated with the
Statement.
"""

from __future__ import absolute_import, print_function, unicode_literals
from builtins import dict, str
from future.utils import python_2_unicode_compatible
import os
import abc
import sys
import uuid
import rdflib
import logging
import textwrap
import networkx
from collections import namedtuple
from collections import OrderedDict as _o
from indra.util import unicode_strs
import indra.databases.hgnc_client as hgc
import indra.databases.uniprot_client as upc

logger = logging.getLogger('indra_statements')

# Python 2
try:
    basestring
# Python 3
except:
    basestring = str

[docs]class BoundCondition(object): """Identify Agents bound (or not bound) to a given Agent in a given context. Parameters ---------- agent : :py:class:`Agent` Instance of Agent. is_bound : bool Specifies whether the given Agent is bound or unbound in the current context. Default is True. Examples -------- EGFR bound to EGF: >>> egf = Agent('EGF') >>> egfr = Agent('EGFR', bound_conditions=[BoundCondition(egf)]) BRAF *not* bound to a 14-3-3 protein (YWHAB): >>> ywhab = Agent('YWHAB') >>> braf = Agent('BRAF', bound_conditions=[BoundCondition(ywhab, False)]) """ def __init__(self, agent, is_bound=True): self.agent = agent self.is_bound = is_bound def to_json(self): json_dict = _o({'agent': self.agent.to_json(), 'is_bound': self.is_bound}) return json_dict @classmethod def _from_json(cls, json_dict): agent_entry = json_dict.get('agent') if agent_entry is None: logger.error('BoundCondition missing agent.') return None agent = Agent._from_json(agent_entry) if agent is None: return None is_bound = json_dict.get('is_bound') if is_bound is None: logger.warning('BoundCondition missing is_bound, defaulting to True.') is_bound = True bc = BoundCondition(agent, is_bound) assert(unicode_strs(bc)) return bc
@python_2_unicode_compatible
[docs]class MutCondition(object): """Mutation state of an amino acid position of an Agent. Parameters ---------- position : str Residue position of the mutation in the protein sequence. residue_from : str Wild-type (unmodified) amino acid residue at the given position. residue_to : str Amino acid at the position resulting from the mutation. Examples -------- Represent EGFR with a L858R mutation: >>> egfr_mutant = Agent('EGFR', mutations=(MutCondition('858', 'L', 'R'))) """ def __init__(self, position, residue_from, residue_to=None): self.position = position self.residue_from = get_valid_residue(residue_from) self.residue_to = get_valid_residue(residue_to) def matches(self, other): return (self.matches_key() == other.matches_key()) def matches_key(self): key = (str(self.position), str(self.residue_from), str(self.residue_to)) return str(key) def equals(self, other): pos_match = (self.position == other.position) residue_from_match = (self.residue_from == other.residue_from) residue_to_match = (self.residue_to == other.residue_to) return (pos_match and residue_from_match and residue_to_match) def to_json(self): json_dict = _o({'position': self.position, 'residue_from': self.residue_from, 'residue_to': self.residue_to}) return json_dict @classmethod def _from_json(cls, json_dict): position = json_dict.get('position') residue_from = json_dict.get('residue_from') residue_to = json_dict.get('residue_to') mc = cls(position, residue_from, residue_to) assert(unicode_strs(mc)) return mc def __str__(self): s = '(%s, %s, %s)' % (self.residue_from, self.position, self.residue_to) return s def __repr__(self): return 'MutCondition' + str(self) def refinement_of(self, other): from_match = (self.residue_from == other.residue_from or \ (self.residue_from is not None and other.residue_from is None)) to_match = (self.residue_to == other.residue_to or \ (self.residue_to is not None and other.residue_to is None)) pos_match = (self.position == other.position or \ (self.position is not None and other.position is None)) return (from_match and to_match and pos_match)
@python_2_unicode_compatible
[docs]class ModCondition(object): """Post-translational modification state at an amino acid position. Parameters ---------- mod_type : str The type of post-translational modification, e.g., 'phosphorylation'. Valid modification types currently include: 'phosphorylation', 'ubiquitination', 'sumoylation', 'hydroxylation', and 'acetylation'. If an invalid modification type is passed an InvalidModTypeError is raised. residue : str or None String indicating the modified amino acid, e.g., 'Y' or 'tyrosine'. If None, indicates that the residue at the modification site is unknown or unspecified. position : str or None String indicating the position of the modified amino acid, e.g., '202'. If None, indicates that the position is unknown or unspecified. is_modified : bool Specifies whether the modification is present or absent. Setting the flag specifies that the Agent with the ModCondition is unmodified at the site. Examples -------- Doubly-phosphorylated MEK (MAP2K1): >>> phospho_mek = Agent('MAP2K1', mods=( ... ModCondition('phosphorylation', 'S', '202'), ... ModCondition('phosphorylation', 'S', '204'))) ERK (MAPK1) unphosphorylated at tyrosine 187: >>> unphos_erk = Agent('MAPK1', mods=( ... ModCondition('phosphorylation', 'Y', '187', is_modified=False))) """ def __init__(self, mod_type, residue=None, position=None, is_modified=True): if mod_type not in modtype_conditions: logger.warning('Unknown modification type: %s' % mod_type) self.mod_type = mod_type self.residue = get_valid_residue(residue) if isinstance(position, int): self.position = str(position) else: self.position = position self.is_modified = is_modified def refinement_of(self, other, mod_hierarchy): if self.is_modified != other.is_modified: return False type_match = (self.mod_type == other.mod_type or \ mod_hierarchy.isa('INDRA', self.mod_type, 'INDRA', other.mod_type)) residue_match = (self.residue == other.residue or \ (self.residue is not None and other.residue is None)) pos_match = (self.position == other.position or \ (self.position is not None and other.position is None)) return (type_match and residue_match and pos_match) def matches(self, other): return (self.matches_key() == other.matches_key()) def matches_key(self): key = (str(self.mod_type), str(self.residue), str(self.position), str(self.is_modified)) return str(key) def __str__(self): ms = '%s' % self.mod_type if self.residue is not None: ms += ', %s' % self.residue if self.position is not None: ms += ', %s' % self.position if not self.is_modified: ms += ', False' ms = '(' + ms + ')' return ms def __repr__(self): return str(self) def to_json(self): json_dict = _o({'mod_type': self.mod_type}) if self.residue is not None: json_dict['residue'] = self.residue if self.position is not None: json_dict['position'] = self.position json_dict['is_modified'] = self.is_modified return json_dict @classmethod def _from_json(cls, json_dict): mod_type = json_dict.get('mod_type') if not mod_type: logger.error('ModCondition missing mod_type.') return None if mod_type not in modtype_to_modclass.keys(): logger.warning('Unknown modification type: %s' % mod_type) residue = json_dict.get('residue') position = json_dict.get('position') is_modified = json_dict.get('is_modified') if is_modified is None: logger.warning('ModCondition missing is_modified, defaulting to True') is_modified = True mc = ModCondition(mod_type, residue, position, is_modified) assert(unicode_strs(mc)) return mc def equals(self, other): type_match = (self.mod_type == other.mod_type) residue_match = (self.residue == other.residue) pos_match = (self.position == other.position) is_mod_match = (self.is_modified == other.is_modified) return (type_match and residue_match and pos_match and is_mod_match) def __hash__(self): return hash(self.matches_key())
[docs]class ActivityCondition(object): """An active or inactive state of a protein. Examples -------- Kinase-active MAP2K1: >>> mek_active = Agent('MAP2K1', ... activity=ActivityCondition('kinase', True)) Transcriptionally inactive FOXO3: >>> foxo_inactive = Agent('FOXO3', ... activity=ActivityCondition('transcription', False)) Parameters ---------- activity_type : str The type of activity, e.g. 'kinase'. The basic, unspecified molecular activity is represented as 'activity'. Examples of other activity types are 'kinase', 'phosphatase', 'catalytic', 'transcription', etc. is_active : bool Specifies whether the given activity type is present or absent. """ def __init__(self, activity_type, is_active): if activity_type not in activity_types: logger.warning('Invalid activity type: %s' % activity_type) self.activity_type = activity_type self.is_active = is_active def refinement_of(self, other, activity_hierarchy): if self.is_active != other.is_active: return False if self.activity_type == other.activity_type: return True if activity_hierarchy.isa('INDRA', self.activity_type, 'INDRA', other.activity_type): return True def equals(self, other): type_match = (self.activity_type == other.activity_type) is_act_match = (self.is_active == other.is_active) return (type_match and is_act_match) def matches(self, other): return (self.matches_key() == other.matches_key()) def matches_key(self): key = (str(self.activity_type), str(self.is_active)) return str(key) def to_json(self): json_dict = _o({'activity_type': self.activity_type, 'is_active': self.is_active}) return json_dict @classmethod def _from_json(cls, json_dict): activity_type = json_dict.get('activity_type') is_active = json_dict.get('is_active') if not activity_type: logger.error('ActivityCondition missing activity_type, ' + 'defaulting to `activity`') activity_type = 'activity' if is_active is None: logger.warning('ActivityCondition missing is_active, ' + 'defaulting to True') is_active = True ac = ActivityCondition(activity_type, is_active) assert(unicode_strs(ac)) return ac def __str__(self): s = '%s' % self.activity_type if not self.is_active: s += ', False' s = '(' + s + ')' return s def __repr__(self): return str(self)
@python_2_unicode_compatible
[docs]class Agent(object): """A molecular entity, e.g., a protein. Parameters ---------- name : str The name of the agent, preferably a canonicalized name such as an HGNC gene name. mods : list of :py:class:`ModCondition` Modification state of the agent. bound_conditions : list of :py:class:`BoundCondition` Other agents bound to the agent in this context. mutations : list of :py:class:`MutCondition` Amino acid mutations of the agent. activity : :py:class:`ActivityCondition` Activity of the agent. location : str Cellular location of the agent. Must be a valid name (e.g. "nucleus") or identifier (e.g. "GO:0005634")for a GO cellular compartment. db_refs : dict Dictionary of database identifiers associated with this agent. """ def __init__(self, name, mods=None, activity=None, bound_conditions=None, mutations=None, location=None, db_refs=None): self.name = name if mods is None: self.mods = [] # Promote to list elif isinstance(mods, ModCondition): self.mods = [mods] else: self.mods = mods if bound_conditions is None: self.bound_conditions = [] # Promote to list elif isinstance(bound_conditions, BoundCondition): self.bound_conditions = [bound_conditions] else: self.bound_conditions = bound_conditions if mutations is None: self.mutations = [] elif isinstance(mutations, MutCondition): self.mutations = [mutations] else: self.mutations = mutations self.activity = activity self.location = get_valid_location(location) if db_refs is None: self.db_refs = {} else: self.db_refs = db_refs def matches(self, other): return self.matches_key() == other.matches_key() def matches_key(self): # NOTE: Making a set of the mod matches_keys might break if # you have an agent with two phosphorylations at serine # with unknown sites. act_key = (self.activity.matches_key() if self.activity else None) key = (self.entity_matches_key(), sorted([m.matches_key() for m in self.mods]), sorted([m.matches_key() for m in self.mutations]), act_key, self.location, len(self.bound_conditions), tuple((bc.agent.matches_key(), bc.is_bound) for bc in sorted(self.bound_conditions, key=lambda x: x.agent.name))) return str(key) def entity_matches(self, other): return self.entity_matches_key() == other.entity_matches_key() def entity_matches_key(self): db_refs_key = 'BE:%s;UP:%s;HGNC:%s' % (self.db_refs.get('BE'), self.db_refs.get('UP'), self.db_refs.get('HGNC')) return str((self.name, db_refs_key)) # Function to get the namespace to look in def get_grounding(self): be = self.db_refs.get('BE') if be: return ('BE', be) hgnc = self.db_refs.get('HGNC') if hgnc: if isinstance(hgnc, list): hgnc = hgnc[0] return ('HGNC', hgc.get_hgnc_name(str(hgnc))) up = self.db_refs.get('UP') if up: if isinstance(up, list): up = up[0] up_mnemonic = upc.get_mnemonic(up) if up_mnemonic and up_mnemonic.endswith('HUMAN'): gene_name = upc.get_gene_name(up, web_fallback=False) if gene_name: return ('HGNC', gene_name) else: return ('UP', up) return (None, None) def isa(self, other, hierarchies): # Get the namespaces for the comparison (self_ns, self_id) = self.get_grounding() (other_ns, other_id) = other.get_grounding() # If one of the agents isn't grounded to a relevant namespace, # there can't be an isa relationship if not all((self_ns, self_id, other_ns, other_id)): return False # Check for isa relationship return hierarchies['entity'].isa(self_ns, self_id, other_ns, other_id) def refinement_of(self, other, hierarchies): # Make sure the Agent types match if type(self) != type(other): return False # ENTITIES # Check that the basic entity of the agent either matches or is related # to the entity of the other agent. If not, no match. # If the entities, match, then we can continue if not (self.entity_matches(other) or self.isa(other, hierarchies)): return False # BOUND CONDITIONS # Now check the bound conditions. For self to be a refinement of # other in terms of the bound conditions, it has to include all of the # bound conditions in the other agent, and add additional context. # TODO: For now, we do not check the bound conditions of the bound # conditions. # FIXME: This matching procedure will get confused if the same # entity is included more than once in one of the sets--this will # be picked up as a match # Iterate over the bound conditions in the other agent, and make sure # they are all matched in self. for bc_other in other.bound_conditions: # Iterate over the bound conditions in self to find a match bc_found = False for bc_self in self.bound_conditions: if (bc_self.is_bound == bc_other.is_bound) and \ bc_self.agent.refinement_of(bc_other.agent, hierarchies): bc_found = True # If we didn't find a match for this bound condition in self, then # no refinement if not bc_found: return False # MODIFICATIONS # Similar to the above, we check that self has all of the modifications # of other. # Here we need to make sure that a mod in self.mods is only matched # once to a mod in other.mods. Otherwise ('phoshporylation') would be # considered a refinement of ('phosphorylation', 'phosphorylation') matched_indices = [] # This outer loop checks that each modification in the other Agent # is matched. for other_mod in other.mods: mod_found = False # We need to keep track of indices for this Agent's modifications # to make sure that each one is used at most once to match # the modification of one of the other Agent's modifications. for ix, self_mod in enumerate(self.mods): if self_mod.refinement_of(other_mod, hierarchies['modification']): # If this modification hasn't been used for matching yet if not ix in matched_indices: # Set the index as used matched_indices.append(ix) mod_found = True break # If we didn't find an exact match for this mod in other, then # no refinement if not mod_found: return False # MUTATIONS # Similar to the above, we check that self has all of the mutations # of other. matched_indices = [] # This outer loop checks that each mutation in the other Agent # is matched. for other_mut in other.mutations: mut_found = False # We need to keep track of indices for this Agent's mutations # to make sure that each one is used at most once to match # the mutation of one of the other Agent's mutations. for ix, self_mut in enumerate(self.mutations): if self_mut.refinement_of(other_mut): # If this mutation hasn't been used for matching yet if not ix in matched_indices: # Set the index as used matched_indices.append(ix) mut_found = True break # If we didn't find an exact match for this mut in other, then # no refinement if not mut_found: return False # LOCATION # If the other location is specified and this one is not then self # cannot be a refinement if self.location is None: if other.location is not None: return False # If both this location and the other one is specified, we check the # hierarchy. elif other.location is not None: # If the other location is part of this location then # self.location is not a refinement if not hierarchies['cellular_component'].partof( 'INDRA', self.location, 'INDRA', other.location): return False # ACTIVITY if self.activity is None: if other.activity is not None: return False elif other.activity is not None: if not self.activity.refinement_of(other.activity, hierarchies['activity']): return False # Everything checks out return True def equals(self, other): matches = (self.name == other.name) and\ (self.activity == other.activity) and \ (self.location == other.location) and \ (self.db_refs == other.db_refs) if len(self.mods) == len(other.mods): for s, o in zip(self.mods, other.mods): matches = matches and s.equals(o) else: return False if len(self.mutations) == len(other.mutations): for s, o in zip(self.mutations, other.mutations): matches = matches and s.equals(o) else: return False if len(self.bound_conditions) == len(other.bound_conditions): for s, o in zip(self.bound_conditions, other.bound_conditions): matches = matches and s.agent.equals(o.agent) and \ s.is_bound == o.is_bound else: return False return matches def to_json(self): json_dict = _o({'name': self.name}) if self.mods: json_dict['mods'] = [mc.to_json() for mc in self.mods] if self.mutations: json_dict['mutations'] = [mc.to_json() for mc in self.mutations] if self.bound_conditions: json_dict['bound_conditions'] = [bc.to_json() for bc in self.bound_conditions] if self.activity is not None: json_dict['activity'] = self.activity.to_json() if self.location is not None: json_dict['location'] = self.location json_dict['db_refs'] = self.db_refs return json_dict @classmethod def _from_json(cls, json_dict): name = json_dict.get('name') db_refs = json_dict.get('db_refs', {}) mods = json_dict.get('mods', []) mutations = json_dict.get('mutations', []) activity = json_dict.get('activity') bound_conditions = json_dict.get('bound_conditions', []) location = json_dict.get('location') if not name: logger.error('Agent missing name.') return None if not db_refs: db_refs = {} agent = Agent(name, db_refs=db_refs) agent.mods = [ModCondition._from_json(mod) for mod in mods] agent.mutations = [MutCondition._from_json(mut) for mut in mutations] agent.bound_conditions = [BoundCondition._from_json(bc) for bc in bound_conditions] agent.location = location if activity: agent.activity = ActivityCondition._from_json(activity) return agent def __str__(self): attr_strs = [] if self.mods: mod_str = 'mods: ' mod_str += ', '.join(['%s' % m for m in self.mods]) attr_strs.append(mod_str) if self.activity: attr_strs.append('%s: %s' % (self.activity.activity_type, self.activity.is_active)) if self.mutations: mut_str = 'muts: ' mut_str += ', '.join(['%s' % m for m in self.mutations]) attr_strs.append(mut_str) if self.bound_conditions: attr_strs += ['bound: [%s, %s]' % (b.agent.name, b.is_bound) for b in self.bound_conditions] if self.location: attr_strs += ['location: %s' % self.location] #if self.db_refs: # attr_strs.append('db_refs: %s' % self.db_refs) attr_str = ', '.join(attr_strs) agent_name = self.name return '%s(%s)' % (agent_name, attr_str) def __repr__(self): return str(self)
@python_2_unicode_compatible
[docs]class Evidence(object): """Container for evidence supporting a given statement. Parameters ---------- source_api : str or None String identifying the INDRA API used to capture the statement, e.g., 'trips', 'biopax', 'bel'. source_id : str or None For statements drawn from databases, ID of the database entity corresponding to the statement. pmid : str or None String indicating the Pubmed ID of the source of the statement. text : str Natural language text supporting the statement. annotations : dict Dictionary containing additional information on the context of the statement, e.g., species, cell line, tissue type, etc. The entries may vary depending on the source of the information. epistemics : dict A dictionary describing various forms of epistemic certainty associated with the statement. """ def __init__(self, source_api=None, source_id=None, pmid=None, text=None, annotations=None, epistemics=None): self.source_api = source_api self.source_id = source_id self.pmid = pmid self.text = text if annotations: self.annotations = annotations else: self.annotations = {} if epistemics: self.epistemics = epistemics else: self.epistemics = {} def matches_key(self): key = str((self.source_api, self.source_id, self.pmid, self.text, self.annotations, self.epistemics)) return key def equals(self, other): matches = (self.source_api == other.source_api) and\ (self.source_id == other.source_id) and\ (self.pmid == other.pmid) and\ (self.text == other.text) and\ (self.annotations == other.annotations) and\ (self.epistemics == other.epistemics) return matches def to_json(self): json_dict = _o({}) if self.source_api: json_dict['source_api'] = self.source_api if self.pmid: json_dict['pmid'] = self.pmid if self.source_id: json_dict['source_id'] = self.source_id if self.text: json_dict['text'] = self.text if self.annotations: json_dict['annotations'] = self.annotations if self.epistemics: json_dict['epistemics'] = self.epistemics return json_dict @classmethod def _from_json(cls, json_dict): source_api = json_dict.get('source_api') source_id = json_dict.get('source_id') pmid = json_dict.get('pmid') text = json_dict.get('text') annotations = json_dict.get('annotations', {}) epistemics = json_dict.get('epistemics', {}) ev = Evidence(source_api=source_api, source_id=source_id, pmid=pmid, text=text, annotations=annotations, epistemics=epistemics) return ev def __str__(self): ev_str = 'Evidence(%s, %s, %s, %s)' % \ (self.source_api, self.pmid, self.annotations, self.text) return ev_str def __repr__(self): if sys.version_info[0] >= 3: return str(self) else: return str(self).encode('utf-8') return str(self)
[docs]class Statement(object): """The parent class of all statements. Parameters ---------- evidence : list of :py:class:`Evidence` If a list of Evidence objects is passed to the constructor, the value is set to this list. If a bare Evidence object is passed, it is enclosed in a list. If no evidence is passed (the default), the value is set to an empty list. supports : list of :py:class:`Statement` Statements that this Statement supports. supported_by : list of :py:class:`Statement` Statements supported by this statement. """ def __init__(self, evidence=None, supports=None, supported_by=None): if evidence is None: self.evidence = [] elif isinstance(evidence, Evidence): self.evidence = [evidence] elif isinstance(evidence, list): self.evidence = evidence else: raise ValueError('evidence must be an Evidence object, a list ' '(of Evidence objects), or None.') # Initialize supports/supported_by fields, which should be lists self.supports = supports if supports else [] self.supported_by = supported_by if supported_by else [] self.belief = 1 self.uuid = '%s' % uuid.uuid4() def matches(self, other): return self.matches_key() == other.matches_key() def entities_match(self, other): self_key = self.entities_match_key() other_key = other.entities_match_key() if len(self_key) != len(other_key): return False for self_agent, other_agent in zip(self_key, other_key): if self_agent is None or other_agent is None: continue if self_agent != other_agent: return False return True def entities_match_key(self): key = tuple(a.entity_matches_key() if a is not None else None for a in self.agent_list()) return key def print_supports(self): print('%s supported_by:' % str(self)) if self.supported_by: print('-->') for s in self.supported_by: s.print_supports() def __repr__(self): if sys.version_info[0] >= 3: return str(self) else: return str(self).encode('utf-8') def equals(self, other): if len(self.agent_list()) == len(other.agent_list()): for s, o in zip(self.agent_list(), other.agent_list()): if (s is None and o is not None) or\ (s is not None and o is None): return False if s is not None and o is not None and not s.equals(o): return False else: return False if len(self.evidence) == len(other.evidence): for s, o in zip(self.evidence, other.evidence): if not s.equals(o): return False else: return False return True
[docs] def to_json(self): """Return serialized Statement as a json dict.""" stmt_type = type(self).__name__ ### For backwards compatibility, could be removed later all_stmts = [self] + self.supports + self.supported_by for st in all_stmts: try: uid = st.uuid except AttributeError: st.uuid = '%s' % uuid.uuid4() ################## json_dict = _o({'type': stmt_type}) if self.evidence: evidence = [ev.to_json() for ev in self.evidence] json_dict['evidence'] = evidence json_dict['id'] = '%s' % self.uuid if self.supports: json_dict['supports'] = \ ['%s' % st.uuid for st in self.supports] if self.supported_by: json_dict['supported_by'] = \ ['%s' % st.uuid for st in self.supported_by] def get_sbo_term(cls): sbo_term = stmt_sbo_map.get(cls.__name__.lower()) while not sbo_term: cls = cls.__bases__[0] sbo_term = stmt_sbo_map.get(cls.__name__.lower()) return sbo_term sbo_term = get_sbo_term(self.__class__) json_dict['sbo'] = \ 'http://identifiers.org/sbo/SBO:%s' % sbo_term return json_dict
@classmethod def _from_json(cls, json_dict): stmt_type = json_dict.get('type') stmt_cls = getattr(sys.modules[__name__], stmt_type) stmt = stmt_cls._from_json(json_dict) evidence = json_dict.get('evidence', []) stmt.evidence = [Evidence._from_json(ev) for ev in evidence] stmt.supports = json_dict.get('supports', []) stmt.supported_by = json_dict.get('supported_by', []) stmt.belief = json_dict.get('belief', 1.0) stmt_id = json_dict.get('id') if not stmt_id: stmt_id = '%s' % uuid.uuid4() stmt.uuid = stmt_id return stmt
[docs] def to_graph(self): """Return Statement as a networkx graph.""" def json_node(graph, element, prefix): if not element: return None node_id = '|'.join(prefix) if isinstance(element, list): graph.add_node(node_id, label='') # Enumerate children and add nodes and connect to anchor node for i, sub_element in enumerate(element): sub_id = json_node(graph, sub_element, prefix + ['%s' % i]) if sub_id: graph.add_edge(node_id, sub_id, label='') elif isinstance(element, dict): graph.add_node(node_id, label='') # Add node recursively for each element # Connect to this node with edge label according to key for k, v in element.items(): if k == 'id': continue elif k == 'name': graph.node[node_id]['label'] = v continue elif k == 'type': graph.node[node_id]['label'] = v continue sub_id = json_node(graph, v, prefix + ['%s' % k]) if sub_id: graph.add_edge(node_id, sub_id, label=('%s' % k)) else: if isinstance(element, basestring) and \ element.startswith('http'): element = element.split('/')[-1] graph.add_node(node_id, label=('%s' % element)) return node_id jd = self.to_json() graph = networkx.DiGraph() json_node(graph, jd, ['%s' % self.uuid]) return graph
@python_2_unicode_compatible
[docs]class Modification(Statement): """Generic statement representing the modification of a protein. Parameters ---------- enz : :py:class`indra.statement.Agent` The enzyme involved in the modification. sub : :py:class:`indra.statement.Agent` The substrate of the modification. residue : str or None The amino acid residue being modified, or None if it is unknown or unspecified. position : str or None The position of the modified amino acid, or None if it is unknown or unspecified. evidence : list of :py:class:`Evidence` Evidence objects in support of the modification. """ def __init__(self, enz, sub, residue=None, position=None, evidence=None): super(Modification, self).__init__(evidence) self.enz = enz self.sub = sub self.residue = get_valid_residue(residue) if isinstance(position, int): self.position = str(position) else: self.position = position def matches_key(self): if self.enz is None: enz_key = None else: enz_key = self.enz.matches_key() key = (type(self), enz_key, self.sub.matches_key(), str(self.residue), str(self.position)) return str(key) def agent_list(self): return [self.enz, self.sub] def set_agent_list(self, agent_list): if len(agent_list) != 2: raise ValueError("Modification has two agents in agent_list.") self.enz = agent_list[0] self.sub = agent_list[1] def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False # Check agent arguments if self.enz is None and other.enz is None: enz_refinement = True elif self.enz is None and other.enz is not None: enz_refinement = False elif self.enz is not None and other.enz is None: enz_refinement = True else: enz_refinement = self.enz.refinement_of(other.enz, hierarchies) sub_refinement = self.sub.refinement_of(other.sub, hierarchies) if not (enz_refinement and sub_refinement): return False # For this to be a refinement of the other, the modifications either # have to match or have this one be a subtype of the other; in # addition, the sites have to match, or this one has to have site # information and the other one not. residue_matches = (other.residue is None or\ (self.residue == other.residue)) position_matches = (other.position is None or\ (self.position == other.position)) return (residue_matches and position_matches) def equals(self, other): matches = super(Modification, self).equals(other) matches = matches and\ (self.residue == other.residue) and\ (self.position == other.position) return matches def _get_mod_condition(self): """Return a ModCondition corresponding to this Modification.""" mod_type = modclass_to_modtype[self.__class__] if isinstance(self, RemoveModification): mod_type = modtype_to_inverse[mod_type] mc = ModCondition(mod_type, self.residue, self.position, True) return mc def to_json(self): generic = super(Modification, self).to_json() json_dict = _o({'type': generic['type']}) if self.enz is not None: json_dict['enz'] = self.enz.to_json() json_dict['enz']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000460' # enzymatic catalyst if self.sub is not None: json_dict['sub'] = self.sub.to_json() json_dict['sub']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000015' # substrate if self.residue is not None: json_dict['residue'] = self.residue if self.position is not None: json_dict['position'] = self.position json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): enz = json_dict.get('enz') sub = json_dict.get('sub') residue = json_dict.get('residue') position = json_dict.get('position') evidence = json_dict.get('evidence', []) if enz: enz = Agent._from_json(enz) if sub: sub = Agent._from_json(sub) stmt = cls(enz, sub, residue, position) return stmt def __str__(self): res_str = (', %s' % self.residue) if self.residue is not None else '' pos_str = (', %s' % self.position) if self.position is not None else '' s = ("%s(%s, %s%s%s)" % (type(self).__name__, self.enz, self.sub, res_str, pos_str)) return s
class AddModification(Modification): pass class RemoveModification(Modification): pass @python_2_unicode_compatible
[docs]class SelfModification(Statement): """Generic statement representing the self-modification of a protein. Parameters ---------- enz : :py:class`indra.statement.Agent` The enzyme involved in the modification, which is also the substrate. residue : str or None The amino acid residue being modified, or None if it is unknown or unspecified. position : str or None The position of the modified amino acid, or None if it is unknown or unspecified. evidence : list of :py:class:`Evidence` Evidence objects in support of the modification. """ def __init__(self, enz, residue=None, position=None, evidence=None): super(SelfModification, self).__init__(evidence) self.enz = enz self.residue = get_valid_residue(residue) if isinstance(position, int): self.position = str(position) else: self.position = position def __str__(self): res_str = (', %s' % self.residue) if self.residue is not None else '' pos_str = (', %s' % self.position) if self.position is not None else '' s = ("%s(%s%s%s)" % (type(self).__name__, self.enz, res_str, pos_str)) return s def matches_key(self): key = (type(self), self.enz.matches_key(), str(self.residue), str(self.position)) return str(key) def agent_list(self): return [self.enz] def set_agent_list(self, agent_list): if len(agent_list) != 1: raise ValueError("SelfModification has one agent.") self.enz = agent_list[0] def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False # Check agent arguments if not self.enz.refinement_of(other.enz, hierarchies): return False # For this to be a refinement of the other, the modifications either # have to match or have this one be a subtype of the other; in # addition, the sites have to match, or this one has to have site # information and the other one not. residue_matches = (other.residue is None or\ (self.residue == other.residue)) position_matches = (other.position is None or\ (self.position == other.position)) return (residue_matches and position_matches) def equals(self, other): matches = super(SelfModification, self).equals(other) matches = matches and\ (self.residue == other.residue) and\ (self.position == other.position) return matches def to_json(self): generic = super(SelfModification, self).to_json() json_dict = _o({'type': generic['type']}) if self.enz is not None: json_dict['enz'] = self.enz.to_json() json_dict['enz']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000460' # enzymatic catalyst if self.residue is not None: json_dict['residue'] = self.residue if self.position is not None: json_dict['position'] = self.position json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): enz = json_dict.get('enz') residue = json_dict.get('residue') position = json_dict.get('position') if enz: enz = Agent._from_json(enz) stmt = cls(enz, residue, position) return stmt
[docs]class Phosphorylation(AddModification): """Phosphorylation modification. Examples -------- MEK (MAP2K1) phosphorylates ERK (MAPK1) at threonine 185: >>> mek = Agent('MAP2K1') >>> erk = Agent('MAPK1') >>> phos = Phosphorylation(mek, erk, 'T', '185') """ pass
[docs]class Autophosphorylation(SelfModification): """Intramolecular autophosphorylation, i.e., in *cis*. Examples -------- p38 bound to TAB1 cis-autophosphorylates itself (see :pmid:`19155529`). >>> tab1 = Agent('TAB1') >>> p38_tab1 = Agent('P38', bound_conditions=[BoundCondition(tab1)]) >>> autophos = Autophosphorylation(p38_tab1) """ pass
[docs]class Transphosphorylation(SelfModification): """Autophosphorylation in *trans.* Transphosphorylation assumes that a kinase is already bound to a substrate (usually of the same molecular species), and phosphorylates it in an intra-molecular fashion. The enz property of the statement must have exactly one bound_conditions entry, and we assume that enz phosphorylates this molecule. The bound_neg property is ignored here. """ pass
[docs]class Dephosphorylation(RemoveModification): """Dephosphorylation modification. Examples -------- DUSP6 dephosphorylates ERK (MAPK1) at T185: >>> dusp6 = Agent('DUSP6') >>> erk = Agent('MAPK1') >>> dephos = Dephosphorylation(dusp6, erk, 'T', '185') """ pass
[docs]class Hydroxylation(AddModification): """Hydroxylation modification.""" pass
[docs]class Dehydroxylation(RemoveModification): """Dehydroxylation modification.""" pass
[docs]class Sumoylation(AddModification): """Sumoylation modification.""" pass
[docs]class Desumoylation(RemoveModification): """Desumoylation modification.""" pass
[docs]class Acetylation(AddModification): """Acetylation modification.""" pass
[docs]class Deacetylation(RemoveModification): """Deacetylation modification.""" pass
[docs]class Glycosylation(AddModification): """Glycosylation modification.""" pass
[docs]class Deglycosylation(RemoveModification): """Deglycosylation modification.""" pass
[docs]class Ribosylation(AddModification): """Ribosylation modification.""" pass
[docs]class Deribosylation(RemoveModification): """Deribosylation modification.""" pass
[docs]class Ubiquitination(AddModification): """Ubiquitination modification.""" pass
[docs]class Deubiquitination(RemoveModification): """Deubiquitination modification.""" pass
[docs]class Farnesylation(AddModification): """Farnesylation modification.""" pass
[docs]class Defarnesylation(RemoveModification): """Defarnesylation modification.""" pass
[docs]class Geranylgeranylation(AddModification): """Geranylgeranylation modification.""" pass
[docs]class Degeranylgeranylation(RemoveModification): """Degeranylgeranylation modification.""" pass
[docs]class Palmitoylation(AddModification): """Palmitoylation modification.""" pass
[docs]class Depalmitoylation(RemoveModification): """Depalmitoylation modification.""" pass
[docs]class Myristoylation(AddModification): """Myristoylation modification.""" pass
[docs]class Demyristoylation(RemoveModification): """Demyristoylation modification.""" pass
[docs]class Methylation(AddModification): """Methylation modification.""" pass
[docs]class Demethylation(RemoveModification): """Demethylation modification.""" pass
@python_2_unicode_compatible
[docs]class RegulateActivity(Statement): """Regulation of activity. This class implements shared functionality of Activation and Inhibition statements and it should not be instantiated directly. """ # The constructor here is an abstractmethod so that this class cannot # be directly instantiated. __metaclass__ = abc.ABCMeta @abc.abstractmethod def __init__(self): pass def __setstate__(self, state): if 'subj_activity' in state: logger.warning('Pickle file is out of date!') state.pop('subj_activity', None) self.__dict__.update(state) def matches_key(self): key = (type(self), self.subj.matches_key(), self.obj.matches_key(), str(self.obj_activity), str(self.is_activation)) return str(key) def agent_list(self): return [self.subj, self.obj] def set_agent_list(self, agent_list): if len(agent_list) != 2: raise ValueError("%s has two agents." % self.__class__.__name__) self.subj = agent_list[0] self.obj = agent_list[1] def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False if self.is_activation != other.is_activation: return False if self.subj.refinement_of(other.subj, hierarchies) and \ self.obj.refinement_of(other.obj, hierarchies): obj_act_match = (self.obj_activity == other.obj_activity) or \ hierarchies['activity'].isa('INDRA', self.obj_activity, 'INDRA', other.obj_activity) if obj_act_match: return True else: return False else: return False def to_json(self): generic = super(RegulateActivity, self).to_json() json_dict = _o({'type': generic['type']}) if self.subj is not None: json_dict['subj'] = self.subj.to_json() if self.is_activation: json_dict['subj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000459' # stimulator else: json_dict['subj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000020' # inhibitor if self.obj is not None: json_dict['obj'] = self.obj.to_json() if self.is_activation: json_dict['obj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000643' # stimulated else: json_dict['obj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000642' # inhibited if self.obj_activity is not None: json_dict['obj_activity'] = self.obj_activity json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): subj = json_dict.get('subj') obj = json_dict.get('obj') obj_activity = json_dict.get('obj_activity') if subj: subj = Agent._from_json(subj) if obj: obj = Agent._from_json(obj) stmt = cls(subj, obj, obj_activity) return stmt def __str__(self): obj_act_str = ', %s' % self.obj_activity if \ self.obj_activity != 'activity' else '' s = ("%s(%s, %s%s)" % (type(self).__name__, self.subj, self.obj, obj_act_str)) return s def __repr__(self): return self.__str__() def equals(self, other): matches = super(RegulateActivity, self).equals(other) matches = matches and\ (self.obj_activity == other.obj_activity) and\ (self.is_activation == other.is_activation) return matches
[docs]class Inhibition(RegulateActivity): """Indicates that a protein inhibits or deactivates another protein. This statement is intended to be used for physical interactions where the mechanism of inhibition is not explicitly specified, which is often the case for descriptions of mechanisms extracted from the literature. Parameters ---------- subj : :py:class:`Agent` The agent responsible for the change in activity, i.e., the "upstream" node. obj : :py:class:`Agent` The agent whose activity is influenced by the subject, i.e., the "downstream" node. obj_activity : Optional[str] The activity of the obj Agent that is affected, e.g., its "kinase" activity. evidence : list of :py:class:`Evidence` Evidence objects in support of the modification. """ def __init__(self, subj, obj, obj_activity='activity', evidence=None): super(RegulateActivity, self).__init__(evidence) self.subj = subj self.obj = obj if obj_activity not in activity_types: logger.warning('Invalid activity type: %s' % obj_activity) self.obj_activity = obj_activity self.is_activation = False
[docs]class Activation(RegulateActivity): """Indicates that a protein activates another protein. This statement is intended to be used for physical interactions where the mechanism of activation is not explicitly specified, which is often the case for descriptions of mechanisms extracted from the literature. Parameters ---------- subj : :py:class:`Agent` The agent responsible for the change in activity, i.e., the "upstream" node. obj : :py:class:`Agent` The agent whose activity is influenced by the subject, i.e., the "downstream" node. obj_activity : Optional[str] The activity of the obj Agent that is affected, e.g., its "kinase" activity. evidence : list of :py:class:`Evidence` Evidence objects in support of the modification. Examples -------- MEK (MAP2K1) activates the kinase activity of ERK (MAPK1): >>> mek = Agent('MAP2K1') >>> erk = Agent('MAPK1') >>> act = Activation(mek, erk, 'kinase') """ def __init__(self, subj, obj, obj_activity='activity', evidence=None): super(RegulateActivity, self).__init__(evidence) self.subj = subj self.obj = obj if obj_activity not in activity_types: logger.warning('Invalid activity type: %s' % obj_activity) self.obj_activity = obj_activity self.is_activation = True
class GtpActivation(Activation): pass @python_2_unicode_compatible
[docs]class ActiveForm(Statement): """Specifies conditions causing an Agent to be active or inactive. Types of conditions influencing a specific type of biochemical activity can include modifications, bound Agents, and mutations. Parameters ---------- agent : :py:class:`Agent` The Agent in a particular active or inactive state. The sets of ModConditions, BoundConditions, and MutConditions on the given Agent instance indicate the relevant conditions. activity : str The type of activity influenced by the given set of conditions, e.g., "kinase". is_active : bool Whether the conditions are activating (True) or inactivating (False). """ def __init__(self, agent, activity, is_active, evidence=None): super(ActiveForm, self).__init__(evidence) self.agent = agent if agent.activity is not None: logger.warning('Agent in ActiveForm should not have ' + 'ActivityConditions.') agent.activity = None if activity not in activity_types: logger.warning('Invalid activity type: %s' % activity) self.activity = activity self.is_active = is_active def matches_key(self): key = (type(self), self.agent.matches_key(), str(self.activity), str(self.is_active)) return str(key) def agent_list(self): return [self.agent] def set_agent_list(self, agent_list): if len(agent_list) != 1: raise ValueError("ActiveForm has one agent.") self.agent = agent_list[0] def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False # Check agent arguments if not self.agent.refinement_of(other.agent, hierarchies): return False # Make sure that the relationships and activities match if (self.is_active == other.is_active) and \ (self.activity == other.activity or \ hierarchies['activity'].isa('INDRA', self.activity, 'INDRA', other.activity)): return True else: return False def to_json(self): generic = super(ActiveForm, self).to_json() json_dict = _o({'type': generic['type']}) json_dict.update({'agent': self.agent.to_json(), 'activity': self.activity, 'is_active': self.is_active}) json_dict['agent']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000644' # modified json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): agent = json_dict.get('agent') if agent: agent = Agent._from_json(agent) else: logger.error('ActiveForm statement missing agent') return None activity = json_dict.get('activity') is_active = json_dict.get('is_active') if activity is None: logger.warning('ActiveForm activity missing, defaulting ' + 'to `activity`') activity = 'activity' if is_active is None: logger.warning('ActiveForm is_active missing, defaulting ' + 'to True') is_active = True stmt = cls(agent, activity, is_active) return stmt def __str__(self): s = ("ActiveForm(%s, %s, %s)" % (self.agent, self.activity, self.is_active)) return s def equals(self, other): matches = super(ActiveForm, self).equals(other) matches = matches and\ (self.activity == other.activity) and\ (self.is_active == other.is_active) return matches
@python_2_unicode_compatible
[docs]class HasActivity(Statement): """States that an Agent has or doesn't have a given activity type. With this Statement, one cane express that a given protein is a kinase, or, for instance, that it is a transcription factor. It is also possible to construct negative statements with which one epxresses, for instance, that a given protein is not a kinase. Parameters ---------- agent : :py:class:`Agent` The Agent that that statement is about. Note that the detailed state of the Agent is not relevant for this type of statement. activity : str The type of activity, e.g., "kinase". has_activity : bool Whether the given Agent has the given activity (True) or not (False). """ def __init__(self, agent, activity, has_activity, evidence=None): super(HasActivity, self).__init__(evidence) if agent.activity is not None: logger.warning('Agent in HasActivity should not have ' + 'ActivityConditions.') agent.activity = None self.agent = agent if activity not in activity_types: logger.warning('Invalid activity type: %s' % activity) self.activity = activity self.has_activity = has_activity def matches_key(self): key = (type(self), self.agent.matches_key(), str(self.activity), str(self.has_activity)) return str(key) def agent_list(self): return [self.agent] def set_agent_list(self, agent_list): if len(agent_list) != 1: raise ValueError("HasActivity has one agent.") self.agent = agent_list[0] def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False # Check agent arguments if not self.agent.refinement_of(other.agent, hierarchies): return False # Make sure that the relationships and activities match if (self.has_activity == other.has_activity) and \ (self.activity == other.activity or \ hierarchies['activity'].isa(self.activity, other.activity)): return True else: return False def __str__(self): s = ("HasActivity(%s, %s, %s)" % (self.agent, self.activity, self.has_activity)) return s def equals(self, other): matches = super(HasActivity, self).equals(other) matches = matches and\ (self.activity == other.activity) and\ (self.has_activity == other.has_activity) return matches
@python_2_unicode_compatible
[docs]class Gef(Statement): """Exchange of GTP for GDP on a small GTPase protein mediated by a GEF. Represents the generic process by which a guanosine exchange factor (GEF) catalyzes nucleotide exchange on a GTPase protein. Parameters ---------- gef : :py:class:`Agent` The guanosine exchange factor. ras : :py:class:`Agent` The GTPase protein. Examples -------- SOS1 catalyzes nucleotide exchange on KRAS: >>> sos = Agent('SOS1') >>> kras = Agent('KRAS') >>> gef = Gef(sos, kras) """ def __init__(self, gef, ras, evidence=None): super(Gef, self).__init__(evidence) self.gef = gef self.ras = ras def matches_key(self): key = (type(self), self.gef.matches_key(), self.ras.matches_key()) return str(key) def agent_list(self): return [self.gef, self.ras] def set_agent_list(self, agent_list): if len(agent_list) != 2: raise ValueError("Gef has two agents.") self.gef = agent_list[0] self.ras = agent_list[1] def __str__(self): s = ("Gef(%s, %s)" % (self.gef.name, self.ras.name)) return s def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False # Check the GEF if self.gef.refinement_of(other.gef, hierarchies) and \ self.ras.refinement_of(other.ras, hierarchies): return True else: return False def equals(self, other): matches = super(Gef, self).equals(other) return matches def to_json(self): generic = super(Gef, self).to_json() json_dict = _o({'type': generic['type']}) if self.gef is not None: json_dict['gef'] = self.gef.to_json() json_dict['gef']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000013' # catalyst if self.ras is not None: json_dict['ras'] = self.ras.to_json() json_dict['ras']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000015' # substrate json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): gef = json_dict.get('gef') ras = json_dict.get('ras') evidence = json_dict.get('evidence') if gef: gef = Agent._from_json(gef) if ras: ras = Agent._from_json(ras) stmt = cls(gef, ras) return stmt
@python_2_unicode_compatible
[docs]class Gap(Statement): """Acceleration of a GTPase protein's GTP hydrolysis rate by a GAP. Represents the generic process by which a GTPase activating protein (GAP) catalyzes GTP hydrolysis by a particular small GTPase protein. Parameters ---------- gap : :py:class:`Agent` The GTPase activating protein. ras : :py:class:`Agent` The GTPase protein. Examples -------- RASA1 catalyzes GTP hydrolysis on KRAS: >>> rasa1 = Agent('RASA1') >>> kras = Agent('KRAS') >>> gap = Gap(rasa1, kras) """ def __init__(self, gap, ras, evidence=None): super(Gap, self).__init__(evidence) self.gap = gap self.ras = ras def matches_key(self): key = (type(self), self.gap.matches_key(), self.ras.matches_key()) return str(key) def agent_list(self): return [self.gap, self.ras] def set_agent_list(self, agent_list): if len(agent_list) != 2: raise ValueError("Gap has two agents.") self.gap = agent_list[0] self.ras = agent_list[1] def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False # Check the GAP if self.gap.refinement_of(other.gap, hierarchies) and \ self.ras.refinement_of(other.ras, hierarchies): return True else: return False def __str__(self): s = ("Gap(%s, %s)" % (self.gap.name, self.ras.name)) return s def equals(self, other): matches = super(Gap, self).equals(other) return matches def to_json(self): generic = super(Gap, self).to_json() json_dict = _o({'type': generic['type']}) if self.gap is not None: json_dict['gap'] = self.gap.to_json() json_dict['gap']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000013' # catalyst if self.ras is not None: json_dict['ras'] = self.ras.to_json() json_dict['ras']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000015' # substrate json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): gap = json_dict.get('gap') ras = json_dict.get('ras') evidence = json_dict.get('evidence') if gap: gap = Agent._from_json(gap) if ras: ras = Agent._from_json(ras) stmt = cls(gap, ras) return stmt
@python_2_unicode_compatible
[docs]class Complex(Statement): """A set of proteins observed to be in a complex. Parameters ---------- members : list of :py:class:`Agent` The set of proteins in the complex. Examples -------- BRAF is observed to be in a complex with RAF1: >>> braf = Agent('BRAF') >>> raf1 = Agent('RAF1') >>> cplx = Complex([braf, raf1]) """ def __init__(self, members, evidence=None): super(Complex, self).__init__(evidence) self.members = members def matches_key(self): key = (type(self), tuple(m.matches_key() for m in sorted(self.members, key=lambda x: x.matches_key()))) return str(key) def entities_match_key(self): key = tuple(a.entity_matches_key() if a is not None else None for a in sorted(self.members, key=lambda x: x.matches_key())) return key def agent_list(self): return self.members def set_agent_list(self, agent_list): self.members = agent_list def __str__(self): s = "Complex(%s)" % (', '.join([('%s' % m) for m in self.members])) return s def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False # Make sure the length of the members list is the same. Note that this # treats Complex([A, B, C]) as distinct from Complex([A, B]), rather # than as a refinement. if len(self.members) != len(other.members): return False # Check that every member in other is refined in self, but only once! self_match_indices = set([]) for other_agent in other.members: for self_agent_ix, self_agent in enumerate(self.members): if self_agent_ix in self_match_indices: continue if self_agent.refinement_of(other_agent, hierarchies): self_match_indices.add(self_agent_ix) break if len(self_match_indices) != len(other.members): return False else: return True def equals(self, other): matches = super(Complex, self).equals(other) return matches def to_json(self): generic = super(Complex, self).to_json() json_dict = _o({'type': generic['type']}) members = [m.to_json() for m in self.members] json_dict['members'] = members json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): members = json_dict.get('members') evidence = json_dict.get('evidence', []) members = [Agent._from_json(m) for m in members] stmt = cls(members) return stmt
@python_2_unicode_compatible
[docs]class Translocation(Statement): """The translocation of a molecular agent from one location to another. Parameters ---------- agent : :py:class:`Agent` The agent which translocates. from_location : Optional[str] The location from which the agent translocates. This must be a valid GO cellular component name (e.g. "cytoplasm") or ID (e.g. "GO:0005737"). to_location : Optional[str] The location to which the agent translocates. This must be a valid GO cellular component name or ID. """ def __init__(self, agent, from_location=None, to_location=None, evidence=None): super(Translocation, self).__init__(evidence) self.agent = agent self.from_location = get_valid_location(from_location) self.to_location = get_valid_location(to_location) def agent_list(self): return [self.agent] def set_agent_list(self, agent_list): if(len(agent_list) != 1): raise ValueError("Translocation has 1 agent") self.agent = agent_list[0] def __str__(self): s = ("Translocation(%s, %s, %s)" % (self.agent, self.from_location, self.to_location)) return s def refinement_of(self, other, hierarchies=None): # Make sure the statement types match if type(self) != type(other): return False # Check several conditions for refinement ch = hierarchies['cellular_component'] ref1 = self.agent.refinement_of(other.agent, hierarchies) ref2 = (other.from_location is None or self.from_location == other.from_location or ch.partof('INDRA', self.from_location, 'INDRA', other.from_location)) ref3 = (other.to_location is None or self.to_location == other.to_location or ch.partof('INDRA', self.to_location, 'INDRA', other.to_location)) return (ref1 and ref2 and ref3) def equals(self, other): matches = super(Translocation, self).equals(other) matches = matches and (self.from_location == other.from_location) matches = matches and (self.to_location == other.to_location) return matches def matches_key(self): key = (type(self), self.agent.matches_key(), str(self.from_location), str(self.to_location)) return str(key) def to_json(self): generic = super(Translocation, self).to_json() json_dict = _o({'type': generic['type']}) json_dict['agent'] = self.agent.to_json() if self.from_location is not None: json_dict['from_location'] = self.from_location if self.to_location is not None: json_dict['to_location'] = self.to_location json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): agent = json_dict.get('agent') if agent: agent = Agent._from_json(agent) else: logger.error('Translocation statement missing agent') return None from_location = json_dict.get('from_location') to_location = json_dict.get('to_location') stmt = cls(agent, from_location, to_location) return stmt
@python_2_unicode_compatible
[docs]class RegulateAmount(Statement): """Superclass handling operations on directed, two-element interactions.""" def __init__(self, subj, obj, evidence=None): super(RegulateAmount, self).__init__(evidence) self.subj = subj if obj is None: raise ValueError('Object of %s cannot be None.' % type(self).__name__) self.obj = obj def matches_key(self): if self.subj is None: subj_key = None else: subj_key = self.subj.matches_key() key = (type(self), subj_key, self.obj.matches_key()) return str(key) def agent_list(self): return [self.subj, self.obj] def set_agent_list(self, agent_list): if len(agent_list) != 2: raise ValueError("%s has two agents in agent_list." % type(self).__name__) self.subj = agent_list[0] self.obj = agent_list[1] def to_json(self): generic = super(RegulateAmount, self).to_json() json_dict = _o({'type': generic['type']}) if self.subj is not None: json_dict['subj'] = self.subj.to_json() if isinstance(self, IncreaseAmount): json_dict['subj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000459' # stimulator else: json_dict['subj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000020' # inhibitor if self.obj is not None: json_dict['obj'] = self.obj.to_json() if isinstance(self, IncreaseAmount): json_dict['obj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000011' # product else: json_dict['obj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000010' # reactant json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): subj = json_dict.get('subj') obj = json_dict.get('obj') evidence = json_dict.get('evidence') if subj: subj = Agent._from_json(subj) if obj: obj = Agent._from_json(obj) stmt = cls(subj, obj) return stmt def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False # Check agent arguments if self.subj is None and other.subj is None: subj_refinement = True elif self.subj is None and other.subj is not None: subj_refinement = False elif self.subj is not None and other.subj is None: subj_refinement = True else: subj_refinement = self.subj.refinement_of(other.subj, hierarchies) obj_refinement = self.obj.refinement_of(other.obj, hierarchies) return (subj_refinement and obj_refinement) def equals(self, other): matches = super(RegulateAmount, self).equals(other) return matches def __str__(self): s = ("%s(%s, %s)" % (type(self).__name__, self.subj, self.obj)) return s
[docs]class DecreaseAmount(RegulateAmount): """Degradation of a protein, possibly mediated by another protein. Note that this statement can also be used to represent inhibitors of synthesis (e.g., cycloheximide). Parameters ---------- subj : :py:class`indra.statement.Agent` The protein mediating the degradation. obj : :py:class:`indra.statement.Agent` The protein that is degraded. evidence : list of :py:class:`Evidence` Evidence objects in support of the degradation statement. """ pass
[docs]class IncreaseAmount(RegulateAmount): """Synthesis of a protein, possibly mediated by another protein. Parameters ---------- subj : :py:class`indra.statement.Agent` The protein mediating the synthesis. obj : :py:class:`indra.statement.Agent` The protein that is synthesized. evidence : list of :py:class:`Evidence` Evidence objects in support of the synthesis statement. """ pass
[docs]class Conversion(Statement): """Conversion of molecular species mediated by a controller protein. Parameters ---------- subj : :py:class`indra.statement.Agent` The protein mediating the conversion. obj_from : list of :py:class:`indra.statement.Agent` The list of molecular species being consumed by the conversion. obj_to : list of :py:class:`indra.statement.Agent` The list of molecular species being created by the conversion. evidence : list of :py:class:`Evidence` Evidence objects in support of the synthesis statement. """ def __init__(self, subj, obj_from=None, obj_to=None, evidence=None): super(Conversion, self).__init__(evidence) self.subj = subj self.obj_from = obj_from if obj_from is not None else [] if isinstance(obj_from, Agent): self.obj_from = [obj_from] self.obj_to = obj_to if obj_to is not None else [] if isinstance(obj_to, Agent): self.obj_to = [obj_to] def matches_key(self): keys = [type(self)] keys += [self.subj.matches_key() if self.subj else None] keys += [agent.matches_key() for agent in self.obj_to] keys += [agent.matches_key() for agent in self.obj_from] return str(keys) def agent_list(self): return [self.subj] + self.obj_from + self.obj_to def set_agent_list(self, agent_list): num_obj_from = len(self.obj_from) num_obj_to = len(self.obj_to) if len(agent_list) != 1 + num_obj_from + num_obj_to: raise Exception('Conversion agent number must be preserved ' 'when setting agent list.') self.subj = agent_list[0] self.obj_from = agent_list[1:num_obj_from+1] self.obj_to = agent_list[num_obj_from+1:] def to_json(self): generic = super(Conversion, self).to_json() json_dict = _o({'type': generic['type']}) if self.subj is not None: json_dict['subj'] = self.subj.to_json() json_dict['subj']['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000013' # catalyst json_dict['obj_from'] = [o.to_json() for o in self.obj_from] for of in json_dict['obj_from']: of['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000010' # reactant json_dict['obj_to'] = [o.to_json() for o in self.obj_to] for ot in json_dict['obj_to']: ot['sbo'] = \ 'http://identifiers.org/sbo/SBO:0000011' # product json_dict.update(generic) return json_dict @classmethod def _from_json(cls, json_dict): subj = json_dict.get('subj') obj_from = json_dict.get('obj_from') obj_to = json_dict.get('obj_to') evidence = json_dict.get('evidence') if subj: subj = Agent._from_json(subj) if obj_from: obj_from = [Agent._from_json(o) for o in obj_from] if obj_to: obj_to = [Agent._from_json(o) for o in obj_to] stmt = cls(subj, obj_from, obj_to) return stmt def refinement_of(self, other, hierarchies): # Make sure the statement types match if type(self) != type(other): return False if self.subj is None and other.subj is None: subj_refinement = True elif self.subj is None and other.subj is not None: subj_refinement = False elif self.subj is not None and other.subj is None: subj_refinement = True else: subj_refinement = self.subj.refinement_of(other.subj, hierarchies) def refinement_agents(lst1, lst2): if len(lst1) != len(lst2): return False # Check that every agent in other is refined in self, but only once! self_match_indices = set([]) for other_agent in lst2: for self_agent_ix, self_agent in enumerate(lst1): if self_agent_ix in self_match_indices: continue if self_agent.refinement_of(other_agent, hierarchies): self_match_indices.add(self_agent_ix) break if len(self_match_indices) != len(lst2): return False return True obj_from_refinement = refinement_agents(self.obj_from, other.obj_from) obj_to_refinement = refinement_agents(self.obj_to, other.obj_to) return (subj_refinement and obj_from_refinement and obj_to_refinement) def equals(self, other): matches = super(Conversion, self).equals(other) return matches def __str__(self): s = ("%s(%s, %s, %s)" % (type(self).__name__, self.subj, self.obj_from, self.obj_to)) return s
def stmts_from_json(json_in): if not isinstance(json_in, list): st = Statement._from_json(json_in) return st else: stmts = [] uuid_dict = {} for json_stmt in json_in: st = Statement._from_json(json_stmt) stmts.append(st) uuid_dict[st.uuid] = st for st in stmts: for i, uid in enumerate(st.supports): try: st.supports[i] = uuid_dict[uid] except KeyError: pass for i, uid in enumerate(st.supported_by): try: st.supported_by[i] = uuid_dict[uid] except KeyError: pass return stmts def stmts_to_json(stmts_in): if not isinstance(stmts_in, list): json_dict = stmts_in.to_json() return json_dict else: json_dict = [st.to_json() for st in stmts_in] return json_dict
[docs]def get_valid_residue(residue): """Check if the given string represents a valid amino acid residue.""" if residue is not None and amino_acids.get(residue) is None: res = amino_acids_reverse.get(residue.lower()) if res is None: raise InvalidResidueError(residue) else: return res return residue
[docs]def get_valid_location(location): """Check if the given location represents a valid cellular component.""" # If we're given None, return None if location is not None and cellular_components.get(location) is None: loc = cellular_components_reverse.get(location) if loc is None: raise InvalidLocationError(location) else: return loc return location
def _read_activity_types(): """Read types of valid activities from a resource file.""" this_dir = os.path.dirname(os.path.abspath(__file__)) ac_file = this_dir + '/resources/activity_hierarchy.rdf' g = rdflib.Graph() with open(ac_file, 'r'): g.parse(ac_file, format='nt') act_types = set() for s, p, o in g: subj = s.rpartition('/')[-1] obj = o.rpartition('/')[-1] act_types.add(subj) act_types.add(obj) return sorted(list(act_types)) activity_types = _read_activity_types() def _read_cellular_components(): """Read cellular components from a resource file.""" this_dir = os.path.dirname(os.path.abspath(__file__)) cc_file = this_dir + '/resources/cellular_components.tsv' cellular_components = {} cellular_components_reverse = {} with open(cc_file, 'rt') as fh: lines = fh.readlines() for lin in lines[1:]: terms = lin.strip().split('\t') cellular_components[terms[1]] = terms[0] cellular_components_reverse[terms[0]] = terms[1] return cellular_components, cellular_components_reverse cellular_components, cellular_components_reverse = _read_cellular_components() def _read_amino_acids(): """Read the amino acid information from a resource file.""" this_dir = os.path.dirname(os.path.abspath(__file__)) aa_file = this_dir + '/resources/amino_acids.tsv' amino_acids = {} amino_acids_reverse = {} with open(aa_file, 'rt') as fh: lines = fh.readlines() for lin in lines[1:]: terms = lin.strip().split('\t') key = terms[2] val = {'full_name': terms[0], 'short_name': terms[1], 'indra_name': terms[3]} amino_acids[key] = val for v in val.values(): amino_acids_reverse[v] = key return amino_acids, amino_acids_reverse amino_acids, amino_acids_reverse = _read_amino_acids() # Mapping between modification type strings and subclasses of Modification modtype_to_modclass = {str(cls.__name__.lower()): cls for cls in \ AddModification.__subclasses__() + \ RemoveModification.__subclasses__()} # Add modification as a generic type modtype_to_modclass['modification'] = Modification modclass_to_modtype = {cls: str(cls.__name__.lower()) for cls in \ AddModification.__subclasses__() + \ RemoveModification.__subclasses__()} # Add modification as a generic type modclass_to_modtype[Modification] = 'modification' # These are the modification types that are valid in ModConditions modtype_conditions = [modclass_to_modtype[mt] for mt in \ AddModification.__subclasses__()] modtype_conditions.append('modification') def _get_mod_inverse_maps(): modtype_to_inverse = {} modclass_to_inverse = {} for cls in AddModification.__subclasses__(): modtype = modclass_to_modtype[cls] modtype_inv = 'de' + modtype cls_inv = modtype_to_modclass[modtype_inv] modtype_to_inverse[modtype] = modtype_inv modtype_to_inverse[modtype_inv] = modtype modclass_to_inverse[cls] = cls_inv modclass_to_inverse[cls_inv] = cls return modtype_to_inverse, modclass_to_inverse modtype_to_inverse, modclass_to_inverse = _get_mod_inverse_maps() stmt_sbo_map = { 'acetylation': '0000215', 'glycosylation': '0000217', 'hydroxylation': '0000233', 'methylation': '0000214', 'myristoylation': '0000219', 'palmitoylation': '0000218', 'phosphorylation': '0000216', 'farnesylation': '0000222', 'geranylgeranylation': '0000223', 'ubiquitination': '0000224', 'dephosphorylation': '0000330', 'addmodification': '0000210', # addition of a chemical group 'removemodification': '0000211', # removal of a chemical group 'modification': '0000182', # conversion 'conversion': '0000182', # conversion 'autophosphorylation': '0000216', # phosphorylation 'transphosphorylation': '0000216', # phosphorylation 'decreaseamount': '0000179', # degradation 'increaseamount': '0000183', # transcription 'complex': '0000526', # protein complex formation 'translocation': '0000185', # transport reaction 'regulateactivity': '0000182', # conversion 'activeform': '0000412', # biological activity 'rasgef': '0000172', # catalysis 'rasgap': '0000172', # catalysis 'statement': '0000231' # occuring entity representation }
[docs]class InvalidResidueError(ValueError): """Invalid residue (amino acid) name.""" def __init__(self, name): ValueError.__init__(self, "Invalid residue name: '%s'" % name)
[docs]class InvalidLocationError(ValueError): """Invalid cellular component name.""" def __init__(self, name): ValueError.__init__(self, "Invalid location name: '%s'" % name)
def draw_stmt_graph(stmts): try: import matplotlib.pyplot as plt except Exception: logger.error('Could not import matplotlib, not drawing graph.') return try: import pygraphviz except Exception: logger.error('Could not import pygraphviz, not drawing graph.') return import numpy g = networkx.compose_all([stmt.to_graph() for stmt in stmts], 'composed_stmts') plt.figure() plt.ion() g.graph['graph'] = {'rankdir': 'LR'} pos = networkx.drawing.nx_agraph.graphviz_layout(g, prog='dot') g = g.to_undirected() # Draw nodes options = { 'marker': 'o', 's': 200, 'c': [0.85, 0.85, 1], 'facecolor': '0.5', 'lw': 0, } ax = plt.gca() nodelist = list(g) xy = numpy.asarray([pos[v] for v in nodelist]) node_collection = ax.scatter(xy[:, 0], xy[:, 1], **options) node_collection.set_zorder(2) # Draw edges networkx.draw_networkx_edges(g, pos, arrows=False, edge_color='0.5') # Draw labels edge_labels = {(e[0], e[1]): e[2].get('label') for e in g.edges(data=True)} networkx.draw_networkx_edge_labels(g, pos, edge_labels=edge_labels) node_labels = {n[0]: n[1].get('label') for n in g.nodes(data=True)} for key, label in node_labels.items(): if len(label) > 25: parts = label.split(' ') parts.insert(int(len(parts)/2), '\n') label = ' '.join(parts) node_labels[key] = label networkx.draw_networkx_labels(g, pos, labels=node_labels) ax.get_xaxis().set_visible(False) ax.get_yaxis().set_visible(False) plt.show()