Using natural language to build models

In this tutorial we build a simple model using natural language, then contextualize and parameterize it, and export it into different formats.

Read INDRA Statements from a natural language string

First we import INDRA’s API to the TRIPS reading system. We then define a block of text which serves as the description of the mechanism to be modeled in the model_text variable. Finally, indra.trips.process_text is called which sends a request to the TRIPS web service, gets a response and processes the extraction knowledge base to obtain a list of INDRA Statements

In [1]: from indra import trips

In [2]: model_text = 'MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1.'

In [3]: tp = trips.process_text(model_text)

At this point tp.statements should contain 2 INDRA Statements: a Phosphorylation Statement and a Dephosphorylation Statement. Note that the evidence sentence for each Statement is propagated:

In [4]: for st in tp.statements:
   ...:     print('%s with evidence "%s"' % (st, st.evidence[0].text))
   ...: 
Phosphorylation(MAP2K1(), MAPK1()) with evidence "MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1."
Dephosphorylation(DUSP6(), MAPK1()) with evidence "MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1."

Assemble the INDRA Statements into a rule-based executable model

We next use INDRA’s PySB Assembler to automatically assemble a rule-based model representing the biochemical mechanisms described in model_text. First a PysbAssembler object is instantiated, then the list of INDRA Statements is added to the assembler. Finally, the assembler’s make_model method is called which assembles the model and returns it, while also storing it in pa.model. Notice that we are using policies=’two_step’ as an argument of make_model. This directs the assemble to use rules in which enzymatic catalysis is modeled as a two-step process in which enzyme and substrate first reversibly bind and the enzyme-substrate complex produces and releases a product irreversibly.

In [5]: from indra.assemblers.pysb_assembler import PysbAssembler

In [6]: pa = PysbAssembler()

In [7]: pa.add_statements(tp.statements)

In [8]: pa.make_model(policies='two_step')
Out[8]: <Model 'None' (monomers: 3, rules: 6, parameters: 9, expressions: 0, compartments: 0) at 0x7feb00471e48>

At this point pa.model contains a PySB model object with 3 monomers,

In [9]: for monomer in pa.model.monomers:
   ...:     print(monomer)
   ...: 
Monomer('MAP2K1', ['mapk1'])
Monomer('DUSP6', ['mapk1'])
Monomer('MAPK1', ['phospho', 'map2k1', 'dusp6'], {'phospho': ['u', 'p']})

6 rules,

In [10]: for rule in pa.model.rules:
   ....:     print(rule)
   ....: 
Rule('MAP2K1_phosphorylation_bind_MAPK1_phospho', MAP2K1(mapk1=None) + MAPK1(phospho='u', map2k1=None) >> MAP2K1(mapk1=1) % MAPK1(phospho='u', map2k1=1), kf_mm_bind_1)
Rule('MAP2K1_phosphorylation_MAPK1_phospho', MAP2K1(mapk1=1) % MAPK1(phospho='u', map2k1=1) >> MAP2K1(mapk1=None) + MAPK1(phospho='p', map2k1=None), kc_mm_phosphorylation_1)
Rule('MAP2K1_dissoc_MAPK1', MAP2K1(mapk1=1) % MAPK1(map2k1=1) >> MAP2K1(mapk1=None) + MAPK1(map2k1=None), kr_mm_bind_1)
Rule('DUSP6_dephosphorylation_bind_MAPK1_phospho', DUSP6(mapk1=None) + MAPK1(phospho='p', dusp6=None) >> DUSP6(mapk1=1) % MAPK1(phospho='p', dusp6=1), kf_dm_bind_1)
Rule('DUSP6_dephosphorylation_MAPK1_phospho', DUSP6(mapk1=1) % MAPK1(phospho='p', dusp6=1) >> DUSP6(mapk1=None) + MAPK1(phospho='u', dusp6=None), kc_dm_dephosphorylation_1)
Rule('DUSP6_dissoc_MAPK1', DUSP6(mapk1=1) % MAPK1(dusp6=1) >> DUSP6(mapk1=None) + MAPK1(dusp6=None), kr_dm_bind_1)

and 9 parameters (6 kinetic rate constants and 3 total protein amounts) that are set to nominal but plausible values,

In [11]: for parameter in pa.model.parameters:
   ....:     print(parameter)
   ....: 
Parameter('kf_mm_bind_1', 1e-06)
Parameter('kr_mm_bind_1', 0.1)
Parameter('kc_mm_phosphorylation_1', 100.0)
Parameter('kf_dm_bind_1', 1e-06)
Parameter('kr_dm_bind_1', 0.1)
Parameter('kc_dm_dephosphorylation_1', 100.0)
Parameter('MAP2K1_0', 10000.0)
Parameter('DUSP6_0', 10000.0)
Parameter('MAPK1_0', 10000.0)

The model also contains extensive annotations that tie the monomers to database identifiers and also annotate the semantics of each component of each rule.

In [12]: for annotation in pa.model.annotations:
   ....:     print(annotation)
   ....: 
Annotation(MAP2K1, 'http://identifiers.org/uniprot/Q02750', 'is')
Annotation(MAP2K1, 'http://identifiers.org/ncit/C17808', 'is')
Annotation(MAP2K1, 'http://identifiers.org/hgnc/HGNC:6840', 'is')
Annotation(DUSP6, 'http://identifiers.org/uniprot/Q16828', 'is')
Annotation(DUSP6, 'http://identifiers.org/ncit/C106026', 'is')
Annotation(DUSP6, 'http://identifiers.org/hgnc/HGNC:3072', 'is')
Annotation(MAPK1, 'http://identifiers.org/uniprot/P28482', 'is')
Annotation(MAPK1, 'http://identifiers.org/ncit/C17589', 'is')
Annotation(MAPK1, 'http://identifiers.org/hgnc/HGNC:6871', 'is')
Annotation(MAP2K1_phosphorylation_bind_MAPK1_phospho, 'beb73899-c0bc-40dd-b4f7-323c888728ac', 'from_indra_statement')
Annotation(MAP2K1_phosphorylation_MAPK1_phospho, 'MAP2K1', 'rule_has_subject')
Annotation(MAP2K1_phosphorylation_MAPK1_phospho, 'MAPK1', 'rule_has_object')
Annotation(MAP2K1_phosphorylation_MAPK1_phospho, 'beb73899-c0bc-40dd-b4f7-323c888728ac', 'from_indra_statement')
Annotation(MAP2K1_dissoc_MAPK1, 'beb73899-c0bc-40dd-b4f7-323c888728ac', 'from_indra_statement')
Annotation(DUSP6_dephosphorylation_bind_MAPK1_phospho, '5ece83d5-e544-4093-8efd-c12de7c1484c', 'from_indra_statement')
Annotation(DUSP6_dephosphorylation_MAPK1_phospho, 'DUSP6', 'rule_has_subject')
Annotation(DUSP6_dephosphorylation_MAPK1_phospho, 'MAPK1', 'rule_has_object')
Annotation(DUSP6_dephosphorylation_MAPK1_phospho, '5ece83d5-e544-4093-8efd-c12de7c1484c', 'from_indra_statement')
Annotation(DUSP6_dissoc_MAPK1, '5ece83d5-e544-4093-8efd-c12de7c1484c', 'from_indra_statement')

Set the model to a particular cell line context

We can use INDRA’s contextualization module which is built into the PysbAssembler to set the amounts of proteins in the model to total amounts measured (or estimated) in a given cancer cell line. In this example, we will use the A375 melanoma cell line to set the total amounts of proteins in the model.

In [13]: pa.set_context('A375_SKIN')
---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/connection.py in _new_conn(self)
    140             conn = connection.create_connection(
--> 141                 (self.host, self.port), self.timeout, **extra_kw)
    142 

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     82     if err is not None:
---> 83         raise err
     84 

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     72                 sock.bind(source_address)
---> 73             sock.connect(sa)
     74             return sock

TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    599                                                   body=body, headers=headers,
--> 600                                                   chunked=chunked)
    601 

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    355         else:
--> 356             conn.request(method, url, **httplib_request_kw)
    357 

/usr/lib/python3.5/http/client.py in request(self, method, url, body, headers)
   1105         """Send a complete request to the server."""
-> 1106         self._send_request(method, url, body, headers)
   1107 

/usr/lib/python3.5/http/client.py in _send_request(self, method, url, body, headers)
   1150             body = _encode(body, 'body')
-> 1151         self.endheaders(body)
   1152 

/usr/lib/python3.5/http/client.py in endheaders(self, message_body)
   1101             raise CannotSendHeader()
-> 1102         self._send_output(message_body)
   1103 

/usr/lib/python3.5/http/client.py in _send_output(self, message_body)
    933 
--> 934         self.send(msg)
    935         if message_body is not None:

/usr/lib/python3.5/http/client.py in send(self, data)
    876             if self.auto_open:
--> 877                 self.connect()
    878             else:

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/connection.py in connect(self)
    165     def connect(self):
--> 166         conn = self._new_conn()
    167         self._prepare_conn(conn)

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/connection.py in _new_conn(self)
    149             raise NewConnectionError(
--> 150                 self, "Failed to establish a new connection: %s" % e)
    151 

NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7feafecf5dd8>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    437                     retries=self.max_retries,
--> 438                     timeout=timeout
    439                 )

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    648             retries = retries.increment(method, url, error=e, _pool=self,
--> 649                                         _stacktrace=sys.exc_info()[2])
    650             retries.sleep()

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    387         if new_retry.is_exhausted():
--> 388             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    389 

MaxRetryError: HTTPConnectionPool(host='general.bigmech.ndexbio.org', port=8081): Max retries exceeded with url: /context/expression/cell_line (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feafecf5dd8>: Failed to establish a new connection: [Errno 110] Connection timed out',))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-13-0f870149b3cf> in <module>()
----> 1 pa.set_context('A375_SKIN')

/home/docs/checkouts/readthedocs.org/user_builds/indra/checkouts/latest/indra/assemblers/pysb_assembler.py in set_context(self, cell_type)
    829             return
    830         monomer_names = [m.name for m in self.model.monomers]
--> 831         res = context_client.get_protein_expression(monomer_names, cell_type)
    832         if not res:
    833             logger.warning('Could not get context for %s cell type.' %

/home/docs/checkouts/readthedocs.org/user_builds/indra/checkouts/latest/indra/databases/context_client.py in get_protein_expression(gene_names, cell_types)
     37         cell_types = [cell_types]
     38     params = {g: cell_types for g in gene_names}
---> 39     res = ndex_client.send_request(url, params, is_json=True)
     40     return res
     41 

/home/docs/checkouts/readthedocs.org/user_builds/indra/checkouts/latest/indra/databases/ndex_client.py in send_request(ndex_service_url, params, is_json, use_get)
     35         res = requests.get(ndex_service_url, json=params)
     36     else:
---> 37         res = requests.post(ndex_service_url, json=params)
     38     status = res.status_code
     39     # If response is immediate, we get 200

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/api.py in post(url, data, json, **kwargs)
    110     """
    111 
--> 112     return request('post', url, data=data, json=json, **kwargs)
    113 
    114 

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/api.py in request(method, url, **kwargs)
     56     # cases, and look like a memory leak in others.
     57     with sessions.Session() as session:
---> 58         return session.request(method=method, url=url, **kwargs)
     59 
     60 

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    516         }
    517         send_kwargs.update(settings)
--> 518         resp = self.send(prep, **send_kwargs)
    519 
    520         return resp

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/sessions.py in send(self, request, **kwargs)
    637 
    638         # Send the request
--> 639         r = adapter.send(request, **kwargs)
    640 
    641         # Total elapsed time of the request (approximately)

/home/docs/checkouts/readthedocs.org/user_builds/indra/envs/latest/lib/python3.5/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    500                 raise ProxyError(e, request=request)
    501 
--> 502             raise ConnectionError(e, request=request)
    503 
    504         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='general.bigmech.ndexbio.org', port=8081): Max retries exceeded with url: /context/expression/cell_line (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7feafecf5dd8>: Failed to establish a new connection: [Errno 110] Connection timed out',))

At this point the PySB model has total protein amounts set consistent with the A375 cell line:

In [14]: for monomer_pattern, parameter in pa.model.initial_conditions:
   ....:     print('%s = %d' % (monomer_pattern, parameter.value))
   ....: 
MAP2K1(mapk1=None) = 10000
DUSP6(mapk1=None) = 10000
MAPK1(phospho='u', map2k1=None, dusp6=None) = 10000

Exporting the model into other common formats

From the assembled PySB format it is possible to export the model into other common formats such as SBML, BNGL and Kappa. One can also generate a Matlab or Mathematica script with ODEs corresponding to the model.

pa.export_model('sbml')
pa.export_model('bngl')

One can also pass a file name argument to the export_model function to save the exported model directly into a file:

pa.export_model('sbml', 'example_model.sbml')