Using natural language to build models

In this tutorial we build a simple model using natural language, then contextualize and parameterize it, and export it into different formats.

Read INDRA Statements from a natural language string

First we import INDRA’s API to the TRIPS reading system. We then define a block of text which serves as the description of the mechanism to be modeled in the model_text variable. Finally, indra.trips.process_text is called which sends a request to the TRIPS web service, gets a response and processes the extraction knowledge base to obtain a list of INDRA Statements

In [1]: from indra import trips

In [2]: model_text = 'MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1.'

In [3]: tp = trips.process_text(model_text)

At this point tp.statements should contain 2 INDRA Statements: a Phosphorylation Statement and a Dephosphorylation Statement. Note that the evidence sentence for each Statement is propagated:

In [4]: for st in tp.statements:
   ...:     print('%s with evidence "%s"' % (st, st.evidence[0].text))
Phosphorylation(MAP2K1(), MAPK1()) with evidence "MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1."
Dephosphorylation(DUSP6(), MAPK1()) with evidence "MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1."

Assemble the INDRA Statements into a rule-based executable model

We next use INDRA’s PySB Assembler to automatically assemble a rule-based model representing the biochemical mechanisms described in model_text. First a PysbAssembler object is instantiated, then the list of INDRA Statements is added to the assembler. Finally, the assembler’s make_model method is called which assembles the model and returns it, while also storing it in pa.model. Notice that we are using policies=’two_step’ as an argument of make_model. This directs the assemble to use rules in which enzymatic catalysis is modeled as a two-step process in which enzyme and substrate first reversibly bind and the enzyme-substrate complex produces and releases a product irreversibly.

In [5]: from indra.assemblers.pysb_assembler import PysbAssembler

In [6]: pa = PysbAssembler()

In [7]: pa.add_statements(tp.statements)

In [8]: pa.make_model(policies='two_step')
Out[8]: <Model 'None' (monomers: 3, rules: 6, parameters: 9, expressions: 0, compartments: 0) at 0x7f0016b71110>

At this point pa.model contains a PySB model object with 3 monomers,

In [9]: for monomer in pa.model.monomers:
   ...:     print(monomer)
Monomer(u'DUSP6', [u'mapk1'])
Monomer(u'MAP2K1', [u'mapk1'])
Monomer(u'MAPK1', [u'phospho', u'map2k1', u'dusp6'], {u'phospho': [u'u', u'p']})

6 rules,

In [10]: for rule in pa.model.rules:
   ....:     print(rule)
Rule(u'MAP2K1_phosphorylation_bind_MAPK1_phospho', MAP2K1(mapk1=None) + MAPK1(phospho=u'u', map2k1=None) >> MAP2K1(mapk1=1) % MAPK1(phospho=u'u', map2k1=1), kf_mm_bind_1)
Rule(u'MAP2K1_phosphorylation_MAPK1_phospho', MAP2K1(mapk1=1) % MAPK1(phospho=u'u', map2k1=1) >> MAP2K1(mapk1=None) + MAPK1(phospho=u'p', map2k1=None), kc_mm_phosphorylation_1)
Rule(u'MAP2K1_dissoc_MAPK1', MAP2K1(mapk1=1) % MAPK1(map2k1=1) >> MAP2K1(mapk1=None) + MAPK1(map2k1=None), kr_mm_bind_1)
Rule(u'DUSP6_dephosphorylation_bind_MAPK1_phospho', DUSP6(mapk1=None) + MAPK1(phospho=u'p', dusp6=None) >> DUSP6(mapk1=1) % MAPK1(phospho=u'p', dusp6=1), kf_dm_bind_1)
Rule(u'DUSP6_dephosphorylation_MAPK1_phospho', DUSP6(mapk1=1) % MAPK1(phospho=u'p', dusp6=1) >> DUSP6(mapk1=None) + MAPK1(phospho=u'u', dusp6=None), kc_dm_dephosphorylation_1)
Rule(u'DUSP6_dissoc_MAPK1', DUSP6(mapk1=1) % MAPK1(dusp6=1) >> DUSP6(mapk1=None) + MAPK1(dusp6=None), kr_dm_bind_1)

and 9 parameters (6 kinetic rate constants and 3 total protein amounts) that are set to nominal but plausible values,

In [11]: for parameter in pa.model.parameters:
   ....:     print(parameter)
Parameter(u'kf_mm_bind_1', 1e-06)
Parameter(u'kr_mm_bind_1', 0.1)
Parameter(u'kc_mm_phosphorylation_1', 100.0)
Parameter(u'kf_dm_bind_1', 1e-06)
Parameter(u'kr_dm_bind_1', 0.1)
Parameter(u'kc_dm_dephosphorylation_1', 100.0)
Parameter(u'DUSP6_0', 10000.0)
Parameter(u'MAP2K1_0', 10000.0)
Parameter(u'MAPK1_0', 10000.0)

The model also contains extensive annotations that tie the monomers to database identifiers and also annotate the semantics of each component of each rule.

In [12]: for annotation in pa.model.annotations:
   ....:     print(annotation)
Annotation(DUSP6, u'', u'is')
Annotation(DUSP6, u'', u'is')
Annotation(DUSP6, u'', u'is')
Annotation(MAP2K1, u'', u'is')
Annotation(MAP2K1, u'', u'is')
Annotation(MAP2K1, u'', u'is')
Annotation(MAPK1, u'', u'is')
Annotation(MAPK1, u'', u'is')
Annotation(MAPK1, u'', u'is')
Annotation(MAP2K1_phosphorylation_bind_MAPK1_phospho, u'd46753c6-5a32-4974-b7b3-b782dfd1403d', u'from_indra_statement')
Annotation(MAP2K1_phosphorylation_MAPK1_phospho, u'MAP2K1', u'rule_has_subject')
Annotation(MAP2K1_phosphorylation_MAPK1_phospho, u'MAPK1', u'rule_has_object')
Annotation(MAP2K1_phosphorylation_MAPK1_phospho, u'd46753c6-5a32-4974-b7b3-b782dfd1403d', u'from_indra_statement')
Annotation(MAP2K1_dissoc_MAPK1, u'd46753c6-5a32-4974-b7b3-b782dfd1403d', u'from_indra_statement')
Annotation(DUSP6_dephosphorylation_bind_MAPK1_phospho, u'5ef281f9-012f-477d-ac65-c8543cc3a6f5', u'from_indra_statement')
Annotation(DUSP6_dephosphorylation_MAPK1_phospho, u'DUSP6', u'rule_has_subject')
Annotation(DUSP6_dephosphorylation_MAPK1_phospho, u'MAPK1', u'rule_has_object')
Annotation(DUSP6_dephosphorylation_MAPK1_phospho, u'5ef281f9-012f-477d-ac65-c8543cc3a6f5', u'from_indra_statement')
Annotation(DUSP6_dissoc_MAPK1, u'5ef281f9-012f-477d-ac65-c8543cc3a6f5', u'from_indra_statement')

Set the model to a particular cell line context

We can use INDRA’s contextualization module which is built into the PysbAssembler to set the amounts of proteins in the model to total amounts measured (or estimated) in a given cancer cell line. In this example, we will use the A375 melanoma cell line to set the total amounts of proteins in the model.

In [13]: pa.set_context('A375_SKIN')

ConnectionErrorTraceback (most recent call last)
<ipython-input-13-0f870149b3cf> in <module>()
----> 1 pa.set_context('A375_SKIN')

/home/docs/checkouts/ in set_context(self, cell_type)
    826             return
    827         monomer_names = [ for m in self.model.monomers]
--> 828         res = context_client.get_protein_expression(monomer_names, cell_type)
    829         if not res:
    830             logger.warning('Could not get context for %s cell type.' %

/home/docs/checkouts/ in get_protein_expression(gene_names, cell_types)
     37         cell_types = [cell_types]
     38     params = {g: cell_types for g in gene_names}
---> 39     res = ndex_client.send_request(url, params, is_json=True)
     40     return res

/home/docs/checkouts/ in send_request(ndex_service_url, params, is_json, use_get)
     35         res = requests.get(ndex_service_url, json=params)
     36     else:
---> 37         res =, json=params)
     38     status = res.status_code
     39     # If response is immediate, we get 200

/home/docs/checkouts/ in post(url, data, json, **kwargs)
    108     """
--> 110     return request('post', url, data=data, json=json, **kwargs)

/home/docs/checkouts/ in request(method, url, **kwargs)
     54     # cases, and look like a memory leak in others.
     55     with sessions.Session() as session:
---> 56         return session.request(method=method, url=url, **kwargs)

/home/docs/checkouts/ in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    486         }
    487         send_kwargs.update(settings)
--> 488         resp = self.send(prep, **send_kwargs)
    490         return resp

/home/docs/checkouts/ in send(self, request, **kwargs)
    608         # Send the request
--> 609         r = adapter.send(request, **kwargs)
    611         # Total elapsed time of the request (approximately)

/home/docs/checkouts/ in send(self, request, stream, timeout, verify, cert, proxies)
    485                 raise ProxyError(e, request=request)
--> 487             raise ConnectionError(e, request=request)
    489         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='', port=8081): Max retries exceeded with url: /context/expression/cell_line (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f0017d684d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))

At this point the PySB model has total protein amounts set consistent with the A375 cell line:

In [14]: for monomer_pattern, parameter in pa.model.initial_conditions:
   ....:     print('%s = %d' % (monomer_pattern, parameter.value))
DUSP6(mapk1=None) = 10000
MAP2K1(mapk1=None) = 10000
MAPK1(phospho=u'u', map2k1=None, dusp6=None) = 10000

Exporting the model into other common formats

From the assembled PySB format it is possible to export the model into other common formats such as SBML, BNGL and Kappa. One can also generate a Matlab or Mathematica script with ODEs corresponding to the model.


One can also pass a file name argument to the export_model function to save the exported model directly into a file:

pa.export_model('sbml', 'example_model.sbml')