INDRA Database REST Client (indra.sources.indra_db_rest)

The INDRA database client allows querying a web service that serves content from a database of INDRA Statements collected and pre-assembled from various sources.

Access to the webservice requires a URL (INDRA_DB_REST_URL) and an API key (INDRA_DB_REST_API_KEY), both of which may be placed in your config file or as environment variables. If you do not have these but would like to access the database REST API, please contact the INDRA developers.

API to the INDRA Database REST Service (indra.sources.indra_db_rest.api)

INDRA has been used to generate and maintain a database of causal relations as INDRA Statements. The contents of the INDRA Database can be accessed programmatically through this API.

The API includes three high-level query functions that cover many common use cases:

get_statements():

Get statements by agent information and Statement type, e.g. “Statements with object MEK and type Inhibition” (This query function has a generic name to maintain backward compatibility.)

get_statements_for_paper():

Get Statements based on the papers they are drawn from, for instance “Statements from the paper with PMID 12345”.

get_statements_by_hash():

Distinct INDRA Statements are associated with a unique numeric hash. This endpoint can be used to query the database for provenance

Queries with more complex constraints can be made using the query language API in :py:module:`indra.sources.indra_db_rest.query` along with this function:

get_statements_from_query():

This function works alongside the Query “language” to execute arbitrary requests for Statements based on statement metadata indexed in the Database.

There are also two functions relating to the submission and retrieval of curations. It is possible to enter feedback the correctness of text-mined Statements, which we call “curations”. submit_curations() allows you to submit your curations, and get_curations() allows you to retrieve existing curations (an API key is required).

Limits, timeouts and threading

Some queries may return a large number of statements, requiring the client to assemble results from multiple successive requests to the REST API. The behavior of the client can be controlled by several parameters to the query functions.

For example, consider the query for Statements whose subject is TNF:

>>>
>> from indra.sources.indra_db_rest.api import get_statements
>> p = get_statements("TNF")
>> stmts = p.statements

Because there are many Statements associated with TNF, the client will make multiple paged requests to get all the results. The maximum number of Statements returned can be limited using the limit argument:

>>>
>> p = get_statements("TNF", limit=1000)
>> stmts = p.statements

For longer requests the client can work in a background thread after a timeout is reached. This can be done by specifying a timeout (in seconds) using the timeout argument. While the client continues retrieval, the first page of the statement results is available in the statements_sample attribute:

>>>
>> p = get_statements("TNF", timeout=5)
>> some_stmts = p.statements_sample
>>
>> # ...Do some other work...
>>
>> # Wait for the requests to finish before getting the final result.
>> p.wait_until_done()
>> stmts = p.statements

Note that the timeout specifies how long the client should block for the result, but that the result will continue to be retrieved until it is completed on a background thread. If desired one can supply a timeout of 0 and get the processor immediately, leaving the entire query to happen in the background.

You can check if the process is still running using the is_working method:

>>>
>> p = get_statements("TNF", timeout=0)
>> p.is_working()
True

If you don’t want the client to make multiple paged requests and instead want to get only the results from the first request, you can set “persist” to False (the request job can still be put in the background with timeout=0).

>>>
>> p = get_statements("TNF", persist=False)
>> stmts = p.statements

For additional details on these and other parameters controlling statement retrieval see the function documentation.

Using the Query Language

There are several metadata and data values indexed in the INDRA Database allowing for complex queries. Using the Query language these attributes can be combined in arbitrary ways using logical operators. For example, you may want to find Statements that MEK is inhibited found in papers related to breast cancer and that also have more than 10 evidence:

>>>
>> from indra.sources.indra_db_rest.api import get_statements_from_query
>> from indra.sources.indra_db_rest.query import HasAgent, HasType, \
>>     FromMeshIds, HasEvidenceBound
>>
>> query = (HasAgent("MEK", namespace="FPLX") & HasType(["Inhibition"])
>>          & FromMeshIds(["D001943"]) & HasEvidenceBound(["> 10"]))
>>
>> p = get_statements_from_query(query)
>> stmts = p.statements

In addition to joining constraints with “&” (an intersection, an “and”) as shown above, you can also form unions (a.k.a. “or”s) using “|”:

>>>
>> query = (
>>     (
>>         HasAgent("MEK", namespace="FPLX")
>>         | HasAgent("MAP2K1", namespace="HGNC-SYMBOL")
>>     )
>>     & HasType(['Inhibition'])
>> )
>>
>> p = get_statements_from_query(query, limit=10)

For more details and examples of the Query architecture, see query.

Evidence Filtering

Queries can constrain results based on a property of the original evidence text, so anything from the text references (like pmid) to the readers included and whether the evidence is from a reading or a database, can all have an effect on the evidences included in the result. By default, such queries filter not only the statements but also their associated evidence, so that, for example, if you query for Statements from a given paper, the evidences returned with the Statements you queried are only from that paper.

>>>
>> p = get_statements_for_papers([('pmid', '20471474'),
>>                                ('pmcid', 'PMC3640704')])
>> all(ev.text_refs['PMID'] == '20471474'
>>     or ev.text_refs['PMCID'] == 'PMC3640704'
>>     for s in p.statements for ev in s.evidence)
True

You can deactivate this feature by setting filter_ev to False:

>>>
>> p = get_statements_for_papers([('pmid', '20471474'),
>>                                ('pmcid', 'PMC3640704')], filter_ev=False)
>> all(ev.text_refs['PMID'] == '20471474'
>>     or ev.text_refs['PMCID'] == 'PMC3640704'
>>     for s in p.statements for ev in s.evidence)
False

Curation Submission

Suppose you run a query and get some Statements with some evidence; you look through the results and find an evidence that does not really support the Statement. Using the API it is possible to provide feedback by submitting a curation.

>>>
>> from indra.statements import pretty_print_stmts
>> p = get_statements(agents=["TNF"], ev_limit=3, limit=1)
>> pretty_print_stmts(p.statements)
[LIST INDEX: 0] Activation(TNF(), apoptotic process())
================================================================================
EV INDEX: 0       These published reports in their aggregate support that TNFR2
SOURCE: reach     can lower the threshold of bioavailable TNFalpha needed to
PMID: 19774075    cause apoptosis through TNFR1 thus amplifying extrinsic cell
                  death pathways.
--------------------------------------------------------------------------------
EV INDEX: 1       Our results indicate that IE86 inhibits tumor necrosis factor
SOURCE: reach     (TNF)-alpha induced apoptosis and that the anti-apoptotic
PMID: 19502735    activity of this viral protein correlates with its expression
                  levels.
--------------------------------------------------------------------------------
EV INDEX: 2       This relationship between PUFAs and their anti-inflammatory
SOURCE: reach     metabolites and type 1 DM is supported by the observation that
PMID: 28824543    in a mfat-1 transgenic mouse model whose islets contained
                  increased levels of n-3 PUFAs and significantly lower amounts
                  of n-6 PUFAs compared to the wild type, were resistant to
                  apoptosis induced by TNF-alpha, IL-1beta, and gamma-IFN.
--------------------------------------------------------------------------------
>>
>> submit_curation(p.statements[0].get_hash(), "correct", "usr@bogusemail.com",
>>                 pa_json=p.statements[0].to_json(),
>>                 ev_json=p.statements[0].evidence[1].to_json())
{'ref': {'id': 11919}, 'result': 'success'}
indra.sources.indra_db_rest.api.get_statements(subject=None, object=None, agents=None, stmt_type=None, use_exact_type=False, limit=None, persist=True, timeout=None, strict_stop=False, ev_limit=10, sort_by='ev_count', tries=3, api_key=None)[source]

Get Statements from the INDRA DB web API matching given agents and type.

You get a DBQueryStatementProcessor object, which allow Statements to be loaded in a background thread, providing a sample of the “best” content available promptly in the sample_statements attribute, and populates the statements attribute when the paged load is complete. The “best” is determined by the sort_by attribute, which may be either ‘belief’ or ‘ev_count’ or None.

Parameters
  • subject/object (str) – Optionally specify the subject and/or object of the statements you wish to get from the database. By default, the namespace is assumed to be HGNC gene names, however you may specify another namespace by including “@<namespace>” at the end of the name string. For example, if you want to specify an agent by chebi, you could use “CHEBI:6801@CHEBI”, or if you wanted to use the HGNC id, you could use “6871@HGNC”.

  • agents (list[str]) – A list of agents, specified in the same manner as subject and object, but without specifying their grammatical position.

  • stmt_type (str) – Specify the types of interactions you are interested in, as indicated by the sub-classes of INDRA’s Statements. This argument is not case sensitive. If the statement class given has sub-classes (e.g. RegulateAmount has IncreaseAmount and DecreaseAmount), then both the class itself, and its subclasses, will be queried, by default. If you do not want this behavior, set use_exact_type=True. Note that if max_stmts is set, it is possible only the exact statement type will be returned, as this is the first searched. The processor then cycles through the types, getting a page of results for each type and adding it to the quota, until the max number of statements is reached.

  • use_exact_type (bool) – If stmt_type is given, and you only want to search for that specific statement type, set this to True. Default is False.

  • limit (Optional[int]) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.

  • persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).

  • timeout (positive int or None) – If an int, block until the work is done and statements are retrieved, or until the timeout has expired, in which case the results so far will be returned in the response object, and further results will be added in a separate thread as they become available. Block indefinitely until all statements are retrieved. Default is None.

  • strict_stop (bool) – If True, the query will only be given timeout time to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.

  • ev_limit (Optional[int]) – Limit the amount of evidence returned per Statement. Default is 10.

  • sort_by (Optional[str]) – Str options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.

  • tries (Optional[int]) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.

  • api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.

Returns

processor – An instance of the DBQueryStatementProcessor, which has an attribute statements which will be populated when the query/queries are done.

Return type

DBQueryStatementProcessor

indra.sources.indra_db_rest.api.get_statements_for_papers(ids, limit=None, ev_limit=10, sort_by='ev_count', persist=True, timeout=None, strict_stop=False, tries=3, filter_ev=True, api_key=None)[source]

Get Statements extracted from the papers with the given ref ids.

Parameters
  • ids (list[str, str]) – A list of tuples with ids and their type. For example: [('pmid', '12345'), ('pmcid', 'PMC12345')] The type can be any one of ‘pmid’, ‘pmcid’, ‘doi’, ‘pii’, ‘manuscript_id’, or ‘trid’, which is the primary key id of the text references in the database.

  • limit (Optional[int]) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.

  • ev_limit (Optional[int]) – Limit the amount of evidence returned per Statement. Default is 10.

  • filter_ev (bool) – Indicate whether evidence should have the same filters applied as the statements themselves, where appropriate (e.g. in the case of a filter by paper).

  • sort_by (Optional[str]) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.

  • persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).

  • timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.

  • strict_stop (bool) – If True, the query will only be given timeout time to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.

  • tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.

  • api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.

Returns

processor – An instance of the DBQueryStatementProcessor, which has an attribute statements which will be populated when the query/queries are done.

Return type

DBQueryStatementProcessor

indra.sources.indra_db_rest.api.get_statements_by_hash(hash_list, limit=None, ev_limit=10, sort_by='ev_count', persist=True, timeout=None, strict_stop=False, tries=3, api_key=None)[source]

Get Statements from a list of hashes.

Parameters
  • hash_list (list[int or str]) – A list of statement hashes.

  • limit (Optional[int]) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.

  • ev_limit (Optional[int]) – Limit the amount of evidence returned per Statement. Default is 100.

  • sort_by (Optional[str]) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.

  • persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).

  • timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.

  • strict_stop (bool) – If True, the query will only be given timeout time to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.

  • tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.

  • api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.

Returns

processor – An instance of the DBQueryStatementProcessor, which has an attribute statements which will be populated when the query/queries are done.

Return type

DBQueryStatementProcessor

indra.sources.indra_db_rest.api.get_statements_from_query(query, limit=None, ev_limit=10, sort_by='ev_count', persist=True, timeout=None, strict_stop=False, tries=3, filter_ev=True, api_key=None)[source]

Get Statements using a Query.

Example

>>>
>> from indra.sources.indra_db_rest.query import HasAgent, FromMeshIds
>> query = HasAgent("MEK", "FPLX") & FromMeshIds(["D001943"])
>> p = get_statements_from_query(query, limit=100)
>> stmts = p.statements
Parameters
  • query (Query) – The query to be evaluated in return for statements.

  • limit (Optional[int]) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.

  • ev_limit (Optional[int]) – Limit the amount of evidence returned per Statement. Default is 10.

  • filter_ev (bool) – Indicate whether evidence should have the same filters applied as the statements themselves, where appropriate (e.g. in the case of a filter by paper).

  • sort_by (Optional[str]) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.

  • persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).

  • timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.

  • strict_stop (bool) – If True, the query will only be given timeout time to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.

  • tries (Optional[int]) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.

  • api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.

Returns

processor – An instance of the DBQueryStatementProcessor, which has an attribute statements which will be populated when the query/queries are done.

Return type

DBQueryStatementProcessor

indra.sources.indra_db_rest.api.submit_curation(hash_val, tag, curator_email, text=None, source='indra_rest_client', ev_hash=None, pa_json=None, ev_json=None, api_key=None, is_test=False)[source]

Submit a curation for the given statement at the relevant level.

Parameters
  • hash_val (int) – The hash corresponding to the statement.

  • tag (str) – A very short phrase categorizing the error or type of curation, e.g. “grounding” for a grounding error, or “correct” if you are marking a statement as correct.

  • curator_email (str) – The email of the curator.

  • text (str) – A brief description of the problem.

  • source (str) – The name of the access point through which the curation was performed. The default is ‘direct_client’, meaning this function was used directly. Any higher-level application should identify itself here.

  • ev_hash (int) – A hash of the sentence and other evidence information. Elsewhere referred to as source_hash.

  • pa_json (None or dict) – The JSON of a statement you wish to curate. If not given, it may be inferred (best effort) from the given hash.

  • ev_json (None or dict) – The JSON of an evidence you wish to curate. If not given, it cannot be inferred.

  • api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.

  • is_test (bool) – Used in testing. If True, no curation will actually be added to the database.

indra.sources.indra_db_rest.api.get_curations(hash_val=None, source_hash=None, api_key=None)[source]

Get the curations for a specific statement and evidence.

If neither hash_val nor source_hash are given, all curations will be retrieved. This will require the user to have extra permissions, as determined by their API key.

Parameters
  • hash_val (Optional[int]) – The hash of a statement whose curations you want to retrieve.

  • source_hash (Optional[int]) – The hash generated for a piece of evidence for which you want curations. The hash_val must be provided to use the source_hash.

  • api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.

Returns

curations – A list of dictionaries containing the curation data.

Return type

list

Advanced Query Construction (indra.sources.indra_db_rest.query)

The Query architecture allows the construction of arbitrary queries for content from the INDRA Database.

Specifically, queries constructed using this language of classes is converted into optimized SQL by the INDRA Database REST API. Different classes represent different types of constraints and are named as much as possible to fit together when spoken aloud in English. For example:

>>>
>> HasAgent("MEK") & HasAgent("ERK") & HasType(["Phosphorylation"])

will find any Statement that has an agent MEK and an agent ERK and has the type phosphorylation.

Query Classes (the building blocks)

Broadly, query classes can be broken into 3 types: queries on the meaning of a Statement, queries on the provenance of a Statement, and queries that combine groups of queries.

Meaning of a Statement:

Provenance of a Statement:

Combine Queriers:

There is also the special class, the EmptyQuery which is useful when programmatically building a query.

Building Nontrivial Queries (how to put the blocks together)

In practice you should not use And or Or very often but instead make use of the overloaded & and | operators to put Queries together into more complex structures. In addition you can invert a query, i.e., essentially ask for Statements that do not meet certain criteria, e.g. “not has readings”. This can be accomplished with the overloaded ~ operator, e.g. ~HasReadings().

The query class works by representing and producing a particular JSON structure which is recognized by the INDRA Database REST service, where it is translated into a similar but more sophisticated Query language used by the Readonly Database client. The Query class implements the basic methods used to communicate with the REST Service in this way.

Examples

First a couple of examples of the typical usage of a query object (See the get_statements_from_query documentation for more usage details):

Example 1: Get statements that have database evidence and have either MEK or MAP2K1 as a name for any of its agents.

>>>
>> from indra.sources.indra_db_rest.api import get_statements_from_query
>> from indra.sources.indra_db_rest.query import *
>> q = HasAgent('MEK') | HasAgent('MAP2K1') & HasDatabases()
>> p = get_statements_from_query(q)
>> p.statements
[Activation(MEK(), ERK()),
 Phosphorylation(MEK(), ERK()),
 Activation(MAP2K1(), ERK()),
 Activation(RAF1(), MEK()),
 Phosphorylation(RAF1(), MEK()),
 Phosphorylation(MAP2K1(), ERK()),
 Activation(BRAF(), MEK()),
 Inhibition(2-(2-amino-3-methoxyphenyl)chromen-4-one(), MEK()),
 Activation(MAP2K1(), MAPK1()),
 Activation(MAP2K1(), MAPK3()),
 Phosphorylation(MAP2K1(), MAPK1()),
 Phosphorylation(BRAF(), MEK()),
 Activation(MEK(), MAPK1()),
 Complex(BRAF(), MAP2K1()),
 Phosphorylation(MAP2K1(), MAPK3()),
 Activation(MEK(), MAPK3()),
 Complex(MAP2K1(), RAF1()),
 Activation(RAF1(), MAP2K1()),
 Inhibition(trametinib(), MEK()),
 Phosphorylation(MEK(), MAPK3()),
 Complex(MAP2K1(), MAPK1()),
 Phosphorylation(MEK(), MAPK1()),
 Inhibition(selumetinib(), MEK()),
 Phosphorylation(PAK1(), MAP2K1(), S, 298)]

Example 2: Get statements that have an agent MEK and an agent ERK and more than 10 evidence.

>>>
>> q = HasAgent('MEK') & HasAgent('ERK') & HasEvidenceBound(["> 10"])
>> p = get_statements_from_query(q)
>> p.statements
[Activation(MEK(), ERK()),
 Phosphorylation(MEK(), ERK()),
 Complex(ERK(), MEK()),
 Inhibition(MEK(), ERK()),
 Dephosphorylation(MEK(), ERK()),
 Complex(ERK(), MEK(), RAF()),
 Phosphorylation(MEK(), ERK(), T),
 Phosphorylation(MEK(), ERK(), Y),
 Activation(MEK(), ERK(mods: (phosphorylation))),
 IncreaseAmount(MEK(), ERK())]

Example 3: An example of using the ~ feature.

>>>
>> q = HasAgent('MEK', namespace='FPLX') & ~HasAgent('ERK', namespace='FPLX')
>> p = get_statements_from_query(q)
>> p.statements[:10]
[Phosphorylation(None, MEK()),
 Phosphorylation(RAF(), MEK()),
 Activation(RAF(), MEK()),
 Activation(MEK(), MAPK()),
 Inhibition(U0126(), MEK()),
 Inhibition(MEK(), apoptotic process()),
 Activation(MEK(), cell population proliferation()),
 Activation(RAF1(), MEK()),
 Phosphorylation(MEK(), MAPK()),
 Phosphorylation(RAF1(), MEK())]

And now an example showing the different methods of the Query object:

Example 4: a tour demonstrating key utilities of a query object.

Consider the last query we wrote. You can examine the simple JSON sent to the server:

>>>
>> q.to_simple_json()
{'class': 'And',
 'constraint': {'queries': [{'class': 'HasAgent',
    'constraint': {'agent_id': 'MEK',
     'namespace': 'FPLX',
     'role': None,
     'agent_num': None},
    'inverted': False},
   {'class': 'HasAgent',
    'constraint': {'agent_id': 'ERK',
     'namespace': 'FPLX',
     'role': None,
     'agent_num': None},
    'inverted': True}]},
 'inverted': False}

Or you can retrieve the more “true” JSON representation that is generated by the server from your simpler query:

>>>
>> q.get_query_json()
{'class': 'Intersection',
 'constraint': {'query_list': [{'class': 'HasAgent',
    'constraint': {'_regularized_id': 'MEK',
     'agent_id': 'MEK',
     'agent_num': None,
     'namespace': 'FPLX',
     'role': None},
    'inverted': False},
   {'class': 'HasAgent',
    'constraint': {'_regularized_id': 'ERK',
     'agent_id': 'ERK',
     'agent_num': None,
     'namespace': 'FPLX',
     'role': None},
    'inverted': True}]},
 'inverted': False}

And last of all you can retrieve a human readable English description of the query from the server:

>>>
>> query_english = q.get_query_english()
>> print("I am finding statements that", query_english)
I am finding statements that do not have an agent where FPLX=ERK and have an
agent where FPLX=MEK
class indra.sources.indra_db_rest.query.Query[source]

Bases: object

The parent of all query objects.

get(result_type, limit=None, sort_by=None, offset=None, timeout=None, n_tries=2, api_key=None, **other_params)[source]

Get results from the API of the given type.

Parameters
  • result_type (str) – The options are ‘statements’, ‘interactions’, ‘relations’, ‘agents’, and ‘hashes’, indicating the type of result you want.

  • limit (Optional[int]) – The maximum number of statements you want to try and retrieve. The server will by default limit the results, and any value exceeding that limit will be “overruled”.

  • sort_by (Optional[str]) – The value can be ‘default’, ‘ev_count’, or ‘belief’.

  • offset (Optional[int]) – The offset of the query to begin at.

  • timeout (Optional[int]) – The number of seconds to wait for the request to return before giving up. This timeout is applied to each try separately.

  • n_tries (Optional[int]) – The number of times to retry the request before giving up. Each try will have timeout seconds to complete before it gives up.

  • api_key (str or None) – Override or use in place of the API key given in the INDRA config file.

  • filter_ev (bool) – (for result_type='statements') Indicate whether evidence should have the same filters applied as the statements themselves, where appropriate (e.g. in the case of a filter by paper).

  • ev_limit (int) – (for result_type='statements') Limit the number of evidence returned per Statement.

  • with_hashes (bool) – (for result_type='relations' or result_type='agents') Choose whether the hashes for each Statement be included along with each grouped heading.

  • complexes_covered (list[int]) – (for result_type='agents') A list (or set) of complexes that have already come up in the agent groups returned. This prevents duplication.

get_query_json()[source]

Generate a compiled JSON rep of the query on the server.

get_query_english(timeout=None)[source]

Get the string representation of the query.

copy()[source]

Make a copy of the query.

to_simple_json()[source]

Generate the JSON from the object rep.

Return type

dict

class indra.sources.indra_db_rest.query.And(queries)[source]

Bases: indra.sources.indra_db_rest.query.Query

The intersection of two queries.

This are generally generated from the use of &, for example:

>>>
>> q_and = HashAgent('MEK') & HasAgent('ERK')
class indra.sources.indra_db_rest.query.Or(queries)[source]

Bases: indra.sources.indra_db_rest.query.Query

The union of two queries.

These are generally generated from the use of |, for example:

>>>
>> q_or = HasOnlySource('reach') | HasOnlySource('medscan')
class indra.sources.indra_db_rest.query.HasAgent(agent_id=None, namespace='NAME', role=None, agent_num=None)[source]

Bases: indra.sources.indra_db_rest.query.Query

Find Statements with the given agent in the given position.

NOTE: At this time 2 agent queries do NOT necessarily imply that the 2 agents are different. For example:

>>>
>> HasAgent("MEK") & HasAgent("MEK")

will get any Statements that have agent with name MEK, not Statements with two agents called MEK. This may change in the future, however in the meantime you can get around this fairly well by specifying the roles:

>>>
>> HasAgent("MEK", role="SUBJECT") & HasAgent("MEK", role="OBJECT")

Or for a more complicated case, consider a query for Statements where one agent is MEK and the other has namespace FPLX. Naturally any agent labeled as MEK will also have a namespace FPLX (MEK is a famplex identifier), and in general you will not want to constrain which role is MEK and which is the “other” agent. To accomplish this you need to use |:

>>>
>> (
>>   HasAgent("MEK", role="SUBJECT")
>>   & HasAgent(namespace="FPLX", role="OBJECT")
>> ) | (
>>   HasAgent("MEK", role="OBJECT")
>>   & HasAgent(namespace="FPLX", role="SUBJECT")
>> )
Parameters
  • agent_id (Optional[str]) – The ID string naming the agent, for example ‘ERK’ (FPLX or NAME) or ‘plx’ (TEXT), and so on. If None, the query must then be constrained by the namespace.

  • namespace (Optional[str]) – By default, this is NAME, indicating the agents canonical, grounded, name will be used. Other options include, but are not limited to: AUTO (in which case GILDA will be used to guess the proper grounding of the entity), FPLX (FamPlex), CHEBI, CHEMBL, HGNC, UP (UniProt), and TEXT (for raw text mentions). If agent_id is None, namespace must be specified and must not be NAME, TEXT, or AUTO.

  • role (Optional[str]) – None by default. Options are “SUBJECT”, “OBJECT”, or “OTHER”.

  • agent_num (Optionals[int]) – None by default. The regularized position of the agent in the Statement’s list of agents.

class indra.sources.indra_db_rest.query.FromMeshIds(mesh_ids)[source]

Bases: indra.sources.indra_db_rest.query.Query

Get stmts that came from papers annotated with the given Mesh Ids.

Parameters

mesh_ids (list) – A canonical MeSH ID, of the “C” or “D” variety, e.g. “D000135”.

class indra.sources.indra_db_rest.query.HasHash(stmt_hashes)[source]

Bases: indra.sources.indra_db_rest.query.Query

Find Statements whose hash is contained in the given list.

Parameters

stmt_hashes (list or set or tuple) – A collection of integers, where each integer is a shallow matches key hash of a Statement (frequently simply called “mk_hash” or “hash”)

class indra.sources.indra_db_rest.query.HasSources(sources)[source]

Bases: indra.sources.indra_db_rest.query.Query

Find Statements with support from the given list of sources.

For example, find Statements that have support from both medscan and reach.

Parameters

sources (list or set or tuple) – A collection of strings, each string the canonical name for a source. The result will include statements that have evidence from ALL sources that you include.

class indra.sources.indra_db_rest.query.HasOnlySource(only_source)[source]

Bases: indra.sources.indra_db_rest.query.Query

Find Statements that come exclusively from one source.

For example, find statements that come only from sparser.

Parameters

only_source (str) – The only source that spawned the statement, e.g. signor, or reach.

class indra.sources.indra_db_rest.query.HasReadings[source]

Bases: indra.sources.indra_db_rest.query.Query

Find Statements with support from readings.

class indra.sources.indra_db_rest.query.HasDatabases[source]

Bases: indra.sources.indra_db_rest.query.Query

Find Statements with support from Databases.

class indra.sources.indra_db_rest.query.HasType(stmt_types, include_subclasses=False)[source]

Bases: indra.sources.indra_db_rest.query.Query

Get Statements with the given type.

For example, you can find Statements that are Phosphorylations or Activations, or you could find all subclasses of RegulateActivity.

Parameters
  • stmt_types (set or list or tuple) – A collection of Strings, where each string is a class name for a type of Statement. Spelling and capitalization are necessary.

  • include_subclasses (bool) – (optional) default is False. If True, each Statement type given in the list will be expanded to include all of its sub classes.

class indra.sources.indra_db_rest.query.HasNumAgents(agent_nums)[source]

Bases: indra.sources.indra_db_rest.query.Query

Get Statements with the given number of agents.

For example, HasNumAgents([1,3,4]) will return agents with either 2, 3, or 4 agents (the latter two mostly being complexes).

Parameters

agent_nums (tuple) – A list of integers, each indicating a number of agents.

class indra.sources.indra_db_rest.query.HasNumEvidence(evidence_nums)[source]

Bases: indra.sources.indra_db_rest.query.Query

Get Statements with the given number of evidence.

For example, HasNumEvidence([2,3,4]) will return Statements that have either 2, 3, or 4 evidence.

Parameters

evidence_nums (Tuple[Union[int, str]]) – A list of numbers greater than 0, each indicating a number of evidence.

class indra.sources.indra_db_rest.query.HasEvidenceBound(evidence_bounds)[source]

Bases: indra.sources.indra_db_rest.query.Query

Get Statements with given bounds on their evidence count.

For example, HasEvidenceBound([”< 10”, “>= 5”]) will return Statements with less than 10 and as many or more than 5 evidence.

Parameters

evidence_bounds (Union[Iterable[str], str]) – An iterable (e.g. list) of strings such as “< 2” or “>= 4”. The argument of the inequality must be a natural number (0, 1, 2, …) and the inequality operation must be one of: <, >, <=, >=, ==, !=.

class indra.sources.indra_db_rest.query.FromPapers(paper_list)[source]

Bases: indra.sources.indra_db_rest.query.Query

Get Statements that came from a given list of papers.

Parameters

paper_list (list[(<id_type>, <paper_id>)]) – A list of tuples, where each tuple indicates and id-type (e.g. ‘pmid’) and an id value for a particular paper.

class indra.sources.indra_db_rest.query.EmptyQuery[source]

Bases: indra.sources.indra_db_rest.query.Query

A query that is empty.

INDRA Database REST Processor (indra.sources.indra_db_rest.processor)

Retrieving the results of large queries from the INDRA Database REST API generally involves multiple individual calls. The Processor classes defined here manage the retrieval process for results of two types, Statements and Statement hashes. Instances of these Processors are returned by the query functions in indra.sources.indra_db_rest.api.

class indra.sources.indra_db_rest.processor.IndraDBQueryProcessor(query, limit=None, sort_by='ev_count', timeout=None, strict_stop=False, persist=True, tries=3, api_key=None)[source]

Bases: object

The parent of all db query processors.

Parameters
  • query (Query) – The query to be evaluated in return for statements.

  • limit (int or None) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.

  • sort_by (str or None) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.

  • persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).

  • timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.

  • strict_stop (bool) – If True, the query will only be given timeout to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False. NOTE: in practice, due to overhead, the precision of the timeout is only around +/-0.1 seconds.

  • tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3

  • api_key (str or None) – Override or use in place of the API key given in the INDRA config file.

get_ev_counts()[source]

Get a dictionary of evidence counts.

get_belief_scores()[source]

Get a dictionary of belief scores.

get_source_counts()[source]

Get the source counts as a dict per statement hash.

cancel()[source]

Cancel the job, stopping the thread running in the background.

is_working()[source]

Check if the thread is running.

timed_out()[source]

Check if the processor timed out.

wait_until_done(timeout=None)[source]

Wait for the background load to complete.

static print_quiet_logs()[source]

Print the logs that were suppressed during the query.

class indra.sources.indra_db_rest.processor.DBQueryStatementProcessor(query, limit=None, sort_by='ev_count', ev_limit=10, filter_ev=True, timeout=None, strict_stop=False, persist=True, use_obtained_counts=False, tries=3, api_key=None)[source]

Bases: indra.sources.indra_db_rest.processor.IndraDBQueryProcessor

A Processor to get Statements from the server.

For information on thread control and other methods, see the docs for IndraDBQueryProcessor.

Parameters
  • query (Query) – The query to be evaluated in return for statements.

  • limit (int or None) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.

  • ev_limit (int or None) – Limit the amount of evidence returned per Statement. Default is 100.

  • filter_ev (bool) – Indicate whether evidence should have the same filters applied as the statements themselves, where appropriate (e.g. in the case of a filter by paper).

  • sort_by (str or None) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.

  • persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).

  • timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.

  • strict_stop (bool) – If True, the query will only be given timeout to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.

  • tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.

  • api_key (str or None) – Override or use in place of the API key given in the INDRA config file.

get_ev_count_by_hash(stmt_hash)[source]

Get the total evidence count for a statement hash.

get_ev_count(stmt)[source]

Get the total evidence count for a statement.

get_belief_score_by_hash(stmt_hash)[source]

Get the belief score for a statement hash.

get_belief_score_by_stmt(stmt)[source]

Get the belief score for a statement.

get_hash_statements_dict()[source]

Return a dict of Statements keyed by hashes.

get_source_count_by_hash(stmt_hash)[source]

Get the source counts for a given statement.

get_source_count(stmt)[source]

Get the source counts for a given statement.

merge_results(other_processor)[source]

Merge the results of this processor with those of another.

class indra.sources.indra_db_rest.processor.DBQueryHashProcessor(*args, **kwargs)[source]

Bases: indra.sources.indra_db_rest.processor.IndraDBQueryProcessor

A processor to get hashes from the server.

Parameters
  • query (Query) – The query to be evaluated in return for statements.

  • limit (int or None) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.

  • sort_by (str or None) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.

  • persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).

  • timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.

  • tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.