INDRA Database REST Client (indra.sources.indra_db_rest
)¶
The INDRA database client allows querying a web service that serves content from a database of INDRA Statements collected and pre-assembled from various sources.
Access to the webservice requires a URL (INDRA_DB_REST_URL
) and
an API key (INDRA_DB_REST_API_KEY
), both of which may be placed in
your config file or as environment variables. If you do not have these
but would like to access the database REST API, please contact the
INDRA developers.
API to the INDRA Database REST Service (indra.sources.indra_db_rest.api
)¶
INDRA has been used to generate and maintain a database of causal relations as INDRA Statements. The contents of the INDRA Database can be accessed programmatically through this API.
The API includes three high-level query functions that cover many common use cases:
get_statements()
:Get statements by agent information and Statement type, e.g. “Statements with object MEK and type Inhibition” (This query function has a generic name to maintain backward compatibility.)
get_statements_for_paper()
:Get Statements based on the papers they are drawn from, for instance “Statements from the paper with PMID 12345”.
get_statements_by_hash()
:Distinct INDRA Statements are associated with a unique numeric hash. This endpoint can be used to query the database for provenance
Queries with more complex constraints can be made using the query language API in :py:module:`indra.sources.indra_db_rest.query` along with this function:
get_statements_from_query()
:This function works alongside the Query “language” to execute arbitrary requests for Statements based on statement metadata indexed in the Database.
There are also two functions relating to the submission and retrieval of
curations. It is possible to enter feedback the correctness of text-mined
Statements, which we call “curations”. submit_curations()
allows you to submit your curations, and get_curations()
allows you to
retrieve existing curations (an API key is required).
Limits, timeouts and threading¶
Some queries may return a large number of statements, requiring the client to assemble results from multiple successive requests to the REST API. The behavior of the client can be controlled by several parameters to the query functions.
For example, consider the query for Statements whose subject is TNF:
>>>
>> from indra.sources.indra_db_rest.api import get_statements
>> p = get_statements("TNF")
>> stmts = p.statements
Because there are many Statements associated with TNF, the client will make multiple paged requests to get all the results. The maximum number of Statements returned can be limited using the limit argument:
>>>
>> p = get_statements("TNF", limit=1000)
>> stmts = p.statements
For longer requests the client can work in a background thread after a timeout is reached. This can be done by specifying a timeout (in seconds) using the timeout argument. While the client continues retrieval, the first page of the statement results is available in the statements_sample attribute:
>>>
>> p = get_statements("TNF", timeout=5)
>> some_stmts = p.statements_sample
>>
>> # ...Do some other work...
>>
>> # Wait for the requests to finish before getting the final result.
>> p.wait_until_done()
>> stmts = p.statements
Note that the timeout specifies how long the client should block for the result, but that the result will continue to be retrieved until it is completed on a background thread. If desired one can supply a timeout of 0 and get the processor immediately, leaving the entire query to happen in the background.
You can check if the process is still running using the is_working method:
>>>
>> p = get_statements("TNF", timeout=0)
>> p.is_working()
True
If you don’t want the client to make multiple paged requests and instead want to get only the results from the first request, you can set “persist” to False (the request job can still be put in the background with timeout=0).
>>>
>> p = get_statements("TNF", persist=False)
>> stmts = p.statements
For additional details on these and other parameters controlling statement retrieval see the function documentation.
Using the Query Language¶
There are several metadata and data values indexed in the INDRA Database allowing for complex queries. Using the Query language these attributes can be combined in arbitrary ways using logical operators. For example, you may want to find Statements that MEK is inhibited found in papers related to breast cancer and that also have more than 10 evidence:
>>>
>> from indra.sources.indra_db_rest.api import get_statements_from_query
>> from indra.sources.indra_db_rest.query import HasAgent, HasType, \
>> FromMeshIds, HasEvidenceBound
>>
>> query = (HasAgent("MEK", namespace="FPLX") & HasType(["Inhibition"])
>> & FromMeshIds(["D001943"]) & HasEvidenceBound(["> 10"]))
>>
>> p = get_statements_from_query(query)
>> stmts = p.statements
In addition to joining constraints with “&” (an intersection, an “and”) as shown above, you can also form unions (a.k.a. “or”s) using “|”:
>>>
>> query = (
>> (
>> HasAgent("MEK", namespace="FPLX")
>> | HasAgent("MAP2K1", namespace="HGNC-SYMBOL")
>> )
>> & HasType(['Inhibition'])
>> )
>>
>> p = get_statements_from_query(query, limit=10)
For more details and examples of the Query architecture, see
query
.
Evidence Filtering¶
Queries can constrain results based on a property of the original evidence text, so anything from the text references (like pmid) to the readers included and whether the evidence is from a reading or a database, can all have an effect on the evidences included in the result. By default, such queries filter not only the statements but also their associated evidence, so that, for example, if you query for Statements from a given paper, the evidences returned with the Statements you queried are only from that paper.
>>>
>> p = get_statements_for_papers([('pmid', '20471474'),
>> ('pmcid', 'PMC3640704')])
>> all(ev.text_refs['PMID'] == '20471474'
>> or ev.text_refs['PMCID'] == 'PMC3640704'
>> for s in p.statements for ev in s.evidence)
True
You can deactivate this feature by setting filter_ev to False:
>>>
>> p = get_statements_for_papers([('pmid', '20471474'),
>> ('pmcid', 'PMC3640704')], filter_ev=False)
>> all(ev.text_refs['PMID'] == '20471474'
>> or ev.text_refs['PMCID'] == 'PMC3640704'
>> for s in p.statements for ev in s.evidence)
False
Curation Submission¶
Suppose you run a query and get some Statements with some evidence; you look through the results and find an evidence that does not really support the Statement. Using the API it is possible to provide feedback by submitting a curation.
>>>
>> from indra.statements import pretty_print_stmts
>> p = get_statements(agents=["TNF"], ev_limit=3, limit=1)
>> pretty_print_stmts(p.statements)
[LIST INDEX: 0] Activation(TNF(), apoptotic process())
================================================================================
EV INDEX: 0 These published reports in their aggregate support that TNFR2
SOURCE: reach can lower the threshold of bioavailable TNFalpha needed to
PMID: 19774075 cause apoptosis through TNFR1 thus amplifying extrinsic cell
death pathways.
--------------------------------------------------------------------------------
EV INDEX: 1 Our results indicate that IE86 inhibits tumor necrosis factor
SOURCE: reach (TNF)-alpha induced apoptosis and that the anti-apoptotic
PMID: 19502735 activity of this viral protein correlates with its expression
levels.
--------------------------------------------------------------------------------
EV INDEX: 2 This relationship between PUFAs and their anti-inflammatory
SOURCE: reach metabolites and type 1 DM is supported by the observation that
PMID: 28824543 in a mfat-1 transgenic mouse model whose islets contained
increased levels of n-3 PUFAs and significantly lower amounts
of n-6 PUFAs compared to the wild type, were resistant to
apoptosis induced by TNF-alpha, IL-1beta, and gamma-IFN.
--------------------------------------------------------------------------------
>>
>> submit_curation(p.statements[0].get_hash(), "correct", "usr@bogusemail.com",
>> pa_json=p.statements[0].to_json(),
>> ev_json=p.statements[0].evidence[1].to_json())
{'ref': {'id': 11919}, 'result': 'success'}
- indra.sources.indra_db_rest.api.get_statements(subject=None, object=None, agents=None, stmt_type=None, use_exact_type=False, limit=None, persist=True, timeout=None, strict_stop=False, ev_limit=10, sort_by='ev_count', tries=3, use_obtained_counts=False, api_key=None)[source]¶
Get Statements from the INDRA DB web API matching given agents and type.
You get a
DBQueryStatementProcessor
object, which allow Statements to be loaded in a background thread, providing a sample of the “best” content available promptly in thesample_statements
attribute, and populates the statements attribute when the paged load is complete. The “best” is determined by thesort_by
attribute, which may be either ‘belief’ or ‘ev_count’ or None.- Parameters
subject/object (str) – Optionally specify the subject and/or object of the statements you wish to get from the database. By default, the namespace is assumed to be HGNC gene names, however you may specify another namespace by including “@<namespace>” at the end of the name string. For example, if you want to specify an agent by chebi, you could use “CHEBI:6801@CHEBI”, or if you wanted to use the HGNC id, you could use “6871@HGNC”.
agents (list[str]) – A list of agents, specified in the same manner as subject and object, but without specifying their grammatical position.
stmt_type (str) – Specify the types of interactions you are interested in, as indicated by the sub-classes of INDRA’s Statements. This argument is not case sensitive. If the statement class given has sub-classes (e.g. RegulateAmount has IncreaseAmount and DecreaseAmount), then both the class itself, and its subclasses, will be queried, by default. If you do not want this behavior, set use_exact_type=True. Note that if max_stmts is set, it is possible only the exact statement type will be returned, as this is the first searched. The processor then cycles through the types, getting a page of results for each type and adding it to the quota, until the max number of statements is reached.
use_exact_type (bool) – If stmt_type is given, and you only want to search for that specific statement type, set this to True. Default is False.
limit (Optional[int]) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.
persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).
timeout (positive int or None) – If an int, block until the work is done and statements are retrieved, or until the timeout has expired, in which case the results so far will be returned in the response object, and further results will be added in a separate thread as they become available. Block indefinitely until all statements are retrieved. Default is None.
strict_stop (bool) – If True, the query will only be given timeout time to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.
ev_limit (Optional[int]) – Limit the amount of evidence returned per Statement. Default is 10.
sort_by (Optional[str]) – Str options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.
tries (Optional[int]) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.
use_obtained_counts (Optional[bool]) – If True, evidence counts and source counts are reported based on the actual evidences returned for each statement in this query (as opposed to all existing evidences, even if not all were returned). Default: False
api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.
- Returns
processor – An instance of the DBQueryStatementProcessor, which has an attribute
statements
which will be populated when the query/queries are done.- Return type
DBQueryStatementProcessor
- indra.sources.indra_db_rest.api.get_statements_for_papers(ids, limit=None, ev_limit=10, sort_by='ev_count', persist=True, timeout=None, strict_stop=False, tries=3, filter_ev=True, api_key=None)[source]¶
Get Statements extracted from the papers with the given ref ids.
- Parameters
ids (list[str, str]) – A list of tuples with ids and their type. For example:
[('pmid', '12345'), ('pmcid', 'PMC12345')]
The type can be any one of ‘pmid’, ‘pmcid’, ‘doi’, ‘pii’, ‘manuscript_id’, or ‘trid’, which is the primary key id of the text references in the database.limit (Optional[int]) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.
ev_limit (Optional[int]) – Limit the amount of evidence returned per Statement. Default is 10.
filter_ev (bool) – Indicate whether evidence should have the same filters applied as the statements themselves, where appropriate (e.g. in the case of a filter by paper).
sort_by (Optional[str]) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.
persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).
timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.
strict_stop (bool) – If True, the query will only be given timeout time to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.
tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.
api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.
- Returns
processor – An instance of the DBQueryStatementProcessor, which has an attribute statements which will be populated when the query/queries are done.
- Return type
DBQueryStatementProcessor
- indra.sources.indra_db_rest.api.get_statements_by_hash(hash_list, limit=None, ev_limit=10, sort_by='ev_count', persist=True, timeout=None, strict_stop=False, tries=3, api_key=None)[source]¶
Get Statements from a list of hashes.
- Parameters
limit (Optional[int]) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.
ev_limit (Optional[int]) – Limit the amount of evidence returned per Statement. Default is 10.
sort_by (Optional[str]) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.
persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).
timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.
strict_stop (bool) – If True, the query will only be given timeout time to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.
tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.
api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.
- Returns
processor – An instance of the DBQueryStatementProcessor, which has an attribute statements which will be populated when the query/queries are done.
- Return type
DBQueryStatementProcessor
- indra.sources.indra_db_rest.api.get_statements_from_query(query, limit=None, ev_limit=10, sort_by='ev_count', persist=True, timeout=None, strict_stop=False, tries=3, filter_ev=True, use_obtained_counts=False, api_key=None)[source]¶
Get Statements using a Query.
Example
>>> >> from indra.sources.indra_db_rest.query import HasAgent, FromMeshIds >> query = HasAgent("MEK", "FPLX") & FromMeshIds(["D001943"]) >> p = get_statements_from_query(query, limit=100) >> stmts = p.statements
- Parameters
query (
Query
) – The query to be evaluated in return for statements.limit (Optional[int]) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.
ev_limit (Optional[int]) – Limit the amount of evidence returned per Statement. Default is 10.
filter_ev (bool) – Indicate whether evidence should have the same filters applied as the statements themselves, where appropriate (e.g. in the case of a filter by paper).
sort_by (Optional[str]) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.
persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).
timeout (positive int or None) – If an int, return after
timeout
seconds, even if query is not done. Default is None.strict_stop (bool) – If True, the query will only be given timeout time to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.
use_obtained_counts (Optional[bool]) – If True, evidence counts and source counts are reported based on the actual evidences returned for each statement in this query (as opposed to all existing evidences, even if not all were returned). Default: False
tries (Optional[int]) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.
api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.
- Returns
processor – An instance of the DBQueryStatementProcessor, which has an attribute statements which will be populated when the query/queries are done.
- Return type
DBQueryStatementProcessor
- indra.sources.indra_db_rest.api.submit_curation(hash_val, tag, curator_email, text=None, source='indra_rest_client', ev_hash=None, pa_json=None, ev_json=None, api_key=None, is_test=False)[source]¶
Submit a curation for the given statement at the relevant level.
- Parameters
hash_val (int) – The hash corresponding to the statement.
tag (str) – A very short phrase categorizing the error or type of curation, e.g. “grounding” for a grounding error, or “correct” if you are marking a statement as correct.
curator_email (str) – The email of the curator.
text (str) – A brief description of the problem.
source (str) – The name of the access point through which the curation was performed. The default is ‘direct_client’, meaning this function was used directly. Any higher-level application should identify itself here.
ev_hash (int) – A hash of the sentence and other evidence information. Elsewhere referred to as source_hash.
pa_json (None or dict) – The JSON of a statement you wish to curate. If not given, it may be inferred (best effort) from the given hash.
ev_json (None or dict) – The JSON of an evidence you wish to curate. If not given, it cannot be inferred.
api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.
is_test (bool) – Used in testing. If True, no curation will actually be added to the database.
- indra.sources.indra_db_rest.api.get_curations(hash_val=None, source_hash=None, api_key=None)[source]¶
Get the curations for a specific statement and evidence.
If neither hash_val nor source_hash are given, all curations will be retrieved. This will require the user to have extra permissions, as determined by their API key.
- Parameters
hash_val (Optional[int]) – The hash of a statement whose curations you want to retrieve.
source_hash (Optional[int]) – The hash generated for a piece of evidence for which you want curations. The hash_val must be provided to use the source_hash.
api_key (Optional[str]) – Override or use in place of the API key given in the INDRA config file.
- Returns
curations – A list of dictionaries containing the curation data.
- Return type
Advanced Query Construction (indra.sources.indra_db_rest.query
)¶
The Query architecture allows the construction of arbitrary queries for content from the INDRA Database.
Specifically, queries constructed using this language of classes is converted into optimized SQL by the INDRA Database REST API. Different classes represent different types of constraints and are named as much as possible to fit together when spoken aloud in English. For example:
>>>
>> HasAgent("MEK") & HasAgent("ERK") & HasType(["Phosphorylation"])
will find any Statement that has an agent MEK and an agent ERK and has the type phosphorylation.
Query Classes (the building blocks)¶
Broadly, query classes can be broken into 3 types: queries on the meaning of a Statement, queries on the provenance of a Statement, and queries that combine groups of queries.
Meaning of a Statement:
Provenance of a Statement:
Combine Queriers:
There is also the special class, the EmptyQuery
which is useful
when programmatically building a query.
Building Nontrivial Queries (how to put the blocks together)¶
In practice you should not use And
or Or
very often but
instead make use of the overloaded &
and |
operators to put Queries
together into more complex structures. In addition you can invert a query,
i.e., essentially ask for Statements that do not meet certain criteria, e.g.
“not has readings”. This can be accomplished with the overloaded ~
operator, e.g. ~HasReadings()
.
The query class works by representing and producing a particular JSON structure which is recognized by the INDRA Database REST service, where it is translated into a similar but more sophisticated Query language used by the Readonly Database client. The Query class implements the basic methods used to communicate with the REST Service in this way.
Examples
First a couple of examples of the typical usage of a query object (See the
get_statements_from_query
documentation for
more usage details):
Example 1: Get statements that have database evidence and have either MEK or MAP2K1 as a name for any of its agents.
>>>
>> from indra.sources.indra_db_rest.api import get_statements_from_query
>> from indra.sources.indra_db_rest.query import *
>> q = HasAgent('MEK') | HasAgent('MAP2K1') & HasDatabases()
>> p = get_statements_from_query(q)
>> p.statements
[Activation(MEK(), ERK()),
Phosphorylation(MEK(), ERK()),
Activation(MAP2K1(), ERK()),
Activation(RAF1(), MEK()),
Phosphorylation(RAF1(), MEK()),
Phosphorylation(MAP2K1(), ERK()),
Activation(BRAF(), MEK()),
Inhibition(2-(2-amino-3-methoxyphenyl)chromen-4-one(), MEK()),
Activation(MAP2K1(), MAPK1()),
Activation(MAP2K1(), MAPK3()),
Phosphorylation(MAP2K1(), MAPK1()),
Phosphorylation(BRAF(), MEK()),
Activation(MEK(), MAPK1()),
Complex(BRAF(), MAP2K1()),
Phosphorylation(MAP2K1(), MAPK3()),
Activation(MEK(), MAPK3()),
Complex(MAP2K1(), RAF1()),
Activation(RAF1(), MAP2K1()),
Inhibition(trametinib(), MEK()),
Phosphorylation(MEK(), MAPK3()),
Complex(MAP2K1(), MAPK1()),
Phosphorylation(MEK(), MAPK1()),
Inhibition(selumetinib(), MEK()),
Phosphorylation(PAK1(), MAP2K1(), S, 298)]
Example 2: Get statements that have an agent MEK and an agent ERK and more than 10 evidence.
>>>
>> q = HasAgent('MEK') & HasAgent('ERK') & HasEvidenceBound(["> 10"])
>> p = get_statements_from_query(q)
>> p.statements
[Activation(MEK(), ERK()),
Phosphorylation(MEK(), ERK()),
Complex(ERK(), MEK()),
Inhibition(MEK(), ERK()),
Dephosphorylation(MEK(), ERK()),
Complex(ERK(), MEK(), RAF()),
Phosphorylation(MEK(), ERK(), T),
Phosphorylation(MEK(), ERK(), Y),
Activation(MEK(), ERK(mods: (phosphorylation))),
IncreaseAmount(MEK(), ERK())]
Example 3: An example of using the ~
feature.
>>>
>> q = HasAgent('MEK', namespace='FPLX') & ~HasAgent('ERK', namespace='FPLX')
>> p = get_statements_from_query(q)
>> p.statements[:10]
[Phosphorylation(None, MEK()),
Phosphorylation(RAF(), MEK()),
Activation(RAF(), MEK()),
Activation(MEK(), MAPK()),
Inhibition(U0126(), MEK()),
Inhibition(MEK(), apoptotic process()),
Activation(MEK(), cell population proliferation()),
Activation(RAF1(), MEK()),
Phosphorylation(MEK(), MAPK()),
Phosphorylation(RAF1(), MEK())]
And now an example showing the different methods of the Query
object:
Example 4: a tour demonstrating key utilities of a query object.
Consider the last query we wrote. You can examine the simple JSON sent to the server:
>>>
>> q.to_simple_json()
{'class': 'And',
'constraint': {'queries': [{'class': 'HasAgent',
'constraint': {'agent_id': 'MEK',
'namespace': 'FPLX',
'role': None,
'agent_num': None},
'inverted': False},
{'class': 'HasAgent',
'constraint': {'agent_id': 'ERK',
'namespace': 'FPLX',
'role': None,
'agent_num': None},
'inverted': True}]},
'inverted': False}
Or you can retrieve the more “true” JSON representation that is generated by the server from your simpler query:
>>>
>> q.get_query_json()
{'class': 'Intersection',
'constraint': {'query_list': [{'class': 'HasAgent',
'constraint': {'_regularized_id': 'MEK',
'agent_id': 'MEK',
'agent_num': None,
'namespace': 'FPLX',
'role': None},
'inverted': False},
{'class': 'HasAgent',
'constraint': {'_regularized_id': 'ERK',
'agent_id': 'ERK',
'agent_num': None,
'namespace': 'FPLX',
'role': None},
'inverted': True}]},
'inverted': False}
And last of all you can retrieve a human readable English description of the query from the server:
>>>
>> query_english = q.get_query_english()
>> print("I am finding statements that", query_english)
I am finding statements that do not have an agent where FPLX=ERK and have an
agent where FPLX=MEK
- class indra.sources.indra_db_rest.query.Query[source]¶
Bases:
object
The parent of all query objects.
- get(result_type, limit=None, sort_by=None, offset=None, timeout=None, n_tries=2, api_key=None, **other_params)[source]¶
Get results from the API of the given type.
- Parameters
result_type (str) – The options are ‘statements’, ‘interactions’, ‘relations’, ‘agents’, and ‘hashes’, indicating the type of result you want.
limit (Optional[int]) – The maximum number of statements you want to try and retrieve. The server will by default limit the results, and any value exceeding that limit will be “overruled”.
sort_by (Optional[str]) – The value can be ‘default’, ‘ev_count’, or ‘belief’.
offset (Optional[int]) – The offset of the query to begin at.
timeout (Optional[int]) – The number of seconds to wait for the request to return before giving up. This timeout is applied to each try separately.
n_tries (Optional[int]) – The number of times to retry the request before giving up. Each try will have timeout seconds to complete before it gives up.
api_key (str or None) – Override or use in place of the API key given in the INDRA config file.
filter_ev (bool) – (for
result_type='statements'
) Indicate whether evidence should have the same filters applied as the statements themselves, where appropriate (e.g. in the case of a filter by paper).ev_limit (int) – (for
result_type='statements'
) Limit the number of evidence returned per Statement.with_hashes (bool) – (for
result_type='relations'
orresult_type='agents'
) Choose whether the hashes for each Statement be included along with each grouped heading.complexes_covered (list[int]) – (for
result_type='agents'
) A list (or set) of complexes that have already come up in the agent groups returned. This prevents duplication.
- class indra.sources.indra_db_rest.query.And(queries)[source]¶
Bases:
Query
The intersection of two queries.
This are generally generated from the use of
&
, for example:>>> >> q_and = HashAgent('MEK') & HasAgent('ERK')
- class indra.sources.indra_db_rest.query.Or(queries)[source]¶
Bases:
Query
The union of two queries.
These are generally generated from the use of
|
, for example:>>> >> q_or = HasOnlySource('reach') | HasOnlySource('medscan')
- class indra.sources.indra_db_rest.query.HasAgent(agent_id=None, namespace='NAME', role=None, agent_num=None)[source]¶
Bases:
Query
Find Statements with the given agent in the given position.
NOTE: At this time 2 agent queries do NOT necessarily imply that the 2 agents are different. For example:
>>> >> HasAgent("MEK") & HasAgent("MEK")
will get any Statements that have agent with name MEK, not Statements with two agents called MEK. This may change in the future, however in the meantime you can get around this fairly well by specifying the roles:
>>> >> HasAgent("MEK", role="SUBJECT") & HasAgent("MEK", role="OBJECT")
Or for a more complicated case, consider a query for Statements where one agent is MEK and the other has namespace FPLX. Naturally any agent labeled as MEK will also have a namespace FPLX (MEK is a famplex identifier), and in general you will not want to constrain which role is MEK and which is the “other” agent. To accomplish this you need to use
|
:>>> >> ( >> HasAgent("MEK", role="SUBJECT") >> & HasAgent(namespace="FPLX", role="OBJECT") >> ) | ( >> HasAgent("MEK", role="OBJECT") >> & HasAgent(namespace="FPLX", role="SUBJECT") >> )
- Parameters
agent_id (Optional[str]) – The ID string naming the agent, for example ‘ERK’ (FPLX or NAME) or ‘plx’ (TEXT), and so on. If None, the query must then be constrained by the
namespace
.namespace (Optional[str]) – By default, this is NAME, indicating the agents canonical, grounded, name will be used. Other options include, but are not limited to: AUTO (in which case GILDA will be used to guess the proper grounding of the entity), FPLX (FamPlex), CHEBI, CHEMBL, HGNC, UP (UniProt), and TEXT (for raw text mentions). If
agent_id
isNone
, namespace must be specified and must not be NAME, TEXT, or AUTO.role (Optional[str]) – None by default. Options are “SUBJECT”, “OBJECT”, or “OTHER”.
agent_num (Optionals[int]) – None by default. The regularized position of the agent in the Statement’s list of agents.
- class indra.sources.indra_db_rest.query.FromMeshIds(mesh_ids)[source]¶
Bases:
Query
Get stmts that came from papers annotated with the given Mesh Ids.
- Parameters
mesh_ids (list) – A canonical MeSH ID, of the “C” or “D” variety, e.g. “D000135”.
- class indra.sources.indra_db_rest.query.HasHash(stmt_hashes)[source]¶
Bases:
Query
Find Statements whose hash is contained in the given list.
- class indra.sources.indra_db_rest.query.HasSources(sources)[source]¶
Bases:
Query
Find Statements with support from the given list of sources.
For example, find Statements that have support from both medscan and reach.
- class indra.sources.indra_db_rest.query.HasOnlySource(only_source)[source]¶
Bases:
Query
Find Statements that come exclusively from one source.
For example, find statements that come only from sparser.
- Parameters
only_source (str) – The only source that spawned the statement, e.g. signor, or reach.
- class indra.sources.indra_db_rest.query.HasReadings[source]¶
Bases:
Query
Find Statements with support from readings.
- class indra.sources.indra_db_rest.query.HasDatabases[source]¶
Bases:
Query
Find Statements with support from Databases.
- class indra.sources.indra_db_rest.query.HasType(stmt_types, include_subclasses=False)[source]¶
Bases:
Query
Get Statements with the given type.
For example, you can find Statements that are Phosphorylations or Activations, or you could find all subclasses of RegulateActivity.
- Parameters
stmt_types (set or list or tuple) – A collection of Strings, where each string is a class name for a type of Statement. Spelling and capitalization are necessary.
include_subclasses (bool) – (optional) default is False. If True, each Statement type given in the list will be expanded to include all of its sub classes.
- class indra.sources.indra_db_rest.query.HasNumAgents(agent_nums)[source]¶
Bases:
Query
Get Statements with the given number of agents.
For example, HasNumAgents([1,3,4]) will return agents with either 2, 3, or 4 agents (the latter two mostly being complexes).
- Parameters
agent_nums (tuple) – A list of integers, each indicating a number of agents.
- class indra.sources.indra_db_rest.query.HasNumEvidence(evidence_nums)[source]¶
Bases:
Query
Get Statements with the given number of evidence.
For example, HasNumEvidence([2,3,4]) will return Statements that have either 2, 3, or 4 evidence.
- class indra.sources.indra_db_rest.query.HasEvidenceBound(evidence_bounds)[source]¶
Bases:
Query
Get Statements with given bounds on their evidence count.
For example, HasEvidenceBound([”< 10”, “>= 5”]) will return Statements with less than 10 and as many or more than 5 evidence.
- class indra.sources.indra_db_rest.query.FromPapers(paper_list)[source]¶
Bases:
Query
Get Statements that came from a given list of papers.
- Parameters
paper_list (list[(<id_type>, <paper_id>)]) – A list of tuples, where each tuple indicates and id-type (e.g. ‘pmid’) and an id value for a particular paper.
INDRA Database REST Processor (indra.sources.indra_db_rest.processor
)¶
Retrieving the results of large queries from the INDRA Database REST API
generally involves multiple individual calls. The Processor classes
defined here manage the retrieval process for results of two types, Statements
and Statement hashes. Instances of these Processors are returned by the query
functions in indra.sources.indra_db_rest.api
.
- class indra.sources.indra_db_rest.processor.IndraDBQueryProcessor(query, limit=None, sort_by='ev_count', timeout=None, strict_stop=False, persist=True, tries=3, api_key=None)[source]¶
Bases:
object
The parent of all db query processors.
- Parameters
query (
Query
) – The query to be evaluated in return for statements.limit (int or None) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.
sort_by (str or None) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.
persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).
timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.
strict_stop (bool) – If True, the query will only be given timeout to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False. NOTE: in practice, due to overhead, the precision of the timeout is only around +/-0.1 seconds.
tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3
api_key (str or None) – Override or use in place of the API key given in the INDRA config file.
- class indra.sources.indra_db_rest.processor.DBQueryStatementProcessor(query, limit=None, sort_by='ev_count', ev_limit=10, filter_ev=True, timeout=None, strict_stop=False, persist=True, use_obtained_counts=False, tries=3, api_key=None)[source]¶
Bases:
IndraDBQueryProcessor
A Processor to get Statements from the server.
For information on thread control and other methods, see the docs for
IndraDBQueryProcessor
.- Parameters
query (
Query
) – The query to be evaluated in return for statements.limit (int or None) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.
ev_limit (int or None) – Limit the amount of evidence returned per Statement. Default is 100.
filter_ev (bool) – Indicate whether evidence should have the same filters applied as the statements themselves, where appropriate (e.g. in the case of a filter by paper).
sort_by (str or None) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.
persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).
timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.
strict_stop (bool) – If True, the query will only be given timeout to complete before being abandoned entirely. Otherwise the timeout will simply wait for the thread to join for timeout seconds before returning, allowing other work to continue while the query runs in the background. The default is False.
use_obtained_counts (Optional[bool]) – If True, evidence counts and source counts are reported based on the actual evidences returned for each statement in this query (as opposed to all existing evidences, even if not all were returned). Default: False
tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.
api_key (str or None) – Override or use in place of the API key given in the INDRA config file.
- class indra.sources.indra_db_rest.processor.DBQueryHashProcessor(*args, **kwargs)[source]¶
Bases:
IndraDBQueryProcessor
A processor to get hashes from the server.
- Parameters
query (
Query
) – The query to be evaluated in return for statements.limit (int or None) – Select the maximum number of statements to return. When set less than 500 the effect is much the same as setting persist to false, and will guarantee a faster response. Default is None.
sort_by (str or None) – Options are currently ‘ev_count’ or ‘belief’. Results will return in order of the given parameter. If None, results will be turned in an arbitrary order.
persist (bool) – Default is True. When False, if a query comes back limited (not all results returned), just give up and pass along what was returned. Otherwise, make further queries to get the rest of the data (which may take some time).
timeout (positive int or None) – If an int, return after timeout seconds, even if query is not done. Default is None.
tries (int > 0) – Set the number of times to try the query. The database often caches results, so if a query times out the first time, trying again after a timeout will often succeed fast enough to avoid a timeout. This can also help gracefully handle an unreliable connection, if you’re willing to wait. Default is 3.