"""
The Query architecture allows the construction of arbitrary queries for content
from the INDRA Database.
Specifically, queries constructed using this language of classes is converted
into optimized SQL by the INDRA Database REST API. Different classes represent
different types of constraints and are named as much as possible to fit
together when spoken aloud in English. For example:
>>>
>> HasAgent("MEK") & HasAgent("ERK") & HasType(["Phosphorylation"])
will find any Statement that has an agent MEK and an agent ERK and has the type
phosphorylation.
Query Classes (the building blocks)
-----------------------------------
Broadly, query classes can be broken into 3 types: queries on the meaning of a
Statement, queries on the provenance of a Statement, and queries that combine
groups of queries.
Meaning of a Statement:
- :py:class:`HasAgent`
- :py:class:`HasType`
- :py:class:`HasNumAgents`
Provenance of a Statement:
- :py:class:`HasReadings`
- :py:class:`HasDatabases`
- :py:class:`HasSources`
- :py:class:`HasOnlySource`
- :py:class:`FromPapers`
- :py:class:`FromMeshIds`
- :py:class:`HasNumEvidence`
- :py:class:`HasEvidenceBound`
Combine Queriers:
- :py:class:`And`
- :py:class:`Or`
There is also the special class, the :py:class:`EmptyQuery` which is useful
when programmatically building a query.
Building Nontrivial Queries (how to put the blocks together)
------------------------------------------------------------
In practice you should not use :py:class:`And` or :py:class:`Or` very often but
instead make use of the overloaded ``&`` and ``|`` operators to put Queries
together into more complex structures. In addition you can invert a query,
i.e., essentially ask for Statements that do *not* meet certain criteria, e.g.
"not has readings". This can be accomplished with the overloaded ``~``
operator, e.g. ``~HasReadings()``.
The query class works by representing and producing a particular JSON structure
which is recognized by the INDRA Database REST service, where it is translated
into a similar but more sophisticated Query language used by the Readonly
Database client. The Query class implements the basic methods used to
communicate with the REST Service in this way.
Examples
--------
First a couple of examples of the typical usage of a query object (See the
:py:func:`get_statements_from_query
<indra.sources.indra_db_rest.api.get_statements_from_query>` documentation for
more usage details):
**Example 1**: Get statements that have database evidence and have either MEK
or MAP2K1 as a name for any of its agents.
>>>
>> from indra.sources.indra_db_rest.api import get_statements_from_query
>> from indra.sources.indra_db_rest.query import *
>> q = HasAgent('MEK') | HasAgent('MAP2K1') & HasDatabases()
>> p = get_statements_from_query(q)
>> p.statements
[Activation(MEK(), ERK()),
Phosphorylation(MEK(), ERK()),
Activation(MAP2K1(), ERK()),
Activation(RAF1(), MEK()),
Phosphorylation(RAF1(), MEK()),
Phosphorylation(MAP2K1(), ERK()),
Activation(BRAF(), MEK()),
Inhibition(2-(2-amino-3-methoxyphenyl)chromen-4-one(), MEK()),
Activation(MAP2K1(), MAPK1()),
Activation(MAP2K1(), MAPK3()),
Phosphorylation(MAP2K1(), MAPK1()),
Phosphorylation(BRAF(), MEK()),
Activation(MEK(), MAPK1()),
Complex(BRAF(), MAP2K1()),
Phosphorylation(MAP2K1(), MAPK3()),
Activation(MEK(), MAPK3()),
Complex(MAP2K1(), RAF1()),
Activation(RAF1(), MAP2K1()),
Inhibition(trametinib(), MEK()),
Phosphorylation(MEK(), MAPK3()),
Complex(MAP2K1(), MAPK1()),
Phosphorylation(MEK(), MAPK1()),
Inhibition(selumetinib(), MEK()),
Phosphorylation(PAK1(), MAP2K1(), S, 298)]
**Example 2**: Get statements that have an agent MEK and an agent ERK and more
than 10 evidence.
>>>
>> q = HasAgent('MEK') & HasAgent('ERK') & HasEvidenceBound(["> 10"])
>> p = get_statements_from_query(q)
>> p.statements
[Activation(MEK(), ERK()),
Phosphorylation(MEK(), ERK()),
Complex(ERK(), MEK()),
Inhibition(MEK(), ERK()),
Dephosphorylation(MEK(), ERK()),
Complex(ERK(), MEK(), RAF()),
Phosphorylation(MEK(), ERK(), T),
Phosphorylation(MEK(), ERK(), Y),
Activation(MEK(), ERK(mods: (phosphorylation))),
IncreaseAmount(MEK(), ERK())]
**Example 3**: An example of using the ``~`` feature.
>>>
>> q = HasAgent('MEK', namespace='FPLX') & ~HasAgent('ERK', namespace='FPLX')
>> p = get_statements_from_query(q)
>> p.statements[:10]
[Phosphorylation(None, MEK()),
Phosphorylation(RAF(), MEK()),
Activation(RAF(), MEK()),
Activation(MEK(), MAPK()),
Inhibition(U0126(), MEK()),
Inhibition(MEK(), apoptotic process()),
Activation(MEK(), cell population proliferation()),
Activation(RAF1(), MEK()),
Phosphorylation(MEK(), MAPK()),
Phosphorylation(RAF1(), MEK())]
And now an example showing the different methods of the :py:class:`Query`
object:
**Example 4**: a tour demonstrating key utilities of a query object.
Consider the last query we wrote. You can examine the simple JSON sent to the
server:
>>>
>> q.to_simple_json()
{'class': 'And',
'constraint': {'queries': [{'class': 'HasAgent',
'constraint': {'agent_id': 'MEK',
'namespace': 'FPLX',
'role': None,
'agent_num': None},
'inverted': False},
{'class': 'HasAgent',
'constraint': {'agent_id': 'ERK',
'namespace': 'FPLX',
'role': None,
'agent_num': None},
'inverted': True}]},
'inverted': False}
Or you can retrieve the more "true" JSON representation that is generated by
the server from your simpler query:
>>>
>> q.get_query_json()
{'class': 'Intersection',
'constraint': {'query_list': [{'class': 'HasAgent',
'constraint': {'_regularized_id': 'MEK',
'agent_id': 'MEK',
'agent_num': None,
'namespace': 'FPLX',
'role': None},
'inverted': False},
{'class': 'HasAgent',
'constraint': {'_regularized_id': 'ERK',
'agent_id': 'ERK',
'agent_num': None,
'namespace': 'FPLX',
'role': None},
'inverted': True}]},
'inverted': False}
And last of all you can retrieve a human readable English description of the
query from the server:
>>>
>> query_english = q.get_query_english()
>> print("I am finding statements that", query_english)
I am finding statements that do not have an agent where FPLX=ERK and have an
agent where FPLX=MEK
"""
__all__ = ['Query', 'And', 'Or', 'HasAgent', 'FromMeshIds', 'HasHash',
'HasSources', 'HasOnlySource', 'HasReadings', 'HasDatabases',
'HasType', 'HasNumAgents', 'HasNumEvidence', 'HasEvidenceBound',
'FromPapers', 'EmptyQuery']
from typing import Iterable, Tuple, Union
from indra.sources.indra_db_rest.query_results import QueryResult
from indra.sources.indra_db_rest.util import make_db_rest_request, jsonify_args
[docs]class Query:
"""The parent of all query objects."""
def __init__(self):
self._inverted = False
self.__compiled_json = None
self.__compiled_str = None
# Here are defined some other functions to get info from the server.
[docs] def get(self, result_type, limit=None, sort_by=None, offset=None,
timeout=None, n_tries=2, api_key=None, **other_params):
"""Get results from the API of the given type.
Parameters
----------
result_type : str
The options are 'statements', 'interactions', 'relations', 'agents',
and 'hashes', indicating the type of result you want.
limit : Optional[int]
The maximum number of statements you want to try and retrieve. The
server will by default limit the results, and any value exceeding
that limit will be "overruled".
sort_by : Optional[str]
The value can be 'default', 'ev_count', or 'belief'.
offset : Optional[int]
The offset of the query to begin at.
timeout : Optional[int]
The number of seconds to wait for the request to return before
giving up. This timeout is applied to each try separately.
n_tries : Optional[int]
The number of times to retry the request before giving up. Each try
will have `timeout` seconds to complete before it gives up.
api_key : str or None
Override or use in place of the API key given in the INDRA config
file.
Other Parameters
----------------
filter_ev : bool
(for ``result_type='statements'``) Indicate whether evidence should
have the same filters applied as the statements themselves, where
appropriate (e.g. in the case of a filter by paper).
ev_limit : int
(for ``result_type='statements'``) Limit the number of evidence
returned per Statement.
with_hashes : bool
(for ``result_type='relations'`` or ``result_type='agents'``) Choose
whether the hashes for each Statement be included along with each
grouped heading.
complexes_covered : list[int]
(for ``result_type='agents'``) A list (or set) of complexes that
have already come up in the agent groups returned. This prevents
duplication.
"""
simple = self.__compiled_json is None
if simple:
query_json = self.to_simple_json()
else:
query_json = self.__compiled_json
resp = make_db_rest_request('post', f'query/{result_type}',
data={'query': query_json,
'kwargs': jsonify_args(other_params)},
params=dict(limit=limit, sort_by=sort_by,
offset=offset, simple=simple),
timeout=timeout, tries=n_tries,
api_key=api_key)
resp_json = resp.json()
self.__compiled_json = resp_json['query_json']
self.__compiled_str = None
return QueryResult.from_json(resp_json)
[docs] def get_query_json(self):
"""Generate a compiled JSON rep of the query on the server."""
if not self.__compiled_json:
resp = make_db_rest_request('post', 'compile/json',
data=self.to_simple_json())
self.__compiled_json = resp.json()
self.__compiled_str = None
return self.__compiled_json
[docs] def get_query_english(self, timeout=None):
"""Get the string representation of the query."""
if self.__compiled_str is None:
if self.__compiled_json is None:
query_json = self.to_simple_json()
simple = True
else:
query_json = self.__compiled_json
simple = False
resp = make_db_rest_request('post', 'compile/string',
data=query_json,
params=dict(simple=simple),
timeout=timeout)
self.__compiled_str = resp.content.decode('utf-8')
return self.__compiled_str
# Local (and largely internal) tools:
[docs] def copy(self):
"""Make a copy of the query."""
cp = self._copy()
cp._inverted = self._inverted
return cp
def _copy(self):
raise NotImplementedError()
[docs] def to_simple_json(self) -> dict:
"""Generate the JSON from the object rep."""
return {'class': self.__class__.__name__,
'constraint': self.get_constraint_dict(),
'inverted': self._inverted}
def get_constraint_dict(self) -> dict:
raise NotImplementedError()
def invert(self):
return self.__invert__()
# Define the operator overloads.
def __and__(self, other):
if isinstance(other, EmptyQuery):
return self.copy()
return And([self.copy(), other.copy()])
def __or__(self, other):
if isinstance(other, EmptyQuery):
return self.copy()
return Or([self.copy(), other.copy()])
def __invert__(self):
inv = self.copy()
inv._inverted = not self._inverted
return inv
def __repr__(self):
inv = '~' if self._inverted else ''
args = ', '.join(f'{key}="{value}"' if isinstance(value, str)
else f'{key}={value}'
for key, value in self.get_constraint_dict().items())
return f"{inv}{self.__class__.__name__}({args})"
[docs]class And(Query):
"""The intersection of two queries.
This are generally generated from the use of ``&``, for example:
>>>
>> q_and = HashAgent('MEK') & HasAgent('ERK')
"""
def __init__(self, queries: list):
self.queries = queries
super(And, self).__init__()
def _copy(self):
return And([q.copy() for q in self.queries])
def get_constraint_dict(self) -> dict:
return {'queries': [q.to_simple_json() for q in self.queries]}
def __repr__(self):
q_strings = [repr(q) for q in self.queries]
s = ' & '.join(q_strings)
if self._inverted:
s = f'~({s})'
return s
def __and__(self, other):
if isinstance(other, And):
other_queries = other.queries
else:
other_queries = [other]
return And([q.copy() for q in (self.queries + other_queries)])
[docs]class Or(Query):
"""The union of two queries.
These are generally generated from the use of ``|``, for example:
>>>
>> q_or = HasOnlySource('reach') | HasOnlySource('medscan')
"""
def __init__(self, queries: list):
self.queries = queries
super(Or, self).__init__()
def _copy(self):
return Or([q.copy() for q in self.queries])
def get_constraint_dict(self) -> dict:
return {'queries': [q.to_simple_json() for q in self.queries]}
def __repr__(self):
q_strings = [repr(q) for q in self.queries]
s = ' | '.join(q_strings)
if self._inverted:
s = f'~({s})'
return s
def __or__(self, other):
if isinstance(other, Or):
other_queries = other.queries
else:
other_queries = [other]
return Or([q.copy() for q in (self.queries + other_queries)])
[docs]class EmptyQuery(Query):
"""A query that is empty."""
def _copy(self):
return EmptyQuery()
def __and__(self, other):
return other
def __or__(self, other):
return other
def get_constraint_dict(self) -> dict:
return {}
[docs]class HasOnlySource(Query):
"""Find Statements that come exclusively from one source.
For example, find statements that come only from sparser.
Parameters
----------
only_source : str
The only source that spawned the statement, e.g. signor, or reach.
"""
def __init__(self, only_source):
self.only_source = only_source
super(HasOnlySource, self).__init__()
def _copy(self):
return HasOnlySource(self.only_source)
def get_constraint_dict(self) -> dict:
return {'only_source': self.only_source}
[docs]class HasSources(Query):
"""Find Statements with support from the given list of sources.
For example, find Statements that have support from both medscan and reach.
Parameters
----------
sources : list or set or tuple
A collection of strings, each string the canonical name for a source.
The result will include statements that have evidence from ALL sources
that you include.
"""
def __init__(self, sources):
self.sources = tuple(set(sources))
super(HasSources, self).__init__()
def _copy(self):
return HasSources(self.sources[:])
def get_constraint_dict(self) -> dict:
return {'sources': self.sources}
[docs]class HasReadings(Query):
"""Find Statements with support from readings."""
def _copy(self):
return HasReadings()
def get_constraint_dict(self) -> dict:
return {}
[docs]class HasDatabases(Query):
"""Find Statements with support from Databases."""
def _copy(self):
return HasDatabases()
def get_constraint_dict(self) -> dict:
return {}
[docs]class HasHash(Query):
"""Find Statements whose hash is contained in the given list.
Parameters
----------
stmt_hashes : list or set or tuple
A collection of integers, where each integer is a shallow matches key
hash of a Statement (frequently simply called "mk_hash" or "hash")
"""
def __init__(self, stmt_hashes):
self.stmt_hashes = stmt_hashes
super(HasHash, self).__init__()
def _copy(self):
return HasHash(self.stmt_hashes)
def get_constraint_dict(self) -> dict:
return {'stmt_hashes': self.stmt_hashes}
[docs]class HasAgent(Query):
"""Find Statements with the given agent in the given position.
**NOTE:** At this time 2 agent queries do NOT necessarily imply that the 2
agents are different. For example:
>>>
>> HasAgent("MEK") & HasAgent("MEK")
will get any Statements that have agent with name MEK, **not** Statements
with two agents called MEK. This may change in the future, however in the
meantime you can get around this fairly well by specifying the roles:
>>>
>> HasAgent("MEK", role="SUBJECT") & HasAgent("MEK", role="OBJECT")
Or for a more complicated case, consider a query for Statements where one
agent is MEK and the other has namespace FPLX. Naturally any agent labeled
as MEK will also have a namespace FPLX (MEK is a famplex identifier), and
in general you will not want to constrain which role is MEK and which is the
"other" agent. To accomplish this you need to use ``|``:
>>>
>> (
>> HasAgent("MEK", role="SUBJECT")
>> & HasAgent(namespace="FPLX", role="OBJECT")
>> ) | (
>> HasAgent("MEK", role="OBJECT")
>> & HasAgent(namespace="FPLX", role="SUBJECT")
>> )
Parameters
----------
agent_id : Optional[str]
The ID string naming the agent, for example 'ERK' (FPLX or NAME) or
'plx' (TEXT), and so on. If None, the query must then be constrained by
the ``namespace``.
namespace : Optional[str]
By default, this is NAME, indicating the agents canonical,
grounded, name will be used. Other options include, but are not limited
to: AUTO (in which case GILDA will be used to guess the proper grounding
of the entity), FPLX (FamPlex), CHEBI, CHEMBL, HGNC, UP (UniProt),
and TEXT (for raw text mentions). If ``agent_id`` is ``None``, namespace
must be specified and must **not** be NAME, TEXT, or AUTO.
role : Optional[str]
None by default. Options are "SUBJECT", "OBJECT", or "OTHER".
agent_num : Optionals[int]
None by default. The regularized position of the agent in the
Statement's list of agents.
"""
def __init__(self, agent_id=None, namespace='NAME', role=None,
agent_num=None):
if agent_id:
agent_id = agent_id.replace('_', r'\_')
self.agent_id = agent_id
self.namespace = namespace
self.role = role
self.agent_num = agent_num
super(HasAgent, self).__init__()
def _copy(self):
return HasAgent(self.agent_id, self.namespace, self.role,
self.agent_num)
def get_constraint_dict(self) -> dict:
return {'agent_id': self.agent_id, 'namespace': self.namespace,
'role': self.role, 'agent_num': self.agent_num}
[docs]class FromPapers(Query):
"""Get Statements that came from a given list of papers.
Parameters
----------
paper_list : list[(<id_type>, <paper_id>)]
A list of tuples, where each tuple indicates and id-type (e.g. 'pmid')
and an id value for a particular paper.
"""
def __init__(self, paper_list):
self.paper_list = paper_list
super(FromPapers, self).__init__()
def _copy(self):
return FromPapers(self.paper_list)
def get_constraint_dict(self) -> dict:
return {'paper_list': self.paper_list}
[docs]class FromMeshIds(Query):
"""Get stmts that came from papers annotated with the given Mesh Ids.
Parameters
----------
mesh_ids : list
A canonical MeSH ID, of the "C" or "D" variety, e.g. "D000135".
"""
def __init__(self, mesh_ids):
self.mesh_ids = mesh_ids
super(FromMeshIds, self).__init__()
def _copy(self):
return FromMeshIds(self.mesh_ids)
def get_constraint_dict(self) -> dict:
return {'mesh_ids': self.mesh_ids}
[docs]class HasNumAgents(Query):
"""Get Statements with the given number of agents.
For example, `HasNumAgents([1,3,4])` will return agents with either 2,
3, or 4 agents (the latter two mostly being complexes).
Parameters
----------
agent_nums : tuple
A list of integers, each indicating a number of agents.
"""
def __init__(self, agent_nums):
self.agent_nums = agent_nums
super(HasNumAgents, self).__init__()
def _copy(self):
return HasNumAgents(self.agent_nums)
def get_constraint_dict(self) -> dict:
return {'agent_nums': self.agent_nums}
[docs]class HasNumEvidence(Query):
"""Get Statements with the given number of evidence.
For example, HasNumEvidence([2,3,4]) will return Statements that have
either 2, 3, or 4 evidence.
Parameters
----------
evidence_nums :
A list of numbers greater than 0, each indicating a number of evidence.
"""
def __init__(self, evidence_nums: Tuple[Union[int, str]]):
self.evidence_nums = evidence_nums
super(HasNumEvidence, self).__init__()
def _copy(self):
return HasNumEvidence(self.evidence_nums)
def get_constraint_dict(self) -> dict:
return {'evidence_nums': self.evidence_nums}
[docs]class HasEvidenceBound(Query):
"""Get Statements with given bounds on their evidence count.
For example, HasEvidenceBound(["< 10", ">= 5"]) will return Statements with
less than 10 and as many or more than 5 evidence.
Parameters
----------
evidence_bounds :
An iterable (e.g. list) of strings such as "< 2" or ">= 4". The argument
of the inequality must be a natural number (0, 1, 2, ...) and the
inequality operation must be one of: <, >, <=, >=, ==, !=.
"""
def __init__(self, evidence_bounds: Union[Iterable[str], str]):
if isinstance(evidence_bounds, str):
evidence_bounds = [evidence_bounds]
self.evidence_bounds = list(evidence_bounds)
super(HasEvidenceBound, self).__init__()
def _copy(self):
return HasEvidenceBound(self.evidence_bounds)
def get_constraint_dict(self) -> dict:
return {'evidence_bounds': self.evidence_bounds}
[docs]class HasType(Query):
"""Get Statements with the given type.
For example, you can find Statements that are Phosphorylations or
Activations, or you could find all subclasses of RegulateActivity.
Parameters
----------
stmt_types : set or list or tuple
A collection of Strings, where each string is a class name for a type
of Statement. Spelling and capitalization are necessary.
include_subclasses : bool
(optional) default is False. If True, each Statement type given in the
list will be expanded to include all of its sub classes.
"""
def __init__(self, stmt_types, include_subclasses=False):
if isinstance(stmt_types, str):
stmt_types = [stmt_types]
self.stmt_types = stmt_types
self.include_subclasses = include_subclasses
super(HasType, self).__init__()
def _copy(self):
return HasType(self.stmt_types, self.include_subclasses)
def get_constraint_dict(self) -> dict:
return {'stmt_types': self.stmt_types,
'include_subclasses': self.include_subclasses}