Tools (indra.tools
)¶
Run assembly components in a pipeline (indra.tools.assemble_corpus
)¶
- indra.tools.assemble_corpus.align_statements(stmts1, stmts2, keyfun=None)[source]¶
Return alignment of two lists of statements by key.
- Parameters
stmts1 (list[indra.statements.Statement]) – A list of INDRA Statements to align
stmts2 (list[indra.statements.Statement]) – A list of INDRA Statements to align
keyfun (Optional[function]) – A function that takes a Statement as an argument and returns a key to align by. If not given, the default key function is a tuble of the names of the Agents in the Statement.
- Returns
matches – A list of tuples where each tuple has two elements, the first corresponding to an element of the stmts1 list and the second corresponding to an element of the stmts2 list. If a given element is not matched, its corresponding pair in the tuple is None.
- Return type
- indra.tools.assemble_corpus.dump_statements(stmts_in, fname, protocol=4)[source]¶
Dump a list of statements into a pickle file.
- indra.tools.assemble_corpus.dump_stmt_strings(stmts, fname)[source]¶
Save printed statements in a file.
- indra.tools.assemble_corpus.expand_families(stmts_in, **kwargs)[source]¶
Expand FamPlex Agents to individual genes.
- indra.tools.assemble_corpus.filter_belief(stmts_in, belief_cutoff, **kwargs)[source]¶
Filter to statements with belief above a given cutoff.
- Parameters
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_by_curation(stmts_in, curations, incorrect_policy='any', correct_tags=None, update_belief=True)[source]¶
Filter out statements and update beliefs based on curations.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
curations (list[dict]) – A list of curations for evidences. Curation object should have (at least) the following attributes: pa_hash (preassembled statement hash), source_hash (evidence hash) and tag (e.g. ‘correct’, ‘wrong_relation’, etc.)
incorrect_policy (str) – A policy for filtering out statements given incorrect curations. The ‘any’ policy filters out a statement if at least one of its evidences is curated as incorrect and no evidences are curated as correct, while the ‘all’ policy only filters out a statement if all of its evidences are curated as incorrect.
correct_tags (list[str] or None) – A list of tags to be considered correct. If no tags are provided, only the ‘correct’ tag is considered correct.
update_belief (Option[bool]) – If True, set a belief score to 1 for statements curated as correct. Default: True
- indra.tools.assemble_corpus.filter_by_db_refs(stmts_in, namespace, values, policy, invert=False, match_suffix=False, **kwargs)[source]¶
Filter to Statements whose agents are grounded to a matching entry.
Statements are filtered so that the db_refs entry (of the given namespace) of their Agent/Concept arguments take a value in the given list of values.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of Statements to filter.
namespace (str) – The namespace in db_refs to which the filter should apply.
values (list[str]) – A list of values in the given namespace to which the filter should apply.
policy (str) – The policy to apply when filtering for the db_refs. “one”: keep Statements that contain at least one of the list of db_refs and possibly others not in the list “all”: keep Statements that only contain db_refs given in the list
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
invert (Optional[bool]) – If True, the Statements that do not match according to the policy are returned. Default: False
match_suffix (Optional[bool]) – If True, the suffix of the db_refs entry is matches agains the list of entries
- Returns
stmts_out – A list of filtered Statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_by_type(stmts_in, stmt_type, invert=False, **kwargs)[source]¶
Filter to a given statement type.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
stmt_type (str or indra.statements.Statement) – The class of the statement type to filter for. Alternatively, a string matching the name of the statement class, e.g., “Activation” can be used. Example: indra.statements.Modification or “Modification”
invert (Optional[bool]) – If True, the statements that are not of the given type are returned. Default: False
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_complexes_by_size(stmts_in, members_allowed=5)[source]¶
Filter out Complexes if the number of members exceeds specified allowed number.
- Parameters
- Returns
stmts_out – A list of filtered Statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_concept_names(stmts_in, name_list, policy, invert=False, **kwargs)[source]¶
Return Statements that refer to concepts/agents given as a list of names.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of Statements to filter.
name_list (list[str]) – A list of concept/agent names to filter for.
policy (str) – The policy to apply when filtering for the list of names. “one”: keep Statements that contain at least one of the list of names and possibly others not in the list “all”: keep Statements that only contain names given in the list
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
invert (Optional[bool]) – If True, the Statements that do not match according to the policy are returned. Default: False
- Returns
stmts_out – A list of filtered Statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_direct(stmts_in, **kwargs)[source]¶
Filter to statements that are direct interactions
- indra.tools.assemble_corpus.filter_enzyme_kinase(stmts_in, **kwargs)[source]¶
Filter Phosphorylations to ones where the enzyme is a known kinase.
- indra.tools.assemble_corpus.filter_evidence_source(stmts_in, source_apis, policy='one', **kwargs)[source]¶
Filter to statements that have evidence from a given set of sources.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
source_apis (list[str]) – A list of sources to filter for. Examples: biopax, bel, reach
policy (Optional[str]) – If ‘one’, a statement that hase evidence from any of the sources is kept. If ‘all’, only those statements are kept which have evidence from all the input sources specified in source_apis. If ‘none’, only those statements are kept that don’t have evidence from any of the sources specified in source_apis.
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_gene_list(stmts_in, gene_list, policy, allow_families=False, remove_bound=False, invert=False, **kwargs)[source]¶
Return statements that contain genes given in a list.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
gene_list (list[str]) – A list of gene symbols to filter for.
policy (str) – The policy to apply when filtering for the list of genes. “one”: keep statements that contain at least one of the list of genes and possibly others not in the list “all”: keep statements that only contain genes given in the list
allow_families (Optional[bool]) – Will include statements involving FamPlex families containing one of the genes in the gene list. Default: False
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
remove_bound (Optional[str]) – If true, removes bound conditions that are not genes in the list If false (default), looks at agents in the bound conditions in addition to those participating in the statement directly when applying the specified policy.
invert (Optional[bool]) – If True, the statements that do not match according to the policy are returned. Default: False
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_genes_only(stmts_in, specific_only=False, remove_bound=False, **kwargs)[source]¶
Filter to statements containing genes only.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
specific_only (Optional[bool]) – If True, only elementary genes/proteins will be kept and families will be filtered out. If False, families are also included in the output. Default: False
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
remove_bound (Optional[bool]) – If true, removes bound conditions that are not genes If false (default), filters out statements with non-gene bound conditions
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_grounded_only(stmts_in, score_threshold=None, remove_bound=False, **kwargs)[source]¶
Filter to statements that have grounded agents.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
score_threshold (Optional[float]) – If scored groundings are available in a list and the highest score if below this threshold, the Statement is filtered out.
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
remove_bound (Optional[bool]) – If true, removes ungrounded bound conditions from a statement. If false (default), filters out statements with ungrounded bound conditions.
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_human_only(stmts_in, remove_bound=False, **kwargs)[source]¶
Filter out statements that are grounded, but not to a human gene.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
remove_bound (Optional[bool]) – If true, removes all bound conditions that are grounded but not to human genes. If false (default), filters out statements with boundary conditions that are grounded to non-human genes.
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_inconsequential(stmts, mods=True, mod_whitelist=None, acts=True, act_whitelist=None)[source]¶
Keep filtering inconsequential modifications and activities until there is nothing else to filter.
- Parameters
stmts (list[indra.statements.Statement]) – A list of INDRA Statements to filter.
mods (Optional[bool]) – If True, inconsequential modifications are filtered out. Default: True
mod_whitelist (Optional[dict]) – A whitelist containing agent modification sites whose modifications should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of tuples of (modification_type, residue, position). Example: whitelist = {‘MAP2K1’: [(‘phosphorylation’, ‘S’, ‘222’)]}
acts (Optional[bool]) – If True, inconsequential activations are filtered out. Default: True
act_whitelist (Optional[dict]) – A whitelist containing agent activity types which should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of activity types. Example: whitelist = {‘MAP2K1’: [‘kinase’]}
- Returns
The filtered list of statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_inconsequential_acts(stmts_in, whitelist=None, **kwargs)[source]¶
Filter out Activations that modify inconsequential activities
Inconsequential here means that the site is not mentioned / tested in any other statement. In some cases specific activity types should be preserved, for instance, to be used as readouts in a model. In this case, the given activities can be passed in a whitelist.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
whitelist (Optional[dict]) – A whitelist containing agent activity types which should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of activity types. Example: whitelist = {‘MAP2K1’: [‘kinase’]}
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_inconsequential_mods(stmts_in, whitelist=None, **kwargs)[source]¶
Filter out Modifications that modify inconsequential sites
Inconsequential here means that the site is not mentioned / tested in any other statement. In some cases specific sites should be preserved, for instance, to be used as readouts in a model. In this case, the given sites can be passed in a whitelist.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
whitelist (Optional[dict]) – A whitelist containing agent modification sites whose modifications should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of tuples of (modification_type, residue, position). Example: whitelist = {‘MAP2K1’: [(‘phosphorylation’, ‘S’, ‘222’)]}
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_mod_nokinase(stmts_in, **kwargs)[source]¶
Filter non-phospho Modifications to ones with a non-kinase enzyme.
- indra.tools.assemble_corpus.filter_mutation_status(stmts_in, mutations, deletions, **kwargs)[source]¶
Filter statements based on existing mutations/deletions
This filter helps to contextualize a set of statements to a given cell type. Given a list of deleted genes, it removes statements that refer to these genes. It also takes a list of mutations and removes statements that refer to mutations not relevant for the given context.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
mutations (dict) – A dictionary whose keys are gene names, and the values are lists of tuples of the form (residue_from, position, residue_to). Example: mutations = {‘BRAF’: [(‘V’, ‘600’, ‘E’)]}
deletions (list) – A list of gene names that are deleted.
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.filter_no_hypothesis(stmts_in, **kwargs)[source]¶
Filter to statements that are not marked as hypothesis in epistemics.
- indra.tools.assemble_corpus.filter_no_negated(stmts_in, **kwargs)[source]¶
Filter to statements that are not marked as negated in epistemics.
- indra.tools.assemble_corpus.filter_top_level(stmts_in, **kwargs)[source]¶
Filter to statements that are at the top-level of the hierarchy.
Here top-level statements correspond to most specific ones.
- indra.tools.assemble_corpus.filter_transcription_factor(stmts_in, **kwargs)[source]¶
Filter out RegulateAmounts where subject is not a transcription factor.
- indra.tools.assemble_corpus.filter_uuid_list(stmts_in, uuids, invert=True, **kwargs)[source]¶
Filter to Statements corresponding to given UUIDs
- Parameters
- Returns
stmts_out – A list of filtered statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.fix_invalidities(stmts, in_place=False, print_report_before=False, print_report_after=False, prior_hash_annots=False)[source]¶
Fix invalidities in a list of statements.
- Parameters
stmts (
List
[Statement
]) – A list of statements to fix invalidities inin_place (
bool
) – If True, the statement objects are changed in place if an invalidity is fixed. Otherwise, a deepcopy is done before running fixes.print_report_before (
bool
) – Run and print a validation report on the statements before running fixing.print_report_after (
bool
) – Run and print a validation report on the statements after running fixing to check if any issues remain that weren’t handled by the fixing module.prior_hash_annots (
bool
) – If True, an annotation is added to each evidence of a statement with the hash of the statement prior to any fixes being applied. This is useful if this function is applied as a post-processing step on assembled statements and it is necessary to refer back to the original hash of statements before an invalidity fix here potentially changes it. Default: False
- Return type
- Returns
The list of statements with validation issues fixed and some invalid statements filtered out.
- indra.tools.assemble_corpus.load_statements(fname, as_dict=False)[source]¶
Load statements from a pickle file.
- Parameters
- Returns
stmts – A list or dict of statements that were loaded.
- Return type
- indra.tools.assemble_corpus.map_db_refs(stmts_in, db_refs_map=None)[source]¶
Update entries in db_refs to those provided in db_refs_map.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements to update db_refs in.
db_refs_map (Optional[dict]) – A dictionary where each key is a tuple (db_ns, db_id) representing old db_refs pair that has to be updated and each value is a new db_id to replace the old value with. If not provided, the default db_refs_map will be loaded.
- indra.tools.assemble_corpus.map_grounding(stmts_in, do_rename=True, grounding_map=None, misgrounding_map=None, agent_map=None, ignores=None, use_adeft=True, gilda_mode=None, grounding_map_policy='replace', **kwargs)[source]¶
Map grounding using the GroundingMapper.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to map.
do_rename (Optional[bool]) – If True, Agents are renamed based on their mapped grounding.
grounding_map (Optional[dict]) – A user supplied grounding map which maps a string to a dictionary of database IDs (in the format used by Agents’ db_refs).
misgrounding_map (Optional[dict]) – A user supplied misgrounding map which maps a string to a known misgrounding which can be eliminated by the grounding mapper.
ignores (Optional[list]) – A user supplied list of ignorable strings which, if present as an Agent text in a Statement, the Statement is filtered out.
use_adeft (Optional[bool]) – If True, Adeft will be attempted to be used for acronym disambiguation. Default: True
gilda_mode (Optional[str]) – If None, Gilda will not be for disambiguation. If ‘web’, the address set in the GILDA_URL configuration or environmental variable is used as a Gilda web service. If ‘local’, the gilda package is imported and used locally.
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
grounding_map_policy (Optional[str]) – If a grounding map is provided, use the policy to extend or replace a default grounding map. Default: ‘replace’.
- Returns
stmts_out – A list of mapped statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.map_sequence(stmts_in, do_methionine_offset=True, do_orthology_mapping=True, do_isoform_mapping=True, **kwargs)[source]¶
Map sequences using the SiteMapper.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to map.
do_methionine_offset (boolean) – Whether to check for off-by-one errors in site position (possibly) attributable to site numbering from mature proteins after cleavage of the initial methionine. If True, checks the reference sequence for a known modification at 1 site position greater than the given one; if there exists such a site, creates the mapping. Default is True.
do_orthology_mapping (boolean) – Whether to check sequence positions for known modification sites in mouse or rat sequences (based on PhosphoSitePlus data). If a mouse/rat site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
do_isoform_mapping (boolean) – Whether to check sequence positions for known modifications in other human isoforms of the protein (based on PhosphoSitePlus data). If a site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
use_cache (boolean) – If True, a cache will be created/used from the laction specified by SITEMAPPER_CACHE_PATH, defined in your INDRA config or the environment. If False, no cache is used. For more details on the cache, see the SiteMapper class definition.
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- Returns
stmts_out – A list of mapped statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.merge_groundings(stmts_in)[source]¶
Gather and merge original grounding information from evidences.
Each Statement’s evidences are traversed to find original grounding information. These groundings are then merged into an overall consensus grounding dict with as much detail as possible.
The current implementation is only applicable to Statements whose concept/agent roles are fixed. Complexes, Associations and Conversions cannot be handled correctly.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements whose groundings should be merged. These Statements are meant to have been preassembled and potentially have multiple pieces of evidence.
- Returns
stmts_out – The list of Statements now with groundings merged at the Statement level.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.normalize_active_forms(stmts_in)[source]¶
Run preassembly of ActiveForms only and keep other statements unchanged.
This is specifically useful in the special case of mechanism linking (that is run after preassembly) producing ActiveForm statements that are redundant. Otherwise, general preassembly deduplicates ActiveForms as expected.
- indra.tools.assemble_corpus.reduce_activities(stmts_in, **kwargs)[source]¶
Reduce the activity types in a list of statements
- indra.tools.assemble_corpus.rename_db_ref(stmts_in, ns_from, ns_to, **kwargs)[source]¶
Rename an entry in the db_refs of each Agent.
This is particularly useful when old Statements in pickle files need to be updated after a namespace was changed such as ‘BE’ to ‘FPLX’.
- Parameters
- Returns
stmts_out – A list of Statements with Agents’ db_refs changed.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.run_mechlinker(stmts_in, reduce_activities=False, reduce_modifications=False, replace_activations=False, require_active_forms=False, implicit=False)[source]¶
Instantiate MechLinker and run its methods in defined order.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements to run mechanism linking on.
reduce_activities (Optional[bool]) – If True, agent activities are reduced to their most specific, unambiguous form. Default: False
reduce_modifications (Optional[bool]) – If True, agent modifications are reduced to their most specific, unambiguous form. Default: False
replace_activations (Optional[bool]) – If True, if there is compatible pair of Modification(X, Y) and ActiveForm(Y) statements, then any Activation(X,Y) statements are filtered out. Default: False
require_active_forms (Optional[bool]) – If True, agents in active positions are rewritten to be in their active forms. Default: False
implicit (Optional[bool]) – If True, active forms of an agent are inferred from multiple statement types implicitly, otherwise only explicit ActiveForm statements are taken into account. Default: False
- Returns
A list of INDRA Statements that have gone through mechanism linking.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.run_preassembly(stmts_in, return_toplevel=True, poolsize=None, size_cutoff=None, belief_scorer=None, ontology=None, matches_fun=None, refinement_fun=None, flatten_evidence=False, flatten_evidence_collect_from=None, normalize_equivalences=False, normalize_opposites=False, normalize_ns='WM', run_refinement=True, filters=None, **kwargs)[source]¶
Run preassembly on a list of statements.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of statements to preassemble.
return_toplevel (Optional[bool]) – If True, only the top-level statements are returned. If False, all statements are returned irrespective of level of specificity. Default: True
poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.
size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
belief_scorer (Optional[indra.belief.BeliefScorer]) – Instance of BeliefScorer class to use in calculating Statement probabilities. If None is provided (default), then the default scorer is used.
ontology (Optional[IndraOntology]) – IndraOntology object to use for preassembly
matches_fun (Optional[function]) – A function to override the built-in matches_key function of statements.
refinement_fun (Optional[function]) – A function to override the built-in refinement_of function of statements.
flatten_evidence (Optional[bool]) – If True, evidences are collected and flattened via supports/supported_by links. Default: False
flatten_evidence_collect_from (Optional[str]) – String indicating whether to collect and flatten evidence from the supports attribute of each statement or the supported_by attribute. If not set, defaults to ‘supported_by’. Only relevant when flatten_evidence is True.
normalize_equivalences (Optional[bool]) – If True, equivalent groundings are rewritten to a single standard one. Default: False
normalize_opposites (Optional[bool]) – If True, groundings that have opposites in the ontology are rewritten to a single standard one.
normalize_ns (Optional[str]) – The name space with respect to which equivalences and opposites are normalized.
filters (Optional[list[:py:class:indra.preassembler.refinement.RefinementFilter]]) – A list of RefinementFilter classes that implement filters on possible statement refinements. For details on how to construct such a filter, see the documentation of
indra.preassembler.refinement.RefinementFilter
. If no user-supplied filters are provided, the default ontology-based filter is applied. If a list of filters is provided here, theindra.preassembler.refinement.OntologyRefinementFilter
isn’t appended by default, and should be added by the user, if necessary. Default: Nonesave (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
save_unique (Optional[str]) – The name of a pickle file to save the unique statements into.
- Returns
stmts_out – A list of preassembled top-level statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.run_preassembly_duplicate(preassembler, beliefengine, **kwargs)[source]¶
Run deduplication stage of preassembly on a list of statements.
- Parameters
preassembler (indra.preassembler.Preassembler) – A Preassembler instance
beliefengine (indra.belief.BeliefEngine) – A BeliefEngine instance.
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- Returns
stmts_out – A list of unique statements.
- Return type
list[indra.statements.Statement]
Run related stage of preassembly on a list of statements.
- Parameters
preassembler (indra.preassembler.Preassembler) – A Preassembler instance which already has a set of unique statements internally.
beliefengine (indra.belief.BeliefEngine) – A BeliefEngine instance.
return_toplevel (Optional[bool]) – If True, only the top-level statements are returned. If False, all statements are returned irrespective of level of specificity. Default: True
size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
flatten_evidence (Optional[bool]) – If True, evidences are collected and flattened via supports/supported_by links. Default: False
flatten_evidence_collect_from (Optional[str]) – String indicating whether to collect and flatten evidence from the supports attribute of each statement or the supported_by attribute. If not set, defaults to ‘supported_by’. Only relevant when flatten_evidence is True.
save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- Returns
stmts_out – A list of preassembled top-level statements.
- Return type
list[indra.statements.Statement]
- indra.tools.assemble_corpus.standardize_names_groundings(stmts)[source]¶
Standardize the names of Concepts with respect to an ontology.
NOTE: this function is currently optimized for Influence Statements obtained from Eidos, Hume, Sofia and CWMS. It will possibly yield unexpected results for biology-specific Statements.
- Parameters
stmts (list[indra.statements.Statement]) – A list of statements whose Concept names should be standardized.
Fix common invalidities in Statements (indra.tools.fix_invalidities
)¶
- indra.tools.fix_invalidities.fix_invalidities(stmts)[source]¶
Fix invalidities in a list of Statements.
Note that in some cases statements can be filtered out if there is a known issue to which there is no fix, e.g., a Translocation statements missing both location parameters.
- indra.tools.fix_invalidities.fix_invalidities_agent(agent)[source]¶
Fix invalidities of a single INDRA Agent in place.
- indra.tools.fix_invalidities.fix_invalidities_context(context)[source]¶
Fix invalidities of a single INDRA BioContext in place.
- indra.tools.fix_invalidities.fix_invalidities_db_refs(db_refs)[source]¶
Return a fixed version of a db_refs grounding dict.
Annotate websites with INDRA through hypothes.is (indra.tools.hypothesis_annotator
)¶
This module exposes functions that annotate websites (including PubMed and PubMedCentral pages, or any other text-based website) with INDRA Statements through hypothes.is. Features include reading the content of the website ‘de-novo’, and generating new INDRA Statements for annotation, and fetching existing statements for a paper from the INDRA DB and using those for annotation.
- indra.tools.hypothesis_annotator.annotate_paper_from_db(text_refs, assembly_pipeline=None)[source]¶
Upload INDRA Statements as annotations for a given paper based on content for that paper in the INDRA DB.
- Parameters
text_refs (dict) – A dict of text references, following the same format as the INDRA Evidence text_refs attribute.
assembly_pipeline (Optional[json]) – A list of pipeline steps (typically filters) that are applied before uploading statements to hypothes.is as annotations.
- indra.tools.hypothesis_annotator.read_and_annotate(text_refs, text_extractor=None, text_reader=None, assembly_pipeline=None)[source]¶
Read a paper/website and upload annotations derived from it to hypothes.is.
- Parameters
text_refs (dict) – A dict of text references, following the same format as the INDRA Evidence text_refs attribute.
text_extractor (Optional[function]) – A function which takes the raw content of a website (e.g., HTML) and extracts clean text from it to prepare for machine reading. This is only used if the text_refs is a URL (e.g., a Wikipedia page), it is not used for PMID or PMCID text_refs where content can be pre-processed and machine read directly. Default: None Example: html2text.HTML2Text().handle
text_reader (Optional[function]) – A function which takes a single text string argument (the text extracted from a given resource), runs reading on it, and returns a list of INDRA Statement objects. Due to complications with the PMC NXML format, this option only supports URL or PMID resources as input in text_refs. Default: None. In the default case, the INDRA REST API is called with an appropriate endpoint that runs Reach and processes its output into INDRA Statements.
assembly_pipeline (Optional[json]) – A list of assembly pipeline steps that are applied before uploading statements to hypothes.is as annotations. Example: [{‘function’: ‘map_grounding’}]
Build a network from a gene list (indra.tools.gene_network
)¶
- class indra.tools.gene_network.GeneNetwork(gene_list, basename=None)[source]¶
Build a set of INDRA statements for a given gene list from databases.
- Parameters
- get_bel_stmts(filter=False)[source]¶
Get relevant statements from the BEL large corpus.
Performs a series of neighborhood queries and then takes the union of all the statements. Because the query process can take a long time for large gene lists, the resulting list of statements are cached in a pickle file with the filename <basename>_bel_stmts.pkl. If the pickle file is present, it is used by default; if not present, the queries are performed and the results are cached.
- Parameters
filter (bool) – If True, includes only those statements that exclusively mention genes in
gene_list
. Default is False. Note that the full (unfiltered) set of statements are cached.- Returns
List of INDRA statements extracted from the BEL large corpus.
- Return type
list of
indra.statements.Statement
- get_biopax_stmts(filter=False, query='pathsbetween', database_filter=None)[source]¶
Get relevant statements from Pathway Commons.
Performs a “paths between” query for the genes in
gene_list
and uses the results to build statements. This function caches two files: the list of statements built from the query, which is cached in <basename>_biopax_stmts.pkl, and the OWL file returned by the Pathway Commons Web API, which is cached in <basename>_pc_pathsbetween.owl. If these cached files are found, then the results are returned based on the cached file and Pathway Commons is not queried again.- Parameters
filter (Optional[bool]) – If True, includes only those statements that exclusively mention genes in
gene_list
. Default is False.query (Optional[str]) – Defined what type of query is executed. The two options are ‘pathsbetween’ which finds paths between the given list of genes and only works if more than 1 gene is given, and ‘neighborhood’ which searches the immediate neighborhood of each given gene. Note that for pathsbetween queries with more thatn 60 genes, the query will be executed in multiple blocks for scalability.
database_filter (Optional[list[str]]) – A list of PathwayCommons databases to include in the query.
- Returns
List of INDRA statements extracted from Pathway Commons.
- Return type
list of
indra.statements.Statement
- get_statements(filter=False)[source]¶
Return the combined list of statements from BEL and Pathway Commons.
Internally calls
get_biopax_stmts()
andget_bel_stmts()
.
- run_preassembly(stmts, print_summary=True)[source]¶
Run complete preassembly procedure on the given statements.
Results are returned as a dict and stored in the attribute
results
. They are also saved in the pickle file <basename>_results.pkl.- Parameters
stmts (list of
indra.statements.Statement
) – Statements to preassemble.print_summary (bool) – If True (default), prints a summary of the preassembly process to the console.
- Returns
A dict containing the following entries:
raw: the starting set of statements before preassembly.
duplicates1: statements after initial de-duplication.
valid: statements found to have valid modification sites.
mapped: mapped statements (list of
indra.preassembler.sitemapper.MappedStatement
).mapped_stmts: combined list of valid statements and statements after mapping.
duplicates2: statements resulting from de-duplication of the statements in mapped_stmts.
related2: top-level statements after combining the statements in duplicates2.
- Return type
Build an executable model from a fragment of a large network (indra.tools.executable_subnetwork
)¶
- indra.tools.executable_subnetwork.get_subnetwork(statements, nodes)[source]¶
Return a PySB model based on a subset of given INDRA Statements.
Statements are first filtered for nodes in the given list and other nodes are optionally added based on relevance in a given network. The filtered statements are then assembled into an executable model using INDRA’s PySB Assembler.
- Parameters
- Returns
model – A PySB model object assembled using INDRA’s PySB Assembler from the INDRA Statements corresponding to the subnetwork.
- Return type
pysb.Model
Build a model incrementally over time (indra.tools.incremental_model
)¶
- class indra.tools.incremental_model.IncrementalModel(model_fname=None)[source]¶
Assemble a model incrementally by iteratively adding new Statements.
- Parameters
model_fname (Optional[str]) – The name of the pickle file in which a set of INDRA Statements are stored in a dict keyed by PubMed IDs. This is the state of an IncrementalModel that is loaded upon instantiation.
- stmts¶
A dictionary of INDRA Statements keyed by PMIDs that stores the current state of the IncrementalModel.
- get_model_agents()[source]¶
Return a list of all Agents from all Statements.
- Returns
agents – A list of Agents that are in the model.
- Return type
list[indra.statements.Agent]
- get_statements()[source]¶
Return a list of all Statements in a single list.
- Returns
stmts – A list of all the INDRA Statements in the model.
- Return type
list[indra.statements.Statement]
- get_statements_noprior()[source]¶
Return a list of all non-prior Statements in a single list.
- Returns
stmts – A list of all the INDRA Statements in the model (excluding the prior).
- Return type
list[indra.statements.Statement]
- get_statements_prior()[source]¶
Return a list of all prior Statements in a single list.
- Returns
stmts – A list of all the INDRA Statements in the prior.
- Return type
list[indra.statements.Statement]
- load_prior(prior_fname)[source]¶
Load a set of prior statements from a pickle file.
The prior statements have a special key in the stmts dictionary called “prior”.
- Parameters
prior_fname (str) – The name of the pickle file containing the prior Statements.
- preassemble(filters=None, grounding_map=None)[source]¶
Preassemble the Statements collected in the model.
Use INDRA’s GroundingMapper, Preassembler and BeliefEngine on the IncrementalModel and save the unique statements and the top level statements in class attributes.
Currently the following filter options are implemented: - grounding: require that all Agents in statements are grounded - human_only: require that all proteins are human proteins - prior_one: require that at least one Agent is in the prior model - prior_all: require that all Agents are in the prior model
- Parameters
filters (Optional[list[str]]) – A list of filter options to apply when choosing the statements. See description above for more details. Default: None
grounding_map (Optional[dict]) – A user supplied grounding map which maps a string to a dictionary of database IDs (in the format used by Agents’ db_refs).
The RAS Machine (indra.tools.machine
)¶
Prerequisites¶
First, install the machine-specific dependencies:
pip install indra[machine]
Starting a New Model¶
To start a new model, run
python -m indra.tools.machine make model_name
Alternatively, the command line interface can be invoked with
indra-machine make model_name
where model_name corresponds to the name of the model to initialize.
This script generates the following folders and files
model_name
model_name/log.txt
model_name/config.yaml
model_name/jsons/
You should the edit model_name/config.yaml to set up the search terms and optionally the credentials to use Twitter, Gmail or NDEx bindings.
Setting Up Search Terms¶
The config.yml file is a standard YAML configuration file. A template is available in model_name/config.yaml after having created the machine.
Two important fields in config.yml are search_terms
and search_genes
both of which are YAML lists. The entries of search_terms
are used
_directly_ as queries in PubMed search (for more information on PubMed
search strings,
read https://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Searching_PubMed).
Example:
search_terms:
- breast cancer
- proteasome
- apoptosis
The entries of search_genes
is a special list in which _only_ standard
HGNC gene symbols are allowed. Entries in this list are also used
to search PubMed but also serve as a list of prior genes that are known
to be relevant for the model.
#Entries in this can be used to search #PubMed specifically for articles that are tagged with the gene’s unique #identifier rather than its string name. This mode of searching for articles #on specific genes is much more reliable than searching for them using #string names.
Example:
search_genes:
- AKT1
- MAPK3
- EGFR
Extending a Model¶
To extend a model, run
python -m indra.tools.machine run_with_search model_name
Alternatively, the command line interface can be invoked with
indra-machine run_with_search model_name
Extending a model involves extracting PMIDs from emails (if Gmail credentials are given), and searching using INDRA’s PubMed client with each entry of search_terms in config.yaml as a search term. INDRA’s literature client is then used to find the full text corresponding to each PMID or its abstract when the full text is not available. The REACH parser is then used to read each new paper. INDRA uses the REACH output to construct Statements corresponding to mechanisms. It then adds them to an incremental model through a process of assembly involving duplication and overlap resolution and the application of filters.