Tools (indra.tools)

Run assembly components in a pipeline (indra.tools.assemble_corpus)

indra.tools.assemble_corpus.align_statements(stmts1, stmts2, keyfun=None)[source]

Return alignment of two lists of statements by key.

Parameters
  • stmts1 (list[indra.statements.Statement]) – A list of INDRA Statements to align

  • stmts2 (list[indra.statements.Statement]) – A list of INDRA Statements to align

  • keyfun (Optional[function]) – A function that takes a Statement as an argument and returns a key to align by. If not given, the default key function is a tuble of the names of the Agents in the Statement.

Returns

matches – A list of tuples where each tuple has two elements, the first corresponding to an element of the stmts1 list and the second corresponding to an element of the stmts2 list. If a given element is not matched, its corresponding pair in the tuple is None.

Return type

list(tuple)

indra.tools.assemble_corpus.dump_statements(stmts_in, fname, protocol=4)[source]

Dump a list of statements into a pickle file.

Parameters
  • fname (str) – The name of the pickle file to dump statements into.

  • protocol (Optional[int]) – The pickle protocol to use (use 2 for Python 2 compatibility). Default: 4

indra.tools.assemble_corpus.dump_stmt_strings(stmts, fname)[source]

Save printed statements in a file.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to save in a text file.

  • fname (Optional[str]) – The name of a text file to save the printed statements into.

indra.tools.assemble_corpus.expand_families(stmts_in, **kwargs)[source]

Expand FamPlex Agents to individual genes.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to expand.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of expanded statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_belief(stmts_in, belief_cutoff, **kwargs)[source]

Filter to statements with belief above a given cutoff.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • belief_cutoff (float) – Only statements with belief above the belief_cutoff will be returned. Here 0 < belief_cutoff < 1.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_by_curation(stmts_in, curations, incorrect_policy='any', correct_tags=None, update_belief=True)[source]

Filter out statements and update beliefs based on curations.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • curations (list[dict]) – A list of curations for evidences. Curation object should have (at least) the following attributes: pa_hash (preassembled statement hash), source_hash (evidence hash) and tag (e.g. ‘correct’, ‘wrong_relation’, etc.)

  • incorrect_policy (str) – A policy for filtering out statements given incorrect curations. The ‘any’ policy filters out a statement if at least one of its evidences is curated as incorrect and no evidences are curated as correct, while the ‘all’ policy only filters out a statement if all of its evidences are curated as incorrect.

  • correct_tags (list[str] or None) – A list of tags to be considered correct. If no tags are provided, only the ‘correct’ tag is considered correct.

  • update_belief (Option[bool]) – If True, set a belief score to 1 for statements curated as correct. Default: True

indra.tools.assemble_corpus.filter_by_db_refs(stmts_in, namespace, values, policy, invert=False, match_suffix=False, **kwargs)[source]

Filter to Statements whose agents are grounded to a matching entry.

Statements are filtered so that the db_refs entry (of the given namespace) of their Agent/Concept arguments take a value in the given list of values.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of Statements to filter.

  • namespace (str) – The namespace in db_refs to which the filter should apply.

  • values (list[str]) – A list of values in the given namespace to which the filter should apply.

  • policy (str) – The policy to apply when filtering for the db_refs. “one”: keep Statements that contain at least one of the list of db_refs and possibly others not in the list “all”: keep Statements that only contain db_refs given in the list

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • invert (Optional[bool]) – If True, the Statements that do not match according to the policy are returned. Default: False

  • match_suffix (Optional[bool]) – If True, the suffix of the db_refs entry is matches agains the list of entries

Returns

stmts_out – A list of filtered Statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_by_type(stmts_in, stmt_type, invert=False, **kwargs)[source]

Filter to a given statement type.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • stmt_type (str or indra.statements.Statement) – The class of the statement type to filter for. Alternatively, a string matching the name of the statement class, e.g., “Activation” can be used. Example: indra.statements.Modification or “Modification”

  • invert (Optional[bool]) – If True, the statements that are not of the given type are returned. Default: False

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_complexes_by_size(stmts_in, members_allowed=5)[source]

Filter out Complexes if the number of members exceeds specified allowed number.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements from which large Complexes need to be filtered out

  • members_allowed (Optional[int]) – Allowed number of members to include. Default: 5

Returns

stmts_out – A list of filtered Statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_concept_names(stmts_in, name_list, policy, invert=False, **kwargs)[source]

Return Statements that refer to concepts/agents given as a list of names.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of Statements to filter.

  • name_list (list[str]) – A list of concept/agent names to filter for.

  • policy (str) – The policy to apply when filtering for the list of names. “one”: keep Statements that contain at least one of the list of names and possibly others not in the list “all”: keep Statements that only contain names given in the list

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • invert (Optional[bool]) – If True, the Statements that do not match according to the policy are returned. Default: False

Returns

stmts_out – A list of filtered Statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_direct(stmts_in, **kwargs)[source]

Filter to statements that are direct interactions

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_enzyme_kinase(stmts_in, **kwargs)[source]

Filter Phosphorylations to ones where the enzyme is a known kinase.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_evidence_source(stmts_in, source_apis, policy='one', **kwargs)[source]

Filter to statements that have evidence from a given set of sources.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • source_apis (list[str]) – A list of sources to filter for. Examples: biopax, bel, reach

  • policy (Optional[str]) – If ‘one’, a statement that hase evidence from any of the sources is kept. If ‘all’, only those statements are kept which have evidence from all the input sources specified in source_apis. If ‘none’, only those statements are kept that don’t have evidence from any of the sources specified in source_apis.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_gene_list(stmts_in, gene_list, policy, allow_families=False, remove_bound=False, invert=False, **kwargs)[source]

Return statements that contain genes given in a list.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • gene_list (list[str]) – A list of gene symbols to filter for.

  • policy (str) – The policy to apply when filtering for the list of genes. “one”: keep statements that contain at least one of the list of genes and possibly others not in the list “all”: keep statements that only contain genes given in the list

  • allow_families (Optional[bool]) – Will include statements involving FamPlex families containing one of the genes in the gene list. Default: False

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • remove_bound (Optional[str]) – If true, removes bound conditions that are not genes in the list If false (default), looks at agents in the bound conditions in addition to those participating in the statement directly when applying the specified policy.

  • invert (Optional[bool]) – If True, the statements that do not match according to the policy are returned. Default: False

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_genes_only(stmts_in, specific_only=False, remove_bound=False, **kwargs)[source]

Filter to statements containing genes only.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • specific_only (Optional[bool]) – If True, only elementary genes/proteins will be kept and families will be filtered out. If False, families are also included in the output. Default: False

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • remove_bound (Optional[bool]) – If true, removes bound conditions that are not genes If false (default), filters out statements with non-gene bound conditions

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_grounded_only(stmts_in, score_threshold=None, remove_bound=False, **kwargs)[source]

Filter to statements that have grounded agents.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • score_threshold (Optional[float]) – If scored groundings are available in a list and the highest score if below this threshold, the Statement is filtered out.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • remove_bound (Optional[bool]) – If true, removes ungrounded bound conditions from a statement. If false (default), filters out statements with ungrounded bound conditions.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_human_only(stmts_in, remove_bound=False, **kwargs)[source]

Filter out statements that are grounded, but not to a human gene.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • remove_bound (Optional[bool]) – If true, removes all bound conditions that are grounded but not to human genes. If false (default), filters out statements with boundary conditions that are grounded to non-human genes.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_inconsequential(stmts, mods=True, mod_whitelist=None, acts=True, act_whitelist=None)[source]

Keep filtering inconsequential modifications and activities until there is nothing else to filter.

Parameters
  • stmts (list[indra.statements.Statement]) – A list of INDRA Statements to filter.

  • mods (Optional[bool]) – If True, inconsequential modifications are filtered out. Default: True

  • mod_whitelist (Optional[dict]) – A whitelist containing agent modification sites whose modifications should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of tuples of (modification_type, residue, position). Example: whitelist = {‘MAP2K1’: [(‘phosphorylation’, ‘S’, ‘222’)]}

  • acts (Optional[bool]) – If True, inconsequential activations are filtered out. Default: True

  • act_whitelist (Optional[dict]) – A whitelist containing agent activity types which should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of activity types. Example: whitelist = {‘MAP2K1’: [‘kinase’]}

Returns

The filtered list of statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_inconsequential_acts(stmts_in, whitelist=None, **kwargs)[source]

Filter out Activations that modify inconsequential activities

Inconsequential here means that the site is not mentioned / tested in any other statement. In some cases specific activity types should be preserved, for instance, to be used as readouts in a model. In this case, the given activities can be passed in a whitelist.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • whitelist (Optional[dict]) – A whitelist containing agent activity types which should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of activity types. Example: whitelist = {‘MAP2K1’: [‘kinase’]}

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_inconsequential_mods(stmts_in, whitelist=None, **kwargs)[source]

Filter out Modifications that modify inconsequential sites

Inconsequential here means that the site is not mentioned / tested in any other statement. In some cases specific sites should be preserved, for instance, to be used as readouts in a model. In this case, the given sites can be passed in a whitelist.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • whitelist (Optional[dict]) – A whitelist containing agent modification sites whose modifications should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of tuples of (modification_type, residue, position). Example: whitelist = {‘MAP2K1’: [(‘phosphorylation’, ‘S’, ‘222’)]}

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_mod_nokinase(stmts_in, **kwargs)[source]

Filter non-phospho Modifications to ones with a non-kinase enzyme.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_mutation_status(stmts_in, mutations, deletions, **kwargs)[source]

Filter statements based on existing mutations/deletions

This filter helps to contextualize a set of statements to a given cell type. Given a list of deleted genes, it removes statements that refer to these genes. It also takes a list of mutations and removes statements that refer to mutations not relevant for the given context.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • mutations (dict) – A dictionary whose keys are gene names, and the values are lists of tuples of the form (residue_from, position, residue_to). Example: mutations = {‘BRAF’: [(‘V’, ‘600’, ‘E’)]}

  • deletions (list) – A list of gene names that are deleted.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_no_hypothesis(stmts_in, **kwargs)[source]

Filter to statements that are not marked as hypothesis in epistemics.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_no_negated(stmts_in, **kwargs)[source]

Filter to statements that are not marked as negated in epistemics.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_top_level(stmts_in, **kwargs)[source]

Filter to statements that are at the top-level of the hierarchy.

Here top-level statements correspond to most specific ones.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_transcription_factor(stmts_in, **kwargs)[source]

Filter out RegulateAmounts where subject is not a transcription factor.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_uuid_list(stmts_in, uuids, invert=True, **kwargs)[source]

Filter to Statements corresponding to given UUIDs

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.

  • uuids (list[str]) – A list of UUIDs to filter for.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • invert (Optional[bool]) – Invert the filter to remove the Statements corresponding to the given UUIDs.

Returns

stmts_out – A list of filtered statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.load_statements(fname, as_dict=False)[source]

Load statements from a pickle file.

Parameters
  • fname (str) – The name of the pickle file to load statements from.

  • as_dict (Optional[bool]) – If True and the pickle file contains a dictionary of statements, it is returned as a dictionary. If False, the statements are always returned in a list. Default: False

Returns

stmts – A list or dict of statements that were loaded.

Return type

list

indra.tools.assemble_corpus.map_db_refs(stmts_in, db_refs_map=None)[source]

Update entries in db_refs to those provided in db_refs_map.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements to update db_refs in.

  • db_refs_map (Optional[dict]) – A dictionary where each key is a tuple (db_ns, db_id) representing old db_refs pair that has to be updated and each value is a new db_id to replace the old value with. If not provided, the default db_refs_map will be loaded.

indra.tools.assemble_corpus.map_grounding(stmts_in, do_rename=True, grounding_map=None, misgrounding_map=None, agent_map=None, ignores=None, use_adeft=True, gilda_mode=None, grounding_map_policy='replace', **kwargs)[source]

Map grounding using the GroundingMapper.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to map.

  • do_rename (Optional[bool]) – If True, Agents are renamed based on their mapped grounding.

  • grounding_map (Optional[dict]) – A user supplied grounding map which maps a string to a dictionary of database IDs (in the format used by Agents’ db_refs).

  • misgrounding_map (Optional[dict]) – A user supplied misgrounding map which maps a string to a known misgrounding which can be eliminated by the grounding mapper.

  • ignores (Optional[list]) – A user supplied list of ignorable strings which, if present as an Agent text in a Statement, the Statement is filtered out.

  • use_adeft (Optional[bool]) – If True, Adeft will be attempted to be used for acronym disambiguation. Default: True

  • gilda_mode (Optional[str]) – If None, Gilda will not be for disambiguation. If ‘web’, the address set in the GILDA_URL configuration or environmental variable is used as a Gilda web service. If ‘local’, the gilda package is imported and used locally.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • grounding_map_policy (Optional[str]) – If a grounding map is provided, use the policy to extend or replace a default grounding map. Default: ‘replace’.

Returns

stmts_out – A list of mapped statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.map_sequence(stmts_in, do_methionine_offset=True, do_orthology_mapping=True, do_isoform_mapping=True, **kwargs)[source]

Map sequences using the SiteMapper.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to map.

  • do_methionine_offset (boolean) – Whether to check for off-by-one errors in site position (possibly) attributable to site numbering from mature proteins after cleavage of the initial methionine. If True, checks the reference sequence for a known modification at 1 site position greater than the given one; if there exists such a site, creates the mapping. Default is True.

  • do_orthology_mapping (boolean) – Whether to check sequence positions for known modification sites in mouse or rat sequences (based on PhosphoSitePlus data). If a mouse/rat site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.

  • do_isoform_mapping (boolean) – Whether to check sequence positions for known modifications in other human isoforms of the protein (based on PhosphoSitePlus data). If a site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.

  • use_cache (boolean) – If True, a cache will be created/used from the laction specified by SITEMAPPER_CACHE_PATH, defined in your INDRA config or the environment. If False, no cache is used. For more details on the cache, see the SiteMapper class definition.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of mapped statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.merge_groundings(stmts_in)[source]

Gather and merge original grounding information from evidences.

Each Statement’s evidences are traversed to find original grounding information. These groundings are then merged into an overall consensus grounding dict with as much detail as possible.

The current implementation is only applicable to Statements whose concept/agent roles are fixed. Complexes, Associations and Conversions cannot be handled correctly.

Parameters

stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements whose groundings should be merged. These Statements are meant to have been preassembled and potentially have multiple pieces of evidence.

Returns

stmts_out – The list of Statements now with groundings merged at the Statement level.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.normalize_active_forms(stmts_in)[source]

Run preassembly of ActiveForms only and keep other statements unchanged.

This is specifically useful in the special case of mechanism linking (that is run after preassembly) producing ActiveForm statements that are redundant. Otherwise, general preassembly deduplicates ActiveForms as expected.

Parameters

stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements among which ActiveForms should be normalized.

Returns

A list of INDRA Statements in which ActiveForms are normalized.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.reduce_activities(stmts_in, **kwargs)[source]

Reduce the activity types in a list of statements

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to reduce activity types in.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of reduced activity statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.rename_db_ref(stmts_in, ns_from, ns_to, **kwargs)[source]

Rename an entry in the db_refs of each Agent.

This is particularly useful when old Statements in pickle files need to be updated after a namespace was changed such as ‘BE’ to ‘FPLX’.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements whose Agents’ db_refs need to be changed

  • ns_from (str) – The namespace identifier to replace

  • ns_to (str) – The namespace identifier to replace to

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of Statements with Agents’ db_refs changed.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.run_mechlinker(stmts_in, reduce_activities=False, reduce_modifications=False, replace_activations=False, require_active_forms=False, implicit=False)[source]

Instantiate MechLinker and run its methods in defined order.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements to run mechanism linking on.

  • reduce_activities (Optional[bool]) – If True, agent activities are reduced to their most specific, unambiguous form. Default: False

  • reduce_modifications (Optional[bool]) – If True, agent modifications are reduced to their most specific, unambiguous form. Default: False

  • replace_activations (Optional[bool]) – If True, if there is compatible pair of Modification(X, Y) and ActiveForm(Y) statements, then any Activation(X,Y) statements are filtered out. Default: False

  • require_active_forms (Optional[bool]) – If True, agents in active positions are rewritten to be in their active forms. Default: False

  • implicit (Optional[bool]) – If True, active forms of an agent are inferred from multiple statement types implicitly, otherwise only explicit ActiveForm statements are taken into account. Default: False

Returns

A list of INDRA Statements that have gone through mechanism linking.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.run_preassembly(stmts_in, return_toplevel=True, poolsize=None, size_cutoff=None, belief_scorer=None, ontology=None, matches_fun=None, refinement_fun=None, flatten_evidence=False, flatten_evidence_collect_from=None, normalize_equivalences=False, normalize_opposites=False, normalize_ns='WM', run_refinement=True, filters=None, **kwargs)[source]

Run preassembly on a list of statements.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements to preassemble.

  • return_toplevel (Optional[bool]) – If True, only the top-level statements are returned. If False, all statements are returned irrespective of level of specificity. Default: True

  • poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.

  • size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.

  • belief_scorer (Optional[indra.belief.BeliefScorer]) – Instance of BeliefScorer class to use in calculating Statement probabilities. If None is provided (default), then the default scorer is used.

  • ontology (Optional[IndraOntology]) – IndraOntology object to use for preassembly

  • matches_fun (Optional[function]) – A function to override the built-in matches_key function of statements.

  • refinement_fun (Optional[function]) – A function to override the built-in refinement_of function of statements.

  • flatten_evidence (Optional[bool]) – If True, evidences are collected and flattened via supports/supported_by links. Default: False

  • flatten_evidence_collect_from (Optional[str]) – String indicating whether to collect and flatten evidence from the supports attribute of each statement or the supported_by attribute. If not set, defaults to ‘supported_by’. Only relevant when flatten_evidence is True.

  • normalize_equivalences (Optional[bool]) – If True, equivalent groundings are rewritten to a single standard one. Default: False

  • normalize_opposites (Optional[bool]) – If True, groundings that have opposites in the ontology are rewritten to a single standard one.

  • normalize_ns (Optional[str]) – The name space with respect to which equivalences and opposites are normalized.

  • filters (Optional[list[:py:class:indra.preassembler.refinement.RefinementFilter]]) – A list of RefinementFilter classes that implement filters on possible statement refinements. For details on how to construct such a filter, see the documentation of indra.preassembler.refinement.RefinementFilter. If no user-supplied filters are provided, the default ontology-based filter is applied. If a list of filters is provided here, the indra.preassembler.refinement.OntologyRefinementFilter isn’t appended by default, and should be added by the user, if necessary. Default: None

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

  • save_unique (Optional[str]) – The name of a pickle file to save the unique statements into.

Returns

stmts_out – A list of preassembled top-level statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.run_preassembly_duplicate(preassembler, beliefengine, **kwargs)[source]

Run deduplication stage of preassembly on a list of statements.

Parameters
Returns

stmts_out – A list of unique statements.

Return type

list[indra.statements.Statement]

Run related stage of preassembly on a list of statements.

Parameters
  • preassembler (indra.preassembler.Preassembler) – A Preassembler instance which already has a set of unique statements internally.

  • beliefengine (indra.belief.BeliefEngine) – A BeliefEngine instance.

  • return_toplevel (Optional[bool]) – If True, only the top-level statements are returned. If False, all statements are returned irrespective of level of specificity. Default: True

  • size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.

  • flatten_evidence (Optional[bool]) – If True, evidences are collected and flattened via supports/supported_by links. Default: False

  • flatten_evidence_collect_from (Optional[str]) – String indicating whether to collect and flatten evidence from the supports attribute of each statement or the supported_by attribute. If not set, defaults to ‘supported_by’. Only relevant when flatten_evidence is True.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of preassembled top-level statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.standardize_names_groundings(stmts)[source]

Standardize the names of Concepts with respect to an ontology.

NOTE: this function is currently optimized for Influence Statements obtained from Eidos, Hume, Sofia and CWMS. It will possibly yield unexpected results for biology-specific Statements.

Parameters

stmts (list[indra.statements.Statement]) – A list of statements whose Concept names should be standardized.

indra.tools.assemble_corpus.strip_agent_context(stmts_in, **kwargs)[source]

Strip any context on agents within each statement.

Parameters
  • stmts_in (list[indra.statements.Statement]) – A list of statements whose agent context should be stripped.

  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.

Returns

stmts_out – A list of stripped statements.

Return type

list[indra.statements.Statement]

indra.tools.assemble_corpus.strip_supports(stmts)[source]

Remove supports and supported by from statements.

Annotate websites with INDRA through hypothes.is (indra.tools.hypothesis_annotator)

This module exposes functions that annotate websites (including PubMed and PubMedCentral pages, or any other text-based website) with INDRA Statements through hypothes.is. Features include reading the content of the website ‘de-novo’, and generating new INDRA Statements for annotation, and fetching existing statements for a paper from the INDRA DB and using those for annotation.

indra.tools.hypothesis_annotator.annotate_paper_from_db(text_refs, assembly_pipeline=None)[source]

Upload INDRA Statements as annotations for a given paper based on content for that paper in the INDRA DB.

Parameters
  • text_refs (dict) – A dict of text references, following the same format as the INDRA Evidence text_refs attribute.

  • assembly_pipeline (Optional[json]) – A list of pipeline steps (typically filters) that are applied before uploading statements to hypothes.is as annotations.

indra.tools.hypothesis_annotator.read_and_annotate(text_refs, text_extractor=None, text_reader=None, assembly_pipeline=None)[source]

Read a paper/website and upload annotations derived from it to hypothes.is.

Parameters
  • text_refs (dict) – A dict of text references, following the same format as the INDRA Evidence text_refs attribute.

  • text_extractor (Optional[function]) – A function which takes the raw content of a website (e.g., HTML) and extracts clean text from it to prepare for machine reading. This is only used if the text_refs is a URL (e.g., a Wikipedia page), it is not used for PMID or PMCID text_refs where content can be pre-processed and machine read directly. Default: None Example: html2text.HTML2Text().handle

  • text_reader (Optional[function]) – A function which takes a single text string argument (the text extracted from a given resource), runs reading on it, and returns a list of INDRA Statement objects. Due to complications with the PMC NXML format, this option only supports URL or PMID resources as input in text_refs. Default: None. In the default case, the INDRA REST API is called with an appropriate endpoint that runs Reach and processes its output into INDRA Statements.

  • assembly_pipeline (Optional[json]) – A list of assembly pipeline steps that are applied before uploading statements to hypothes.is as annotations. Example: [{‘function’: ‘map_grounding’}]

Build a network from a gene list (indra.tools.gene_network)

class indra.tools.gene_network.GeneNetwork(gene_list, basename=None)[source]

Build a set of INDRA statements for a given gene list from databases.

Parameters
  • gene_list (list[str]) – List of gene names.

  • basename (str or None (default)) – Filename prefix to be used for caching of intermediates (Biopax OWL file, pickled statement lists, etc.). If None, no results are cached and no cached files are used.

gene_list

List of gene names

Type

list[str]

basename

Filename prefix for cached intermediates, or None if no cached used.

Type

str or None

results

List of preassembled statements.

Type

list[indra.statements.Statement]

get_bel_stmts(filter=False)[source]

Get relevant statements from the BEL large corpus.

Performs a series of neighborhood queries and then takes the union of all the statements. Because the query process can take a long time for large gene lists, the resulting list of statements are cached in a pickle file with the filename <basename>_bel_stmts.pkl. If the pickle file is present, it is used by default; if not present, the queries are performed and the results are cached.

Parameters

filter (bool) – If True, includes only those statements that exclusively mention genes in gene_list. Default is False. Note that the full (unfiltered) set of statements are cached.

Returns

List of INDRA statements extracted from the BEL large corpus.

Return type

list of indra.statements.Statement

get_biopax_stmts(filter=False, query='pathsbetween', database_filter=None)[source]

Get relevant statements from Pathway Commons.

Performs a “paths between” query for the genes in gene_list and uses the results to build statements. This function caches two files: the list of statements built from the query, which is cached in <basename>_biopax_stmts.pkl, and the OWL file returned by the Pathway Commons Web API, which is cached in <basename>_pc_pathsbetween.owl. If these cached files are found, then the results are returned based on the cached file and Pathway Commons is not queried again.

Parameters
  • filter (Optional[bool]) – If True, includes only those statements that exclusively mention genes in gene_list. Default is False.

  • query (Optional[str]) – Defined what type of query is executed. The two options are ‘pathsbetween’ which finds paths between the given list of genes and only works if more than 1 gene is given, and ‘neighborhood’ which searches the immediate neighborhood of each given gene. Note that for pathsbetween queries with more thatn 60 genes, the query will be executed in multiple blocks for scalability.

  • database_filter (Optional[list[str]]) – A list of PathwayCommons databases to include in the query.

Returns

List of INDRA statements extracted from Pathway Commons.

Return type

list of indra.statements.Statement

get_statements(filter=False)[source]

Return the combined list of statements from BEL and Pathway Commons.

Internally calls get_biopax_stmts() and get_bel_stmts().

Parameters

filter (bool) – If True, includes only those statements that exclusively mention genes in gene_list. Default is False.

Returns

List of INDRA statements extracted the BEL large corpus and Pathway Commons.

Return type

list of indra.statements.Statement

run_preassembly(stmts, print_summary=True)[source]

Run complete preassembly procedure on the given statements.

Results are returned as a dict and stored in the attribute results. They are also saved in the pickle file <basename>_results.pkl.

Parameters
  • stmts (list of indra.statements.Statement) – Statements to preassemble.

  • print_summary (bool) – If True (default), prints a summary of the preassembly process to the console.

Returns

A dict containing the following entries:

  • raw: the starting set of statements before preassembly.

  • duplicates1: statements after initial de-duplication.

  • valid: statements found to have valid modification sites.

  • mapped: mapped statements (list of indra.preassembler.sitemapper.MappedStatement).

  • mapped_stmts: combined list of valid statements and statements after mapping.

  • duplicates2: statements resulting from de-duplication of the statements in mapped_stmts.

  • related2: top-level statements after combining the statements in duplicates2.

Return type

dict

Build an executable model from a fragment of a large network (indra.tools.executable_subnetwork)

indra.tools.executable_subnetwork.get_subnetwork(statements, nodes)[source]

Return a PySB model based on a subset of given INDRA Statements.

Statements are first filtered for nodes in the given list and other nodes are optionally added based on relevance in a given network. The filtered statements are then assembled into an executable model using INDRA’s PySB Assembler.

Parameters
  • statements (list[indra.statements.Statement]) – A list of INDRA Statements to extract a subnetwork from.

  • nodes (list[str]) – The names of the nodes to extract the subnetwork for.

Returns

model – A PySB model object assembled using INDRA’s PySB Assembler from the INDRA Statements corresponding to the subnetwork.

Return type

pysb.Model

Build a model incrementally over time (indra.tools.incremental_model)

class indra.tools.incremental_model.IncrementalModel(model_fname=None)[source]

Assemble a model incrementally by iteratively adding new Statements.

Parameters

model_fname (Optional[str]) – The name of the pickle file in which a set of INDRA Statements are stored in a dict keyed by PubMed IDs. This is the state of an IncrementalModel that is loaded upon instantiation.

stmts

A dictionary of INDRA Statements keyed by PMIDs that stores the current state of the IncrementalModel.

Type

dict[str, list[indra.statements.Statement]]

assembled_stmts

A list of INDRA Statements after assembly.

Type

list[indra.statements.Statement]

add_statements(pmid, stmts)[source]

Add INDRA Statements to the incremental model indexed by PMID.

Parameters
  • pmid (str) – The PMID of the paper from which statements were extracted.

  • stmts (list[indra.statements.Statement]) – A list of INDRA Statements to be added to the model.

get_model_agents()[source]

Return a list of all Agents from all Statements.

Returns

agents – A list of Agents that are in the model.

Return type

list[indra.statements.Agent]

get_statements()[source]

Return a list of all Statements in a single list.

Returns

stmts – A list of all the INDRA Statements in the model.

Return type

list[indra.statements.Statement]

get_statements_noprior()[source]

Return a list of all non-prior Statements in a single list.

Returns

stmts – A list of all the INDRA Statements in the model (excluding the prior).

Return type

list[indra.statements.Statement]

get_statements_prior()[source]

Return a list of all prior Statements in a single list.

Returns

stmts – A list of all the INDRA Statements in the prior.

Return type

list[indra.statements.Statement]

load_prior(prior_fname)[source]

Load a set of prior statements from a pickle file.

The prior statements have a special key in the stmts dictionary called “prior”.

Parameters

prior_fname (str) – The name of the pickle file containing the prior Statements.

preassemble(filters=None, grounding_map=None)[source]

Preassemble the Statements collected in the model.

Use INDRA’s GroundingMapper, Preassembler and BeliefEngine on the IncrementalModel and save the unique statements and the top level statements in class attributes.

Currently the following filter options are implemented: - grounding: require that all Agents in statements are grounded - human_only: require that all proteins are human proteins - prior_one: require that at least one Agent is in the prior model - prior_all: require that all Agents are in the prior model

Parameters
  • filters (Optional[list[str]]) – A list of filter options to apply when choosing the statements. See description above for more details. Default: None

  • grounding_map (Optional[dict]) – A user supplied grounding map which maps a string to a dictionary of database IDs (in the format used by Agents’ db_refs).

save(model_fname='model.pkl')[source]

Save the state of the IncrementalModel in a pickle file.

Parameters

model_fname (Optional[str]) – The name of the pickle file to save the state of the IncrementalModel in. Default: model.pkl

The RAS Machine (indra.tools.machine)

Prerequisites

First, install the machine-specific dependencies:

pip install indra[machine]

Starting a New Model

To start a new model, run

python -m indra.tools.machine make model_name

Alternatively, the command line interface can be invoked with

indra-machine make model_name

where model_name corresponds to the name of the model to initialize.

This script generates the following folders and files

  • model_name

  • model_name/log.txt

  • model_name/config.yaml

  • model_name/jsons/

You should the edit model_name/config.yaml to set up the search terms and optionally the credentials to use Twitter, Gmail or NDEx bindings.

Setting Up Search Terms

The config.yml file is a standard YAML configuration file. A template is available in model_name/config.yaml after having created the machine.

Two important fields in config.yml are search_terms and search_genes both of which are YAML lists. The entries of search_terms are used _directly_ as queries in PubMed search (for more information on PubMed search strings, read https://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.Searching_PubMed).

Example:

search_terms:
- breast cancer
- proteasome
- apoptosis

The entries of search_genes is a special list in which _only_ standard HGNC gene symbols are allowed. Entries in this list are also used to search PubMed but also serve as a list of prior genes that are known to be relevant for the model.

#Entries in this can be used to search #PubMed specifically for articles that are tagged with the gene’s unique #identifier rather than its string name. This mode of searching for articles #on specific genes is much more reliable than searching for them using #string names.

Example:

search_genes:
- AKT1
- MAPK3
- EGFR

Extending a Model

To extend a model, run

python -m indra.tools.machine run_with_search model_name

Alternatively, the command line interface can be invoked with

indra-machine run_with_search model_name

Extending a model involves extracting PMIDs from emails (if Gmail credentials are given), and searching using INDRA’s PubMed client with each entry of search_terms in config.yaml as a search term. INDRA’s literature client is then used to find the full text corresponding to each PMID or its abstract when the full text is not available. The REACH parser is then used to read each new paper. INDRA uses the REACH output to construct Statements corresponding to mechanisms. It then adds them to an incremental model through a process of assembly involving duplication and overlap resolution and the application of filters.

indra.tools.machine.copy_default_config(destination)[source]

Copies the default configuration to the given destination

Parameters

destination (str) – The location to which a default RAS Machine config file is placed.