Tools (indra.tools)

Run assembly components in a pipeline (indra.tools.assemble_corpus)

indra.tools.assemble_corpus.dump_statements(stmts, fname)[source]

Dump a list of statements into a pickle file.

Parameters:fname (str) – The name of the pickle file to dump statements into.
indra.tools.assemble_corpus.dump_stmt_strings(stmts, fname)[source]

Save printed statements in a file.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to save in a text file.
  • fname (Optional[str]) – The name of a text file to save the printed statements into.
indra.tools.assemble_corpus.expand_families(stmts_in, **kwargs)[source]

Expand Bioentities Agents to individual genes.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to expand.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of expanded statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_belief(stmts_in, belief_cutoff, **kwargs)[source]

Filter to statements with belief above a given cutoff.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • belief_cutoff (float) – Only statements with belief above the belief_cutoff will be returned. Here 0 < belief_cutoff < 1.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_by_type(stmts_in, stmt_type, **kwargs)[source]

Filter to a given statement type.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • stmt_type (indra.statements.Statement) – The class of the statement type to filter for. Example: indra.statements.Modification
  • invert (Optional[bool]) – If True, the statements that are not of the given type are returned. Default: False
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_direct(stmts_in, **kwargs)[source]

Filter to statements that are direct interactions

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_enzyme_kinase(stmts_in, **kwargs)[source]

Filter Phosphorylations to ones where the enzyme is a known kinase.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_evidence_source(stmts_in, source_apis, policy='one', **kwargs)[source]

Filter to statements that have evidence from a given set of sources.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • source_apis (list[str]) – A list of sources to filter for. Examples: biopax, bel, reach
  • policy (Optional[str]) – If ‘one’, a statement that hase evidence from any of the sources is kept. If ‘all’, only those statements are kept which have evidence from all the input sources specified in source_apis. If ‘none’, only those statements are kept that don’t have evidence from any of the sources specified in source_apis.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_gene_list(stmts_in, gene_list, policy, allow_families=False, **kwargs)[source]

Return statements that contain genes given in a list.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • gene_list (list[str]) – A list of gene symbols to filter for.
  • policy (str) – The policy to apply when filtering for the list of genes. “one”: keep statements that contain at least one of the list of genes and possibly others not in the list “all”: keep statements that only contain genes given in the list
  • allow_families (Optional[bool]) – Will include statements involving Bioentities families containing one of the genes in the gene list. Default: False
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_genes_only(stmts_in, **kwargs)[source]

Filter to statements containing genes only.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • specific_only (Optional[bool]) – If True, only elementary genes/proteins will be kept and families will be filtered out. If False, families are also included in the output. Default: False
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_grounded_only(stmts_in, **kwargs)[source]

Filter to statements that have grounded agents.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_human_only(stmts_in, **kwargs)[source]

Filter out statements that are not grounded to human genes.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_inconsequential_acts(stmts_in, whitelist=None, **kwargs)[source]

Filter out Activations that modify inconsequential activities

Inconsequential here means that the site is not mentioned / tested in any other statement. In some cases specific activity types should be preserved, for instance, to be used as readouts in a model. In this case, the given activities can be passed in a whitelist.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • whitelist (Optional[dict]) – A whitelist containing agent activity types which should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of activity types. Example: whitelist = {‘MAP2K1’: [‘kinase’]}
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_inconsequential_mods(stmts_in, whitelist=None, **kwargs)[source]

Filter out Modifications that modify inconsequential sites

Inconsequential here means that the site is not mentioned / tested in any other statement. In some cases specific sites should be preserved, for instance, to be used as readouts in a model. In this case, the given sites can be passed in a whitelist.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • whitelist (Optional[dict]) – A whitelist containing agent modification sites whose modifications should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of tuples of (modification_type, residue, position). Example: whitelist = {‘MAP2K1’: [(‘phosphorylation’, ‘S’, ‘222’)]}
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_mod_nokinase(stmts_in, **kwargs)[source]

Filter non-phospho Modifications to ones with a non-kinase enzyme.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_mutation_status(stmts_in, mutations, deletions, **kwargs)[source]

Filter statements based on existing mutations/deletions

This filter helps to contextualize a set of statements to a given cell type. Given a list of deleted genes, it removes statements that refer to these genes. It also takes a list of mutations and removes statements that refer to mutations not relevant for the given context.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • mutations (dict) – A dictionary whose keys are gene names, and the values are lists of tuples of the form (residue_from, position, residue_to). Example: mutations = {‘BRAF’: [(‘V’, ‘600’, ‘E’)]}
  • deletions (list) – A list of gene names that are deleted.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_no_hypothesis(stmts_in, **kwargs)[source]

Filter to statements that are not marked as hypothesis in epistemics.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_top_level(stmts_in, **kwargs)[source]

Filter to statements that are at the top-level of the hierarchy.

Here top-level statements correspond to most specific ones.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_transcription_factor(stmts_in, **kwargs)[source]

Filter out RegulateAmounts where subject is not a transcription factor.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.filter_uuid_list(stmts_in, uuids, **kwargs)[source]

Filter to Statements corresponding to given UUIDs

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
  • uuids (list[str]) – A list of UUIDs to filter for.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of filtered statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.load_statements(fname, as_dict=False)[source]

Load statements from a pickle file.

Parameters:
  • fname (str) – The name of the pickle file to load statements from.
  • as_dict (Optional[bool]) – If True and the pickle file contains a dictionary of statements, it is returned as a dictionary. If False, the statements are always returned in a list. Default: False
Returns:

stmts – A list or dict of statements that were loaded.

Return type:

list

indra.tools.assemble_corpus.map_grounding(stmts_in, **kwargs)[source]

Map grounding using the GroundingMapper.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to map.
  • do_rename (Optional[bool]) – If True, Agents are renamed based on their mapped grounding.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of mapped statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.map_sequence(stmts_in, **kwargs)[source]

Map sequences using the SiteMapper.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to map.
  • do_methionine_offset (boolean) – Whether to check for off-by-one errors in site position (possibly) attributable to site numbering from mature proteins after cleavage of the initial methionine. If True, checks the reference sequence for a known modification at 1 site position greater than the given one; if there exists such a site, creates the mapping. Default is True.
  • do_orthology_mapping (boolean) – Whether to check sequence positions for known modification sites in mouse or rat sequences (based on PhosphoSitePlus data). If a mouse/rat site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
  • do_isoform_mapping (boolean) – Whether to check sequence positions for known modifications in other human isoforms of the protein (based on PhosphoSitePlus data). If a site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of mapped statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.reduce_activities(stmts_in, **kwargs)[source]

Reduce the activity types in a list of statements

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to reduce activity types in.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of reduced activity statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.run_preassembly(stmts_in, **kwargs)[source]

Run preassembly on a list of statements.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements to preassemble.
  • return_toplevel (Optional[bool]) – If True, only the top-level statements are returned. If False, all statements are returned irrespective of level of specificity. Default: True
  • poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.
  • size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
  • save_unique (Optional[str]) – The name of a pickle file to save the unique statements into.
Returns:

stmts_out – A list of preassembled top-level statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.run_preassembly_duplicate(preassembler, beliefengine, **kwargs)[source]

Run deduplication stage of preassembly on a list of statements.

Parameters:
Returns:

stmts_out – A list of unique statements.

Return type:

list[indra.statements.Statement]

Run related stage of preassembly on a list of statements.

Parameters:
  • preassembler (indra.preassembler.Preassembler) – A Preassembler instance which already has a set of unique statements internally.
  • beliefengine (indra.belief.BeliefEngine) – A BeliefEngine instance
  • return_toplevel (Optional[bool]) – If True, only the top-level statements are returned. If False, all statements are returned irrespective of level of specificity. Default: True
  • poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.
  • size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of preassembled top-level statements.

Return type:

list[indra.statements.Statement]

indra.tools.assemble_corpus.strip_agent_context(stmts_in, **kwargs)[source]

Strip any context on agents within each statement.

Parameters:
  • stmts_in (list[indra.statements.Statement]) – A list of statements whose agent context should be stripped.
  • save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns:

stmts_out – A list of stripped statements.

Return type:

list[indra.statements.Statement]

Build a network from a gene list (indra.tools.gene_network)

class indra.tools.gene_network.GeneNetwork(gene_list, basename=None)[source]

Build a set of INDRA statements for a given gene list from databases.

Parameters:
  • gene_list (string) – List of gene names.
  • basename (string or None (default)) – Filename prefix to be used for caching of intermediates (Biopax OWL file, pickled statement lists, etc.). If None, no results are cached and no cached files are used.
gene_list

string – List of gene names

basename

string or None – Filename prefix for cached intermediates, or None if no cached used.

results

dict – Dict containing results of preassembly (see return type for run_preassembly().

get_bel_stmts(filter=False)[source]

Get relevant statements from the BEL large corpus.

Performs a series of neighborhood queries and then takes the union of all the statements. Because the query process can take a long time for large gene lists, the resulting list of statements are cached in a pickle file with the filename <basename>_bel_stmts.pkl. If the pickle file is present, it is used by default; if not present, the queries are performed and the results are cached.

Parameters:filter (bool) – If True, includes only those statements that exclusively mention genes in gene_list. Default is False. Note that the full (unfiltered) set of statements are cached.
Returns:List of INDRA statements extracted from the BEL large corpus.
Return type:list of indra.statements.Statement
get_biopax_stmts(filter=False, query='pathsbetween')[source]

Get relevant statements from Pathway Commons.

Performs a “paths between” query for the genes in gene_list and uses the results to build statements. This function caches two files: the list of statements built from the query, which is cached in <basename>_biopax_stmts.pkl, and the OWL file returned by the Pathway Commons Web API, which is cached in <basename>_pc_pathsbetween.owl. If these cached files are found, then the results are returned based on the cached file and Pathway Commons is not queried again.

Parameters:
  • filter (bool) – If True, includes only those statements that exclusively mention genes in gene_list. Default is False.
  • query (str) – Defined what type of query is executed. The two options are ‘pathsbetween’ which finds paths between the given list of genes and only works if more than 1 gene is given, and ‘neighborhood’ which searches the immediate neighborhood of each given gene.
Returns:

List of INDRA statements extracted from Pathway Commons.

Return type:

list of indra.statements.Statement

get_statements(filter=False)[source]

Return the combined list of statements from BEL and Pathway Commons.

Internally calls get_biopax_stmts() and get_bel_stmts().

Parameters:filter (bool) – If True, includes only those statements that exclusively mention genes in gene_list. Default is False.
Returns:List of INDRA statements extracted the BEL large corpus and Pathway Commons.
Return type:list of indra.statements.Statement
run_preassembly(stmts, print_summary=True)[source]

Run complete preassembly procedure on the given statements.

Results are returned as a dict and stored in the attribute results. They are also saved in the pickle file <basename>_results.pkl.

Parameters:
  • stmts (list of indra.statements.Statement) – Statements to preassemble.
  • print_summary (bool) – If True (default), prints a summary of the preassembly process to the console.
Returns:

A dict containing the following entries:

  • raw: the starting set of statements before preassembly.
  • duplicates1: statements after initial de-duplication.
  • valid: statements found to have valid modification sites.
  • mapped: mapped statements (list of indra.preassembler.sitemapper.MappedStatement).
  • mapped_stmts: combined list of valid statements and statements after mapping.
  • duplicates2: statements resulting from de-duplication of the statements in mapped_stmts.
  • related2: top-level statements after combining the statements in duplicates2.

Return type:

dict

Build an executable model from a fragment of a large network (indra.tools.executable_subnetwork)

Build a model incrementally over time (indra.tools.incremental_model)

class indra.tools.incremental_model.IncrementalModel(model_fname=None)[source]

Assemble a model incrementally by iteratively adding new Statements.

Parameters:model_fname (Optional[str]) – The name of the pickle file in which a set of INDRA Statements are stored in a dict keyed by PubMed IDs. This is the state of an IncrementalModel that is loaded upon instantiation.
stmts

dict[str, list[indra.statements.Statement]] – A dictionary of INDRA Statements keyed by PMIDs that stores the current state of the IncrementalModel.

assembled_stmts

list[indra.statements.Statement] – A list of INDRA Statements after assembly.

add_statements(pmid, stmts)[source]

Add INDRA Statements to the incremental model indexed by PMID.

Parameters:
  • pmid (str) – The PMID of the paper from which statements were extracted.
  • stmts (list[indra.statements.Statement]) – A list of INDRA Statements to be added to the model.
get_model_agents()[source]

Return a list of all Agents from all Statements.

Returns:agents – A list of Agents that are in the model.
Return type:list[indra.statements.Agent]
get_statements()[source]

Return a list of all Statements in a single list.

Returns:stmts – A list of all the INDRA Statements in the model.
Return type:list[indra.statements.Statement]
get_statements_noprior()[source]

Return a list of all non-prior Statements in a single list.

Returns:stmts – A list of all the INDRA Statements in the model (excluding the prior).
Return type:list[indra.statements.Statement]
get_statements_prior()[source]

Return a list of all prior Statements in a single list.

Returns:stmts – A list of all the INDRA Statements in the prior.
Return type:list[indra.statements.Statement]
load_prior(prior_fname)[source]

Load a set of prior statements from a pickle file.

The prior statements have a special key in the stmts dictionary called “prior”.

Parameters:prior_fname (str) – The name of the pickle file containing the prior Statements.
preassemble(filters=None)[source]

Preassemble the Statements collected in the model.

Use INDRA’s GroundingMapper, Preassembler and BeliefEngine on the IncrementalModel and save the unique statements and the top level statements in class attributes.

Currently the following filter options are implemented: - grounding: require that all Agents in statements are grounded - human_only: require that all proteins are human proteins - prior_one: require that at least one Agent is in the prior model - prior_all: require that all Agents are in the prior model

Parameters:filters (Optional[list[str]]) – A list of filter options to apply when choosing the statements. See description above for more details. Default: None
save(model_fname='model.pkl')[source]

Save the state of the IncrementalModel in a pickle file.

Parameters:model_fname (Optional[str]) – The name of the pickle file to save the state of the IncrementalModel in. Default: model.pkl

High-throughput reading tools (indra.tools.reading)

Scoring INDRA Statements manually (indra.tools.stmt_scoring)

Generate English language questions on linked mechanisms (indra.tools.mechlinker_queries)