Assembly Pipeline (indra.pipeline)

class indra.pipeline.pipeline.AssemblyPipeline(steps=None)[source]

Bases: object

An assembly pipeline that runs the specified steps on a given set of statements.

Ways to initialize and run the pipeline (examples assume you have a list of INDRA Statements stored in the stmts variable.)

>>> from indra.statements import *
>>> map2k1 = Agent('MAP2K1', db_refs={'HGNC': '6840'})
>>> mapk1 = Agent('MAPK1', db_refs={'HGNC': '6871'})
>>> braf = Agent('BRAF')
>>> stmts = [Phosphorylation(map2k1, mapk1, 'T', '185'),
...          Phosphorylation(braf, map2k1)]

1) Provide a JSON file containing the steps, then use the classmethod from_json_file, and run it with the run method on a list of statements. This option allows storing pipeline versions in a separate file and reproducing the same results. All functions referenced in the JSON file have to be registered with the @register_pipeline decorator.

>>> import os
>>> path_this = os.path.dirname(os.path.abspath(__file__))
>>> filename = os.path.abspath(
... os.path.join(path_this, '..', 'tests', 'pipeline_test.json'))
>>> ap = AssemblyPipeline.from_json_file(filename)
>>> assembled_stmts = ap.run(stmts)

2) Initialize a pipeline with a list of steps and run it with the run method on a list of statements. All functions referenced in steps have to be registered with the @register_pipeline decorator.

>>> steps = [
...    {"function": "filter_no_hypothesis"},
...    {"function": "filter_grounded_only",
...     "kwargs": {"score_threshold": 0.8}}
... ]
>>> ap = AssemblyPipeline(steps)
>>> assembled_stmts = ap.run(stmts)

3) Initialize an empty pipeline and append/insert the steps one by one. Provide a function and its args and kwargs. For arguments that require calling a different function, use the RunnableArgument class. All functions referenced here have to be either imported and passed as function objects or registered with the @register_pipeline decorator and passed as function names (strings). The pipeline built this way can be optionally saved into a JSON file.

>>> from indra.tools.assemble_corpus import *
>>> from indra.ontology.world import load_world_ontology
>>> from indra.belief.wm_scorer import get_eidos_scorer
>>> ap = AssemblyPipeline()
>>> ap.append(filter_no_hypothesis)
>>> ap.append(filter_grounded_only)
>>> ap.append(run_preassembly,
...           belief_scorer=RunnableArgument(get_eidos_scorer),
...           ontology=RunnableArgument(load_world_ontology))
>>> assembled_stmts = ap.run(stmts)
>>> ap.to_json_file('filename.json')
Parameters:steps (list[dict]) – A list of dictionaries representing steps in the pipeline. Each step should have a ‘function’ key and, if appropriate, ‘args’ and ‘kwargs’ keys. Arguments can be simple values (strings, integers, booleans, lists, etc.) or can be functions themselves. In case an argument is a function or a result of another function, it should also be represented as a dictionary of a similar structure. If a function itself is an argument (and not its result), the dictionary should contain a key-value pair {‘no_run’: True}. If an argument is a type of a statement, it should be represented as a dictionary {‘stmt_type’: <name of a statement type>}.
append(func, *args, **kwargs)[source]

Append a step to the end of the pipeline.

Args and kwargs here can be of any type. All functions referenced here have to be either imported and passed as function objects or registered with @register_pipeline decorator and passed as function names (strings). For arguments that require calling a different function, use RunnableArgument class.

Parameters:
  • func (str or function) – A function or the string name of a function to add to the pipeline.
  • args (args) – Args that are passed to func when calling it.
  • kwargs (kwargs) – Kwargs that are passed to func when calling it.
create_new_step(func_name, *args, **kwargs)[source]

Create a dictionary representing a new step in the pipeline.

Parameters:
  • func_name (str) – The string name of a function to create as a step.
  • args (args) – Args that are passed to the function when calling it.
  • kwargs (kwargs) – Kwargs that are passed to the function when calling it.
Returns:

A dict structure representing a step in the pipeline.

Return type:

dict

classmethod from_json_file(filename)[source]

Create an instance of AssemblyPipeline from a JSON file with steps.

get_argument_value(arg_json)[source]

Get a value of an argument from its json version.

static get_function_from_name(name)[source]

Return a function object by name if available or raise exception.

Parameters:name (str) – The name of the function.
Returns:The function that was found based on its name. If not found, a NotRegisteredFunctionError is raised.
Return type:function
static get_function_parameters(func_dict)[source]

Retrieve a function name and arguments from function dictionary.

Parameters:func_dict (dict) – A dict structure representing a function and its args and kwargs.
Returns:A tuple with the following elements: the name of the function, the args of the function, and the kwargs of the function.
Return type:tuple of str, list and dict
insert(ix, func, *args, **kwargs)[source]

Insert a step to any position in the pipeline.

Args and kwargs here can be of any type. All functions referenced here have to be either imported and passed as function objects or registered with @register_pipeline decorator and passed as function names (strings). For arguments that require calling a different function, use RunnableArgument class.

Parameters:
  • func (str or function) – A function or the string name of a function to add to the pipeline.
  • args (args) – Args that are passed to func when calling it.
  • kwargs (kwargs) – Kwargs that are passed to func when calling it.
static is_function(argument, keyword='function')[source]

Check if an argument should be converted to a specific object type, e.g. a function or a statement type.

Parameters:
  • argument (dict or other object) – The argument is a dict, its keyword entry is checked, and if it is there, we return True, otherwise we return False.
  • keyword (Optional[str]) – The keyword to check if it’s there if the argument is a dict. Default: function
run(statements, **kwargs)[source]

Run all steps of the pipeline.

Parameters:
  • statements (list[indra.statements.Statement]) – A list of INDRA Statements to run the pipeline on.
  • **kwargs (kwargs) – It is recommended to define all arguments for the steps functions in the steps definition, but it is also possible to provide some external objects (if it is not possible to provide them as a step argument) as kwargs to the entire pipeline here. One should be cautious to avoid kwargs name clashes between multiple functions (this value will be provided to all functions that expect an argument with the same name). To overwrite this value in other functions, provide it explicitly in the corresponding steps kwargs.
Returns:

The list of INDRA Statements resulting from running the pipeline on the list of input Statements.

Return type:

list[indra.statements.Statement]

run_function(func_dict, statements=None, **kwargs)[source]

Run a given function and return the results.

For each of the arguments, if it requires an extra function call, recursively call the functions until we get a simple function.

Parameters:
  • func_dict (dict) – A dict representing the function to call, its args and kwargs.
  • args (args) – Args that are passed to the function when calling it.
  • kwargs (kwargs) – Kwargs that are passed to the function when calling it.
Returns:

Any value that the given function returns.

Return type:

object

static run_simple_function(func, *args, **kwargs)[source]

Run a simple function and return the result.

Simple here means a function all arguments of which are simple values (do not require extra function calls).

Parameters:
  • func (function) – The function to call.
  • args (args) – Args that are passed to the function when calling it.
  • kwargs (kwargs) – Kwargs that are passed to the function when calling it.
Returns:

Any value that the given function returns.

Return type:

object

to_json_file(filename)[source]

Save AssemblyPipeline to a JSON file.

exception indra.pipeline.pipeline.NotRegisteredFunctionError[source]

Bases: Exception

class indra.pipeline.pipeline.RunnableArgument(func, *args, **kwargs)[source]

Bases: object

Class representing arguments generated by calling a function.

RunnableArguments should be used as args or kwargs in AssemblyPipeline append and insert methods.

Parameters:func (str or function) – A function or a name of a function to be called to generate argument value.
to_json()[source]

Jsonify to standard AssemblyPipeline step format.

indra.pipeline.pipeline.jsonify_arg_input(arg)[source]

Jsonify user input (in AssemblyPipeline append and insert methods) into a standard step json.

exception indra.pipeline.decorators.ExistingFunctionError[source]

Bases: Exception

indra.pipeline.decorators.register_pipeline(function)[source]

Decorator to register a function for the assembly pipeline.