Eidos (indra.sources.eidos)

Eidos is an open-domain machine reading system which uses a cascade of grammars to extract causal events from free text. It is ideal for modeling applications that are not specific to a given domain like molecular biology.

To cover a wide range of use cases and scenarios, there are currently 5 different ways in which INDRA can use Eidos.

In all cases for Eidos to provide grounding information to be included in INDRA Statements, it needs to be configured explicitly to do so. Please follow instructions at https://github.com/clulab/eidos#configuring to download and configure Eidos grounding resources.

1. INDRA communicating with a separately running Eidos webapp (indra.sources.eidos.client)

Setup and usage: Clone and run the Eidos web server.

git clone https://github.com/clulab/eidos.git
cd eidos
sbt webapp/run

Then read text by specifying the webserver parameter when using indra.sources.eidos.process_text.

from indra.sources import eidos
ep = eidos.process_text('rainfall causes floods',
                         webservice='http://localhost:9000')

Advantages:

  • Does not require setting up the pyjnius Python-Java bridge

  • Does not require assembling an Eidos JAR file

Disadvantages:

  • Not all Eidos functionalities are immediately exposed through its webapp.

2. INDRA using an Eidos JAR directly through a Python-Java bridge (indra.sources.eidos.reader)

Setup and usage:

First, the Eidos system and its dependencies need to be packaged as a fat JAR:

git clone https://github.com/clulab/eidos.git
cd eidos
sbt assembly

This creates a JAR file in eidos/target/scala[version]/eidos-[version].jar. Set the absolute path to this file on the EIDOSPATH environmental variable and then append EIDOSPATH to the CLASSPATH environmental variable (entries are separated by colons).

The pyjnius package needs to be set up and be operational. For more details, see Pyjnius setup instructions in the documentation.

Then, reading can be done simply using the indra.sources.eidos.process_text function.

from indra.sources import eidos
ep = eidos.process_text('rainfall causes floods')

Advantages:

  • Doesn’t require running a separate process for Eidos and INDRA

  • Having a single Eidos JAR file makes this solution portable

Disadvantages:

  • Requires configuring pyjnius which is often difficult

  • Requires building a large Eidos JAR file which can be time consuming

  • The EidosReader instance needs to be instantiated every time a new INDRA session is started which is time consuming.

3. INDRA using a Flask sever wrapping an Eidos JAR in a separate process (indra.sources.eidos.server)

Setup and usage: Requires building an Eidos JAR and setting up pyjnius – see above.

First, run the server using

python -m indra.sources.eidos.server

Then point to the running server with the webservice parameter when calling indra.sources.eidos.process_text.

from indra.sources import eidos
ep = eidos.process_text('rainfall causes floods',
                         webservice='http://localhost:6666')

Advantages:

  • EidosReader is instantiated by the Flask server in a separate process, therefore it isn’t reloaded each time a new INDRA session is started

  • Having a single Eidos JAR file makes this solution portable

Disadvantages:

  • Currently does not offer any additional functionality compared to running the Eidos webapp directly

  • Requires configuring pyjnius which is often difficult

  • Requires building a large Eidos JAR file which can be time consuming

4. INDRA calling the Eidos CLI using java through the command line (indra.sources.eidos.cli)

Setup and usage: Requires building an Eidos JAR and setting EIDOSPATH but does not require setting up pyjnius – see above. To use, call any of the functions exposed in indra.sources.eidos.cli.

Advantages:

  • Provides a Python-interface for running Eidos on “large scale” jobs, e.g., a large number of input files.

  • Does not require setting up pyjnius since it uses Eidos via the command line.

  • Provides a way to use any available entrypoint of Eidos.

Disadvantages:

  • Requires building an Eidos JAR which can be time consuming.

5. Use Eidos separately to produce output files and then process those with INDRA

In this usage mode Eidos is not directly invoked by INDRA. Rather, Eidos is set up and run idenpendently of INDRA to produce JSON-LD output files for a set of text content. One can then use indra.sources.eidos.api.process_json_file in INDRA to process the JSON-LD output files.

Eidos API (indra.sources.eidos.api)

indra.sources.eidos.api.initialize_reader()[source]

Instantiate an Eidos reader for fast subsequent reading.

indra.sources.eidos.api.process_json_bio(json_dict, grounder=None)[source]

Return EidosProcessor with grounded Activation/Inhibition statements.

Parameters
  • json_dict (dict) – The JSON-LD dict to be processed.

  • grounder (Optional[function]) – A function which takes a text and an optional context as argument and returns a dict of groundings.

Returns

ep – A EidosProcessor containing the extracted INDRA Statements in its statements attribute.

Return type

EidosProcessor

indra.sources.eidos.api.process_json_bio_entities(json_dict, grounder=None, with_coords=False)[source]

Return INDRA Agents grounded to biological ontologies extracted from Eidos JSON-LD.

Parameters
  • json_dict (dict) – The JSON-LD dict to be processed.

  • grounder (Optional[function]) – A function which takes a text and an optional context as argument and returns a dict of groundings.

  • with_coords (Optional[bool]) – If True, the Agents will have their coordinates returned along with them in a tuple. Default: False

Returns

A list of INDRA Agents which are derived from concepts extracted by Eidos from text.

Return type

list of indra.statements.Agent

indra.sources.eidos.api.process_text_bio(text, save_json='eidos_output.json', webservice=None, grounder=None)[source]

Return an EidosProcessor by processing the given text.

This constructs a reader object via Java and extracts mentions from the text. It then serializes the mentions into JSON and processes the result with process_json.

Parameters
  • text (str) – The text to be processed.

  • save_json (Optional[str]) – The name of a file in which to dump the JSON output of Eidos.

  • webservice (Optional[str]) – An Eidos reader web service URL to send the request to. If None, the reading is assumed to be done with the Eidos JAR rather than via a web service. Default: None

  • grounder (Optional[function]) – A function which takes a text and an optional context as argument and returns a dict of groundings.

Returns

ep – An EidosProcessor containing the extracted INDRA Statements in its statements attribute.

Return type

EidosProcessor

indra.sources.eidos.api.process_text_bio_entities(text, webservice=None, grounder=None)[source]

Return INDRA Agents grounded to biological ontologies extracted from text.

Parameters
  • text (str) – Text to be processed.

  • webservice (Optional[str]) – An Eidos reader web service URL to send the request to. If None, the reading is assumed to be done with the Eidos JAR rather than via a web service. Default: None

  • grounder (Optional[function]) – A function which takes a text and an optional context as argument and returns a dict of groundings.

Returns

A list of INDRA Agents which are derived from concepts extracted by Eidos from text.

Return type

list of indra.statements.Agent

Eidos Processor (indra.sources.eidos.processor)

class indra.sources.eidos.processor.EidosProcessor(json_dict)[source]

This processor extracts INDRA Statements from Eidos JSON-LD output.

Parameters

json_dict (dict) – A JSON dictionary containing the Eidos extractions in JSON-LD format.

statements

A list of INDRA Statements that were extracted by the processor.

Type

list[indra.statements.Statement]

extract_all_events()[source]

Extract all events, including ones that are arguments of other statements.

The goal of this method is to extract events as standalone statements with their own dedicated evidence. This is different from the get_all_events method in that it extracts the event-specific evidence for each Event statement instead of propagating causal relation evidence into the Event after initial extraction.

extract_causal_relations()[source]

Extract causal relations as Statements.

extract_correlations()[source]

Extract correlations as Assocation statements.

extract_events()[source]

Extract Events that are not arguments of other statements.

get_all_events()[source]

Return a list of all standalone events from the existing list of extracted statements.

Note that this method only operates on statements already extracted into the processor’s statements attribute. Note also that the evidences for events created from Influences and Assocations here are propagated from those statements; they are not equivalent to the original evidences for the events themselves (see extract_all_events method).

Returns

events – A list of Events from original Events, and unrolled from Influences and Associations.

Return type

list[indra.statements.Event]

get_concept(entity)[source]

Return Concept from an Eidos entity.

get_evidence(relation)[source]

Return the Evidence object for the INDRA Statment.

get_groundings(entity)[source]

Return groundings as db_refs for an entity.

static get_hedging(event)[source]

Return hedging markers attached to an event.

Example: “states”: [{“@type”: “State”, “type”: “HEDGE”,

“text”: “could”}

static get_negation(event)[source]

Return negation attached to an event.

Example: “states”: [{“@type”: “State”, “type”: “NEGATION”,

“text”: “n’t”}]

indra.sources.eidos.processor.find_arg(event, arg_type)[source]

Return ID of the first argument of a given type

indra.sources.eidos.processor.find_args(event, arg_type)[source]

Return IDs of all arguments of a given type

Eidos Bio Processor (indra.sources.eidos.bio_processor)

class indra.sources.eidos.bio_processor.EidosBioProcessor(json_dict, grounder=None)[source]

Class to extract biology-oriented INDRA statements from Eidos output in a way that agents are grounded to biomedical ontologies.

Eidos Client (indra.sources.eidos.client)

indra.sources.eidos.client.process_text(text, webservice)[source]

Process a given text with an Eidos webservice at the given address.

Note that in most cases this function should not be used directly, rather, used indirectly by calling indra.sources.eidos.process_text with the webservice parameter.

Parameters
  • text (str) – The text to be read using Eidos.

  • webservice (str) – The address where the Eidos web service is running, e.g., http://localhost:9000.

Returns

A JSON dict of the results from the Eidos webservice.

Return type

dict

Eidos Reader (indra.sources.eidos.reader)

class indra.sources.eidos.reader.EidosReader[source]

Reader object keeping an instance of the Eidos reader as a singleton.

This allows the Eidos reader to need initialization when the first piece of text is read, the subsequent readings are done with the same instance of the reader and are therefore faster.

eidos_reader

A Scala object, an instance of the Eidos reading system. It is instantiated only when first processing text.

Type

org.clulab.wm.eidos.EidosSystem

initialize_reader()[source]

Instantiate the Eidos reader attribute of this reader.

process_text(text)[source]

Return a mentions JSON object given text.

Parameters

text (str) – Text to be processed.

Returns

json_dict – A JSON object of mentions extracted from text.

Return type

dict

Eidos Webserver (indra.sources.eidos.server)

This is a Python-based web server that can be run to read with Eidos. To run the server, do

python -m indra.sources.eidos.server

and then submit POST requests to the localhost:5000/process_text endpoint with JSON content as {‘text’: ‘text to read’}. The response will be the Eidos JSON-LD output. Another endpoint for regrounding entity texts is also available on the reground endpoint.

Eidos CLI (indra.sources.eidos.cli)

This is a Python based command line interface to Eidos to complement the Python-Java bridge based interface. EIDOSPATH (in the INDRA config.ini or as an environmental variable) needs to be pointing to a fat JAR of the Eidos system.

indra.sources.eidos.cli.extract_and_process(path_in, path_out, process_fun)[source]

Run Eidos on a set of text files and process output with INDRA.

The output is produced in the specified output folder but the output files aren’t processed by this function.

Parameters
  • path_in (str) – Path to an input folder with some text files

  • path_out (str) – Path to an output folder in which Eidos places the output JSON-LD files

  • process_fun (function) – A function that takes a JSON dict as argument and returns an EidosProcessor.

Returns

stmts – A list of INDRA Statements

Return type

list[indra.statements.Statements]

indra.sources.eidos.cli.extract_from_directory(path_in, path_out)[source]

Run Eidos on a set of text files in a folder.

The output is produced in the specified output folder but the output files aren’t processed by this function.

Parameters
  • path_in (str) – Path to an input folder with some text files

  • path_out (str) – Path to an output folder in which Eidos places the output JSON-LD files

indra.sources.eidos.cli.run_eidos(endpoint, *args)[source]

Run a given enpoint of Eidos through the command line.

Parameters
  • endpoint (str) – The class within the Eidos package to run, for instance ‘apps.extract.ExtractFromDirectory’ will run ‘org.clulab.wm.eidos.apps.extract.ExtractFromDirectory’

  • *args – Any further arguments to be passed as inputs to the class being run.