Eidos (indra.sources.eidos
)¶
Eidos is an open-domain machine reading system which uses a cascade of grammars to extract causal events from free text. It is ideal for modeling applications that are not specific to a given domain like molecular biology.
To cover a wide range of use cases and scenarios, there are currently 5 different ways in which INDRA can use Eidos.
In all cases for Eidos to provide grounding information to be included in INDRA Statements, it needs to be configured explicitly to do so. Please follow instructions at https://github.com/clulab/eidos#configuring to download and configure Eidos grounding resources.
1. INDRA communicating with a separately running Eidos webapp (indra.sources.eidos.client
)¶
Setup and usage: Clone and run the Eidos web server.
git clone https://github.com/clulab/eidos.git
cd eidos
sbt webapp/run
Then read text by specifying the webserver parameter when using indra.sources.eidos.process_text.
from indra.sources import eidos
ep = eidos.process_text('rainfall causes floods',
webservice='http://localhost:9000')
Advantages:
Does not require setting up the pyjnius Python-Java bridge
Does not require assembling an Eidos JAR file
Disadvantages:
Not all Eidos functionalities are immediately exposed through its webapp.
2. INDRA using an Eidos JAR directly through a Python-Java bridge (indra.sources.eidos.reader
)¶
Setup and usage:
First, the Eidos system and its dependencies need to be packaged as a fat JAR:
git clone https://github.com/clulab/eidos.git
cd eidos
sbt assembly
This creates a JAR file in eidos/target/scala[version]/eidos-[version].jar. Set the absolute path to this file on the EIDOSPATH environmental variable and then append EIDOSPATH to the CLASSPATH environmental variable (entries are separated by colons).
The pyjnius package needs to be set up and be operational. For more details, see Pyjnius setup instructions in the documentation.
Then, reading can be done simply using the indra.sources.eidos.process_text function.
from indra.sources import eidos
ep = eidos.process_text('rainfall causes floods')
Advantages:
Doesn’t require running a separate process for Eidos and INDRA
Having a single Eidos JAR file makes this solution portable
Disadvantages:
Requires configuring pyjnius which is often difficult
Requires building a large Eidos JAR file which can be time consuming
The EidosReader instance needs to be instantiated every time a new INDRA session is started which is time consuming.
3. INDRA using a Flask sever wrapping an Eidos JAR in a separate process (indra.sources.eidos.server
)¶
Setup and usage: Requires building an Eidos JAR and setting up pyjnius – see above.
First, run the server using
python -m indra.sources.eidos.server
Then point to the running server with the webservice parameter when calling indra.sources.eidos.process_text.
from indra.sources import eidos
ep = eidos.process_text('rainfall causes floods',
webservice='http://localhost:6666')
Advantages:
EidosReader is instantiated by the Flask server in a separate process, therefore it isn’t reloaded each time a new INDRA session is started
Having a single Eidos JAR file makes this solution portable
Disadvantages:
Currently does not offer any additional functionality compared to running the Eidos webapp directly
Requires configuring pyjnius which is often difficult
Requires building a large Eidos JAR file which can be time consuming
4. INDRA calling the Eidos CLI using java through the command line (indra.sources.eidos.cli
)¶
Setup and usage: Requires building an Eidos JAR and setting EIDOSPATH but does
not require setting up pyjnius – see above. To use, call any of the
functions exposed in indra.sources.eidos.cli
.
Advantages:
Provides a Python-interface for running Eidos on “large scale” jobs, e.g., a large number of input files.
Does not require setting up pyjnius since it uses Eidos via the command line.
Provides a way to use any available entrypoint of Eidos.
Disadvantages:
Requires building an Eidos JAR which can be time consuming.
5. Use Eidos separately to produce output files and then process those with INDRA¶
In this usage mode Eidos is not directly invoked by INDRA. Rather, Eidos
is set up and run idenpendently of INDRA to produce JSON-LD output files
for a set of text content.
One can then use indra.sources.eidos.api.process_json_file
in INDRA to process the JSON-LD output files.
Eidos API (indra.sources.eidos.api
)¶
- indra.sources.eidos.api.initialize_reader()[source]¶
Instantiate an Eidos reader for fast subsequent reading.
- indra.sources.eidos.api.process_json_bio(json_dict, grounder=None)[source]¶
Return EidosProcessor with grounded Activation/Inhibition statements.
- Parameters
json_dict (dict) – The JSON-LD dict to be processed.
grounder (Optional[function]) – A function which takes a text and an optional context as argument and returns a dict of groundings.
- Returns
ep – A EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type
- indra.sources.eidos.api.process_json_bio_entities(json_dict, grounder=None)[source]¶
Return INDRA Agents grounded to biological ontologies extracted from Eidos JSON-LD.
- Parameters
json_dict (dict) – The JSON-LD dict to be processed.
grounder (Optional[function]) – A function which takes a text and an optional context as argument and returns a dict of groundings.
- Returns
A list of INDRA Agents which are derived from concepts extracted by Eidos from text.
- Return type
list of indra.statements.Agent
- indra.sources.eidos.api.process_text_bio(text, save_json='eidos_output.json', webservice=None, grounder=None)[source]¶
Return an EidosProcessor by processing the given text.
This constructs a reader object via Java and extracts mentions from the text. It then serializes the mentions into JSON and processes the result with process_json.
- Parameters
text (str) – The text to be processed.
save_json (Optional[str]) – The name of a file in which to dump the JSON output of Eidos.
webservice (Optional[str]) – An Eidos reader web service URL to send the request to. If None, the reading is assumed to be done with the Eidos JAR rather than via a web service. Default: None
grounder (Optional[function]) – A function which takes a text and an optional context as argument and returns a dict of groundings.
- Returns
ep – An EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type
- indra.sources.eidos.api.process_text_bio_entities(text, webservice=None, grounder=None)[source]¶
Return INDRA Agents grounded to biological ontologies extracted from text.
- Parameters
text (str) – Text to be processed.
webservice (Optional[str]) – An Eidos reader web service URL to send the request to. If None, the reading is assumed to be done with the Eidos JAR rather than via a web service. Default: None
grounder (Optional[function]) – A function which takes a text and an optional context as argument and returns a dict of groundings.
- Returns
A list of INDRA Agents which are derived from concepts extracted by Eidos from text.
- Return type
list of indra.statements.Agent
Eidos Processor (indra.sources.eidos.processor
)¶
- class indra.sources.eidos.processor.EidosProcessor(json_dict)[source]¶
This processor extracts INDRA Statements from Eidos JSON-LD output.
- Parameters
json_dict (dict) – A JSON dictionary containing the Eidos extractions in JSON-LD format.
- statements¶
A list of INDRA Statements that were extracted by the processor.
- Type
list[indra.statements.Statement]
- extract_all_events()[source]¶
Extract all events, including ones that are arguments of other statements.
The goal of this method is to extract events as standalone statements with their own dedicated evidence. This is different from the get_all_events method in that it extracts the event-specific evidence for each Event statement instead of propagating causal relation evidence into the Event after initial extraction.
- get_all_events()[source]¶
Return a list of all standalone events from the existing list of extracted statements.
Note that this method only operates on statements already extracted into the processor’s statements attribute. Note also that the evidences for events created from Influences and Assocations here are propagated from those statements; they are not equivalent to the original evidences for the events themselves (see extract_all_events method).
- Returns
events – A list of Events from original Events, and unrolled from Influences and Associations.
- Return type
list[indra.statements.Event]
Eidos Bio Processor (indra.sources.eidos.bio_processor
)¶
Eidos Client (indra.sources.eidos.client
)¶
- indra.sources.eidos.client.process_text(text, webservice)[source]¶
Process a given text with an Eidos webservice at the given address.
Note that in most cases this function should not be used directly, rather, used indirectly by calling indra.sources.eidos.process_text with the webservice parameter.
- Parameters
text (str) – The text to be read using Eidos.
webservice (str) – The address where the Eidos web service is running, e.g., http://localhost:9000.
- Returns
A JSON dict of the results from the Eidos webservice.
- Return type
Eidos Reader (indra.sources.eidos.reader
)¶
- class indra.sources.eidos.reader.EidosReader[source]¶
Reader object keeping an instance of the Eidos reader as a singleton.
This allows the Eidos reader to need initialization when the first piece of text is read, the subsequent readings are done with the same instance of the reader and are therefore faster.
- eidos_reader¶
A Scala object, an instance of the Eidos reading system. It is instantiated only when first processing text.
- Type
org.clulab.wm.eidos.EidosSystem
Eidos Webserver (indra.sources.eidos.server
)¶
This is a Python-based web server that can be run to read with Eidos. To run the server, do
python -m indra.sources.eidos.server
and then submit POST requests to the localhost:5000/process_text endpoint with JSON content as {‘text’: ‘text to read’}. The response will be the Eidos JSON-LD output. Another endpoint for regrounding entity texts is also available on the reground endpoint.
Eidos CLI (indra.sources.eidos.cli
)¶
This is a Python based command line interface to Eidos to complement the Python-Java bridge based interface. EIDOSPATH (in the INDRA config.ini or as an environmental variable) needs to be pointing to a fat JAR of the Eidos system.
- indra.sources.eidos.cli.extract_and_process(path_in, path_out, process_fun)[source]¶
Run Eidos on a set of text files and process output with INDRA.
The output is produced in the specified output folder but the output files aren’t processed by this function.
- Parameters
- Returns
stmts – A list of INDRA Statements
- Return type
list[indra.statements.Statements]
- indra.sources.eidos.cli.extract_from_directory(path_in, path_out)[source]¶
Run Eidos on a set of text files in a folder.
The output is produced in the specified output folder but the output files aren’t processed by this function.
- indra.sources.eidos.cli.run_eidos(endpoint, *args)[source]¶
Run a given enpoint of Eidos through the command line.
- Parameters
endpoint (str) – The class within the Eidos package to run, for instance ‘apps.extract.ExtractFromDirectory’ will run ‘org.clulab.wm.eidos.apps.extract.ExtractFromDirectory’
*args – Any further arguments to be passed as inputs to the class being run.