Hypothes.is (indra.sources.hypothesis)

This module implements an API and processor for annotations coming from hypothes.is. Annotations for a given group are obtained and processed either into INDRA Statements or into entity grounding annotations.

Two configurable values (either in the INDRA config file or as an environmental variable) are used. HYPOTHESIS_API_KEY is an API key used to access the hypothes.is API. HYPOTHESIS_GROUP is an optional configuration used to select a specific group of annotations on hypothes.is by default.

Curation tutorial

Go to https://web.hypothes.is/ and create an account, and then create a group in which annotations will be collected. Under Settings, click on Developer to find the API key. Set his API key in the INDRA config file under HYPOTHESIS_API_KEY. Optionally, set the group’s ID as HYPOTHESIS_GROUP in the INDRA config file. (Note that both these values can also be set as environmental variables.) Next, install the hypothes.is browser plug-in and log in.

Curating Statements

To curate text from a website with the intention of creating one or more INDRA Statements, select some text and create a new annotation using the hypothes.is browser plug-in. The content of the annotation consists of one or more lines. The first line should contain one or more English sentences describing the mechanism(s) that will be represented as an INDRA Statement (e.g., AMPK activates STAT3) based on the selected text. Each subsequent line of the annotation is assumed to be a context annotation. These lines are of the form “<context type>: <context text>” where <context type> can be one of: Cell type, Cell line, Disease, Organ, Location, Species, and <context text> is the text describing the context, e.g., lysosome, liver, prostate cancer, etc.

The annotation should also be tagged with indra (though by default, if no tags are given, the processor assumes that the given annotation is an INDRA Statement annotation).

Curating grounding

Generally, grounding annotations are only needed if INDRA’s current resources (reading systems, grounding mapping, Gilda, etc.) don’t contain a given synonym for an entity of interest.

With the hypothes.is browser plug-in, select some text on a website that contains lexical information about an entity or concept of interest. The conctent of the new annotation can contain one or more lines with identical syntax as follows: [text to ground] -> <db_name1>:<db_id1>|<db_name2>:<db_id2>|… In each case, db_name is a grounding database name space such as HGNC or CHEBI, and db_id is a value within that namespace such as 1097 or CHEBI:63637. Example: [AMPK] -> FPLX:AMPK.

The annotation needs to be tagged with gilda for the processor to know that it needs to be interpreted as a grounding annotation.

Hypothes.is API (indra.sources.hypothesis.api)

indra.sources.hypothesis.api.process_annotations(group=None, reader=None, grounder=None)[source]

Process annotations in hypothes.is in a given group.

Parameters:
  • group (Optional[str]) – The hypothesi.is key of the group (not its name). If not given, the HYPOTHESIS_GROUP configuration in the config file or an environmental variable is used.
  • reader (Optiona[function]) – A handle for a function which takes a single str argument (text to process) and returns a processor object with a statements attribute containing INDRA Statements. By default, the REACH reader’s process_text function is used with default parameters. Note that if the function requires extra parameters other than the input text, functools.partial can be used to set those.
  • grounder (Optional[function]) – A handle for a function which takes a positional str argument (entity text to ground) and an optional context key word argument and returns a list of objects matching the structure of gilda.grounder.ScoredMatch. By default, Gilda’s ground function is used for grounding.
Returns:

A HypothesisProcessor object which contains a list of extracted INDRA Statements in its statements attribute, and a list of extracted grounding curations in its groundings attribute.

Return type:

HypothesisProcessor

Hypothes.is Processor (indra.sources.hypothesis.processor)

class indra.sources.hypothesis.processor.HypothesisProcessor(annotations, reader=None, grounder=None)[source]

Processes hypothes.is annotations into INDRA Statements or groundings.

Parameters:
  • annotations (list[dict]) – A list of annotations fetched from hypothes.is in JSON-deserialized form represented as a list of dicts.
  • reader (Optiona[function]) – A handle for a function which takes a single str argument (text to process) and returns a processor object with a statements attribute containing INDRA Statements. By default, the REACH reader’s process_text function is used with default parameters. Note that if the function requires extra parameters other than the input text, functools.partial can be used to set those.
  • grounder (Optional[function]) – A handle for a function which takes a positional str argument (entity text to ground) and an optional context key word argument and returns a list of objects matching the structure of gilda.grounder.ScoredMatch. By default, Gilda’s ground function is used for grounding.
statements

A list of INDRA Statements extracted from the given annotations.

Type:list[indra.statements.Statement]
groundings

A dict of entity text keys with an associated dict of grounding references.

Type:dict
extract_groundings()[source]

Sets groundings attribute to list of extracted groundings.

extract_statements()[source]

Sets statements attribute to list of extracted INDRA Statements.

static groundings_from_annotation(annotation)[source]

Return a dict of groundings from a single annotation.

stmts_from_annotation(annotation)[source]

Return a list of Statements extracted from a single annotation.

indra.sources.hypothesis.processor.get_text_refs(url)[source]

Return the parsed out text reference dict from an URL.

indra.sources.hypothesis.processor.parse_context_entry(entry, grounder, sentence=None)[source]

Return a dict of context type and object processed from an entry.

indra.sources.hypothesis.processor.parse_grounding_entry(entry)[source]

Return a dict representing single grounding curation entry string.