Sparser (indra.sources.sparser)

Sparser API (indra.sources.sparser.api)

Provides an API used to run and get Statements from the Sparser reading system.

indra.sources.sparser.api.process_text(text, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]

Return processor with Statements extracted by reading text with Sparser.

Parameters:
  • text (str) – The text to be processed
  • output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
  • outbuf (Optional[file]) – A file like object that the Sparser output is written to.
  • cleanup (Optional[bool]) – If True, the temporary file created, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True
  • key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.
Returns:

  • SparserXMLProcessor or SparserJSONProcessor depending on what output
  • format was chosen.

indra.sources.sparser.api.process_nxml_str(nxml_str, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]

Return processor with Statements extracted by reading an NXML string.

Parameters:
  • nxml_str (str) – The string value of the NXML-formatted paper to be read.
  • output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
  • outbuf (Optional[file]) – A file like object that the Sparser output is written to.
  • cleanup (Optional[bool]) – If True, the temporary file created in this function, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True
  • key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.
Returns:

  • SparserXMLProcessor or SparserJSONProcessor depending on what output
  • format was chosen.

indra.sources.sparser.api.process_nxml_file(fname, output_fmt='json', outbuf=None, cleanup=True, **kwargs)[source]

Return processor with Statements extracted by reading an NXML file.

Parameters:
  • fname (str) – The path to the NXML file to be read.
  • output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
  • outbuf (Optional[file]) – A file like object that the Sparser output is written to.
  • cleanup (Optional[bool]) – If True, the output file created by Sparser is removed. Default: True
Returns:

  • sp (SparserXMLProcessor or SparserJSONProcessor depending on what output)
  • format was chosen.

indra.sources.sparser.api.process_sparser_output(output_fname, output_fmt='json')[source]

Return a processor with Statements extracted from Sparser XML or JSON

Parameters:
  • output_fname (str) – The path to the Sparser output file to be processed. The file can either be JSON or XML output from Sparser, with the output_fmt parameter defining what format is assumed to be processed.
  • output_fmt (Optional[str]) – The format of the Sparser output to be processed, can either be ‘json’ or ‘xml’. Default: ‘json’
Returns:

  • sp (SparserXMLProcessor or SparserJSONProcessor depending on what output)
  • format was chosen.

indra.sources.sparser.api.process_json_dict(json_dict)[source]

Return processor with Statements extracted from a Sparser JSON.

Parameters:json_dict (dict) – The JSON object obtained by reading content with Sparser, using the ‘json’ output mode.
Returns:sp – A SparserJSONProcessor which has extracted Statements as its statements attribute.
Return type:SparserJSONProcessor
indra.sources.sparser.api.process_xml(xml_str)[source]

Return processor with Statements extracted from a Sparser XML.

Parameters:xml_str (str) – The XML string obtained by reading content with Sparser, using the ‘xml’ output mode.
Returns:sp – A SparserXMLProcessor which has extracted Statements as its statements attribute.
Return type:SparserXMLProcessor
indra.sources.sparser.api.run_sparser(fname, output_fmt, outbuf=None, timeout=600)[source]

Return the path to reading output after running Sparser reading.

Parameters:
  • fname (str) – The path to an input file to be processed. Due to the Spaser executable’s assumptions, the file name needs to start with PMC and should be an NXML formatted file.
  • output_fmt (Optional[str]) – The format in which Sparser should produce its output, can either be ‘json’ or ‘xml’.
  • outbuf (Optional[file]) – A file like object that the Sparser output is written to.
  • timeout (int) – The number of seconds to wait until giving up on this one reading. The default is 600 seconds (i.e. 10 minutes). Sparcer is a fast reader and the typical type to read a single full text is a matter of seconds.
Returns:

output_path – The path to the output file created by Sparser.

Return type:

str

indra.sources.sparser.api.get_version()[source]

Return the version of the Sparser executable on the path.

Returns:version – The version of Sparser that is found on the Sparser path.
Return type:str
indra.sources.sparser.api.make_nxml_from_text(text)[source]

Return raw text wrapped in NXML structure.

Parameters:text (str) – The raw text content to be wrapped in an NXML structure.
Returns:nxml_str – The NXML string wrapping the raw text input.
Return type:str