Sparser (indra.sources.sparser)

Sparser API (indra.sources.sparser.api)

Provides an API used to run and get Statements from the Sparser reading system.

indra.sources.sparser.api.get_version()[source]

Return the version of the Sparser executable on the path.

Returns

version – The version of Sparser that is found on the Sparser path.

Return type

str

indra.sources.sparser.api.make_nxml_from_text(text)[source]

Return raw text wrapped in NXML structure.

Parameters

text (str) – The raw text content to be wrapped in an NXML structure.

Returns

nxml_str – The NXML string wrapping the raw text input.

Return type

str

indra.sources.sparser.api.process_json_dict(json_dict)[source]

Return processor with Statements extracted from a Sparser JSON.

Parameters

json_dict (dict) – The JSON object obtained by reading content with Sparser, using the ‘json’ output mode.

Returns

sp – A SparserJSONProcessor which has extracted Statements as its statements attribute.

Return type

SparserJSONProcessor

indra.sources.sparser.api.process_nxml_file(fname, output_fmt='json', outbuf=None, cleanup=True, **kwargs)[source]

Return processor with Statements extracted by reading an NXML file.

Parameters
  • fname (str) – The path to the NXML file to be read.

  • output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’

  • outbuf (Optional[file]) – A file like object that the Sparser output is written to.

  • cleanup (Optional[bool]) – If True, the output file created by Sparser is removed. Default: True

Returns

  • sp (SparserXMLProcessor or SparserJSONProcessor depending on what output)

  • format was chosen.

indra.sources.sparser.api.process_nxml_str(nxml_str, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]

Return processor with Statements extracted by reading an NXML string.

Parameters
  • nxml_str (str) – The string value of the NXML-formatted paper to be read.

  • output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’

  • outbuf (Optional[file]) – A file like object that the Sparser output is written to.

  • cleanup (Optional[bool]) – If True, the temporary file created in this function, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True

  • key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.

Returns

  • SparserXMLProcessor or SparserJSONProcessor depending on what output

  • format was chosen.

indra.sources.sparser.api.process_sparser_output(output_fname, output_fmt='json')[source]

Return a processor with Statements extracted from Sparser XML or JSON

Parameters
  • output_fname (str) – The path to the Sparser output file to be processed. The file can either be JSON or XML output from Sparser, with the output_fmt parameter defining what format is assumed to be processed.

  • output_fmt (Optional[str]) – The format of the Sparser output to be processed, can either be ‘json’ or ‘xml’. Default: ‘json’

Returns

  • sp (SparserXMLProcessor or SparserJSONProcessor depending on what output)

  • format was chosen.

indra.sources.sparser.api.process_text(text, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]

Return processor with Statements extracted by reading text with Sparser.

Parameters
  • text (str) – The text to be processed

  • output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’

  • outbuf (Optional[file]) – A file like object that the Sparser output is written to.

  • cleanup (Optional[bool]) – If True, the temporary file created, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True

  • key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.

Returns

  • SparserXMLProcessor or SparserJSONProcessor depending on what output

  • format was chosen.

indra.sources.sparser.api.process_xml(xml_str)[source]

Return processor with Statements extracted from a Sparser XML.

Parameters

xml_str (str) – The XML string obtained by reading content with Sparser, using the ‘xml’ output mode.

Returns

sp – A SparserXMLProcessor which has extracted Statements as its statements attribute.

Return type

SparserXMLProcessor

indra.sources.sparser.api.run_sparser(fname, output_fmt, outbuf=None, timeout=600)[source]

Return the path to reading output after running Sparser reading.

Parameters
  • fname (str) – The path to an input file to be processed. Due to the Spaser executable’s assumptions, the file name needs to start with PMC and should be an NXML formatted file.

  • output_fmt (Optional[str]) – The format in which Sparser should produce its output, can either be ‘json’ or ‘xml’.

  • outbuf (Optional[file]) – A file like object that the Sparser output is written to.

  • timeout (int) – The number of seconds to wait until giving up on this one reading. The default is 600 seconds (i.e. 10 minutes). Sparcer is a fast reader and the typical type to read a single full text is a matter of seconds.

Returns

output_path – The path to the output file created by Sparser.

Return type

str