Sparser (indra.sources.sparser
)
Sparser API (indra.sources.sparser.api
)
Provides an API used to run and get Statements from the Sparser reading system.
- indra.sources.sparser.api.get_version()[source]
Return the version of the Sparser executable on the path.
- Returns
version – The version of Sparser that is found on the Sparser path.
- Return type
- indra.sources.sparser.api.make_nxml_from_text(text)[source]
Return raw text wrapped in NXML structure.
- indra.sources.sparser.api.process_json_dict(json_dict)[source]
Return processor with Statements extracted from a Sparser JSON.
- Parameters
json_dict (dict) – The JSON object obtained by reading content with Sparser, using the ‘json’ output mode.
- Returns
sp – A SparserJSONProcessor which has extracted Statements as its statements attribute.
- Return type
SparserJSONProcessor
- indra.sources.sparser.api.process_nxml_file(fname, output_fmt='json', outbuf=None, cleanup=True, **kwargs)[source]
Return processor with Statements extracted by reading an NXML file.
- Parameters
fname (str) – The path to the NXML file to be read.
output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
outbuf (Optional[file]) – A file like object that the Sparser output is written to.
cleanup (Optional[bool]) – If True, the output file created by Sparser is removed. Default: True
- Returns
sp (SparserXMLProcessor or SparserJSONProcessor depending on what output)
format was chosen.
- indra.sources.sparser.api.process_nxml_str(nxml_str, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]
Return processor with Statements extracted by reading an NXML string.
- Parameters
nxml_str (str) – The string value of the NXML-formatted paper to be read.
output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
outbuf (Optional[file]) – A file like object that the Sparser output is written to.
cleanup (Optional[bool]) – If True, the temporary file created in this function, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True
key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.
- Returns
SparserXMLProcessor or SparserJSONProcessor depending on what output
format was chosen.
- indra.sources.sparser.api.process_sparser_output(output_fname, output_fmt='json')[source]
Return a processor with Statements extracted from Sparser XML or JSON
- Parameters
output_fname (str) – The path to the Sparser output file to be processed. The file can either be JSON or XML output from Sparser, with the output_fmt parameter defining what format is assumed to be processed.
output_fmt (Optional[str]) – The format of the Sparser output to be processed, can either be ‘json’ or ‘xml’. Default: ‘json’
- Returns
sp (SparserXMLProcessor or SparserJSONProcessor depending on what output)
format was chosen.
- indra.sources.sparser.api.process_text(text, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]
Return processor with Statements extracted by reading text with Sparser.
- Parameters
text (str) – The text to be processed
output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
outbuf (Optional[file]) – A file like object that the Sparser output is written to.
cleanup (Optional[bool]) – If True, the temporary file created, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True
key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.
- Returns
SparserXMLProcessor or SparserJSONProcessor depending on what output
format was chosen.
- indra.sources.sparser.api.process_xml(xml_str)[source]
Return processor with Statements extracted from a Sparser XML.
- Parameters
xml_str (str) – The XML string obtained by reading content with Sparser, using the ‘xml’ output mode.
- Returns
sp – A SparserXMLProcessor which has extracted Statements as its statements attribute.
- Return type
SparserXMLProcessor
- indra.sources.sparser.api.run_sparser(fname, output_fmt, outbuf=None, timeout=600)[source]
Return the path to reading output after running Sparser reading.
- Parameters
fname (str) – The path to an input file to be processed. Due to the Spaser executable’s assumptions, the file name needs to start with PMC and should be an NXML formatted file.
output_fmt (Optional[str]) – The format in which Sparser should produce its output, can either be ‘json’ or ‘xml’.
outbuf (Optional[file]) – A file like object that the Sparser output is written to.
timeout (int) – The number of seconds to wait until giving up on this one reading. The default is 600 seconds (i.e. 10 minutes). Sparcer is a fast reader and the typical type to read a single full text is a matter of seconds.
- Returns
output_path – The path to the output file created by Sparser.
- Return type