Util (indra.util)

Utilities for using AWS (indra.util.aws)

indra.util.aws.dump_logs(job_queue='run_reach_queue', job_status='RUNNING')[source]

Write logs for all jobs with given the status to files.

indra.util.aws.get_batch_command(command_list, project=None, purpose=None)[source]

Get the command appropriate for running something on batch.

indra.util.aws.get_job_log(job_info, log_group_name='/aws/batch/job', write_file=True, verbose=False)[source]

Gets the Cloudwatch log associated with the given job.

  • job_info (dict) – dict containing entries for ‘jobName’ and ‘jobId’, e.g., as returned by get_jobs()
  • log_group_name (string) – Name of the log group; defaults to ‘/aws/batch/job’
  • write_file (boolean) – If True, writes the downloaded log to a text file with the filename ‘%s_%s.log’ % (job_name, job_id)

The event messages in the log, with the earliest events listed first.

Return type:

list of strings

indra.util.aws.get_jobs(job_queue='run_reach_queue', job_status='RUNNING')[source]

Returns a list of dicts with jobName and jobId for each job with the given status.

indra.util.aws.get_log_by_name(log_group_name, log_stream_name, out_file=None, verbose=True)[source]

Download a log given the log’s group and stream name.

  • log_group_name (str) – The name of the log group, e.g. /aws/batch/job.
  • log_stream_name (str) – The name of the log stream, e.g. run_reach_jobdef/default/<UUID>

lines – The lines of the log as a list.

Return type:


indra.util.aws.get_s3_file_tree(s3, bucket, prefix)[source]

Overcome s3 response limit and return NestedDict tree of paths.

The NestedDict object also allows the user to search by the ends of a path.

The tree mimics a file directory structure, with the leave nodes being the full unbroken key. For example, ‘path/to/file.txt’ would be retrieved by


The NestedDict object returned also has the capability to get paths that lead to a certain value. So if you wanted all paths that lead to something called ‘file.txt’, you could use


For more details, see the NestedDict docs.

indra.util.aws.iter_s3_keys(s3, bucket, prefix)[source]

Iterate over the keys in an s3 bucket given a prefix.

indra.util.aws.kill_all(job_queue, reason='None given', states=None, kill_list=None)[source]

Terminates/cancels all jobs on the specified queue.

  • job_queue (str) – The name of the Batch job queue on which you wish to terminate/cancel jobs.
  • reason (str) – Provide a reason for the kill that will be recorded with the job’s record on AWS.
  • states (None or list[str]) – A list of job states to remove. Possible states are ‘STARTING’, ‘RUNNABLE’, and ‘RUNNING’. If None, all jobs in all states will be ended (modulo the kill_list below).
  • kill_list (None or list[dict]) – A list of job dictionaries (as returned by the submit function) that you specifically wish to kill. All other jobs on the queue will be ignored. If None, all jobs on the queue will be ended (modulo the above).

killed_ids – A list of the job ids for jobs that were killed.

Return type:


indra.util.aws.tag_instance(instance_id, **tags)[source]

Tag a single ec2 instance.

indra.util.aws.tag_myself(project='cwc', **other_tags)[source]

Function run when indra is used in an EC2 instance to apply tags.

A utility to get the INDRA version (indra.util.get_version)

This tool provides a uniform method for createing a robust indra version string, both from within python and from commandline. If possible, the version will include the git commit hash. Otherwise, the version will be marked with ‘UNHASHED’.


Get a dict with useful git info.

indra.util.get_version.get_version(with_git_hash=True, refresh_hash=False)[source]

Get an indra version string, including a git hash.

Define NestedDict (indra.util.nested_dict)

class indra.util.nested_dict.NestedDict[source]

A dict-like object that recursively populates elements of a dict.

More specifically, this acts like a recursive defaultdict, allowing, for example:

>> nd = NestedDict() >> nd[‘a’][‘b’][‘c’] = ‘foo’

In addition, useful methods have been defined that allow the user to search the data structure. Note that the are not particularly optimized methods at this time. However, for convenience, you can for example simply call get_path to get the path to a particular key:

>> nd.get_path(‘c’) ((‘a’, ‘b’, ‘c’), ‘foo’)

and the value at that key. Similarly:

>> nd.get_path(‘b’) ((‘a’, ‘b’), NestedDict(

‘c’: ‘foo’


get, gets, and get_paths operate on similar principles, and are documented below.


Convert this into an ordinary dict (of dicts).


Find the first value within the tree which has the key.


Get the deepest entries as a flat set.


Like get, but also return the path taken to the value.


Like gets, but include the paths, like get_path for all matches.


Like get, but return all matches, not just the first.

Some shorthands for plot formatting (indra.util.plot_formatting)

indra.util.plot_formatting.format_axis(ax, label_padding=2, tick_padding=0, yticks_position='left')[source]

Set standardized axis formatting for figure.


Set standardized font properties for figure.