Util (indra.util)

Utilities for using AWS (indra.util.aws)

class indra.util.aws.JobLog(job_info, log_group_name='/aws/batch/job', verbose=False, append_dumps=True)[source]

Gets the Cloudwatch log associated with the given job.

Parameters:
  • job_info (dict) – dict containing entries for ‘jobName’ and ‘jobId’, e.g., as returned by get_jobs()
  • log_group_name (string) – Name of the log group; defaults to ‘/aws/batch/job’
Returns:

The event messages in the log, with the earliest events listed first.

Return type:

list of strings

dump(out_file, append=None)[source]

Dump the logs in their entirety to the specified file.

load(out_file)[source]

Load the log lines from the cached files.

indra.util.aws.dump_logs(job_queue='run_reach_queue', job_status='RUNNING')[source]

Write logs for all jobs with given the status to files.

indra.util.aws.get_batch_command(command_list, project=None, purpose=None)[source]

Get the command appropriate for running something on batch.

indra.util.aws.get_date_from_str(date_str)[source]

Get a utc datetime object from a string of format %Y-%m-%d-%H-%M-%S

Parameters:date_str (str) – A string of the format %Y(-%m-%d-%H-%M-%S). The string is assumed to represent a UTC time.
Returns:
Return type:datetime.datetime
indra.util.aws.get_jobs(job_queue='run_reach_queue', job_status='RUNNING')[source]

Returns a list of dicts with jobName and jobId for each job with the given status.

indra.util.aws.get_s3_client(unsigned=True)[source]

Return a boto3 S3 client with optional unsigned config.

Parameters:unsigned (Optional[bool]) – If True, the client will be using unsigned mode in which public resources can be accessed without credentials. Default: True
Returns:A client object to AWS S3.
Return type:botocore.client.S3
indra.util.aws.get_s3_file_tree(s3, bucket, prefix, date_cutoff=None, after=True, with_dt=False)[source]

Overcome s3 response limit and return NestedDict tree of paths.

The NestedDict object also allows the user to search by the ends of a path.

The tree mimics a file directory structure, with the leave nodes being the full unbroken key. For example, ‘path/to/file.txt’ would be retrieved by

ret[‘path’][‘to’][‘file.txt’][‘key’]

The NestedDict object returned also has the capability to get paths that lead to a certain value. So if you wanted all paths that lead to something called ‘file.txt’, you could use

ret.get_paths(‘file.txt’)

For more details, see the NestedDict docs.

Parameters:
  • s3 (boto3.client.S3) – A boto3.client.S3 instance
  • bucket (str) – The name of the bucket to list objects in
  • prefix (str) – The prefix filtering of the objects for list
  • date_cutoff (str|datetime.datetime) – A datestring of format %Y(-%m-%d-%H-%M-%S) or a datetime.datetime object. The date is assumed to be in UTC. By default no filtering is done. Default: None.
  • after (bool) – If True, only return objects after the given date cutoff. Otherwise, return objects before. Default: True
  • with_dt (bool) – If True, yield a tuple (key, datetime.datetime(LastModified)) of the s3 Key and the object’s LastModified date as a datetime.datetime object, only yield s3 key otherwise. Default: False.
Returns:

A file tree represented as an NestedDict

Return type:

NestedDict

indra.util.aws.iter_s3_keys(s3, bucket, prefix, date_cutoff=None, after=True, with_dt=False, do_retry=True)[source]

Iterate over the keys in an s3 bucket given a prefix

Parameters:
  • s3 (boto3.client.S3) – A boto3.client.S3 instance
  • bucket (str) – The name of the bucket to list objects in
  • prefix (str) – The prefix filtering of the objects for list
  • date_cutoff (str|datetime.datetime) – A datestring of format %Y(-%m-%d-%H-%M-%S) or a datetime.datetime object. The date is assumed to be in UTC. By default no filtering is done. Default: None.
  • after (bool) – If True, only return objects after the given date cutoff. Otherwise, return objects before. Default: True
  • with_dt (bool) – If True, yield a tuple (key, datetime.datetime(LastModified)) of the s3 Key and the object’s LastModified date as a datetime.datetime object, only yield s3 key otherwise. Default: False.
  • do_retry (bool) – If True, and no contents appear, try again in case there was simply a brief lag. If False, do not retry, and just accept the “directory” is empty.
Returns:

An iterator over s3 keys or (key, LastModified) tuples.

Return type:

iterator[key]|iterator[(key, datetime.datetime)]

indra.util.aws.kill_all(job_queue, reason='None given', states=None, kill_list=None)[source]

Terminates/cancels all jobs on the specified queue.

Parameters:
  • job_queue (str) – The name of the Batch job queue on which you wish to terminate/cancel jobs.
  • reason (str) – Provide a reason for the kill that will be recorded with the job’s record on AWS.
  • states (None or list[str]) – A list of job states to remove. Possible states are ‘STARTING’, ‘RUNNABLE’, and ‘RUNNING’. If None, all jobs in all states will be ended (modulo the kill_list below).
  • kill_list (None or list[dict]) – A list of job dictionaries (as returned by the submit function) that you specifically wish to kill. All other jobs on the queue will be ignored. If None, all jobs on the queue will be ended (modulo the above).
Returns:

killed_ids – A list of the job ids for jobs that were killed.

Return type:

list[str]

indra.util.aws.rename_s3_prefix(s3, bucket, old_prefix, new_prefix)[source]

Change an s3 prefix within the same bucket.

indra.util.aws.tag_instance(instance_id, **tags)[source]

Tag a single ec2 instance.

indra.util.aws.tag_myself(project='cwc', **other_tags)[source]

Function run when indra is used in an EC2 instance to apply tags.

A utility to get the INDRA version (indra.util.get_version)

This tool provides a uniform method for createing a robust indra version string, both from within python and from commandline. If possible, the version will include the git commit hash. Otherwise, the version will be marked with ‘UNHASHED’.

indra.util.get_version.get_git_info()[source]

Get a dict with useful git info.

indra.util.get_version.get_version(with_git_hash=True, refresh_hash=False)[source]

Get an indra version string, including a git hash.

Define NestedDict (indra.util.nested_dict)

class indra.util.nested_dict.NestedDict[source]

A dict-like object that recursively populates elements of a dict.

More specifically, this acts like a recursive defaultdict, allowing, for example:

>> nd = NestedDict() >> nd[‘a’][‘b’][‘c’] = ‘foo’

In addition, useful methods have been defined that allow the user to search the data structure. Note that the are not particularly optimized methods at this time. However, for convenience, you can for example simply call get_path to get the path to a particular key:

>> nd.get_path(‘c’) ((‘a’, ‘b’, ‘c’), ‘foo’)

and the value at that key. Similarly:

>> nd.get_path(‘b’) ((‘a’, ‘b’), NestedDict(

‘c’: ‘foo’

))

get, gets, and get_paths operate on similar principles, and are documented below.

export_dict()[source]

Convert this into an ordinary dict (of dicts).

get(key)[source]

Find the first value within the tree which has the key.

get_leaves()[source]

Get the deepest entries as a flat set.

get_path(key)[source]

Like get, but also return the path taken to the value.

get_paths(key)[source]

Like gets, but include the paths, like get_path for all matches.

gets(key)[source]

Like get, but return all matches, not just the first.

Some shorthands for plot formatting (indra.util.plot_formatting)

indra.util.plot_formatting.format_axis(ax, label_padding=2, tick_padding=0, yticks_position='left')[source]

Set standardized axis formatting for figure.

indra.util.plot_formatting.set_fig_params()[source]

Set standardized font properties for figure.