Welcome to Dis.co’s documentation!

Job

Example:

1
2
3
 my_file_id = disco.upload_file('my_file.py', pathlib.Path('/home/bob/my_file.py'))
 job = disco.Job.create(my_file_id)
 job.start()
class disco.Job(job_id)

A job that runs on DISCO machines.

Every Job object has its own Python script and data files, and runs independently on the cloud, until it produces a result.

archive()

Archive the job, making it unusable.

classmethod create(script_file_id=None, input_file_ids: Union[str, list] = None, constants_file_ids: Union[str, list] = None, job_name=None, cluster_instance_type='s', cluster_id=None, instance_cost=None, script_repo_id=None, script_file_path_in_repo=None, auto_start=False, upload_requirements_file=True, docker_image_id=None)

Creates a new job.

Args:

script_file_id (str): The ID of the script file to run. input_file_ids: A list of IDs of files that will be used as standard input. constants_file_ids: A list of IDs of constants files. job_name: Is a name you can give to your job. Leave empty

to use a random string.

cluster_instance_type: Is the size of instance used. Choose ‘m’

for a medium instance and ‘l’ for a large instance. Use gpu_s, gpu_m, gpu_l for gpu jobs (read more about gpu on disco job create -h). The default is ‘s’ for small.

cluster_id: Specifies the ID of the cluster on which to run the

job. Leave as None to run on DISCO’s cluster.

instace_cost: instance cost type : guaranteed or lowCost. default is None. script_repo_id (str): The ID of the Git repository in which the script file to run is.

(Alternative to script_file_id).

script_file_path_in_repo (str): The path in the Git repository to the script file. auto_start: Automatically start the job upon creation. upload_requirements_file (bool): if True uploads a requirements file if in venv. docker_image_id (str): The ID of the docker image to run the job in.

Returns:

obj: The created job object.

classmethod generate_requirements()

Generates a string of requirements for requirements file Returns:

separated list of requirements

get_details()

Get details about the job.

This includes its name, last activity, status and task states.

Returns:

JobDetails

get_results(block=False, block_timeout=600)

Get the job’s result.

Args:
block (bool): Pass block=True to first wait for the job

to be completed.

block_timeout (int): timeout in seconds.

Returns:

get_status()

Get the job’s status.

Returns (JobStatus):

The status of the job.

get_tasks(limit=None, next_=None)

Get job tasks

Args:

limit (int): next_:

Returns:

list(Task)

classmethod jobs_summary()

Gets a summary of all job statuses.

Returns:

dict: Dictionary [str, int] of status->count

classmethod list_jobs(limit=None, next_=None)

Show a list of all the jobs belonging to this user.

Args:

limit (int): next_:

Returns:

list(JobDetails)

start()

Start the job.

When you run job.start(), the DISCO server will queue the job for execution.

Returns:

obj: The job object.

stop()

Cancels a running job.

When you run job.stop(), the DISCO server will stop running the job and return any results retrieved so far.

classmethod upload_requirements_file(cluster_id)

Uploads requirements file Args:

cluster_id: ID for cluster to upload to

Returns:

ID of requirements file in the DB

wait_for_finish(interval=5, timeout=600)

Wait for a job to finish.

This means waiting until it’s no longer in “Queued” or “Running” statuses.

Args:
interval (int): Interval in seconds to check if the job has

finished running.

timeout (int): Timeout in seconds.

Returns:

The status of the job.

wait_for_status(*expected_statuses, interval=5, timeout=600)

Wait for the job to be in one of the given statuses.

Args:

*expected_statuses (str): List of expected job statuses. interval (int): Interval in seconds to check the job’s status. timeout (int): Timeout in seconds.

Returns:

The status of the job.

Repository

class disco.Repository

Repository methods

classmethod list_repositories(limit=None, next_=None)

Show a list of all the repositories of this user.

Args:

limit (int): pagination limit next_: pagination next

Returns:

list(): List of the repositories of this user.

DockerImage

class disco.DockerImage

Docker image methods

classmethod list_docker_images(limit=None, next_=None)

Show a list of all the docker images of this user.

Args:

limit (int): pagination limit next_: pagination next

Returns:

list(): List of the docker images of this user.

Cluster

class disco.Cluster

Cluster methods

classmethod fetch_and_validate_by_id(cluster_id)

Validate that the cluster_id belongs to the user

Args:

cluster_id (str):

Returns:

False if invalid or ClusterDetails if valid

classmethod list_clusters(limit=None, next_=None)

Show a list of all the clusters applicable for this user.

Args:

limit (int): pagination limit next_: pagination next

Returns:

list(ClusterDetails): List of the clusters belonging to this user.

Asset

class disco.Asset

Provides functionality for uploading and downloading disco files

input_files_from_bucket(bucket_paths, cluster_id)
Args:

bucket_paths (list(str)): cluster_id (str):

Raises:

BucketPathsException: In case the bucket path is invalid or missing or has no files

Returns:

list - List of files IDs registered from the given bucket paths

upload(file_name, file, cluster=None, show_progress_bar=True)

Upload a file to DISCO, so it could later be used to run jobs.

Args:

file_name (str): file: file can be either the file contents,

in binary or string forms, a file object, or a Path` object that points to a file.

cluster (ClusterDetails): show_progress_bar (bool):

Returns:

str: The ID of the uploaded file.

Indices and tables