job run
Run a job in Studio.
Synopsis
usage: datachain job run [-h] [-v] [-q] [--team TEAM] [--env-file ENV_FILE] [--env ENV [ENV ...]] [--cluster CLUSTER] [--workers WORKERS]
[--files FILES [FILES ...]] [--python-version PYTHON_VERSION] [--repository REPOSITORY] [--req-file REQ_FILE] [--req REQ [REQ ...]]
[--priority PRIORITY] [--start-time START_TIME] [--cron CRON]
file
Description
This command runs a job in Studio using the specified query file. You can configure various aspects of the job including environment variables, Python version, dependencies, and more. When using --start-time or --cron, the job is scheduled as a task and will not show logs immediately. The job will be executed according to the schedule.
Arguments
file
- Query file to run
Options
--team TEAM
- Team to run job for (default: from config)--env-file ENV_FILE
- File with environment variables for the job--env ENV
- Environment variables in KEY=VALUE format--cluster CLUSTER
- Compute cluster to run the job on--workers WORKERS
- Number of workers for the job--files FILES
- Additional files to include in the job--python-version PYTHON_VERSION
- Python version for the job (e.g., 3.9, 3.10, 3.11)--repository REPOSITORY
- Repository URL to clone before running the job--req-file REQ_FILE
- Python requirements file--req REQ
- Python package requirements--priority PRIORITY
- Priority for the job in range 0-5. Lower value is higher priority (default: 5)--start-time START_TIME
- Start time in ISO format or natural language for the cron task.--cron CRON
- Cron expression for the cron task.-h
,--help
- Show the help message and exit.-v
,--verbose
- Be verbose.-q
,--quiet
- Be quiet.
Examples
-
Run a basic job:
-
Run a job with specific team and Python version:
-
Run a job with environment variables and requirements:
-
Run a job with multiple workers and additional files:
-
Run a job with inline environment variables and package requirements:
-
Run a job with a repository (will be cloned in the job working directory):
datachain job run --repository https://github.com/iterative/datachain query.py # To specify a branch / revision: datachain job run --repository https://github.com/iterative/datachain@main query.py # Git URLs are also supported: datachain job run --repository git@github.com:iterative/datachain.git@main query.py
-
Run a job with higher priority
-
Run a job in a specific cluster
-
Schedule a job to run once at a specific time
# Run job tomorrow at 3pm datachain job run --start-time "tomorrow 3pm" query.py # Run job in 2 hours datachain job run --start-time "in 2 hours" query.py # Run job on Monday at 9am datachain job run --start-time "monday 9am" query.py # Run job at a specific date and time datachain job run --start-time "2024-01-15 14:30:00" query.py
-
Schedule a recurring job using cron expression
-
Schedule a recurring job with a start time
Notes
- Closing the logs command (e.g., with Ctrl+C) will only stop displaying the logs but will not cancel the job execution
- To cancel a running job, use the
datachain job cancel
command - The job will continue running in Studio even after you stop viewing the logs
- You can get the list of compute clusters using
datachain job clusters
command. - When using
--start-time
or--cron
options, the job is scheduled as a task and will not show logs immediately. The job will be executed according to the schedule. - The
--start-time
option supports natural language parsing using the dateparser library, allowing flexible time expressions like "tomorrow 3pm", "in 2 hours", "monday 9am", etc. - Cron expressions follow the standard format: minute hour day-of-month month day-of-week (e.g., "0 0 * * *" for daily at midnight) or Vixie cron-style “@” keyword expressions.
- Following options for Vixie cron-style expressions are supported:
- @midnight
- @hourly
- @daily
- @weekly
- @monthly
- @yearly
- @annually