Skip to main content
Jobs are ideal for running fire-and-forget experiments. If you’re starting a new project, using a cluster for interactive development is recommended instead.
Make sure you’ve installed the TensorPool CLI and configured your API key.

Initialize a Job

Create a job configuration file in your project directory:
tp job init
This prompts for a job name and creates a {job}.tp.toml file that defines your training job.

Configure Your Job

Edit the generated {job}.tp.toml file to specify your training commands:
commands = [
    "pip install -r requirements.txt",
    "python train.py --epochs 100",
]

outputs = [
    "checkpoints/",
    "model.pth",
    "results.json",
]

ignore = [
    ".venv",
    "venv/",
    "__pycache__/",
    ".git",
    "*.pyc",
]

Create or Select a Cluster

Jobs run on existing TensorPool clusters. Create one if you don’t have one already:
tp cluster create 8xB200
Or list your existing clusters:
tp cluster list
Note the cluster ID — you’ll need it in the next step.

Submit Your Job

Push your job to a cluster:
tp job push {job_name.tp.toml} <cluster_id>
Your code will be uploaded and executed on the cluster. You’ll receive a job ID to track progress.Use --teardown to automatically destroy the cluster when the job finishes:
tp job push train.tp.toml <cluster_id> --teardown

Monitor Your Job

Stream real-time logs from your running job:
tp job listen <job_id>
Check job status and details:
tp job info <job_id>

Download Results

Once your job completes, pull the output files:
tp job pull <job_id>
This downloads all files specified in the outputs section of your configuration.

Next Steps