Skip to main content

Cluster Management

TensorPool makes it easy to deploy and manage GPU clusters of any size, from single GPUs to large multi-node configurations.

Core Commands

Cluster Management

  • tp cluster create - Deploy a new GPU cluster
  • tp cluster list - View all your clusters
  • tp cluster info <cluster_id> - Get detailed information about a cluster
  • tp cluster destroy <cluster_id> - Terminate a cluster

Creating Clusters

Deploy GPU clusters with simple commands. You can create single-node clusters with various GPU configurations, or multi-node clusters for distributed training.

Single Node Examples

# Single H100
tp cluster create -i ~/.ssh/id_ed25519.pub -t 1xH100

# Single node with 8x H200
tp cluster create -i ~/.ssh/id_ed25519.pub -t 8xH200

# Single node with 8x B200
tp cluster create -i ~/.ssh/id_ed25519.pub -t 8xB200

# Single node MI300X
tp cluster create -i ~/.ssh/id_ed25519.pub -t 1xMI300X

Multi-Node Clusters

For distributed training workloads, create multi-node clusters:
# 2-node cluster with 8xH200 each (16 GPUs total)
tp cluster create -i ~/.ssh/id_ed25519.pub -t 8xH200 -n 2

# 4-node cluster with 8xB200 each (32 GPUs total)
tp cluster create -i ~/.ssh/id_ed25519.pub -t 8xB200 -n 4
Multi-node support is currently available for 8xH200 and 8xB200 instance types. More instance types will support multi-node configurations soon.

Accessing Your Cluster

Once your cluster is ready, use the TensorPool CLI to connect:
tp ssh connect <instance_id>
You’ll have full SSH access to your nodes, allowing you to use any tools and frameworks you prefer.

Best Practices

  • Cluster Naming: Use descriptive names for your clusters to easily identify them
  • Cost Management: Destroy clusters when not in use to avoid unnecessary charges
  • Monitoring: Regularly check tp cluster list to monitor your active resources

Next Steps

I