Cluster Management
TensorPool makes it easy to deploy and manage GPU clusters of any size, from single GPUs to large multi-node configurations.Core Commands
Cluster Management
tp cluster create- Deploy a new GPU clustertp cluster list- View all your clusterstp cluster info <cluster_id>- Get detailed information about a clustertp cluster edit <cluster_id>- Edit cluster settingstp cluster destroy <cluster_id>- Terminate a cluster
Creating Clusters
Deploy GPU clusters with simple commands. You can create single-node clusters with various GPU configurations, or multi-node clusters for distributed training.Single Node Examples
Multi-Node Clusters
For distributed training workloads, create multi-node clusters:Multi-node support is currently available for 8xH200 and 8xB200 instance types.
Accessing Your Cluster
Once your cluster is ready, use the TensorPool CLI to connect to the instances within the cluster:Cluster and Instance Statuses
A cluster’s status is derived from the statuses of its individual instances. Each instance within a cluster progresses through its own lifecycle, and the cluster’s displayed status reflects the highest-priority status among all its instances.Instance Status Lifecycle
Each instance in a cluster follows this lifecycle:Status Definitions
| Status | Description |
|---|---|
| PENDING | Instance creation request has been submitted and is being queued for provisioning. |
| PROVISIONING | Instance has been allocated and is being provisioned. |
| CONFIGURING | Instance is being configured with software, drivers, networking, and storage. |
| RUNNING | Instance is ready for use. |
| DESTROYING | Instance shutdown in progress, resources are being deallocated. |
| DESTROYED | Instance has been successfully terminated. |
| FAILED | System-level problem (e.g., hardware failure, no capacity). |
Cluster Status Priority
A cluster’s status is determined by the highest-priority status among its instances. Priority order (highest to lowest):- FAILED - Any failed instance causes the cluster to show as failed
- DESTROYING - Cluster is being torn down
- PENDING - Instances are waiting to be provisioned
- PROVISIONING - Instances are being provisioned
- CONFIGURING - Instances are being configured
- RUNNING - All instances are running
- DESTROYED - All instances have been terminated
RUNNING and 1 is CONFIGURING, the cluster status will show as CONFIGURING.
Next Steps
- Explore instance types available
- Learn about NFS storage for persistent data
- Read the CLI reference for detailed command options