Storage Types
TensorPool offers two storage volume types:| Feature | Fast Volumes | Flex Volumes |
|---|---|---|
| Cluster Support | Multi-node only (2+ nodes) | All cluster types |
| POSIX Compliant | Yes | No |
Fast Storage Volumes
Fast storage volumes are high-performance NFS-based volumes designed for distributed training on multi-node clusters:- Multi-node clusters only: Requires clusters with 2 or more nodes
- High aggregate performance: Up to 300 GB/s aggregate read throughput, 150 GB/s aggregate write throughput, 1.5M read IOPS, 750k write IOPS
- Fixed volume size: Volume size must be defined when created, it can be increased at any time. See pricing for details.
- Ideal for: datasets for distributed training, storing model checkpoints
| Metric | Performance |
|---|---|
| Read Throughput | 6,000 MB/s |
| Write Throughput | 2,000 MB/s |
| Read IOPS | 6,000 |
| Write IOPS | 2,000 |
| Avg Read Latency | 5ms |
| Avg Write Latency | 15ms |
| p99 Read Latency | 9ms |
| p99 Write Latency | 30ms |
Fast storage volume performance scales with volume size. Larger volumes provide higher throughput and IOPS.
Flex Storage Volumes
Flex storage volumes are flexible object storage backed volumes designed for use on TensorPool clusters that are used as workbenches.- All cluster types: Works with all cluster types, not just multi-node clusters
- Backed by object storage: Cost-effective for large datasets
- Unlimited volume size: Billed on usage with no size limit. See pricing for details.
- Ideal for: Data archival, researcher collaboration, general persistent storage
- Not Ideal for: performance critical workloads, distributed training
| Metric | Performance |
|---|---|
| Read Throughput | 2,300 MB/s |
| Write Throughput | 3,700 MB/s |
| Read IOPS | 2,300 |
| Write IOPS | 3,600 |
| Avg Read Latency | 10ms |
| Avg Write Latency | 8ms |
| p99 Read Latency | 19ms |
| p99 Write Latency | 16ms |
Flex Storage Volumes do not have traditional performance characteristics that you may expect from traditional shared filesystems (like NFS or block storage).Due to the nature of object storage every file operation incurs fixed overhead regardless of file size:
- FUSE overhead: User-space/kernel context switches per syscall
- S3 API overhead: HTTP request/response cycle
touch file.txt translates to 3 S3 API calls being performed (HeadObject,PutObject,ListObjectsV2) under the hood.Traditionally cheap operations like ls are also time-intensive because object storage has no directory hierarchy, listing requires querying all objects with a matching prefix.Flex storage volumes are not POSIX compliant. Unsupported features:
- Hard links
- Setting file permissions (
chmod) - Sticky, set-user-ID (
SUID), and set-group-ID (SGID) bits - Updating the modification timestamp (
mtime) - Creating and using FIFOs (first-in-first-out) pipes
- Creating and using Unix sockets
- Obtaining exclusive file locks
- Unlinking an open file while it is still readable
Moving Data Into Flex Storage Volumes
To maximize performance when copying data into a Flex Storage Volume, we recommend usingrclone with parallelized transfers to take advantage of TensorPool’s optimized fuse mount:
rclone copy transfers multiple files concurrently, significantly improving throughput compared to cp or rsync which transfer files sequentially.
rclone is installed by default on all TensorPool clusters with Flex Storage Volumes.
Core Commands
tp storage create -t <type> [-s <size_gb>]- Create a new storage volumetp storage list- View all your storage volumestp cluster attach <cluster_id> <storage_id>- Attach storage to a clustertp cluster detach <cluster_id> <storage_id>- Detach storage from a clustertp storage destroy <storage_id>- Delete a storage volume
Creating Storage Volumes
Create storage volumes by specifying type (fast or flex) and size:
Attaching and Detaching
Attach storage volumes to a cluster:Fast storage volumes can only be attached to multi-node clusters (clusters with 2 or more nodes). Flex storage works with all cluster types.
Storage Locations
Volume Mount Points
When you attach a storage volume to your cluster, it will be mounted on each instance at:Example Workflow
Storage Statuses
Storage volumes progress through various statuses throughout their lifecycle:| Status | Description |
|---|---|
| PENDING | Storage creation request has been submitted and is being queued for provisioning. |
| PROVISIONING | Storage has been allocated and is being provisioned. |
| READY | Storage is ready for use. |
| ATTACHING | Storage is being attached to a cluster. |
| DETACHING | Storage is being detached from a cluster. |
| DESTROYING | Storage deletion in progress, resources are being deallocated. |
| DESTROYED | Storage has been successfully deleted. |
| FAILED | System-level problem (e.g., no capacity, hardware failure, etc.). |
Best Practices
- Data Persistence: Use storage volumes for important data that needs to persist across cluster lifecycles
- Shared Data: Attach the same storage volume to multiple clusters to share datasets
- Choose the Right Type: Use fast storage for multi-node distributed training workloads; use flex for cost-effective persistent storage
Next Steps
- Learn about storage management workflows
- See the CLI reference for detailed command options