Storage

TensorPool provides high performance persistent storage that can be attached to your clusters.

Storage Types

TensorPool offers two storage volume types:

Feature	Shared Storage Volumes	Object Storage
Cluster Support	CPU & Multi-node only (2+ nodes)	All cluster types
POSIX Compliant	Yes	No

Shared Storage Volumes

Shared storage volumes are high-performance NFS-based volumes designed for distributed training on multi-node clusters:

Multi-node clusters only: Requires clusters with 2 or more nodes
High aggregate performance: Up to 300 GB/s aggregate read throughput, 150 GB/s aggregate write throughput, 1.5M read IOPS, 750k write IOPS
Fixed volume size: Volume size must be defined when created, it can be increased at any time. See pricing for details.
Ideal for: datasets for distributed training, storing model checkpoints

Expected single client performance:

Metric	Performance
Read Throughput	11,000 MB/s
Write Throughput	5,000 MB/s
Read IOPS	10,000
Write IOPS	4,500
Avg Read Latency	2ms
Avg Write Latency	6ms
p99 Read Latency	8ms
p99 Write Latency	20ms

Object Storage

Object storage is flexible storage designed for use on TensorPool clusters that are used as workbenches.

S3 API compatible: Works with any S3-compatible tools and libraries, both on TensorPool clusters and external environments
No ingress/egress fees: Transfer data in and out without additional costs
Globally replicated: Data is automatically replicated across all TensorPool cluster regions (us-west, us-central, us-east, canada, finland, germany) with strong global consistency.
- 99.99% of objects will be globally available in all regions within 15 minutes. Small files will be globally available in milliseconds.
- This means you always get best-case latency as if you were using the closest region, regardless of your client location.
Mount on all cluster types: Supported on all cluster types, not just multi-node clusters
Unlimited volume size: Billed on usage with no size limit. See pricing for details.
Ideal for: Data archives, general persistent storage
- Not Ideal for: performance critical workloads, distributed training

Object storage buckets can optionally be mounted to TensorPool clusters via FUSE. While convenient, FUSE mounts trade performance for filesystem compatibility.

Prefer S3 API tools over FUSE mounts. Using boto3, rclone, or other S3-compatible tools is strongly recommended. FUSE mounts add significant overhead and should only be used when filesystem semantics are absolutely required.

FUSE mount peak single-client performance (cached reads, large files):

Metric	Performance
Read Throughput	2,300 MB/s
Write Throughput	3,700 MB/s
Read IOPS	2,300
Write IOPS	3,600
Avg Read Latency	10ms
Avg Write Latency	8ms
p99 Read Latency	19ms
p99 Write Latency	16ms

Mounted object storage does not have traditional performance characteristics that you may expect from traditional shared filesystems.Due to the request-based nature of object storage every file operation incurs fixed overhead regardless of file size:

FUSE overhead: User-space/kernel context switches per syscall
S3 API overhead: HTTP request/response cycle

For large files this overhead is negligible. For small files (under 100KB), the overhead dominates the operation time. For example, doing a simple touch file.txt translates to 3 S3 API calls being performed (HeadObject,PutObject,ListObjectsV2) under the hood.Traditionally cheap operations like ls are time-intensive because object storage has no directory hierarchy, listing requires querying all objects with a matching prefix.

Mounted object storage buckets are not POSIX compliant. Unsupported features:

Hard links
Setting file permissions (chmod)
Sticky, set-user-ID (SUID), and set-group-ID (SGID) bits
Updating the modification timestamp (mtime)
Creating and using FIFOs (first-in-first-out) pipes
Creating and using Unix sockets
Obtaining exclusive file locks
Unlinking an open file while it is still readable

While symlinks are supported, their use is discouraged. Symlink targets may not exist across all clusters, which can cause unexpected behavior.The use of small files (under 100KB) is discouraged due to the request based nature of object storage.Setting up Python virtual environments within an object storage bucket is not recommended due to virtual environment’s use of symlinks and large number (~1000) of small files.

Core Commands

tp storage create -t <type> [-s <size_gb>] - Create a new storage volume
tp storage list - View all your storage volumes
tp cluster attach <cluster_id> <storage_id> - Attach storage to a cluster
tp cluster detach <cluster_id> <storage_id> - Detach storage from a cluster
tp storage destroy <storage_id> - Delete a storage volume

Creating Storage Volumes

Create storage volumes by specifying type (shared or object) and size:

# Create a 500GB shared storage volume
tp storage create -t shared -s 500 --name training-data

# Create an object storage volume (size not required)
tp storage create -t object --name models

Attaching and Detaching

Attach storage volumes to a cluster:

tp cluster attach <cluster_id> <storage_id>

Detach when you’re done:

tp cluster detach <cluster_id> <storage_id>

Shared storage volumes can only be attached to multi-node clusters (clusters with 2 or more nodes). Object storage buckets can be mounted to all cluster types.

Storage Locations

Volume Mount Points

When you attach a storage volume to your cluster, it will be mounted on each instance at:

/mnt/<storage-type>-<storage_id>

Example Workflow

# 1. Create a 1TB shared storage volume
tp storage create -t shared -s 1000 --name dataset

# 2. Attach the volume to a cluster
tp cluster attach <cluster_id> <storage_id>

# 3. SSH into your cluster and access the data
tp ssh <instance_id>
cd /mnt/shared-<storage_id>

# 4. When done, detach the volume
tp cluster detach <cluster_id> <storage_id>

# 5. Destroy the volume when no longer needed
tp storage destroy <storage_id>

Storage Statuses

Storage volumes progress through various statuses throughout their lifecycle:

Status	Description
PENDING	Storage creation request has been submitted and is being queued for provisioning.
PROVISIONING	Storage has been allocated and is being provisioned.
READY	Storage is ready for use.
ATTACHING	Storage is being attached to a cluster.
DETACHING	Storage is being detached from a cluster.
DESTROYING	Storage deletion in progress, resources are being deallocated.
DESTROYED	Storage has been successfully deleted.
FAILED	System-level problem (e.g., no capacity, hardware failure, etc.).

Best Practices

Data persistence: Use storage volumes for important data that needs to persist across cluster lifecycles
Shared data: Attach the same storage volume to multiple clusters to share data
Choose the right storage type: Use shared storage for multi-node distributed training workloads; use object storage for cost-effective persistent storage

Next Steps

Learn about storage management workflows
See the CLI reference for detailed command options

Getting Started

Core Features

CLI Reference

Resources

Storage Types

Shared Storage Volumes

Object Storage

Core Commands

Creating Storage Volumes

Attaching and Detaching

Storage Locations

Volume Mount Points

Example Workflow

Storage Statuses

Best Practices

Next Steps

Getting Started

Core Features

CLI Reference

Resources

​Storage Types

​Shared Storage Volumes

​Object Storage

​Core Commands

​Creating Storage Volumes

​Attaching and Detaching

​Storage Locations

​Volume Mount Points

​Example Workflow

​Storage Statuses

​Best Practices

​Next Steps

Storage Types

Shared Storage Volumes

Object Storage

Core Commands

Creating Storage Volumes

Attaching and Detaching

Storage Locations

Volume Mount Points

Example Workflow

Storage Statuses

Best Practices

Next Steps