Comparison

s3fs vs Training Pipes

s3fs is a fine prototype tool. In a real ML training pipeline, it falls over. Here's what you lose with FUSE and what you get from a proper NFS gateway instead.

Short answer

s3fs-fuse implements a filesystem-like view of an S3 bucket using FUSE — every filesystem operation on every node becomes an HTTP request to S3, with limited POSIX semantics and no shared cache. For exploration and light scripts, it works. For GPU training that re-reads datasets and checkpoints large models, POSIX breaks silently, costs compound per-request, and tail latencies crush GPU utilization. Training Pipes runs a real NFS gateway with an NVMe cache between your compute and object storage, so your training code sees a proper filesystem backed by a shared cache.

Feature-by-feature comparison

Features3fsTraining Pipes
ProtocolFUSE → HTTP (S3 API)NFSv4 / SMB → gateway → S3
POSIX atomic renameEmulated (unsafe)Yes
POSIX file lockingNo-opYes
mmap supportUnreliableYes
Shared cache across nodes
Shared cache is the biggest difference for multi-node training.
NoYes
Prefetch / preloadNoYes
Container/Kubernetes supportRequires privileged podsStandard NFS CSI
Per-read S3 request costEvery read on cold cacheOnly on cache miss
Tail latency under load100ms-1s+Sub-ms on cache hit
Works with BYO S3/GCS/AzureS3-compatible onlyYes
Managed serviceNoYes

When to use s3fs

  • Ad-hoc scripts on a dev workstation.
  • Exploring bucket contents interactively.
  • One-off data loads where latency doesn't matter.
  • Single-node jobs with short runtime.

When to use Training Pipes

  • Multi-node GPU training.
  • Workloads that re-read datasets across epochs.
  • Anywhere POSIX semantics (locks, atomic rename, mmap) actually matter.
  • Kubernetes environments where privileged FUSE pods are off-limits.
  • Any time S3 request and egress charges are showing up on your bill.

The verdict

If you're running training workloads on cloud GPUs and your data lives in object storage, s3fs is the wrong tool. A real NFS gateway with caching gives you higher GPU utilization, correct POSIX semantics, and dramatically lower egress/request costs. Keep s3fs for dev boxes and ad-hoc scripts — not for the hot path of training.

Frequently asked questions

Why is s3fs a problem in production?
Three reasons: it implements POSIX poorly (rename isn't atomic, locks don't work, mmap is flaky), it has no shared cache so every node fetches data independently, and it requires per-pod FUSE setup in containers which breaks in many Kubernetes environments.
Is Mountpoint for Amazon S3 any better?
Somewhat. It's faster and more stable than s3fs for read-heavy sequential access, but it still has limited POSIX support (no random writes, no atomic rename in the general case) and no shared cache across nodes. It's still FUSE-based.
Can I use Training Pipes with my existing S3 bucket?
Yes — this is the BYO connections feature. Point Training Pipes at your S3 bucket, and we expose it via NFS/SMB through a regional gateway. Your data stays in your S3 bucket; only the gateway middleware is ours.
What happens on a cache miss?
The gateway fetches from the backing object store, caches the result on local NVMe, and returns it to the client. Subsequent reads of the same data hit the cache. The first read has object-storage latency (~30-50ms); subsequent reads are sub-ms.
Do I still need to manage S3 directly?
For managed buckets: no, we handle it. For BYO connections: you still own the bucket, its lifecycle rules, and its permissions. The gateway reads and (optionally) writes via scoped credentials you provide.

Related reading

Ready to try Training Pipes?

Spin up a regional NFS file system in five minutes. Free tier available.