Thursday, April 30, 2026·4 min read

Stop Using s3fs in Production: Better Alternatives for ML Teams

Training Pipes Team
Developer debugging code on multiple monitors

Every ML team eventually discovers s3fs. It seems magical: mount your S3 bucket as a filesystem with a single command, and your Python code can just open() files as if they were local. GitHub has the project pinned. The README looks credible.

Then you put it in a training pipeline and things get weird.

This post is the warning we wish we'd had. We'll cover what s3fs and its cousins (goofys, rclone mount, Mountpoint for Amazon S3) are actually doing, why they fail in production ML workloads, and what to use instead.

What FUSE-Based S3 Mounts Actually Do

FUSE (Filesystem in Userspace) lets a user-space program pretend to be a filesystem. When your app calls open(), the kernel routes the call to the FUSE daemon, which translates it into whatever the daemon wants — usually an S3 API call.

s3fs-fuse: the classic. Maps files to objects 1:1. Pretends to support POSIX but leaks in obvious ways.

goofys: faster, more honest about not being POSIX. Read-optimized.

rclone mount: bolted on to the rclone sync tool. Functional but not purpose-built.

Mountpoint for Amazon S3: AWS's official answer. Read-mostly, no POSIX modifications supported.

Why They Fail Under Training Load

1. Every open() Is an HTTP Request

Your DataLoader opens thousands of files per second across many workers. FUSE daemons either hit S3 for each one (latency death) or cache metadata locally (stale-read death).

2. Small Reads Are Catastrophic

A PyTorch DataLoader might issue a million reads of 4KB-1MB each per epoch. S3 charges per request. A 10-million-request epoch on S3 Standard is roughly $4 in request fees alone, and that's the good case — the bad case is latency.

3. POSIX Lies

FUSE mounts of object storage advertise POSIX but silently break on:

  • Atomic rename: S3 has no rename. FUSE does copy+delete, which isn't atomic. Checkpointing code that relies on os.rename() for consistency can leave you with corrupt checkpoints.
  • mmap: Good luck. Some daemons emulate it badly; others refuse.
  • File locking: Usually a no-op. If multiple workers think they have a lock, they don't.
  • Directory listings: LIST calls are eventually consistent and paginated. A huge directory can take seconds to enumerate.

4. Kernel-Level Footguns

FUSE deadlocks are a real and miserable thing. A misbehaving daemon can hang any process that touches the mount, requiring a forced unmount or a reboot. This happens more than you'd think under high concurrency.

5. No Real Caching

Most FUSE tools have a tiny cache or no cache. Every read goes back to S3. That's fine for cold data. It's ruinous for iterating over a dataset 50 times in an epoch sweep.

6. Mount Fragility in Containers

In Kubernetes, FUSE mounts require privileged containers or special device access (/dev/fuse). Sidecars, init containers, CSI drivers — it's all plumbing you now own.

The Alternatives

Here's a honest tier list of what to use instead, ranked by how well each scales to production ML training.

Tier 1: Purpose-Built Caching Gateway + NFS

A gateway sits near your compute, caches the hot working set on NVMe, and exposes it over NFS (or SMB). The storage of record stays in object storage.

Pros:

  • Real NFS. No FUSE. No surprises.
  • Caching eliminates per-read S3 round-trips.
  • Works in any container runtime (NFS is a standard mount type).
  • Your training code is unchanged.

Cons:

  • Someone has to run the gateway. (This is what Training Pipes does, so you don't.)

Tier 2: Copy to Local Disk

For small datasets or single-node jobs, just copy the data to local NVMe at job start. Crude but reliable.

Pros:

  • Zero magic.
  • Local disk is fast.

Cons:

  • Doesn't scale beyond local disk capacity.
  • Every job restart re-copies.
  • No good answer for multi-node.

Tier 3: EFS or FSx Lustre

Managed POSIX filesystems from AWS. Real NFS underneath.

Pros:

  • Real POSIX.
  • Scales.

Cons:

  • Priced for storage, not caching — you pay for the whole dataset even if you only read 5%.
  • Data still has to get in to EFS from wherever it originated. Often that's a painful sync.
  • AWS-only.

Tier 4: Roll Your Own

Nginx with a proxy_cache. A Redis-backed read-through. A homegrown distributed cache.

Pros:

  • You learn a lot.

Cons:

  • You now maintain a distributed system that isn't your product.

Tier 5: s3fs / goofys / Mountpoint

For dev boxes and one-off scripts, they're fine. For production ML training with GPU dollars on the line, they're not.

The Honest Answer for Most Teams

If you're running training jobs regularly and GPU idle time matters to your budget, use a caching-gateway-over-NFS pattern. Training Pipes provides this as a managed service — you get NFSv4 (or SMB) mounts backed by regional caching gateways, with S3-compatible API access to the same data when you need it.

Your DataLoader doesn't know or care whether the underlying data is Training Pipes–managed or one of your existing S3 buckets connected via BYO. It just sees a filesystem that's fast.

Stop fighting FUSE and start training →