"POSIX filesystem on S3" is one of those phrases that can mean five different things depending on who's saying it. A FUSE daemon, a distributed filesystem, a protocol gateway, a lift-and-shift, or an outright lie. This post sorts them out.

What POSIX Actually Requires

POSIX is the Unix-y filesystem contract that almost every tool assumes. The parts that matter for real workloads:

Hierarchical namespace with inodes, permissions, owners
open() / read() / write() / close() with byte-level offsets
Atomic rename() within a directory
mmap() — mapping a file into memory
Advisory locks (flock, fcntl)
Directory listing with consistent results
Hard and symbolic links
Sparse files
fsync() guarantees

Most code doesn't use all of these. Most code assumes they exist.

What Object Storage Gives You

S3 and friends expose a completely different model:

Flat key-value namespace
PUT entire object or ranged PUT for multipart
GET with optional byte range
LIST with prefix and pagination
HEAD for metadata
No rename, no mmap, no links, no locks

Everything above the raw object API has to be emulated if you want POSIX on top.

The Approaches

Option A: FUSE Emulation

Daemons like s3fs-fuse pretend to be a filesystem in userspace and translate every call to S3 API calls. Covered in depth in our FUSE limitations post. Short version: fine for dev, bad for production.

Option B: Fully Rewritten Filesystems (JuiceFS, Alluxio, BeeGFS)

Projects like JuiceFS, Alluxio, and Ceph-backed filesystems decouple metadata from data. Metadata lives in a fast database (Redis, etcd, TiKV), data blocks live in object storage. You get real POSIX. You also get:

A metadata service you have to run and scale
A consistency model that's all-new to your team
Client libraries instead of (or alongside) kernel mounts
Complex failure modes when the metadata server is slow

These are legitimate architectures, but they're not "mount S3 and go."

Option C: NFS Gateway

A gateway server implements NFSv4 and translates to object storage behind the scenes. Clients mount standard NFS with zero custom code. The gateway handles:

Filename to object-key mapping
Metadata caching
Data caching
Atomic rename (via a short-lived lock + multi-step copy)
Protocol translation

This is the architecture Training Pipes uses. The gateway sees NFS operations, the object store sees S3 operations, clients see a filesystem.

Option D: Managed Cloud Filesystems (EFS, FSx)

Not actually object-backed — they're their own storage systems, with their own capacity and throughput billing. If you want "POSIX on S3" the answer here is "they're separate products; sync between them."

What Breaks, and How Each Approach Handles It

Atomic Rename

POSIX: rename("a", "b") is atomic.

S3: no rename exists. Copy + delete is not atomic.

FUSE: often broken under concurrency. Don't checkpoint over FUSE mounts.
Gateway: can be made atomic via metadata-level rename (cheap) with a backing key rewrite deferred.
JuiceFS/Alluxio: atomic, because metadata is the source of truth.
EFS: atomic, it's real NFS.

mmap

POSIX: map a file into virtual memory, page on demand.

S3: no concept of it.

FUSE: unreliable. Some daemons cache the whole file on first touch.
Gateway: if the cached copy is on local NVMe, mmap works against the cache.
EFS: works, but slow over network.

Directory Listings

POSIX: readdir() returns a consistent snapshot.

S3: LIST is eventually consistent, paginated, and priced per call.

FUSE: slow for big directories, sometimes inconsistent.
Gateway: caches listings locally for correctness and speed.
JuiceFS/Alluxio: instant, metadata is separate.

Locks

POSIX: flock and fcntl let processes coordinate.

S3: no lock primitive.

FUSE: usually no-op.
Gateway: can implement NFSv4 state locking server-side.
EFS: full lock support.

Consistency

POSIX: strict within a host, well-defined across NFS mounts.

S3: read-after-write for new objects (strong as of 2020). Read-after-overwrite is strong in modern S3, but varies across S3-compatible implementations.

All non-trivial architectures above layer in their own consistency logic on top.

What ML Workloads Actually Need

ML training is (mercifully) forgiving of a subset of POSIX:

Required:

Read files by name
Stat (for size/exists checks)
Directory listing
mmap on large shards (for some frameworks)
Atomic write-then-rename for checkpoints

Nice to have:

Hard links (for efficient "copy" of snapshots)
Locking (multi-worker coordination)

Usually unneeded:

POSIX ACLs beyond owner/group/other
Sparse files
Named pipes, device nodes

A well-designed NFS gateway over object storage hits the "required" list cleanly, and usually the "nice to have" list too. FUSE-based tools often miss parts of the required list silently.

The Training Pipes Approach

We pick the NFS-gateway model because it's the one that:

Gives training code an honest filesystem (not a polite lie)
Doesn't require a metadata service you have to babysit
Works with unmodified clients (standard mount -t nfs4)
Lets you use any object storage backend (managed by us, or BYO)

What you get:

NFSv4.0 and NFSv4.1
Atomic rename
Real directory semantics
POSIX permissions (mapped to per-mount identities)
Full cache coherency across clients of the same mount
S3-compatible API to the same data when you want it

You don't get a perfect POSIX experience because nobody does on top of object storage. You get the subset that matters for real workloads, with honest semantics about the rest.

Mount object storage as a real filesystem →

Saturday, May 16, 2026·4 min read

POSIX Filesystems on Object Storage: The Good, the Bad, the Fast