doc/cephfs/posix.rst
CephFS aims to adhere to POSIX semantics wherever possible. For example, in contrast to many other common network file systems like NFS, CephFS maintains strong cache coherency across clients. The goal is for processes communicating via the file system to behave the same when they are on different hosts as when they are on the same host.
However, there are a few places where CephFS diverges from strict POSIX semantics for various reasons:
.snap directory that is used to
access, create, delete, and rename snapshots. Although the virtual
directory is excluded from readdir(2), any process that tries to
create a file or directory with the same name will get an error
code. The name of this hidden directory can be changed at mount
time with -o snapdirname=.somethingelse (Linux) or the config
option client_snapdir (libcephfs, ceph-fuse).atime field. Most applications
do not care, though this impacts some backup and data tiering
applications that can move unused data to a secondary storage system.
You may be able to work around this for some use cases, as CephFS does
support setting atime via the setattr operation.People talk a lot about "POSIX compliance," but in reality most file system implementations do not strictly adhere to the spec, including local Linux file systems like ext4 and XFS. For example, for performance reasons, the atomicity requirements for reads are relaxed: processes reading from a file that is also being written may see torn results.
Similarly, NFS has extremely weak consistency semantics when multiple clients are interacting with the same files or directories, opting instead for "close-to-open". In the world of network attached storage, where most environments use NFS, whether or not the server's file system is "fully POSIX" may not be relevant, and whether client applications notice depends on whether data is being shared between clients or not. NFS may also "tear" the results of concurrent writers as client data may not even be flushed to the server until the file is closed (and more generally writes will be significantly more time-shifted than CephFS, leading to less predictable results).
Regardless, these are all similar enough to POSIX, and applications still work most of the time. Many other storage systems (e.g., HDFS) claim to be "POSIX-like" but diverge significantly from the standard by dropping support for things like in-place file modifications, truncate, or directory renames.
CephFS relaxes more than local Linux kernel file systems (for example, writes spanning object boundaries may be torn). It relaxes strictly less than NFS when it comes to multiclient consistency, and generally less than NFS when it comes to write atomicity.
In other words, when it comes to POSIX, ::
HDFS < NFS < CephFS < {XFS, ext4}
POSIX is somewhat vague about the state of an inode after fsync reports an error. In general, CephFS uses the standard error-reporting mechanisms in the client's kernel, and therefore follows the same conventions as other file systems.
In modern Linux kernels (v4.17 or later), writeback errors are reported once to every file description that is open at the time of the error. In addition, unreported errors that occurred before the file description was opened will also be returned on fsync.
See PostgreSQL's summary of fsync() error reporting across operating systems <https://wiki.postgresql.org/wiki/Fsync_Errors>_ and Matthew Wilcox's presentation on Linux IO error handling <https://www.youtube.com/watch?v=74c19hwY2oE>_ for more information.