Some modern cluster file systems provide perfect cache coherence
among their clients. Perfect cache coherence among disparate NFS
clients is expensive to achieve, especially on wide area
networks. As such, NFS settles for weaker cache coherence that
satisfies the requirements of most file sharing types.
Close-to-open cache consistency
Typically file sharing is completely sequential. First client A
opens a file, writes something to it, then closes it. Then
client B opens the same file, and reads the changes.
When an application opens a file stored on an NFS version 3
server, the NFS client checks that the file exists on the server
and is permitted to the opener by sending a GETATTR or ACCESS
request. The NFS client sends these requests regardless of the
freshness of the file's cached attributes.
When the application closes the file, the NFS client writes back
any pending changes to the file so that the next opener can view
the changes. This also gives the NFS client an opportunity to
report write errors to the application via the return code from
close(2).
The behavior of checking at open time and flushing at close time
is referred to as close-to-open cache consistency, or CTO. It
can be disabled for an entire mount point using the nocto
mount
option.
Weak cache consistency
There are still opportunities for a client's data cache to
contain stale data. The NFS version 3 protocol introduced "weak
cache consistency" (also known as WCC) which provides a way of
efficiently checking a file's attributes before and after a
single request. This allows a client to help identify changes
that could have been made by other clients.
When a client is using many concurrent operations that update the
same file at the same time (for example, during asynchronous
write behind), it is still difficult to tell whether it was that
client's updates or some other client's updates that altered the
file.
Attribute caching
Use the noac
mount option to achieve attribute cache coherence
among multiple clients. Almost every file system operation
checks file attribute information. The client keeps this
information cached for a period of time to reduce network and
server load. When noac
is in effect, a client's file attribute
cache is disabled, so each operation that needs to check a file's
attributes is forced to go back to the server. This permits a
client to see changes to a file very quickly, at the cost of many
extra network operations.
Be careful not to confuse the noac
option with "no data caching."
The noac
mount option prevents the client from caching file
metadata, but there are still races that may result in data cache
incoherence between client and server.
The NFS protocol is not designed to support true cluster file
system cache coherence without some type of application
serialization. If absolute cache coherence among clients is
required, applications should use file locking. Alternatively,
applications can also open their files with the O_DIRECT flag to
disable data caching entirely.
File timestamp maintenance
NFS servers are responsible for managing file and directory
timestamps (atime
, ctime
, and mtime
). When a file is accessed or
updated on an NFS server, the file's timestamps are updated just
like they would be on a filesystem local to an application.
NFS clients cache file attributes, including timestamps. A
file's timestamps are updated on NFS clients when its attributes
are retrieved from the NFS server. Thus there may be some delay
before timestamp updates on an NFS server appear to applications
on NFS clients.
To comply with the POSIX filesystem standard, the Linux NFS
client relies on NFS servers to keep a file's mtime
and ctime
timestamps properly up to date. It does this by flushing local
data changes to the server before reporting mtime
to applications
via system calls such as stat(2).
The Linux client handles atime
updates more loosely, however.
NFS clients maintain good performance by caching data, but that
means that application reads, which normally update atime
, are
not reflected to the server where a file's atime
is actually
maintained.
Because of this caching behavior, the Linux NFS client does not
support generic atime-related mount options. See mount(8) for
details on these options.
In particular, the atime
/noatime
, diratime
/nodiratime
,
relatime
/norelatime
, and strictatime
/nostrictatime
mount options
have no effect on NFS mounts.
/proc/mounts may report that the relatime
mount option is set on
NFS mounts, but in fact the atime
semantics are always as
described here, and are not like relatime
semantics.
Directory entry caching
The Linux NFS client caches the result of all NFS LOOKUP
requests. If the requested directory entry exists on the server,
the result is referred to as a positive lookup result. If the
requested directory entry does not exist on the server (that is,
the server returned ENOENT), the result is referred to as
negative lookup result.
To detect when directory entries have been added or removed on
the server, the Linux NFS client watches a directory's mtime. If
the client detects a change in a directory's mtime, the client
drops all cached LOOKUP results for that directory. Since the
directory's mtime is a cached attribute, it may take some time
before a client notices it has changed. See the descriptions of
the acdirmin
, acdirmax
, and noac
mount options for more
information about how long a directory's mtime is cached.
Caching directory entries improves the performance of
applications that do not share files with applications on other
clients. Using cached information about directories can
interfere with applications that run concurrently on multiple
clients and need to detect the creation or removal of files
quickly, however. The lookupcache
mount option allows some
tuning of directory entry caching behavior.
Before kernel release 2.6.28, the Linux NFS client tracked only
positive lookup results. This permitted applications to detect
new directory entries created by other clients quickly while
still providing some of the performance benefits of caching. If
an application depends on the previous lookup caching behavior of
the Linux NFS client, you can use lookupcache=positive
.
If the client ignores its cache and validates every application
lookup request with the server, that client can immediately
detect when a new directory entry has been either created or
removed by another client. You can specify this behavior using
lookupcache=none
. The extra NFS requests needed if the client
does not cache directory entries can exact a performance penalty.
Disabling lookup caching should result in less of a performance
penalty than using noac
, and has no effect on how the NFS client
caches the attributes of files.
The sync mount option
The NFS client treats the sync
mount option differently than some
other file systems (refer to mount(8) for a description of the
generic sync
and async
mount options). If neither sync
nor async
is specified (or if the async
option is specified), the NFS
client delays sending application writes to the server until any
of these events occur:
Memory pressure forces reclamation of system memory
resources.
An application flushes file data explicitly with sync(2),
msync(2), or fsync
(3).
An application closes a file with close(2).
The file is locked/unlocked via fcntl(2).
In other words, under normal circumstances, data written by an
application may not immediately appear on the server that hosts
the file.
If the sync
option is specified on a mount point, any system call
that writes data to files on that mount point causes that data to
be flushed to the server before the system call returns control
to user space. This provides greater data cache coherence among
clients, but at a significant performance cost.
Applications can use the O_SYNC open flag to force application
writes to individual files to go to the server immediately
without the use of the sync
mount option.
Using file locks with NFS
The Network Lock Manager protocol is a separate sideband protocol
used to manage file locks in NFS version 2 and version 3. To
support lock recovery after a client or server reboot, a second
sideband protocol -- known as the Network Status Manager protocol
-- is also required. In NFS version 4, file locking is supported
directly in the main NFS protocol, and the NLM and NSM sideband
protocols are not used.
In most cases, NLM and NSM services are started automatically,
and no extra configuration is required. Configure all NFS
clients with fully-qualified domain names to ensure that NFS
servers can find clients to notify them of server reboots.
NLM supports advisory file locks only. To lock NFS files, use
fcntl(2) with the F_GETLK and F_SETLK commands. The NFS client
converts file locks obtained via flock(2) to advisory locks.
When mounting servers that do not support the NLM protocol, or
when mounting an NFS server through a firewall that blocks the
NLM service port, specify the nolock
mount option. NLM locking
must be disabled with the nolock
option when using NFS to mount
/var because /var contains files used by the NLM implementation
on Linux.
Specifying the nolock
option may also be advised to improve the
performance of a proprietary application which runs on a single
client and uses file locks extensively.
NFS version 4 caching features
The data and metadata caching behavior of NFS version 4 clients
is similar to that of earlier versions. However, NFS version 4
adds two features that improve cache behavior: change attributes
and file delegation.
The change attribute is a new part of NFS file and directory
metadata which tracks data changes. It replaces the use of a
file's modification and change time stamps as a way for clients
to validate the content of their caches. Change attributes are
independent of the time stamp resolution on either the server or
client, however.
A file delegation is a contract between an NFS version 4 client
and server that allows the client to treat a file temporarily as
if no other client is accessing it. The server promises to
notify the client (via a callback request) if another client
attempts to access that file. Once a file has been delegated to
a client, the client can cache that file's data and metadata
aggressively without contacting the server.
File delegations come in two flavors: read and write. A read
delegation means that the server notifies the client about any
other clients that want to write to the file. A write delegation
means that the client gets notified about either read or write
accessors.
Servers grant file delegations when a file is opened, and can
recall delegations at any time when another client wants access
to the file that conflicts with any delegations already granted.
Delegations on directories are not supported.
In order to support delegation callback, the server checks the
network return path to the client during the client's initial
contact with the server. If contact with the client cannot be
established, the server simply does not grant any delegations to
that client.