открыть и, возможно, создать файл (open and possibly create a file)
Примечание (Note)
Under Linux, the O_NONBLOCK
flag is sometimes used in cases where
one wants to open but does not necessarily have the intention to
read or write. For example, this may be used to open a device in
order to get a file descriptor for use with ioctl(2).
The (undefined) effect of O_RDONLY | O_TRUNC
varies among
implementations. On many systems the file is actually truncated.
Note that open
() can open device special files, but creat
()
cannot create them; use mknod(2) instead.
If the file is newly created, its st_atime, st_ctime, st_mtime
fields (respectively, time of last access, time of last status
change, and time of last modification; see stat(2)) are set to
the current time, and so are the st_ctime and st_mtime fields of
the parent directory. Otherwise, if the file is modified because
of the O_TRUNC
flag, its st_ctime and st_mtime fields are set to
the current time.
The files in the /proc/[pid]/fd directory show the open file
descriptors of the process with the PID pid. The files in the
/proc/[pid]/fdinfo directory show even more information about
these file descriptors. See proc(5) for further details of both
of these directories.
The Linux header file <asm/fcntl.h>
doesn't define O_ASYNC
; the
(BSD-derived) FASYNC
synonym is defined instead.
Open file descriptions
The term open file description is the one used by POSIX to refer
to the entries in the system-wide table of open files. In other
contexts, this object is variously also called an "open file
object", a "file handle", an "open file table entry", or—in
kernel-developer parlance—a struct file.
When a file descriptor is duplicated (using dup(2) or similar),
the duplicate refers to the same open file description as the
original file descriptor, and the two file descriptors
consequently share the file offset and file status flags. Such
sharing can also occur between processes: a child process created
via fork(2) inherits duplicates of its parent's file descriptors,
and those duplicates refer to the same open file descriptions.
Each open
() of a file creates a new open file description; thus,
there may be multiple open file descriptions corresponding to a
file inode.
On Linux, one can use the kcmp(2) KCMP_FILE
operation to test
whether two file descriptors (in the same process or in two
different processes) refer to the same open file description.
Synchronized I/O
The POSIX.1-2008 "synchronized I/O" option specifies different
variants of synchronized I/O, and specifies the open
() flags
O_SYNC
, O_DSYNC
, and O_RSYNC
for controlling the behavior.
Regardless of whether an implementation supports this option, it
must at least support the use of O_SYNC
for regular files.
Linux implements O_SYNC
and O_DSYNC
, but not O_RSYNC
. Somewhat
incorrectly, glibc defines O_RSYNC
to have the same value as
O_SYNC
. (O_RSYNC
is defined in the Linux header file
<asm/fcntl.h> on HP PA-RISC, but it is not used.)
O_SYNC
provides synchronized I/O file integrity completion,
meaning write operations will flush data and all associated
metadata to the underlying hardware. O_DSYNC
provides
synchronized I/O data integrity completion, meaning write
operations will flush data to the underlying hardware, but will
only flush metadata updates that are required to allow a
subsequent read operation to complete successfully. Data
integrity completion can reduce the number of disk operations
that are required for applications that don't need the guarantees
of file integrity completion.
To understand the difference between the two types of completion,
consider two pieces of file metadata: the file last modification
timestamp (st_mtime) and the file length. All write operations
will update the last file modification timestamp, but only writes
that add data to the end of the file will change the file length.
The last modification timestamp is not needed to ensure that a
read completes successfully, but the file length is. Thus,
O_DSYNC
would only guarantee to flush updates to the file length
metadata (whereas O_SYNC
would also always flush the last
modification timestamp metadata).
Before Linux 2.6.33, Linux implemented only the O_SYNC
flag for
open
(). However, when that flag was specified, most filesystems
actually provided the equivalent of synchronized I/O data
integrity completion (i.e., O_SYNC
was actually implemented as
the equivalent of O_DSYNC
).
Since Linux 2.6.33, proper O_SYNC
support is provided. However,
to ensure backward binary compatibility, O_DSYNC
was defined with
the same value as the historical O_SYNC
, and O_SYNC
was defined
as a new (two-bit) flag value that includes the O_DSYNC
flag
value. This ensures that applications compiled against new
headers get at least O_DSYNC
semantics on pre-2.6.33 kernels.
C library/kernel differences
Since version 2.26, the glibc wrapper function for open
() employs
the openat
() system call, rather than the kernel's open
() system
call. For certain architectures, this is also true in glibc
versions before 2.26.
NFS
There are many infelicities in the protocol underlying NFS,
affecting amongst others O_SYNC
and O_NDELAY
.
On NFS filesystems with UID mapping enabled, open
() may return a
file descriptor but, for example, read(2) requests are denied
with EACCES
. This is because the client performs open
() by
checking the permissions, but UID mapping is performed by the
server upon read and write requests.
FIFOs
Opening the read or write end of a FIFO blocks until the other
end is also opened (by another process or thread). See fifo(7)
for further details.
File access mode
Unlike the other values that can be specified in flags, the
access mode values O_RDONLY
, O_WRONLY
, and O_RDWR
do not specify
individual bits. Rather, they define the low order two bits of
flags, and are defined respectively as 0, 1, and 2. In other
words, the combination O_RDONLY | O_WRONLY
is a logical error,
and certainly does not have the same meaning as O_RDWR
.
Linux reserves the special, nonstandard access mode 3 (binary 11)
in flags to mean: check for read and write permission on the file
and return a file descriptor that can't be used for reading or
writing. This nonstandard access mode is used by some Linux
drivers to return a file descriptor that is to be used only for
device-specific ioctl(2) operations.
Rationale for openat() and other directory file descriptor APIs
openat
() and the other system calls and library functions that
take a directory file descriptor argument (i.e., execveat(2),
faccessat(2), fanotify_mark(2), fchmodat(2), fchownat(2),
fspick
(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2),
mknodat(2), mount_setattr(2), move_mount
(2),
name_to_handle_at(2), open_tree
(2), openat2(2), readlinkat(2),
renameat(2), renameat2(2), statx(2), symlinkat(2), unlinkat(2),
utimensat(2), mkfifoat(3), and scandirat(3)) address two problems
with the older interfaces that preceded them. Here, the
explanation is in terms of the openat
() call, but the rationale
is analogous for the other interfaces.
First, openat
() allows an application to avoid race conditions
that could occur when using open
() to open files in directories
other than the current working directory. These race conditions
result from the fact that some component of the directory prefix
given to open
() could be changed in parallel with the call to
open
(). Suppose, for example, that we wish to create the file
dir1/dir2/xxx.dep if the file dir1/dir2/xxx exists. The problem
is that between the existence check and the file-creation step,
dir1 or dir2 (which might be symbolic links) could be modified to
point to a different location. Such races can be avoided by
opening a file descriptor for the target directory, and then
specifying that file descriptor as the dirfd argument of (say)
fstatat(2) and openat
(). The use of the dirfd file descriptor
also has other benefits:
* the file descriptor is a stable reference to the directory,
even if the directory is renamed; and
* the open file descriptor prevents the underlying filesystem
from being dismounted, just as when a process has a current
working directory on a filesystem.
Second, openat
() allows the implementation of a per-thread
"current working directory", via file descriptor(s) maintained by
the application. (This functionality can also be obtained by
tricks based on the use of /proc/self/fd/dirfd, but less
efficiently.)
The dirfd argument for these APIs can be obtained by using open
()
or openat
() to open a directory (with either the O_RDONLY
or the
O_PATH
flag). Alternatively, such a file descriptor can be
obtained by applying dirfd(3) to a directory stream created using
opendir(3).
When these APIs are given a dirfd argument of AT_FDCWD
or the
specified pathname is absolute, then they handle their pathname
argument in the same way as the corresponding conventional APIs.
However, in this case, several of the APIs have a flags argument
that provides access to functionality that is not available with
the corresponding conventional APIs.
O_DIRECT
The O_DIRECT
flag may impose alignment restrictions on the length
and address of user-space buffers and the file offset of I/Os.
In Linux alignment restrictions vary by filesystem and kernel
version and might be absent entirely. However there is currently
no filesystem-independent interface for an application to
discover these restrictions for a given file or filesystem. Some
filesystems provide their own interfaces for doing so, for
example the XFS_IOC_DIOINFO
operation in xfsctl(3).
Under Linux 2.4, transfer sizes, the alignment of the user
buffer, and the file offset must all be multiples of the logical
block size of the filesystem. Since Linux 2.6.0, alignment to
the logical block size of the underlying storage (typically 512
bytes) suffices. The logical block size can be determined using
the ioctl(2) BLKSSZGET
operation or from the shell using the
command:
blockdev --getss
O_DIRECT
I/Os should never be run concurrently with the fork(2)
system call, if the memory buffer is a private mapping (i.e., any
mapping created with the mmap(2) MAP_PRIVATE
flag; this includes
memory allocated on the heap and statically allocated buffers).
Any such I/Os, whether submitted via an asynchronous I/O
interface or from another thread in the process, should be
completed before fork(2) is called. Failure to do so can result
in data corruption and undefined behavior in parent and child
processes. This restriction does not apply when the memory
buffer for the O_DIRECT
I/Os was created using shmat(2) or
mmap(2) with the MAP_SHARED
flag. Nor does this restriction
apply when the memory buffer has been advised as MADV_DONTFORK
with madvise(2), ensuring that it will not be available to the
child after fork(2).
The O_DIRECT
flag was introduced in SGI IRIX, where it has
alignment restrictions similar to those of Linux 2.4. IRIX has
also a fcntl(2) call to query appropriate alignments, and sizes.
FreeBSD 4.x introduced a flag of the same name, but without
alignment restrictions.
O_DIRECT
support was added under Linux in kernel version 2.4.10.
Older Linux kernels simply ignore this flag. Some filesystems
may not implement the flag, in which case open
() fails with the
error EINVAL
if it is used.
Applications should avoid mixing O_DIRECT
and normal I/O to the
same file, and especially to overlapping byte regions in the same
file. Even when the filesystem correctly handles the coherency
issues in this situation, overall I/O throughput is likely to be
slower than using either mode alone. Likewise, applications
should avoid mixing mmap(2) of files with direct I/O to the same
files.
The behavior of O_DIRECT
with NFS will differ from local
filesystems. Older kernels, or kernels configured in certain
ways, may not support this combination. The NFS protocol does
not support passing the flag to the server, so O_DIRECT
I/O will
bypass the page cache only on the client; the server may still
cache the I/O. The client asks the server to make the I/O
synchronous to preserve the synchronous semantics of O_DIRECT
.
Some servers will perform poorly under these circumstances,
especially if the I/O size is small. Some servers may also be
configured to lie to clients about the I/O having reached stable
storage; this will avoid the performance penalty at some risk to
data integrity in the event of server power failure. The Linux
NFS client places no alignment restrictions on O_DIRECT
I/O.
In summary, O_DIRECT
is a potentially powerful tool that should
be used with caution. It is recommended that applications treat
use of O_DIRECT
as a performance option which is disabled by
default.