The AF_UNIX
(also known as AF_LOCAL
) socket family is used to
communicate between processes on the same machine efficiently.
Traditionally, UNIX domain sockets can be either unnamed, or
bound to a filesystem pathname (marked as being of type socket).
Linux also supports an abstract namespace which is independent of
the filesystem.
Valid socket types in the UNIX domain are: SOCK_STREAM
, for a
stream-oriented socket; SOCK_DGRAM
, for a datagram-oriented
socket that preserves message boundaries (as on most UNIX
implementations, UNIX domain datagram sockets are always reliable
and don't reorder datagrams); and (since Linux 2.6.4)
SOCK_SEQPACKET
, for a sequenced-packet socket that is connection-
oriented, preserves message boundaries, and delivers messages in
the order that they were sent.
UNIX domain sockets support passing file descriptors or process
credentials to other processes using ancillary data.
Address format
A UNIX domain socket address is represented in the following
structure:
struct sockaddr_un {
sa_family_t sun_family; /* AF_UNIX */
char sun_path[108]; /* Pathname */
};
The sun_family field always contains AF_UNIX
. On Linux, sun_path
is 108 bytes in size; see also NOTES, below.
Various systems calls (for example, bind(2), connect(2), and
sendto(2)) take a sockaddr_un argument as input. Some other
system calls (for example, getsockname(2), getpeername(2),
recvfrom(2), and accept(2)) return an argument of this type.
Three types of address are distinguished in the sockaddr_un
structure:
* pathname: a UNIX domain socket can be bound to a null-
terminated filesystem pathname using bind(2). When the
address of a pathname socket is returned (by one of the system
calls noted above), its length is
offsetof(struct sockaddr_un, sun_path) + strlen(sun_path)
+ 1
and sun_path contains the null-terminated pathname. (On
Linux, the above offsetof
() expression equates to the same
value as sizeof(sa_family_t), but some other implementations
include other fields before sun_path, so the offsetof
()
expression more portably describes the size of the address
structure.)
For further details of pathname sockets, see below.
* unnamed: A stream socket that has not been bound to a pathname
using bind(2) has no name. Likewise, the two sockets created
by socketpair(2) are unnamed. When the address of an unnamed
socket is returned, its length is sizeof(sa_family_t), and
sun_path should not be inspected.
* abstract: an abstract socket address is distinguished (from a
pathname socket) by the fact that sun_path[0] is a null byte
('\0'). The socket's address in this namespace is given by
the additional bytes in sun_path that are covered by the
specified length of the address structure. (Null bytes in the
name have no special significance.) The name has no
connection with filesystem pathnames. When the address of an
abstract socket is returned, the returned addrlen is greater
than sizeof(sa_family_t) (i.e., greater than 2), and the name
of the socket is contained in the first (addrlen -
sizeof(sa_family_t)) bytes of sun_path.
Pathname sockets
When binding a socket to a pathname, a few rules should be
observed for maximum portability and ease of coding:
* The pathname in sun_path should be null-terminated.
* The length of the pathname, including the terminating null
byte, should not exceed the size of sun_path.
* The addrlen argument that describes the enclosing sockaddr_un
structure should have a value of at least:
offsetof(struct sockaddr_un, sun_path)+strlen(addr.sun_path)+1
or, more simply, addrlen can be specified as sizeof(struct
sockaddr_un).
There is some variation in how implementations handle UNIX domain
socket addresses that do not follow the above rules. For
example, some (but not all) implementations append a null
terminator if none is present in the supplied sun_path.
When coding portable applications, keep in mind that some
implementations have sun_path as short as 92 bytes.
Various system calls (accept(2), recvfrom(2), getsockname(2),
getpeername(2)) return socket address structures. When applied
to UNIX domain sockets, the value-result addrlen argument
supplied to the call should be initialized as above. Upon
return, the argument is set to indicate the actual size of the
address structure. The caller should check the value returned in
this argument: if the output value exceeds the input value, then
there is no guarantee that a null terminator is present in
sun_path. (See BUGS.)
Pathname socket ownership and permissions
In the Linux implementation, pathname sockets honor the
permissions of the directory they are in. Creation of a new
socket fails if the process does not have write and search
(execute) permission on the directory in which the socket is
created.
On Linux, connecting to a stream socket object requires write
permission on that socket; sending a datagram to a datagram
socket likewise requires write permission on that socket. POSIX
does not make any statement about the effect of the permissions
on a socket file, and on some systems (e.g., older BSDs), the
socket permissions are ignored. Portable programs should not
rely on this feature for security.
When creating a new socket, the owner and group of the socket
file are set according to the usual rules. The socket file has
all permissions enabled, other than those that are turned off by
the process umask(2).
The owner, group, and permissions of a pathname socket can be
changed (using chown(2) and chmod(2)).
Abstract sockets
Socket permissions have no meaning for abstract sockets: the
process umask(2) has no effect when binding an abstract socket,
and changing the ownership and permissions of the object (via
fchown(2) and fchmod(2)) has no effect on the accessibility of
the socket.
Abstract sockets automatically disappear when all open references
to the socket are closed.
The abstract socket namespace is a nonportable Linux extension.
Socket options
For historical reasons, these socket options are specified with a
SOL_SOCKET
type even though they are AF_UNIX
specific. They can
be set with setsockopt(2) and read with getsockopt(2) by
specifying SOL_SOCKET
as the socket family.
SO_PASSCRED
Enabling this socket option causes receipt of the
credentials of the sending process in an SCM_CREDENTIALS
ancillary
message in each subsequently received message.
The returned credentials are those specified by the sender
using SCM_CREDENTIALS
, or a default that includes the
sender's PID, real user ID, and real group ID, if the
sender did not specify SCM_CREDENTIALS
ancillary data.
When this option is set and the socket is not yet
connected, a unique name in the abstract namespace will be
generated automatically.
The value given as an argument to setsockopt(2) and
returned as the result of getsockopt(2) is an integer
boolean flag.
SO_PASSSEC
Enables receiving of the SELinux security label of the
peer socket in an ancillary message of type SCM_SECURITY
(see below).
The value given as an argument to setsockopt(2) and
returned as the result of getsockopt(2) is an integer
boolean flag.
The SO_PASSSEC
option is supported for UNIX domain
datagram sockets since Linux 2.6.18; support for UNIX
domain stream sockets was added in Linux 4.2.
SO_PEEK_OFF
See socket(7).
SO_PEERCRED
This read-only socket option returns the credentials of
the peer process connected to this socket. The returned
credentials are those that were in effect at the time of
the call to connect(2) or socketpair(2).
The argument to getsockopt(2) is a pointer to a ucred
structure; define the _GNU_SOURCE
feature test macro to
obtain the definition of that structure from
<sys/socket.h>.
The use of this option is possible only for connected
AF_UNIX
stream sockets and for AF_UNIX
stream and datagram
socket pairs created using socketpair(2).
SO_PEERSEC
This read-only socket option returns the security context
of the peer socket connected to this socket. By default,
this will be the same as the security context of the
process that created the peer socket unless overridden by
the policy or by a process with the required permissions.
The argument to getsockopt(2) is a pointer to a buffer of
the specified length in bytes into which the security
context string will be copied. If the buffer length is
less than the length of the security context string, then
getsockopt(2) returns -1, sets errno to ERANGE
, and
returns the required length via optlen. The caller should
allocate at least NAME_MAX
bytes for the buffer initially,
although this is not guaranteed to be sufficient.
Resizing the buffer to the returned length and retrying
may be necessary.
The security context string may include a terminating null
character in the returned length, but is not guaranteed to
do so: a security context "foo" might be represented as
either {'f','o','o'} of length 3 or {'f','o','o','\0'} of
length 4, which are considered to be interchangeable. The
string is printable, does not contain non-terminating null
characters, and is in an unspecified encoding (in
particular, it is not guaranteed to be ASCII or UTF-8).
The use of this option for sockets in the AF_UNIX
address
family is supported since Linux 2.6.2 for connected stream
sockets, and since Linux 4.18 also for stream and datagram
socket pairs created using socketpair(2).
Autobind feature
If a bind(2) call specifies addrlen as sizeof(sa_family_t), or
the SO_PASSCRED
socket option was specified for a socket that was
not explicitly bound to an address, then the socket is autobound
to an abstract address. The address consists of a null byte
followed by 5 bytes in the character set [0-9a-f]. Thus, there
is a limit of 2^20 autobind addresses. (From Linux 2.1.15, when
the autobind feature was added, 8 bytes were used, and the limit
was thus 2^32 autobind addresses. The change to 5 bytes came in
Linux 2.3.15.)
Sockets API
The following paragraphs describe domain-specific details and
unsupported features of the sockets API for UNIX domain sockets
on Linux.
UNIX domain sockets do not support the transmission of out-of-
band data (the MSG_OOB
flag for send(2) and recv(2)).
The send(2) MSG_MORE
flag is not supported by UNIX domain
sockets.
Before Linux 3.4, the use of MSG_TRUNC
in the flags argument of
recv(2) was not supported by UNIX domain sockets.
The SO_SNDBUF
socket option does have an effect for UNIX domain
sockets, but the SO_RCVBUF
option does not. For datagram
sockets, the SO_SNDBUF
value imposes an upper limit on the size
of outgoing datagrams. This limit is calculated as the doubled
(see socket(7)) option value less 32 bytes used for overhead.
Ancillary messages
Ancillary data is sent and received using sendmsg(2) and
recvmsg(2). For historical reasons, the ancillary message types
listed below are specified with a SOL_SOCKET
type even though
they are AF_UNIX
specific. To send them, set the cmsg_level
field of the struct cmsghdr to SOL_SOCKET
and the cmsg_type field
to the type. For more information, see cmsg(3).
SCM_RIGHTS
Send or receive a set of open file descriptors from
another process. The data portion contains an integer
array of the file descriptors.
Commonly, this operation is referred to as "passing a file
descriptor" to another process. However, more accurately,
what is being passed is a reference to an open file
description (see open(2)), and in the receiving process it
is likely that a different file descriptor number will be
used. Semantically, this operation is equivalent to
duplicating (dup(2)) a file descriptor into the file
descriptor table of another process.
If the buffer used to receive the ancillary data
containing file descriptors is too small (or is absent),
then the ancillary data is truncated (or discarded) and
the excess file descriptors are automatically closed in
the receiving process.
If the number of file descriptors received in the
ancillary data would cause the process to exceed its
RLIMIT_NOFILE
resource limit (see getrlimit(2)), the
excess file descriptors are automatically closed in the
receiving process.
The kernel constant SCM_MAX_FD
defines a limit on the
number of file descriptors in the array. Attempting to
send an array larger than this limit causes sendmsg(2) to
fail with the error EINVAL
. SCM_MAX_FD
has the value 253
(or 255 in kernels before 2.6.38).
SCM_CREDENTIALS
Send or receive UNIX credentials. This can be used for
authentication. The credentials are passed as a struct
ucred ancillary message. This structure is defined in
<sys/socket.h> as follows:
struct ucred {
pid_t pid; /* Process ID of the sending process */
uid_t uid; /* User ID of the sending process */
gid_t gid; /* Group ID of the sending process */
};
Since glibc 2.8, the _GNU_SOURCE
feature test macro must
be defined (before including any header files) in order to
obtain the definition of this structure.
The credentials which the sender specifies are checked by
the kernel. A privileged process is allowed to specify
values that do not match its own. The sender must specify
its own process ID (unless it has the capability
CAP_SYS_ADMIN
, in which case the PID of any existing
process may be specified), its real user ID, effective
user ID, or saved set-user-ID (unless it has CAP_SETUID
),
and its real group ID, effective group ID, or saved set-
group-ID (unless it has CAP_SETGID
).
To receive a struct ucred message, the SO_PASSCRED
option
must be enabled on the socket.
SCM_SECURITY
Receive the SELinux security context (the security label)
of the peer socket. The received ancillary data is a
null-terminated string containing the security context.
The receiver should allocate at least NAME_MAX
bytes in
the data portion of the ancillary message for this data.
To receive the security context, the SO_PASSSEC
option
must be enabled on the socket (see above).
When sending ancillary data with sendmsg(2), only one item of
each of the above types may be included in the sent message.
At least one byte of real data should be sent when sending
ancillary data. On Linux, this is required to successfully send
ancillary data over a UNIX domain stream socket. When sending
ancillary data over a UNIX domain datagram socket, it is not
necessary on Linux to send any accompanying real data. However,
portable applications should also include at least one byte of
real data when sending ancillary data over a datagram socket.
When receiving from a stream socket, ancillary data forms a kind
of barrier for the received data. For example, suppose that the
sender transmits as follows:
1. sendmsg(2) of four bytes, with no ancillary data.
2. sendmsg(2) of one byte, with ancillary data.
3. sendmsg(2) of four bytes, with no ancillary data.
Suppose that the receiver now performs recvmsg(2) calls each with
a buffer size of 20 bytes. The first call will receive five
bytes of data, along with the ancillary data sent by the second
sendmsg(2) call. The next call will receive the remaining four
bytes of data.
If the space allocated for receiving incoming ancillary data is
too small then the ancillary data is truncated to the number of
headers that will fit in the supplied buffer (or, in the case of
an SCM_RIGHTS
file descriptor list, the list of file descriptors
may be truncated). If no buffer is provided for incoming
ancillary data (i.e., the msg_control field of the msghdr
structure supplied to recvmsg(2) is NULL), then the incoming
ancillary data is discarded. In both of these cases, the
MSG_CTRUNC
flag will be set in the msg.msg_flags value returned
by recvmsg(2).
Ioctls
The following ioctl(2) calls return information in value. The
correct syntax is:
int
value;
error = ioctl(
unix_socket,
ioctl_type, &
value);
ioctl_type can be:
SIOCINQ
For SOCK_STREAM
sockets, this call returns the number of
unread bytes in the receive buffer. The socket must not
be in LISTEN state, otherwise an error (EINVAL
) is
returned. SIOCINQ
is defined in <linux/sockios.h>.
Alternatively, you can use the synonymous FIONREAD
,
defined in <sys/ioctl.h>. For SOCK_DGRAM
sockets, the
returned value is the same as for Internet domain datagram
sockets; see udp(7).