Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   seccomp_unotify    ( 2 )

механизм уведомлений в пользовательском пространстве Seccomp (Seccomp user-space notification mechanism)

  Name  |  Synopsis  |  Description  |    Ioctl operations    |  Note  |  Bugs  |  Examples  |  See also  |

IOCTL OPERATIONS

The following ioctl(2) operations are supported by the seccomp user-space notification file descriptor. For each of these operations, the first (file descriptor) argument of ioctl(2) is the listening file descriptor returned by a call to seccomp(2) with the SECCOMP_FILTER_FLAG_NEW_LISTENER flag.

SECCOMP_IOCTL_NOTIF_RECV The SECCOMP_IOCTL_NOTIF_RECV operation (available since Linux 5.0) is used to obtain a user-space notification event. If no such event is currently pending, the operation blocks until an event occurs. The third ioctl(2) argument is a pointer to a structure of the following form which contains information about the event. This structure must be zeroed out before the call.

struct seccomp_notif { __u64 id; /* Cookie */ __u32 pid; /* TID of target thread */ __u32 flags; /* Currently unused (0) */ struct seccomp_data data; /* See seccomp(2) */ };

The fields in this structure are as follows:

id This is a cookie for the notification. Each such cookie is guaranteed to be unique for the corresponding seccomp filter.

• The cookie can be used with the SECCOMP_IOCTL_NOTIF_ID_VALID ioctl(2) operation described below.

• When returning a notification response to the kernel, the supervisor must include the cookie value in the seccomp_notif_resp structure that is specified as the argument of the SECCOMP_IOCTL_NOTIF_SEND operation.

pid This is the thread ID of the target thread that triggered the notification event.

flags This is a bit mask of flags providing further information on the event. In the current implementation, this field is always zero.

data This is a seccomp_data structure containing information about the system call that triggered the notification. This is the same structure that is passed to the seccomp filter. See seccomp(2) for details of this structure.

On success, this operation returns 0; on failure, -1 is returned, and errno is set to indicate the cause of the error. This operation can fail with the following errors:

EINVAL (since Linux 5.5) The seccomp_notif structure that was passed to the call contained nonzero fields.

ENOENT The target thread was killed by a signal as the notification information was being generated, or the target's (blocked) system call was interrupted by a signal handler.

SECCOMP_IOCTL_NOTIF_ID_VALID The SECCOMP_IOCTL_NOTIF_ID_VALID operation (available since Linux 5.0) is used to check that a notification ID returned by an earlier SECCOMP_IOCTL_NOTIF_RECV operation is still valid (i.e., that the target still exists and its system call is still blocked waiting for a response).

The third ioctl(2) argument is a pointer to the cookie (id) returned by the SECCOMP_IOCTL_NOTIF_RECV operation.

This operation is necessary to avoid race conditions that can occur when the pid returned by the SECCOMP_IOCTL_NOTIF_RECV operation terminates, and that process ID is reused by another process. An example of this kind of race is the following

1. A notification is generated on the listening file descriptor. The returned seccomp_notif contains the TID of the target thread (in the pid field of the structure).

2. The target terminates.

3. Another thread or process is created on the system that by chance reuses the TID that was freed when the target terminated.

4. The supervisor open(2)s the /proc/[tid]/mem file for the TID obtained in step 1, with the intention of (say) inspecting the memory location(s) that containing the argument(s) of the system call that triggered the notification in step 1.

In the above scenario, the risk is that the supervisor may try to access the memory of a process other than the target. This race can be avoided by following the call to open(2) with a SECCOMP_IOCTL_NOTIF_ID_VALID operation to verify that the process that generated the notification is still alive. (Note that if the target terminates after the latter step, a subsequent read(2) from the file descriptor may return 0, indicating end of file.)

See NOTES for a discussion of other cases where SECCOMP_IOCTL_NOTIF_ID_VALID checks must be performed.

On success (i.e., the notification ID is still valid), this operation returns 0. On failure (i.e., the notification ID is no longer valid), -1 is returned, and errno is set to ENOENT.

SECCOMP_IOCTL_NOTIF_SEND The SECCOMP_IOCTL_NOTIF_SEND operation (available since Linux 5.0) is used to send a notification response back to the kernel. The third ioctl(2) argument of this structure is a pointer to a structure of the following form:

struct seccomp_notif_resp { __u64 id; /* Cookie value */ __s64 val; /* Success return value */ __s32 error; /* 0 (success) or negative error number */ __u32 flags; /* See below */ };

The fields of this structure are as follows:

id This is the cookie value that was obtained using the SECCOMP_IOCTL_NOTIF_RECV operation. This cookie value allows the kernel to correctly associate this response with the system call that triggered the user-space notification.

val This is the value that will be used for a spoofed success return for the target's system call; see below.

error This is the value that will be used as the error number (errno) for a spoofed error return for the target's system call; see below.

flags This is a bit mask that includes zero or more of the following flags:

SECCOMP_USER_NOTIF_FLAG_CONTINUE (since Linux 5.5) Tell the kernel to execute the target's system call.

Two kinds of response are possible:

• A response to the kernel telling it to execute the target's system call. In this case, the flags field includes SECCOMP_USER_NOTIF_FLAG_CONTINUE and the error and val fields must be zero.

This kind of response can be useful in cases where the supervisor needs to do deeper analysis of the target's system call than is possible from a seccomp filter (e.g., examining the values of pointer arguments), and, having decided that the system call does not require emulation by the supervisor, the supervisor wants the system call to be executed normally in the target.

The SECCOMP_USER_NOTIF_FLAG_CONTINUE flag should be used with caution; see NOTES.

• A spoofed return value for the target's system call. In this case, the kernel does not execute the target's system call, instead causing the system call to return a spoofed value as specified by fields of the seccomp_notif_resp structure. The supervisor should set the fields of this structure as follows:

+ flags does not contain SECCOMP_USER_NOTIF_FLAG_CONTINUE.

+ error is set either to 0 for a spoofed "success" return or to a negative error number for a spoofed "failure" return. In the former case, the kernel causes the target's system call to return the value specified in the val field. In the latter case, the kernel causes the target's system call to return -1, and errno is assigned the negated error value.

+ val is set to a value that will be used as the return value for a spoofed "success" return for the target's system call. The value in this field is ignored if the error field contains a nonzero value.

On success, this operation returns 0; on failure, -1 is returned, and errno is set to indicate the cause of the error. This operation can fail with the following errors:

EINPROGRESS A response to this notification has already been sent.

EINVAL An invalid value was specified in the flags field.

EINVAL The flags field contained SECCOMP_USER_NOTIF_FLAG_CONTINUE, and the error or val field was not zero.

ENOENT The blocked system call in the target has been interrupted by a signal handler or the target has terminated.

SECCOMP_IOCTL_NOTIF_ADDFD The SECCOMP_IOCTL_NOTIF_ADDFD operation (available since Linux 5.9) allows the supervisor to install a file descriptor into the target's file descriptor table. Much like the use of SCM_RIGHTS messages described in unix(7), this operation is semantically equivalent to duplicating a file descriptor from the supervisor's file descriptor table into the target's file descriptor table.

The SECCOMP_IOCTL_NOTIF_ADDFD operation permits the supervisor to emulate a target system call (such as socket(2) or openat(2)) that generates a file descriptor. The supervisor can perform the system call that generates the file descriptor (and associated open file description) and then use this operation to allocate a file descriptor that refers to the same open file description in the target. (For an explanation of open file descriptions, see open(2).)

Once this operation has been performed, the supervisor can close its copy of the file descriptor.

In the target, the received file descriptor is subject to the same Linux Security Module (LSM) checks as are applied to a file descriptor that is received in an SCM_RIGHTS ancillary message. If the file descriptor refers to a socket, it inherits the cgroup version 1 network controller settings (classid and netprioidx) of the target.

The third ioctl(2) argument is a pointer to a structure of the following form:

struct seccomp_notif_addfd { __u64 id; /* Cookie value */ __u32 flags; /* Flags */ __u32 srcfd; /* Local file descriptor number */ __u32 newfd; /* 0 or desired file descriptor number in target */ __u32 newfd_flags; /* Flags to set on target file descriptor */ };

The fields in this structure are as follows:

id This field should be set to the notification ID (cookie value) that was obtained via SECCOMP_IOCTL_NOTIF_RECV.

flags This field is a bit mask of flags that modify the behavior of the operation. Currently, only one flag is supported:

SECCOMP_ADDFD_FLAG_SETFD When allocating the file descriptor in the target, use the file descriptor number specified in the newfd field.

SECCOMP_ADDFD_FLAG_SEND (since Linux 5.14) Perform the equivalent of SECCOMP_IOCTL_NOTIF_ADDFD plus SECCOMP_IOCTL_NOTIF_SEND as an atomic operation. On successful invocation, the target process's errno will be 0 and the return value will be the file descriptor number that was allocated in the target. If allocating the file descriptor in the target fails, the target's system call continues to be blocked until a successful response is sent.

srcfd This field should be set to the number of the file descriptor in the supervisor that is to be duplicated.

newfd This field determines which file descriptor number is allocated in the target. If the SECCOMP_ADDFD_FLAG_SETFD flag is set, then this field specifies which file descriptor number should be allocated. If this file descriptor number is already open in the target, it is atomically closed and reused. If the descriptor duplication fails due to an LSM check, or if srcfd is not a valid file descriptor, the file descriptor newfd will not be closed in the target process.

If the SECCOMP_ADDFD_FLAG_SETFD flag it not set, then this field must be 0, and the kernel allocates the lowest unused file descriptor number in the target.

newfd_flags This field is a bit mask specifying flags that should be set on the file descriptor that is received in the target process. Currently, only the following flag is implemented:

O_CLOEXEC Set the close-on-exec flag on the received file descriptor.

On success, this ioctl(2) call returns the number of the file descriptor that was allocated in the target. Assuming that the emulated system call is one that returns a file descriptor as its function result (e.g., socket(2)), this value can be used as the return value (resp.val) that is supplied in the response that is subsequently sent with the SECCOMP_IOCTL_NOTIF_SEND operation.

On error, -1 is returned and errno is set to indicate the cause of the error.

This operation can fail with the following errors:

EBADF Allocating the file descriptor in the target would cause the target's RLIMIT_NOFILE limit to be exceeded (see getrlimit(2)).

EBUSY If the flag SECCOMP_IOCTL_NOTIF_SEND is used, this means the operation can't proceed until other SECCOMP_IOCTL_NOTIF_ADDFD requests are processed.

EINPROGRESS The user-space notification specified in the id field exists but has not yet been fetched (by a SECCOMP_IOCTL_NOTIF_RECV) or has already been responded to (by a SECCOMP_IOCTL_NOTIF_SEND).

EINVAL An invalid flag was specified in the flags or newfd_flags field, or the newfd field is nonzero and the SECCOMP_ADDFD_FLAG_SETFD flag was not specified in the flags field.

EMFILE The file descriptor number specified in newfd exceeds the limit specified in /proc/sys/fs/nr_open.

ENOENT The blocked system call in the target has been interrupted by a signal handler or the target has terminated.

Here is some sample code (with error handling omitted) that uses the SECCOMP_ADDFD_FLAG_SETFD operation (here, to emulate a call to openat(2)):

int fd, removeFd;

fd = openat(req->data.args[0], path, req->data.args[2], req->data.args[3]);

struct seccomp_notif_addfd addfd; addfd.id = req->id; /* Cookie from SECCOMP_IOCTL_NOTIF_RECV */ addfd.srcfd = fd; addfd.newfd = 0; addfd.flags = 0; addfd.newfd_flags = O_CLOEXEC;

targetFd = ioctl(notifyFd, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);

close(fd); /* No longer needed in supervisor */

struct seccomp_notif_resp *resp; /* Code to allocate 'resp' omitted */ resp->id = req->id; resp->error = 0; /* "Success" */ resp->val = targetFd; resp->flags = 0; ioctl(notifyFd, SECCOMP_IOCTL_NOTIF_SEND, resp);