Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   seccomp_unotify    ( 2 )

механизм уведомлений в пользовательском пространстве Seccomp (Seccomp user-space notification mechanism)

  Name  |  Synopsis  |    Description    |  Ioctl operations  |  Note  |  Bugs  |  Examples  |  See also  |

Описание (Description)

This page describes the user-space notification mechanism
       provided by the Secure Computing (seccomp) facility.  As well as
       the use of the SECCOMP_FILTER_FLAG_NEW_LISTENER flag, the
       SECCOMP_RET_USER_NOTIF action value, and the
       SECCOMP_GET_NOTIF_SIZES operation described in seccomp(2), this
       mechanism involves the use of a number of related ioctl(2)
       operations (described below).

Overview In conventional usage of a seccomp filter, the decision about how to treat a system call is made by the filter itself. By contrast, the user-space notification mechanism allows the seccomp filter to delegate the handling of the system call to another user-space process. Note that this mechanism is explicitly not intended as a method implementing security policy; see NOTES.

In the discussion that follows, the thread(s) on which the seccomp filter is installed is (are) referred to as the target, and the process that is notified by the user-space notification mechanism is referred to as the supervisor.

A suitably privileged supervisor can use the user-space notification mechanism to perform actions on behalf of the target. The advantage of the user-space notification mechanism is that the supervisor will usually be able to retrieve information about the target and the performed system call that the seccomp filter itself cannot. (A seccomp filter is limited in the information it can obtain and the actions that it can perform because it is running on a virtual machine inside the kernel.)

An overview of the steps performed by the target and the supervisor is as follows:

1. The target establishes a seccomp filter in the usual manner, but with two differences:

• The seccomp(2) flags argument includes the flag SECCOMP_FILTER_FLAG_NEW_LISTENER. Consequently, the return value of the (successful) seccomp(2) call is a new "listening" file descriptor that can be used to receive notifications. Only one "listening" seccomp filter can be installed for a thread.

• In cases where it is appropriate, the seccomp filter returns the action value SECCOMP_RET_USER_NOTIF. This return value will trigger a notification event.

2. In order that the supervisor can obtain notifications using the listening file descriptor, (a duplicate of) that file descriptor must be passed from the target to the supervisor. One way in which this could be done is by passing the file descriptor over a UNIX domain socket connection between the target and the supervisor (using the SCM_RIGHTS ancillary message type described in unix(7)). Another way to do this is through the use of pidfd_getfd(2).

3. The supervisor will receive notification events on the listening file descriptor. These events are returned as structures of type seccomp_notif. Because this structure and its size may evolve over kernel versions, the supervisor must first determine the size of this structure using the seccomp(2) SECCOMP_GET_NOTIF_SIZES operation, which returns a structure of type seccomp_notif_sizes. The supervisor allocates a buffer of size seccomp_notif_sizes.seccomp_notif bytes to receive notification events. In addition,the supervisor allocates another buffer of size seccomp_notif_sizes.seccomp_notif_resp bytes for the response (a struct seccomp_notif_resp structure) that it will provide to the kernel (and thus the target).

4. The target then performs its workload, which includes system calls that will be controlled by the seccomp filter. Whenever one of these system calls causes the filter to return the SECCOMP_RET_USER_NOTIF action value, the kernel does not (yet) execute the system call; instead, execution of the target is temporarily blocked inside the kernel (in a sleep state that is interruptible by signals) and a notification event is generated on the listening file descriptor.

5. The supervisor can now repeatedly monitor the listening file descriptor for SECCOMP_RET_USER_NOTIF-triggered events. To do this, the supervisor uses the SECCOMP_IOCTL_NOTIF_RECV ioctl(2) operation to read information about a notification event; this operation blocks until an event is available. The operation returns a seccomp_notif structure containing information about the system call that is being attempted by the target. (As described in NOTES, the file descriptor can also be monitored with select(2), poll(2), or epoll(7).)

6. The seccomp_notif structure returned by the SECCOMP_IOCTL_NOTIF_RECV operation includes the same information (a seccomp_data structure) that was passed to the seccomp filter. This information allows the supervisor to discover the system call number and the arguments for the target's system call. In addition, the notification event contains the ID of the thread that triggered the notification and a unique cookie value that is used in subsequent SECCOMP_IOCTL_NOTIF_ID_VALID and SECCOMP_IOCTL_NOTIF_SEND operations.

The information in the notification can be used to discover the values of pointer arguments for the target's system call. (This is something that can't be done from within a seccomp filter.) One way in which the supervisor can do this is to open the corresponding /proc/[tid]/mem file (see proc(5)) and read bytes from the location that corresponds to one of the pointer arguments whose value is supplied in the notification event. (The supervisor must be careful to avoid a race condition that can occur when doing this; see the description of the SECCOMP_IOCTL_NOTIF_ID_VALID ioctl(2) operation below.) In addition, the supervisor can access other system information that is visible in user space but which is not accessible from a seccomp filter.

7. Having obtained information as per the previous step, the supervisor may then choose to perform an action in response to the target's system call (which, as noted above, is not executed when the seccomp filter returns the SECCOMP_RET_USER_NOTIF action value).

One example use case here relates to containers. The target may be located inside a container where it does not have sufficient capabilities to mount a filesystem in the container's mount namespace. However, the supervisor may be a more privileged process that does have sufficient capabilities to perform the mount operation.

8. The supervisor then sends a response to the notification. The information in this response is used by the kernel to construct a return value for the target's system call and provide a value that will be assigned to the errno variable of the target.

The response is sent using the SECCOMP_IOCTL_NOTIF_SEND ioctl(2) operation, which is used to transmit a seccomp_notif_resp structure to the kernel. This structure includes a cookie value that the supervisor obtained in the seccomp_notif structure returned by the SECCOMP_IOCTL_NOTIF_RECV operation. This cookie value allows the kernel to associate the response with the target. This structure must include the cookie value that the supervisor obtained in the seccomp_notif structure returned by the SECCOMP_IOCTL_NOTIF_RECV operation; the cookie allows the kernel to associate the response with the target.

9. Once the notification has been sent, the system call in the target thread unblocks, returning the information that was provided by the supervisor in the notification response.

As a variation on the last two steps, the supervisor can send a response that tells the kernel that it should execute the target thread's system call; see the discussion of SECCOMP_USER_NOTIF_FLAG_CONTINUE, below.