`mount_namespaces` ( 7 )

обзор пространств имен монтирования Linux (overview of Linux mount namespaces)
Примечание (Note)

The propagation type assigned to a new mount depends on the
       propagation type of the parent mount.  If the mount has a parent
       (i.e., it is a non-root mount point) and the propagation type of
       the parent is MS_SHARED, then the propagation type of the new
       mount is also MS_SHARED.  Otherwise, the propagation type of the
       new mount is MS_PRIVATE.

       Notwithstanding the fact that the default propagation type for
       new mount is in many cases MS_PRIVATE, MS_SHARED is typically
       more useful.  For this reason, systemd(1) automatically remounts
       all mounts as MS_SHARED on system startup.  Thus, on most modern
       systems, the default propagation type is in practice MS_SHARED.

       Since, when one uses unshare(1) to create a mount namespace, the
       goal is commonly to provide full isolation of the mounts in the
       new namespace, unshare(1) (since util-linux version 2.27) in turn
       reverses the step performed by systemd(1), by making all mounts
       private in the new namespace.  That is, unshare(1) performs the
       equivalent of the following in the new mount namespace:

           mount --make-rprivate /

       To prevent this, one can use the --propagation unchanged option
       to unshare(1).

       An application that creates a new mount namespace directly using
       clone(2) or unshare(2) may desire to prevent propagation of mount
       events to other mount namespaces (as is done by unshare(1)).
       This can be done by changing the propagation type of mounts in
       the new namespace to either MS_SLAVE or MS_PRIVATE, using a call
       such as the following:

           mount(NULL, "/", MS_SLAVE | MS_REC, NULL);

       For a discussion of propagation types when moving mounts
       (MS_MOVE) and creating bind mounts (MS_BIND), see
       Documentation/filesystems/sharedsubtree.txt.

   Restrictions on mount namespaces
       Note the following points with respect to mount namespaces:

       [1] Each mount namespace has an owner user namespace.  As
           explained above, when a new mount namespace is created, its
           mount list is initialized as a copy of the mount list of
           another mount namespace.  If the new namespace and the
           namespace from which the mount list was copied are owned by
           different user namespaces, then the new mount namespace is
           considered less privileged.

       [2] When creating a less privileged mount namespace, shared
           mounts are reduced to slave mounts.  This ensures that
           mappings performed in less privileged mount namespaces will
           not propagate to more privileged mount namespaces.

       [3] Mounts that come as a single unit from a more privileged
           mount namespace are locked together and may not be separated
           in a less privileged mount namespace.  (The unshare(2)
           CLONE_NEWNS operation brings across all of the mounts from
           the original mount namespace as a single unit, and recursive
           mounts that propagate between mount namespaces propagate as a
           single unit.)

           In this context, "may not be separated" means that the mounts
           are locked so that they may not be individually unmounted.
           Consider the following example:

               $ sudo sh
               # mount --bind /dev/null /etc/shadow
               # cat /etc/shadow       # Produces no output

           The above steps, performed in a more privileged mount
           namespace, have created a bind mount that obscures the
           contents of the shadow password file, /etc/shadow.  For
           security reasons, it should not be possible to unmount that
           mount in a less privileged mount namespace, since that would
           reveal the contents of /etc/shadow.

           Suppose we now create a new mount namespace owned by a new
           user namespace.  The new mount namespace will inherit copies
           of all of the mounts from the previous mount namespace.
           However, those mounts will be locked because the new mount
           namespace is less privileged.  Consequently, an attempt to
           unmount the mount fails as show in the following step:

               # unshare --user --map-root-user --mount \
                              strace -o /tmp/log \
                              umount /mnt/dir
               umount: /etc/shadow: not mounted.
               # grep '^umount' /tmp/log
               umount2("/etc/shadow", 0)     = -1 EINVAL (Invalid argument)

           The error message from mount(8) is a little confusing, but
           the strace(1) output reveals that the underlying umount2(2)
           system call failed with the error EINVAL, which is the error
           that the kernel returns to indicate that the mount is locked.

           Note, however, that it is possible to stack (and unstack) a
           mount on top of one of the inherited locked mounts in a less
           privileged mount namespace:

               # echo 'aaaaa' > /tmp/a    # File to mount onto /etc/shadow
               # unshare --user --map-root-user --mount \
                   sh -c 'mount --bind /tmp/a /etc/shadow; cat /etc/shadow'
               aaaaa
               # umount /etc/shadow

           The final umount(8) command above, which is performed in the
           initial mount namespace, makes the original /etc/shadow file
           once more visible in that namespace.

       [4] Following on from point [3], note that it is possible to
           unmount an entire subtree of mounts that propagated as a unit
           into a less privileged mount namespace, as illustrated in the
           following example.

           First, we create new user and mount namespaces using
           unshare(1).  In the new mount namespace, the propagation type
           of all mounts is set to private.  We then create a shared
           bind mount at /mnt, and a small hierarchy of mounts
           underneath that mount.

               $ PS1='ns1# ' sudo unshare --user --map-root-user \
                                      --mount --propagation private bash
               ns1# echo $$        # We need the PID of this shell later
               778501
               ns1# mount --make-shared --bind /mnt /mnt
               ns1# mkdir /mnt/x
               ns1# mount --make-private -t tmpfs none /mnt/x
               ns1# mkdir /mnt/x/y
               ns1# mount --make-private -t tmpfs none /mnt/x/y
               ns1# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               986 83 8:5 /mnt /mnt rw,relatime shared:344
               989 986 0:56 / /mnt/x rw,relatime
               990 989 0:57 / /mnt/x/y rw,relatime

           Continuing in the same shell session, we then create a second
           shell in a new user namespace and a new (less privileged)
           mount namespace and check the state of the propagated mounts
           rooted at /mnt.

               ns1# PS1='ns2# ' unshare --user --map-root-user \
                                      --mount --propagation unchanged bash
               ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime

           Of note in the above output is that the propagation type of
           the mount /mnt has been reduced to slave, as explained in
           point [2].  This means that submount events will propagate
           from the master /mnt in "ns1", but propagation will not occur
           in the opposite direction.

           From a separate terminal window, we then use nsenter(1) to
           enter the mount and user namespaces corresponding to "ns1".
           In that terminal window, we then recursively bind mount
           /mnt/x at the location /mnt/ppp.

               $ PS1='ns3# ' sudo nsenter -t 778501 --user --mount
               ns3# mount --rbind --make-private /mnt/x /mnt/ppp
               ns3# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               986 83 8:5 /mnt /mnt rw,relatime shared:344
               989 986 0:56 / /mnt/x rw,relatime
               990 989 0:57 / /mnt/x/y rw,relatime
               1242 986 0:56 / /mnt/ppp rw,relatime
               1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518

           Because the propagation type of the parent mount, /mnt, was
           shared, the recursive bind mount propagated a small subtree
           of mounts under the slave mount /mnt into "ns2", as can be
           verified by executing the following command in that shell
           session:

               ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime
               1244 1239 0:56 / /mnt/ppp rw,relatime
               1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518

           While it is not possible to unmount a part of the propagated
           subtree (/mnt/ppp/y) in "ns2", it is possible to unmount the
           entire subtree, as shown by the following commands:

               ns2# umount /mnt/ppp/y
               umount: /mnt/ppp/y: not mounted.
               ns2# umount -l /mnt/ppp | sed 's/ - .*//'      # Succeeds...
               ns2# grep /mnt /proc/self/mountinfo
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime

       [5] The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the
           "atime" flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME)
           settings become locked when propagated from a more privileged
           to a less privileged mount namespace, and may not be changed
           in the less privileged mount namespace.

           This point is illustrated in the following example where, in
           a more privileged mount namespace, we create a bind mount
           that is marked as read-only.  For security reasons, it should
           not be possible to make the mount writable in a less
           privileged mount namespace, and indeed the kernel prevents
           this:

               $ sudo mkdir /mnt/dir
               $ sudo mount --bind -o ro /some/path /mnt/dir
               $ sudo unshare --user --map-root-user --mount \
                              mount -o remount,rw /mnt/dir
               mount: /mnt/dir: permission denied.

       [6] A file or directory that is a mount point in one namespace
           that is not a mount point in another namespace, may be
           renamed, unlinked, or removed (rmdir(2)) in the mount
           namespace in which it is not a mount point (subject to the
           usual permission checks).  Consequently, the mount point is
           removed in the mount namespace where it was a mount point.

           Previously (before Linux 3.18), attempting to unlink, rename,
           or remove a file or directory that was a mount point in
           another mount namespace would result in the error EBUSY.
           That behavior had technical problems of enforcement (e.g.,
           for NFS) and permitted denial-of-service attacks against more
           privileged users (i.e., preventing individual files from
           being updated by bind mounting on top of them).
Исходный текст на man7.org
mount_namespaces ( 7 )

Примечание (Note)

`mount_namespaces` ( 7 )