Each directory below /dev/cpuset represents a cpuset and contains
a fixed set of pseudo-files describing the state of that cpuset.
New cpusets are created using the mkdir(2) system call or the
mkdir(1) command. The properties of a cpuset, such as its flags,
allowed CPUs and memory nodes, and attached processes, are
queried and modified by reading or writing to the appropriate
file in that cpuset's directory, as listed below.
The pseudo-files in each cpuset directory are automatically
created when the cpuset is created, as a result of the mkdir(2)
invocation. It is not possible to directly add or remove these
pseudo-files.
A cpuset directory that contains no child cpuset directories, and
has no attached processes, can be removed using rmdir(2) or
rmdir(1). It is not necessary, or possible, to remove the
pseudo-files inside the directory before removing it.
The pseudo-files in each cpuset directory are small text files
that may be read and written using traditional shell utilities
such as cat(1), and echo(1), or from a program by using file I/O
library functions or system calls, such as open(2), read(2),
write(2), and close(2).
The pseudo-files in a cpuset directory represent internal kernel
state and do not have any persistent image on disk. Each of
these per-cpuset files is listed and described below.
tasks List of the process IDs (PIDs) of the processes in that
cpuset. The list is formatted as a series of ASCII
decimal numbers, each followed by a newline. A process
may be added to a cpuset (automatically removing it from
the cpuset that previously contained it) by writing its
PID to that cpuset's tasks file (with or without a
trailing newline).
Warning:
only one PID may be written to the tasks file at
a time. If a string is written that contains more than
one PID, only the first one will be used.
notify_on_release
Flag (0 or 1). If set (1), that cpuset will receive
special handling after it is released, that is, after all
processes cease using it (i.e., terminate or are moved to
a different cpuset) and all child cpuset directories have
been removed. See the Notify On Release
section, below.
cpuset.cpus
List of the physical numbers of the CPUs on which
processes in that cpuset are allowed to execute. See List
Format
below for a description of the format of cpus.
The CPUs allowed to a cpuset may be changed by writing a
new list to its cpus file.
cpuset.cpu_exclusive
Flag (0 or 1). If set (1), the cpuset has exclusive use
of its CPUs (no sibling or cousin cpuset may overlap
CPUs). By default, this is off (0). Newly created
cpusets also initially default this to off (0).
Two cpusets are sibling cpusets if they share the same
parent cpuset in the /dev/cpuset hierarchy. Two cpusets
are cousin cpusets if neither is the ancestor of the
other. Regardless of the cpu_exclusive setting, if one
cpuset is the ancestor of another, and if both of these
cpusets have nonempty cpus, then their cpus must overlap,
because the cpus of any cpuset are always a subset of the
cpus of its parent cpuset.
cpuset.mems
List of memory nodes on which processes in this cpuset are
allowed to allocate memory. See List Format
below for a
description of the format of mems.
cpuset.mem_exclusive
Flag (0 or 1). If set (1), the cpuset has exclusive use
of its memory nodes (no sibling or cousin may overlap).
Also if set (1), the cpuset is a Hardwall
cpuset (see
below). By default, this is off (0). Newly created
cpusets also initially default this to off (0).
Regardless of the mem_exclusive setting, if one cpuset is
the ancestor of another, then their memory nodes must
overlap, because the memory nodes of any cpuset are always
a subset of the memory nodes of that cpuset's parent
cpuset.
cpuset.mem_hardwall (since Linux 2.6.26)
Flag (0 or 1). If set (1), the cpuset is a Hardwall
cpuset (see below). Unlike mem_exclusive
, there is no
constraint on whether cpusets marked mem_hardwall
may have
overlapping memory nodes with sibling or cousin cpusets.
By default, this is off (0). Newly created cpusets also
initially default this to off (0).
cpuset.memory_migrate (since Linux 2.6.16)
Flag (0 or 1). If set (1), then memory migration is
enabled. By default, this is off (0). See the Memory
Migration
section, below.
cpuset.memory_pressure (since Linux 2.6.16)
A measure of how much memory pressure the processes in
this cpuset are causing. See the Memory Pressure
section,
below. Unless memory_pressure_enabled is enabled, always
has value zero (0). This file is read-only. See the
WARNINGS
section, below.
cpuset.memory_pressure_enabled (since Linux 2.6.16)
Flag (0 or 1). This file is present only in the root
cpuset, normally /dev/cpuset. If set (1), the
memory_pressure calculations are enabled for all cpusets
in the system. By default, this is off (0). See the
Memory Pressure
section, below.
cpuset.memory_spread_page (since Linux 2.6.17)
Flag (0 or 1). If set (1), pages in the kernel page cache
(filesystem buffers) are uniformly spread across the
cpuset. By default, this is off (0) in the top cpuset,
and inherited from the parent cpuset in newly created
cpusets. See the Memory Spread
section, below.
cpuset.memory_spread_slab (since Linux 2.6.17)
Flag (0 or 1). If set (1), the kernel slab caches for
file I/O (directory and inode structures) are uniformly
spread across the cpuset. By default, is off (0) in the
top cpuset, and inherited from the parent cpuset in newly
created cpusets. See the Memory Spread
section, below.
cpuset.sched_load_balance (since Linux 2.6.24)
Flag (0 or 1). If set (1, the default) the kernel will
automatically load balance processes in that cpuset over
the allowed CPUs in that cpuset. If cleared (0) the
kernel will avoid load balancing processes in this cpuset,
unless some other cpuset with overlapping CPUs has its
sched_load_balance flag set. See Scheduler Load
Balancing
, below, for further details.
cpuset.sched_relax_domain_level (since Linux 2.6.26)
Integer, between -1 and a small positive value. The
sched_relax_domain_level controls the width of the range
of CPUs over which the kernel scheduler performs immediate
rebalancing of runnable tasks across CPUs. If
sched_load_balance is disabled, then the setting of
sched_relax_domain_level does not matter, as no such load
balancing is done. If sched_load_balance is enabled, then
the higher the value of the sched_relax_domain_level, the
wider the range of CPUs over which immediate load
balancing is attempted. See Scheduler Relax Domain Level
,
below, for further details.
In addition to the above pseudo-files in each directory below
/dev/cpuset, each process has a pseudo-file, /proc/<pid>/cpuset,
that displays the path of the process's cpuset directory relative
to the root of the cpuset filesystem.
Also the /proc/<pid>/status file for each process has four added
lines, displaying the process's Cpus_allowed (on which CPUs it
may be scheduled) and Mems_allowed (on which memory nodes it may
obtain memory), in the two formats Mask Format
and List Format
(see below) as shown in the following example:
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
Cpus_allowed_list: 0-127
Mems_allowed: ffffffff,ffffffff
Mems_allowed_list: 0-63
The "allowed" fields were added in Linux 2.6.24; the
"allowed_list" fields were added in Linux 2.6.26.