Systemtap may be used as a powerful administrative tool. It can
expose kernel internal data structures and potentially private
user information. (In dyninst
runtime mode, this is not the
case, see the ALTERNATE RUNTIMES section below.)
The translator asserts many safety constraints during compilation
and more during run-time. It aims to ensure that no handler
routine can run for very long, allocate boundless memory, perform
unsafe operations, or in unintentionally interfere with the
system. Uses of script global variables are automatically
read/write locked as appropriate, to protect against manipulation
by concurrent probe handlers. Locks are taken so as to run the
global-variable manipulation portion of probe handlers atomically
(locks are taken all-or-none). Deadlocks are detected with
timeouts. Use the -t
flag to receive reports of excessive lock
contention. Experimenting with scripts is therefore generally
safe
. The guru-mode -g
option allows administrators to bypass
most safety measures, which permits invasive or state-changing
operations, embedded-C code, and increases the risk of upset. By
default, overload prevention is turned on for all modules. If
you would like to disable overload processing, use the
--suppress-time-limits
option.
Errors that are caught at run time normally result in a clean
script shutdown and a pass-5 error message. The
--suppress-handler-errors
option lets scripts tolerate soft
errors without shutting down.
PERMISSIONS
For the normal linux-kernel-module runtime, to run the kernel
objects systemtap builds, a user must be one of the following:
• the root user;
• a member of the stapdev and stapusr groups;
• a member of the stapsys and stapusr groups; or
• a member of the stapusr group.
The root user or a user who is a member of both the stapdev and
stapusr groups can build and run any systemtap script.
A user who is a member of both the stapsys and stapusr groups can
only use pre-built modules under the following conditions:
• The module has been signed by a trusted signer. Trusted
signers are normally systemtap compile-servers which sign
modules when the --privilege option is specified by the
client. See the stap-server(8) manual page for more
information.
• The module was built using the --privilege=stapsys or the
--privilege=stapusr options.
Members of only the stapusr group can only use pre-built modules
under the following conditions:
• The module is located in the /lib/modules/VERSION/systemtap
directory. This directory must be owned by root and not be
world writable.
or
• The module has been signed by a trusted signer. Trusted
signers are normally systemtap compile-servers which sign
modules when the --privilege option is specified by the
client. See the stap-server(8) manual page for more
information.
• The module was built using the --privilege=stapusr option.
The kernel modules generated by stap program are run by the
staprun program. The latter is a part of the Systemtap package,
dedicated to module loading and unloading (but only in the white
zone), and kernel-to-user data transfer. Since staprun does not
perform any additional security checks on the kernel objects it
is given, it would be unwise for a system administrator to add
untrusted users to the stapdev or stapusr groups.
SECUREBOOT
If the current system has SecureBoot turned on in the UEFI
firmware, all kernel modules must be signed. (Some kernels may
allow disabling SecureBoot long after booting with a key sequence
such as SysRq-X, making it unnecessary to sign modules.) The
systemtap compile server can sign modules with a MOK (Machine
Owner Key) that it has in common with a client system. See the
following wiki page for more details:
https://sourceware.org/systemtap/wiki/SecureBoot
Some kernels do not let systemtap guess whether module module
signing is in effect. On such machines, set the SYSTEMTAP_SIGN
environment variable to any value while running stap.
RESOURCE LIMITS
Many resource use limits are set by macros in the generated C
code. These may be overridden with -D
flags. A selection of
these is as follows:
MAXNESTING
Maximum number of nested function calls. Default
determined by script analysis, with a bonus 10 slots added
for recursive scripts.
MAXSTRINGLEN
Maximum length of strings, default 128.
MAXTRYLOCK
Maximum number of iterations to wait for locks on global
variables before declaring possible deadlock and skipping
the probe, default 1000.
MAXACTION
Maximum number of statements to execute during any single
probe hit (with interrupts disabled), default 1000. Note
that for straight-through probe handlers lacking loops or
recursion, due to optimization, this parameter may be
interpreted too conservatively.
MAXACTION_INTERRUPTIBLE
Maximum number of statements to execute during any single
probe hit which is executed with interrupts enabled (such
as begin/end probes), default (MAXACTION * 10).
MAXBACKTRACE
Maximum number of stack frames that will be be processed
by the stap runtime unwinder as produced by the backtrace
functions in the [u]context-unwind.stp tapsets, default
20.
MAXMAPENTRIES
Maximum number of rows in any single global array, default
2048. Individual arrays may be declared with a larger or
smaller limit instead:
global big[10000],little[5]
or denoted with % to make them wrap-around (replace old
entries) automatically, as in
global big%
or both.
MAPHASHBIAS
The number of powers-of-two to add or subtract from the
natural size of the hash table backing each global
associative array. Default is 0. Try small positive
numbers to get extra performance at the cost of more
memory consumption, because that should reduce hash table
collisions. Try small negative numbers for the opposite
tradeoff.
MAXERRORS
Maximum number of soft errors before an exit is triggered,
default 0, which means that the first error will exit the
script. Note that with the --suppress-handler-errors
option, this limit is not enforced.
MAXSKIPPED
Maximum number of skipped probes before an exit is
triggered, default 100. Running systemtap with -t
(timing) mode gives more details about skipped probes.
With the default -DINTERRUPTIBLE=1 setting, probes skipped
due to reentrancy are not accumulated against this limit.
Note that with the --suppress-handler-errors
option, this
limit is not enforced.
MINSTACKSPACE
Minimum number of free kernel stack bytes required in
order to run a probe handler, default 1024. This number
should be large enough for the probe handler's own needs,
plus a safety margin.
MAXUPROBES
Maximum number of concurrently armed user-space probes
(uprobes), default somewhat larger than the number of
user-space probe points named in the script. This pool
needs to be potentially large because individual uprobe
objects (about 64 bytes each) are allocated for each
process for each matching script-level probe.
STP_MAXMEMORY
Maximum amount of memory (in kilobytes) that the systemtap
module should use, default unlimited. The memory size
includes the size of the module itself, plus any
additional allocations. This only tracks direct
allocations by the systemtap runtime. This does not track
indirect allocations (as done by kprobes/uprobes/etc.
internals).
STP_OVERLOAD_THRESHOLD, STP_OVERLOAD_INTERVAL
Maximum number of machine cycles spent in probes on any
cpu per given interval, before an overload condition is
declared and the script shut down. The defaults are 500
million and 1 billion, so as to limit stap script cpu
consumption at around 50%.
STP_PROCFS_BUFSIZE
Size of procfs probe read buffers (in bytes). Defaults to
MAXSTRINGLEN. This value can be overridden on a per-
procfs file basis using the procfs read probe
.maxsize(MAXSIZE) parameter.
With scripts that contain probes on any interrupt path, it is
possible that those interrupts may occur in the middle of another
probe handler. The probe in the interrupt handler would be
skipped in this case to avoid reentrance. To work around this
issue, execute stap with the option -DINTERRUPTIBLE=0
to mask
interrupts throughout the probe handler. This does add some
extra overhead to the probes, but it may prevent reentrance for
common problem cases. However, probes in NMI handlers and in the
callpath of the stap runtime may still be skipped due to
reentrance.
In case something goes wrong with stap or staprun after a probe
has already started running, one may safely kill both user
processes, and remove the active probe kernel module with rmmod.
Any pending trace messages may be lost.