systemtap точки зондирования (systemtap probe points)
PROBE POINT FAMILIES
BEGIN/END/ERROR
The probe points begin and end are defined by the translator to
refer to the time of session startup and shutdown. All "begin"
probe handlers are run, in some sequence, during the startup of
the session. All global variables will have been initialized
prior to this point. All "end" probes are run, in some sequence,
during the normal shutdown of a session, such as in the aftermath
of an exit () function call, or an interruption from the user.
In the case of an error-triggered shutdown, "end" probes are not
run. There are no target variables available in either context.
If the order of execution among "begin" or "end" probes is
significant, then an optional sequence number may be provided:
begin(N)
end(N)
The number N may be positive or negative. The probe handlers are
run in increasing order, and the order between handlers with the
same sequence number is unspecified. When "begin" or "end" are
given without a sequence, they are effectively sequence zero.
The error probe point is similar to the end probe, except that
each such probe handler run when the session ends after errors
have occurred. In such cases, "end" probes are skipped, but each
"error" probe is still attempted. This kind of probe can be used
to clean up or emit a "final gasp". It may also be numerically
parametrized to set a sequence.
NEVER
The probe point never is specially defined by the translator to
mean "never". Its probe handler is never run, though its
statements are analyzed for symbol / type correctness as usual.
This probe point may be useful in conjunction with optional
probes.
SYSCALL and ND_SYSCALL
The syscall.* and nd_syscall.* aliases define several hundred
probes, too many to detail here. They are of the general form:
syscall.NAME
nd_syscall.NAME
syscall.NAME.return
nd_syscall.NAME.return
Generally, a pair of probes are defined for each normal system
call as listed in the syscalls(2) manual page, one for entry and
one for return. Those system calls that never return do not have
a corresponding .return probe. The nd_* family of probes are
about the same, except it uses non-DWARF
based searching
mechanisms, which may result in a lower quality of symbolic
context data (parameters), and may miss some system calls. You
may want to try them first, in case kernel debugging information
is not immediately available.
Each probe alias provides a variety of variables. Looking at the
tapset source code is the most reliable way. Generally, each
variable listed in the standard manual page is made available as
a script-level variable, so syscall.open exposes filename, flags,
and mode. In addition, a standard suite of variables is
available at most aliases:
argstr A pretty-printed form of the entire argument list, without
parentheses.
name The name of the system call.
retval For return probes, the raw numeric system-call result.
retstr For return probes, a pretty-printed string form of the
system-call result.
As usual for probe aliases, these variables are all initialized
once from the underlying $context variables, so that later
changes to $context variables are not automatically reflected.
Not all probe aliases obey all of these general guidelines.
Please report any bothersome ones you encounter as a bug. Note
that on some kernel/userspace architecture combinations (e.g.,
32-bit userspace on 64-bit kernel), the underlying $context
variables may need explicit sign extension / masking. When this
is an issue, consider using the tapset-provided variables instead
of raw $context variables.
If debuginfo availability is a problem, you may try using the
non-DWARF syscall probe aliases instead. Use the nd_syscall.
prefix instead of syscall. The same context variables are
available, as far as possible.
nd_syscall probes on kernels that use syscall wrappers to pass
arguments via pt_regs (currently 4.17+ on x86_64 and 4.19+ on
aarch64) support syscall argument writing when guru mode is
enabled. If a probe syscall parameter is modified in the probe
body then immediately before the probe exits the parameter's
current value will be written to pt_regs. This overwrites the
previous value. nd_syscall probes also include two parameters
for each of the syscall's string parameters. One holds a quoted
version of the string passed to the syscall. The other holds an
unquoted version of the string intended to be used when modifying
the parameter. If the probe modifies the unquoted string
variable then as the probe is about to exit the contents of this
variable will be written to the user space buffer passed to the
syscall. It is the user's responsibility to ensure that this
buffer is large enough to hold the modified string and that it is
located in a writable memory segment.
TIMERS
There are two main types of timer probes: "jiffies" timer probes
and time interval timer probes.
Intervals defined by the standard kernel "jiffies" timer may be
used to trigger probe handlers asynchronously. Two probe point
variants are supported by the translator:
timer.jiffies(N)
timer.jiffies(N).randomize(M)
The probe handler is run every N jiffies (a kernel-defined unit
of time, typically between 1 and 60 ms). If the "randomize"
component is given, a linearly distributed random value in the
range [-M..+M] is added to N every time the handler is run. N is
restricted to a reasonable range (1 to around a million), and M
is restricted to be smaller than N. There are no target
variables provided in either context. It is possible for such
probes to be run concurrently on a multi-processor computer.
Alternatively, intervals may be specified in units of time.
There are two probe point variants similar to the jiffies timer:
timer.ms(N)
timer.ms(N).randomize(M)
Here, N and M are specified in milliseconds, but the full options
for units are seconds (s/sec), milliseconds (ms/msec),
microseconds (us/usec), nanoseconds (ns/nsec), and hertz (hz).
Randomization is not supported for hertz timers.
The actual resolution of the timers depends on the target kernel.
For kernels prior to 2.6.17, timers are limited to jiffies
resolution, so intervals are rounded up to the nearest jiffies
interval. After 2.6.17, the implementation uses hrtimers for
tighter precision, though the actual resolution will be arch-
dependent. In either case, if the "randomize" component is
given, then the random value will be added to the interval before
any rounding occurs.
Profiling timers are also available to provide probes that
execute on all CPUs at the rate of the system tick (CONFIG_HZ) or
at a given frequency (hz). On some kernels, this is a one-
concurrent-user-only or disabled facility, resulting in error -16
(EBUSY) during probe registration.
timer.profile.tick
timer.profile.freq.hz(N)
Full context information of the interrupted process is available,
making this probe suitable for a time-based sampling profiler.
It is recommended to use the tapset probe timer.profile rather
than timer.profile.tick. This probe point behaves identically to
timer.profile.tick when the underlying functionality is
available, and falls back to using perf.sw.cpu_clock on some
recent kernels which lack the corresponding profile timer
facility.
Profiling timers with specified frequencies are only accurate up
to around 100 hz. You may need to provide a larger value to
achieve the desired rate.
Note that if a timer probe is set to fire at a very high rate and
if the probe body is complex, succeeding timer probes can get
skipped, since the time for them to run has already passed.
Normally systemtap reports missed probes, but it will not report
these skipped probes.
DWARF
This family of probe points uses symbolic debugging information
for the target kernel/module/program, as may be found in
unstripped executables, or the separate debuginfo packages. They
allow placement of probes logically into the execution path of
the target program, by specifying a set of points in the source
or object code. When a matching statement executes on any
processor, the probe handler is run in that context.
Probe points in the DWARF family can be identified by the target
kernel module (or user process), source file, line number,
function name, or some combination of these.
Here is a list of DWARF probe points currently supported:
kernel.function(PATTERN)
kernel.function(PATTERN).call
kernel.function(PATTERN).callee(PATTERN)
kernel.function(PATTERN).callee(PATTERN).return
kernel.function(PATTERN).callee(PATTERN).call
kernel.function(PATTERN).callees(DEPTH)
kernel.function(PATTERN).return
kernel.function(PATTERN).inline
kernel.function(PATTERN).label(LPATTERN)
module(MPATTERN).function(PATTERN)
module(MPATTERN).function(PATTERN).call
module(MPATTERN).function(PATTERN).callee(PATTERN)
module(MPATTERN).function(PATTERN).callee(PATTERN).return
module(MPATTERN).function(PATTERN).callee(PATTERN).call
module(MPATTERN).function(PATTERN).callees(DEPTH)
module(MPATTERN).function(PATTERN).return
module(MPATTERN).function(PATTERN).inline
module(MPATTERN).function(PATTERN).label(LPATTERN)
kernel.statement(PATTERN)
kernel.statement(PATTERN).nearest
kernel.statement(ADDRESS).absolute
module(MPATTERN).statement(PATTERN)
process("PATH").function("NAME")
process("PATH").statement("*@FILE.c:123")
process("PATH").library("PATH").function("NAME")
process("PATH").library("PATH").statement("*@FILE.c:123")
process("PATH").library("PATH").statement("*@FILE.c:123").nearest
process("PATH").function("*").return
process("PATH").function("myfun").label("foo")
process("PATH").function("foo").callee("bar")
process("PATH").function("foo").callee("bar").return
process("PATH").function("foo").callee("bar").call
process("PATH").function("foo").callees(DEPTH)
process(PID).function("NAME")
process(PID).function("myfun").label("foo")
process(PID).plt("NAME")
process(PID).plt("NAME").return
process(PID).statement("*@FILE.c:123")
process(PID).statement("*@FILE.c:123").nearest
process(PID).statement(ADDRESS).absolute
(See the USER-SPACE section below for more information on the
process probes.)
The list above includes multiple variants and modifiers which
provide additional functionality or filters. They are:
.function
Places a probe near the beginning of the named
function, so that parameters are available as
context variables.
.return
Places a probe at the moment after
the return from
the named function, so the return value is
available as the "$return" context variable.
.inline
Filters the results to include only instances of
inlined functions. Note that inlined functions do
not have an identifiable return point, so .return
is not supported on .inline
probes.
.call
Filters the results to include only non-inlined
functions (the opposite set of .inline
)
.exported
Filters the results to include only exported
functions.
.statement
Places a probe at the exact spot, exposing those
local variables that are visible there.
.statement.nearest
Places a probe at the nearest available line number
for each line number given in the statement.
.callee
Places a probe on the callee function given in the
.callee
modifier, where the callee must be a
function called by the target function given in
.function
. The advantage of doing this over
directly probing the callee function is that this
probe point is run only when the callee is called
from the target function (add the
-DSTAP_CALLEE_MATCHALL directive to override this
when calling stap(1)).
Note that only callees that can be statically
determined are available. For example, calls
through function pointers are not available.
Additionally, calls to functions located in other
objects (e.g. libraries) are not available
(instead use another probe point). This feature
will only work for code compiled with GCC 4.7+.
.callees
Shortcut for .callee("*")
, which places a probe on
all callees of the function.
.callees
(DEPTH)
Recursively places probes on callees. For example,
.callees(2)
will probe both callees of the target
function, as well as callees of those callees. And
.callees(3)
goes one level deeper, etc... A callee
probe at depth N is only triggered when the N
callers in the callstack match those that were
statically determined during analysis (this also
may be overridden using -DSTAP_CALLEE_MATCHALL).
In the above list of probe points, MPATTERN stands for a string
literal that aims to identify the loaded kernel module of
interest. For in-tree kernel modules, the name suffices (e.g.
"btrfs"). The name may also include the "*", "[]", and "?"
wildcards to match multiple in-tree modules. Out-of-tree modules
are also supported by specifying the full path to the ko file.
Wildcards are not supported. The file must follow the convention
of being named <module_name>.ko (characters ',' and '-' are
replaced by '_').
LPATTERN stands for a source program label. It may also contain
"*", "[]", and "?" wildcards. PATTERN stands for a string literal
that aims to identify a point in the program. It is made up of
three parts:
• The first part is the name of a function, as would appear in
the nm program's output. This part may use the "*" and "?"
wildcarding operators to match multiple names.
• The second part is optional and begins with the "@"
character. It is followed by the path to the source file
containing the function, which may include a wildcard
pattern, such as mm/slab*. If it does not match as is, an
implicit "*/" is optionally added before the pattern, so that
a script need only name the last few components of a possibly
long source directory path.
• Finally, the third part is optional if the file name part was
given, and identifies the line number in the source file
preceded by a ":" or a "+". The line number is assumed to be
an absolute line number if preceded by a ":", or relative to
the declaration line of the function if preceded by a "+".
All the lines in the function can be matched with ":*". A
range of lines x through y can be matched with ":x-y". Ranges
and specific lines can be mixed using commas, e.g. ":x,y-z".
As an alternative, PATTERN may be a numeric constant, indicating
an address. Such an address may be found from symbol tables of
the appropriate kernel / module object file. It is verified
against known statement code boundaries, and will be relocated
for use at run time.
In guru mode only, absolute kernel-space addresses may be
specified with the ".absolute" suffix. Such an address is
considered already relocated, as if it came from /proc/kallsyms
,
so it cannot be checked against statement/instruction boundaries.
CONTEXT VARIABLES
Many of the source-level context variables, such as function
parameters, locals, globals visible in the compilation unit, may
be visible to probe handlers. They may refer to these variables
by prefixing their name with "$" within the scripts. In
addition, a special syntax allows limited traversal of
structures, pointers, and arrays. More syntax allows pretty-
printing of individual variables or their groups. See also
@cast
. Note that variables may be inaccessible due to them being
paged out, or for a few other reasons. See also man
error::fault(7stap).
Functions called from DWARF class probe points and from
process.mark probes may also refer to context variables.
$var refers to an in-scope variable or thread local storage
variable "var". If it's an integer-like type, it will be
cast to a 64-bit int for systemtap script use. String-
like pointers (char *) may be copied to systemtap string
values using the kernel_string or user_string functions.
@var("varname")
an alternative syntax for $varname
@var("varname","module")
The global variable or global thread local storage
variable in scope of the given module already loaded into
the current probed process. Useful to get an exported
variable in a shared library loaded into the process being
probed, or a global variable in a process while a shared
library probe is being executed. For user-space modules
only. For example: @var("_r_debug","/lib/ld-linux.so.2")
@var("varname@src/file.c")
refers to the global (either file local or external)
variable varname defined when the file src/file.c was
compiled. The CU in which the variable is resolved is the
first CU in the module of the probe point which matches
the given file name at the end and has the shortest file
name path (e.g. given @var("foo@bar/baz.c") and CUs with
file name paths src/sub/module/bar/baz.c and src/bar/baz.c
the second CU will be chosen to resolve the (file) global
variable foo
@var("varname@src/file.c","module")
The global variable in scope of the given CU, defined in
the given module, even if the variable is static (so the
name is not unique without the CU name).
$var->field traversal via a structure's or a pointer's field.
This
generalized indirection operator may be repeated to follow
more levels. Note that the . operator is not used for
plain structure members, only -> for both purposes. (This
is because "." is reserved for string concatenation.) Also
note that for direct dereferencing of $var pointer
{kernel,user}_{char,int,...}($var) should be used. (Refer
to stapfuncs(5) for more details.)
$return
is available in return probes only for functions that are
declared with a return value, which can be determined
using @defined($return).
$var[N]
indexes into an array. The index given with a literal
number or even an arbitrary numeric expression.
A number of operators exist for such basic context variable
expressions:
$$vars expands to a character string that is equivalent to
sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
parm1, ..., parmN, var1, ..., varN)
for each variable in scope at the probe point. Some
values may be printed as =? if their run-time location
cannot be found.
$$locals
expands to a subset of $$vars for only local variables.
$$parms
expands to a subset of $$vars for only function
parameters.
$$return
is available in return probes only. It expands to a
string that is equivalent to sprintf("return=%x", $return)
if the probed function has a return value, or else an
empty string.
& $EXPR
expands to the address of the given context variable
expression, if it is addressable.
@defined($EXPR)
expands to 1 or 0 iff the given context variable
expression is resolvable, for use in conditionals such as
@defined($foo->bar) ? $foo->bar : 0
@probewrite($VAR)
see the PROBES section of stap(1).
$EXPR$ expands to a string with all of $EXPR's members,
equivalent to
sprintf("{.a=%i, .b=%u, .c={...}, .d=[...]}",
$EXPR->a, $EXPR->b)
$EXPR$$
expands to a string with all of $var's members and
submembers, equivalent to
sprintf("{.a=%i, .b=%u, .c={.x=%p, .y=%c}, .d=[%i, ...]}",
$EXPR->a, $EXPR->b, $EXPR->c->x, $EXPR->c->y, $EXPR->d[0])
@errno expands to the last value the C library global variable
errno was set to.
MORE ON RETURN PROBES
For the kernel ".return" probes, only a certain fixed number of
returns may be outstanding. The default is a relatively small
number, on the order of a few times the number of physical CPUs.
If many different threads concurrently call the same blocking
function, such as futex(2) or read(2), this limit could be
exceeded, and skipped "kretprobes" would be reported by "stap
-t". To work around this, specify a
probe FOO.return.maxactive(NNN)
suffix, with a large enough NNN to cover all expected
concurrently blocked threads. Alternately, use the
stap -DKRETACTIVE=NNNN
stap command line macro setting to override the default for all
".return" probes.
For ".return" probes, context variables other than the "$return"
may be accessible, as a convenience for a script programmer
wishing to access function parameters. These values are
snapshots
taken at the time of function entry. (Local variables
within the function are not
generally accessible, since those
variables did not exist in allocated/initialized form at the
snapshot moment.) These entry-snapshot variables should be
accessed via @entry($var).
In addition, arbitrary entry-time expressions can also be saved
for ".return" probes using the @entry(expr) operator. For
example, one can compute the elapsed time of a function:
probe kernel.function("do_filp_open").return {
println( get_timeofday_us() - @entry(get_timeofday_us()) )
}
The following table summarizes how values related to a function
parameter context variable, a pointer named addr
, may be accessed
from a .return probe.
at-entry value past-exit value
$addr not available
$addr->x->y @cast(@entry($addr),"struct zz")->x->y
$addr[0] {kernel,user}_{char,int,...}(& $addr[0])
DWARFLESS
In absence of debugging information, entry & exit points of
kernel & module functions can be probed using the "kprobe" family
of probes. However, these do not permit looking up the arguments
/ local variables of the function. Following constructs are
supported :
kprobe.function(FUNCTION)
kprobe.function(FUNCTION).call
kprobe.function(FUNCTION).return
kprobe.module(NAME).function(FUNCTION)
kprobe.module(NAME).function(FUNCTION).call
kprobe.module(NAME).function(FUNCTION).return
kprobe.statement(ADDRESS).absolute
Probes of type function
are recommended for kernel functions,
whereas probes of type module
are recommended for probing
functions of the specified module. In case the absolute address
of a kernel or module function is known, statement
probes can be
utilized.
Note that FUNCTION and MODULE names must not
contain wildcards,
or the probe will not be registered. Also, statement probes must
be run under guru-mode only.
USER-SPACE
Support for user-space probing is available for kernels that are
configured with the utrace extensions, or have the uprobes
facility in linux 3.5. (Various kernel build configuration
options need to be enabled; systemtap will advise if these are
missing.)
There are several forms. First, a non-symbolic probe point:
process(PID).statement(ADDRESS).absolute
is analogous to kernel.statement(ADDRESS).absolute in that both
use raw (unverified) virtual addresses and provide no $variables.
The target PID parameter must identify a running process, and
ADDRESS should identify a valid instruction address. All threads
of that process will be probed.
Second, non-symbolic user-kernel interface events handled by
utrace may be probed:
process(PID).begin
process("FULLPATH").begin
process.begin
process(PID).thread.begin
process("FULLPATH").thread.begin
process.thread.begin
process(PID).end
process("FULLPATH").end
process.end
process(PID).thread.end
process("FULLPATH").thread.end
process.thread.end
process(PID).syscall
process("FULLPATH").syscall
process.syscall
process(PID).syscall.return
process("FULLPATH").syscall.return
process.syscall.return
process(PID).insn
process("FULLPATH").insn
process(PID).insn.block
process("FULLPATH").insn.block
A process.begin
probe gets called when new process described by
PID or FULLPATH gets created. In addition, it is called once
from the context of each preexisting process, at systemtap script
startup. This is useful to track live processes. A
process.thread.begin
probe gets called when a new thread
described by PID or FULLPATH gets created. A process.end
probe
gets called when process described by PID or FULLPATH dies. A
process.thread.end
probe gets called when a thread described by
PID or FULLPATH dies. A process.syscall
probe gets called when a
thread described by PID or FULLPATH makes a system call. The
system call number is available in the $syscall
context variable,
and the first 6 arguments of the system call are available in the
$argN
(ex. $arg1, $arg2, ...) context variable. A
process.syscall.return
probe gets called when a thread described
by PID or FULLPATH returns from a system call. The system call
number is available in the $syscall
context variable, and the
return value of the system call is available in the $return
context variable. A process.insn
probe gets called for every
single-stepped instruction of the process described by PID or
FULLPATH. A process.insn.block
probe gets called for every
block-stepped instruction of the process described by PID or
FULLPATH.
If a process probe is specified without a PID or FULLPATH, all
user threads will be probed. However, if systemtap was invoked
with the -c or -x options, then process probes are restricted to
the process hierarchy associated with the target process. If a
process probe is unspecified (i.e. without a PID or FULLPATH),
but with the -c option, the PATH of the -c cmd will be
heuristically filled into the process PATH. In that case, only
command parameters are allowed in the -c command (i.e. no command
substitution allowed and no occurrences of any of these
characters: '|&;<>(){}').
Third, symbolic static instrumentation compiled into programs and
shared libraries may be probed:
process("PATH").mark("LABEL")
process("PATH").provider("PROVIDER").mark("LABEL")
process(PID).mark("LABEL")
process(PID).provider("PROVIDER").mark("LABEL")
A .mark
probe gets called via a static probe which is defined in
the application by STAP_PROBE1(PROVIDER,LABEL,arg1), which are
macros defined in sys/sdt.h
. The PROVIDER is an arbitrary
application identifier, LABEL is the marker site identifier, and
arg1 is the integer-typed argument. STAP_PROBE1 is used for
probes with 1 argument, STAP_PROBE2 is used for probes with 2
arguments, and so on. The arguments of the probe are available
in the context variables $arg1, $arg2, ... An alternative to
using the STAP_PROBE macros is to use the dtrace script to create
custom macros. Additionally, the variables $$name and $$provider
are available as parts of the probe point name. The sys/sdt.h
macro names DTRACE_PROBE* are available as aliases for
STAP_PROBE*.
Finally, full symbolic source-level probes in user-space programs
and shared libraries are supported. These are exactly analogous
to the symbolic DWARF-based kernel/module probes described above.
They expose the same sorts of context $variables for function
parameters, local variables, and so on.
process("PATH").function("NAME")
process("PATH").statement("*@FILE.c:123")
process("PATH").plt("NAME")
process("PATH").library("PATH").plt("NAME")
process("PATH").library("PATH").function("NAME")
process("PATH").library("PATH").statement("*@FILE.c:123")
process("PATH").function("*").return
process("PATH").function("myfun").label("foo")
process("PATH").function("foo").callee("bar")
process("PATH").plt("NAME").return
process(PID).function("NAME")
process(PID).statement("*@FILE.c:123")
process(PID).plt("NAME")
Note that for all process probes, PATH names refer to executables
that are searched the same way shells do: relative to the working
directory if they contain a "/" character, otherwise in $PATH
.
If PATH names refer to scripts, the actual interpreters
(specified in the script in the first line after the #!
characters) are probed.
Tapset process probes placed in the special directory
$prefix/share/systemtap/tapset/PATH/ with relative paths will
have their process parameter prefixed with the location of the
tapset. For example,
process("foo").function("NAME")
expands to
process("/usr/bin/foo").function("NAME")
when placed in $prefix/share/systemtap/tapset/PATH/usr/bin/
If PATH is a process component parameter referring to shared
libraries then all processes that map it at runtime would be
selected for probing. If PATH is a library component parameter
referring to shared libraries then the process specified by the
process component would be selected. Note that the PATH pattern
in a library component will always apply to libraries statically
determined to be in use by the process. However, you may also
specify the full path to any library file even if not statically
needed by the process.
A .plt probe will probe functions in the program linkage table
corresponding to the rest of the probe point. .plt can be
specified as a shorthand for .plt("*"). The symbol name is
available as a $$name context variable; function arguments are
not available, since PLTs are processed without debuginfo. A
.plt.return probe places a probe at the moment after
the return
from the named function.
If the PATH string contains wildcards as in the MPATTERN case,
then standard globbing is performed to find all matching paths.
In this case, the $PATH
environment variable is not used.
If systemtap was invoked with the -c or -x options, then process
probes are restricted to the process hierarchy associated with
the target process.
JAVA
Support for probing Java methods is available using Byteman as a
backend. Byteman is an instrumentation tool from the JBoss
project which systemtap can use to monitor invocations for a
specific method or line in a Java program.
Systemtap does so by generating a Byteman script listing the
probes to instrument and then invoking the Byteman bminstall
utility.
This Java instrumentation support is currently a prototype
feature with major limitations. Moreover, Java probing currently
does not work across users; the stap script must run (with
appropriate permissions) under the same user that the Java
process being probed. (Thus a stap script under root currently
cannot probe Java methods in a non-root-user Java process.)
The first probe type refers to Java processes by the name of the
Java process:
java("PNAME").class("CLASSNAME").method("PATTERN")
java("PNAME").class("CLASSNAME").method("PATTERN").return
The PNAME argument must be a pre-existing jvm pid, and be
identifiable via a jps listing.
The PATTERN parameter specifies the signature of the Java method
to probe. The signature must consist of the exact name of the
method, followed by a bracketed list of the types of the
arguments, for instance "myMethod(int,double,Foo)". Wildcards are
not supported.
The probe can be set to trigger at a specific line within the
method by appending a line number with colon, just as in other
types of probes: "myMethod(int,double,Foo):245".
The CLASSNAME parameter identifies the Java class the method
belongs to, either with or without the package qualification. By
default, the probe only triggers on descendants of the class that
do not override the method definition of the original class.
However, CLASSNAME can take an optional caret prefix, as in
^org.my.MyClass, which specifies that the probe should also
trigger on all descendants of MyClass that override the original
method. For instance, every method with signature foo(int) in
program org.my.MyApp can be probed at once using
java("org.my.MyApp").class("^java.lang.Object").method("foo(int)")
The second probe type works analogously, but refers to Java
processes by PID:
java(PID).class("CLASSNAME").method("PATTERN")
java(PID).class("CLASSNAME").method("PATTERN").return
(PIDs for an already running process can be obtained using the
jps
(1) utility.)
Context variables defined within java probes include $arg1
through $arg10 (for up to the first 10 arguments of a method),
represented as character-pointers for the toString()
form of each
actual argument. The arg1 through arg10 script variables provide
access to these as ordinary strings, fetched via
user_string_warn().
Prior to systemtap version 3.1, $arg1 through $arg10 could
contain either integers or character pointers, depending on the
types of the objects being passed to each particular java method.
This previous behaviour may be invoked with the stap
--compatible=3.0 flag.
PROCFS
These probe points allow procfs "files" in
/proc/systemtap/MODNAME to be created, read and written using a
permission that may be modified using the proper umask value.
Default permissions are 0400 for read probes, and 0200 for write
probes. If both a read and write probe are being used on the same
file, a default permission of 0600 will be used. Using
procfs.umask(0040).read would result in a 0404 permission set for
the file. (MODNAME is the name of the systemtap module). The
proc filesystem is a pseudo-filesystem which is used as an
interface to kernel data structures. There are several probe
point variants supported by the translator:
procfs("PATH").read
procfs("PATH").umask(UMASK).read
procfs("PATH").read.maxsize(MAXSIZE)
procfs("PATH").umask(UMASK).maxsize(MAXSIZE)
procfs("PATH").write
procfs("PATH").umask(UMASK).write
procfs.read
procfs.umask(UMASK).read
procfs.read.maxsize(MAXSIZE)
procfs.umask(UMASK).read.maxsize(MAXSIZE)
procfs.write
procfs.umask(UMASK).write
Note that there are a few differences when procfs probes are used
in the stapbpf runtime. FIFO special files are used instead of
proc filesystem files. These files are created in
/var/tmp/systemtap-USER/MODNAME. (USER is the name of the user).
Additionally, users cannot create both read and write probes on
the same file.
PATH is the file name (relative to /proc/systemtap/MODNAME or
/var/tmp/systemtap-USER/MODNAME) to be created. If no PATH is
specified (as in the last two variants above), PATH defaults to
"command". The file name "__stdin" is used internally by
systemtap for input probes and should not be used as a PATH for
procfs probes; see the input probe section below.
When a user reads /proc/systemtap/MODNAME/PATH (normal runtime)
or /var/tmp/systemtap-USER/MODNAME (stapbpf runtime), the
corresponding procfs read probe is triggered. The string data to
be read should be assigned to a variable named $value, like this:
procfs("PATH").read { $value = "100\n" }
When a user writes into /proc/systemtap/MODNAME/PATH (normal
runtime) or /var/tmp/systemtap-USER/MODNAME (stapbpf runtime),
the corresponding procfs write probe is triggered. The data the
user wrote is available in the string variable named $value, like
this:
procfs("PATH").write { printf("user wrote: %s", $value) }
MAXSIZE is the size of the procfs read buffer. Specifying
MAXSIZE allows larger procfs output. If no MAXSIZE is specified,
the procfs read buffer defaults to STP_PROCFS_BUFSIZE (which
defaults to MAXSTRINGLEN, the maximum length of a string). If
setting the procfs read buffers for more than one file is needed,
it may be easiest to override the STP_PROCFS_BUFSIZE definition.
Here's an example of using MAXSIZE:
procfs.read.maxsize(1024) {
$value = "long string..."
$value .= "another long string..."
$value .= "another long string..."
$value .= "another long string..."
}
INPUT
These probe points make input from stdin available to the script
during runtime. The translator currently supports two variants
of this family:
input.char
input.line
input.char
is triggered each time a character is read from stdin.
The current character is available in the string variable named
char. There is no newline buffering; the next character is read
from stdin as soon as it becomes available.
input.line
causes all characters read from stdin to be buffered
until a newline is read, at which point the probe will be
triggered. The current line of characters (including the newline)
is made available in a string variable named line. Note that no
more than MAXSTRINGLEN characters will be buffered. Any
additional characters will not be included in line.
Input probes are aliases for procfs("__stdin").write
. Systemtap
reconfigures stdin if the presence of this procfs probe is
detected, therefore "__stdin" should not be used as a path
argument for procfs probes. Additionally, input probes will not
work with the -F and --remote options.
NETFILTER HOOKS
These probe points allow observation of network packets using the
netfilter mechanism. A netfilter probe in systemtap corresponds
to a netfilter hook function in the original netfilter probes
API. It is probably more convenient to use
tapset
::netfilter(3stap), which wraps the primitive netfilter
hooks and does the work of extracting useful information from the
context variables.
There are several probe point variants supported by the
translator:
netfilter.hook("HOOKNAME").pf("PROTOCOL_F")
netfilter.pf("PROTOCOL_F").hook("HOOKNAME")
netfilter.hook("HOOKNAME").pf("PROTOCOL_F").priority("PRIORITY")
netfilter.pf("PROTOCOL_F").hook("HOOKNAME").priority("PRIORITY")
PROTOCOL_F is the protocol family to listen for, currently one of
NFPROTO_IPV4, NFPROTO_IPV6, NFPROTO_ARP, or NFPROTO_BRIDGE.
HOOKNAME is the point, or 'hook', in the protocol stack at which
to intercept the packet. The available hook names for each
protocol family are taken from the kernel header files
<linux/netfilter_ipv4.h>, <linux/netfilter_ipv6.h>,
<linux/netfilter_arp.h> and <linux/netfilter_bridge.h>. For
instance, allowable hook names for NFPROTO_IPV4 are
NF_INET_PRE_ROUTING, NF_INET_LOCAL_IN, NF_INET_FORWARD,
NF_INET_LOCAL_OUT, and NF_INET_POST_ROUTING.
PRIORITY is an integer priority giving the order in which the
probe point should be triggered relative to any other netfilter
hook functions which trigger on the same packet. Hook functions
execute on each packet in order from smallest priority number to
largest priority number. If no PRIORITY is specified (as in the
first two probe point variants above), PRIORITY defaults to "0".
There are a number of predefined priority names of the form
NF_IP_PRI_* and NF_IP6_PRI_* which are defined in the kernel
header files <linux/netfilter_ipv4.h> and
<linux/netfilter_ipv6.h> respectively. The script is permitted to
use these instead of specifying an integer priority. (The probe
points for NFPROTO_ARP and NFPROTO_BRIDGE currently do not expose
any named hook priorities to the script writer.) Thus, allowable
ways to specify the priority include:
priority("255")
priority("NF_IP_PRI_SELINUX_LAST")
A script using guru mode is permitted to specify any identifier
or number as the parameter for hook, pf, and priority. This
feature should be used with caution, as the parameter is inserted
verbatim into the C code generated by systemtap.
The netfilter probe points define the following context
variables:
$hooknum
The hook number.
$skb The address of the sk_buff struct representing the packet.
See <linux/skbuff.h> for details on how to use this
struct, or alternatively use the tapset
tapset
::netfilter(3stap) for easy access to key
information.
$in The address of the net_device struct representing the
network device on which the packet was received (if any).
May be 0 if the device is unknown or undefined at that
stage in the protocol stack.
$out The address of the net_device struct representing the
network device on which the packet will be sent (if any).
May be 0 if the device is unknown or undefined at that
stage in the protocol stack.
$verdict
(Guru mode only.) Assigning one of the verdict values
defined in <linux/netfilter.h> to this variable alters the
further progress of the packet through the protocol stack.
For instance, the following guru mode script forces all
ipv6 network packets to be dropped:
probe netfilter.pf("NFPROTO_IPV6").hook("NF_IP6_PRE_ROUTING") {
$verdict = 0 /* nf_drop */
}
For convenience, unlike the primitive probe points
discussed here, the probes defined in
tapset
::netfilter(3stap) export the lowercase names of the
verdict constants (e.g. NF_DROP becomes nf_drop) as local
variables.
KERNEL TRACEPOINTS
This family of probe points hooks up to static probing
tracepoints inserted into the kernel or modules. As with
markers, these tracepoints are special macro calls inserted by
kernel developers to make probing faster and more reliable than
with DWARF-based probes, and DWARF debugging information is not
required to probe tracepoints. Tracepoints have an extra
advantage of more strongly-typed parameters than markers.
Tracepoint probes look like: kernel.trace("name")
. The
tracepoint name string, which may contain the usual wildcard
characters, is matched against the names defined by the kernel
developers in the tracepoint header files. To restrict the search
to specific subsystems (e.g. sched, ext3, etc...), the following
syntax can be used: kernel.trace("system:name")
. The tracepoint
system string may also contain the usual wildcard characters.
The handler associated with a tracepoint-based probe may read the
optional parameters specified at the macro call site. These are
named according to the declaration by the tracepoint author. For
example, the tracepoint probe kernel.trace("sched:sched_switch")
provides the parameters $prev
and $next
. If the parameter is a
complex type, as in a struct pointer, then a script can access
fields with the same syntax as DWARF $target variables. Also,
tracepoint parameters cannot be modified, but in guru-mode a
script may modify fields of parameters.
The subsystem and name of the tracepoint are available in
$$system
and $$name
and a string of name=value pairs for all
parameters of the tracepoint is available in $$vars
or $$parms
.
KERNEL MARKERS (OBSOLETE)
This family of probe points hooks up to an older style of static
probing markers inserted into older kernels or modules. These
markers are special STAP_MARK macro calls inserted by kernel
developers to make probing faster and more reliable than with
DWARF-based probes. Further, DWARF debugging information is not
required to probe markers.
Marker probe points begin with kernel
. The next part names the
marker itself: mark("name")
. The marker name string, which may
contain the usual wildcard characters, is matched against the
names given to the marker macros when the kernel and/or module
was compiled. Optionally, you can specify format("format")
.
Specifying the marker format string allows differentiation
between two markers with the same name but different marker
format strings.
The handler associated with a marker-based probe may read the
optional parameters specified at the macro call site. These are
named $arg1
through $argNN
, where NN is the number of parameters
supplied by the macro. Number and string parameters are passed
in a type-safe manner.
The marker format string associated with a marker is available in
$format
. And also the marker name string is available in $name
.
HARDWARE BREAKPOINTS
This family of probes is used to set hardware watchpoints for a
given
(global) kernel symbol. The probes take three components as
inputs :
1. The virtual
address / name of the kernel symbol to be traced
is supplied as argument to this class of probes. ( Probes for
only data segment variables are supported. Probing local
variables of a function cannot be done.)
2. Nature of access to be probed : a. .write probe gets
triggered when a write happens at the specified address/symbol
name. b. rw probe is triggered when either a read or write
happens.
3. .length
(optional) Users have the option of specifying the
address interval to be probed using "length" constructs. The
user-specified length gets approximated to the closest possible
address length that the architecture can support. If the
specified length exceeds the limits imposed by architecture, an
error message is flagged and probe registration fails. Wherever
'length' is not specified, the translator requests a hardware
breakpoint probe of length 1. It should be noted that the
"length" construct is not valid with symbol names.
Following constructs are supported :
probe kernel.data(ADDRESS).write
probe kernel.data(ADDRESS).rw
probe kernel.data(ADDRESS).length(LEN).write
probe kernel.data(ADDRESS).length(LEN).rw
probe kernel.data("SYMBOL_NAME").write
probe kernel.data("SYMBOL_NAME").rw
This set of probes make use of the debug registers of the
processor, which is a scarce resource. (4 on x86 , 1 on powerpc )
The script translation flags a warning if a user requests more
hardware breakpoint probes than the limits set by architecture.
For example,a pass-2 warning is flashed when an input script
requests 5 hardware breakpoint probes on an x86 system while x86
architecture supports a maximum of 4 breakpoints. Users are
cautioned to set probes judiciously.
PERF
This family of probe points interfaces to the kernel "perf event"
infrastructure for controlling hardware performance counters.
The events being attached to are described by the "type",
"config" fields of the perf_event_attr structure, and are sampled
at an interval governed by the "sample_period" and "sample_freq"
fields.
These fields are made available to systemtap scripts using the
following syntax:
probe perf.type(NN).config(MM).sample(XX)
probe perf.type(NN).config(MM).hz(XX)
probe perf.type(NN).config(MM)
probe perf.type(NN).config(MM).process("PROC")
probe perf.type(NN).config(MM).counter("COUNTER")
probe perf.type(NN).config(MM).process("PROC").counter("NAME")
The systemtap probe handler is called once per XX increments of
the underlying performance counter when using the .sample field
or at a frequency in hertz when using the .hz field. When not
specified, the default behavior is to sample at a count of
1000000. The range of valid type/config is described by the
perf_event_open(2) system call, and/or the linux/perf_event.h
file. Invalid combinations or exhausted hardware counter
resources result in errors during systemtap script startup.
Systemtap does not sanity-check the values: it merely passes them
through to the kernel for error- and safety-checking. By default
the perf event probe is systemwide unless .process is specified,
which will bind the probe to a specific task. If the name is
omitted then it is inferred from the stap -c argument. A perf
event can be read on demand using .counter. The body of the perf
probe handler will not be invoked for a .counter probe; instead,
the counter is read in a user space probe via:
process("PROC").statement("func@file") {stat <<<
@perf("NAME")}
PYTHON
Support for probing python 2 and python 3 function is available
with the help of an extra python support module. Note that the
debuginfo for the version of python being probed is required. To
run a python script with the extra python support module you'd
add the '-m HelperSDT' option to your python command, like this:
stap foo.stp -c "python -m HelperSDT foo.py"
Python probes look like the following:
python2.module("MPATTERN").function("PATTERN")
python2.module("MPATTERN").function("PATTERN").call
python2.module("MPATTERN").function("PATTERN").return
python3.module("MPATTERN").function("PATTERN")
python3.module("MPATTERN").function("PATTERN").call
python3.module("MPATTERN").function("PATTERN").return
The list above includes multiple variants and modifiers which
provide additional functionality or filters. They are:
.function
Places a probe at the beginning of the named
function by default, unless modified by PATTERN.
Parameters are available as context variables.
.call
Places a probe at the beginning of the named
function. Parameters are available as context
variables.
.return
Places a probe at the moment before
the return from
the named function. Parameters and local/global
python variables are available as context
variables.
PATTERN stands for a string literal that aims to identify a point
in the python program. It is made up of three parts:
• The first part is the name of a function (e.g. "foo") or
class method (e.g. "bar.baz"). This part may use the "*" and
"?" wildcarding operators to match multiple names.
• The second part is optional and begins with the "@"
character. It is followed by the path to the source file
containing the function, which may include a wildcard
pattern. The python path is searched for a matching filename.
• Finally, the third part is optional if the file name part was
given, and identifies the line number in the source file
preceded by a ":" or a "+". The line number is assumed to be
an absolute line number if preceded by a ":", or relative to
the declaration line of the function if preceded by a "+".
All the lines in the function can be matched with ":*". A
range of lines x through y can be matched with ":x-y". Ranges
and specific lines can be mixed using commas, e.g. ":x,y-z".
In the above list of probe points, MPATTERN stands for a python
module or script name that names the python module of interest.
This part may use the "*" and "?" wildcarding operators to match
multiple names. The python path is searched for a matching
filename.