In the simplest case strace
runs the specified command until it
exits. It intercepts and records the system calls which are
called by a process and the signals which are received by a
process. The name of each system call, its arguments and its
return value are printed on standard error or to the file
specified with the -o
option.
strace
is a useful diagnostic, instructional, and debugging tool.
System administrators, diagnosticians and trouble-shooters will
find it invaluable for solving problems with programs for which
the source is not readily available since they do not need to be
recompiled in order to trace them. Students, hackers and the
overly-curious will find that a great deal can be learned about a
system and its system calls by tracing even ordinary programs.
And programmers will find that since system calls and signals are
events that happen at the user/kernel interface, a close
examination of this boundary is very useful for bug isolation,
sanity checking and attempting to capture race conditions.
Each line in the trace contains the system call name, followed by
its arguments in parentheses and its return value. An example
from stracing the command "cat /dev/null" is:
open("/dev/null", O_RDONLY) = 3
Errors (typically a return value of -1) have the errno symbol and
error string appended.
open("/foo/bar", O_RDONLY) = -1 ENOENT (No such file or directory)
Signals are printed as signal symbol and decoded siginfo
structure. An excerpt from stracing and interrupting the command
"sleep 666" is:
sigsuspend([] <unfinished ...>
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=...} ---
+++ killed by SIGINT +++
If a system call is being executed and meanwhile another one is
being called from a different thread/process then strace
will try
to preserve the order of those events and mark the ongoing call
as being unfinished. When the call returns it will be marked as
resumed.
[pid 28772] select(4, [3], NULL, NULL, NULL <unfinished ...>
[pid 28779] clock_gettime(CLOCK_REALTIME, {1130322148, 939977000}) = 0
[pid 28772] <... select resumed> ) = 1 (in [3])
Interruption of a (restartable) system call by a signal delivery
is processed differently as kernel terminates the system call and
also arranges its immediate reexecution after the signal handler
completes.
read(0, 0x7ffff72cf5cf, 1) = ? ERESTARTSYS (To be restarted)
--- SIGALRM ... ---
rt_sigreturn(0xe) = 0
read(0, "", 1) = 0
Arguments are printed in symbolic form with passion. This
example shows the shell performing ">>xyzzy" output redirection:
open("xyzzy", O_WRONLY|O_APPEND|O_CREAT, 0666) = 3
Here, the second and the third argument of open(2) are decoded by
breaking down the flag argument into its three bitwise-OR
constituents and printing the mode value in octal by tradition.
Where the traditional or native usage differs from ANSI or POSIX,
the latter forms are preferred. In some cases, strace
output is
proven to be more readable than the source.
Structure pointers are dereferenced and the members are displayed
as appropriate. In most cases, arguments are formatted in the
most C-like fashion possible. For example, the essence of the
command "ls -l /dev/null" is captured as:
lstat("/dev/null", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}) = 0
Notice how the 'struct stat' argument is dereferenced and how
each member is displayed symbolically. In particular, observe
how the st_mode
member is carefully decoded into a bitwise-OR of
symbolic and numeric values. Also notice in this example that
the first argument to lstat(2) is an input to the system call and
the second argument is an output. Since output arguments are not
modified if the system call fails, arguments may not always be
dereferenced. For example, retrying the "ls -l" example with a
non-existent file produces the following line:
lstat("/foo/bar", 0xb004) = -1 ENOENT (No such file or directory)
In this case the porch light is on but nobody is home.
Syscalls unknown to strace
are printed raw, with the unknown
system call number printed in hexadecimal form and prefixed with
"syscall_":
syscall_0xbad(0x1, 0x2, 0x3, 0x4, 0x5, 0x6) = -1 ENOSYS (Function not implemented)
Character pointers are dereferenced and printed as C strings.
Non-printing characters in strings are normally represented by
ordinary C escape codes. Only the first strsize (32 by default)
bytes of strings are printed; longer strings have an ellipsis
appended following the closing quote. Here is a line from "ls
-l" where the getpwuid(3) library routine is reading the password
file:
read(3, "root::0:0:System Administrator:/"..., 1024) = 422
While structures are annotated using curly braces, simple
pointers and arrays are printed using square brackets with commas
separating elements. Here is an example from the command id(1)
on a system with supplementary group ids:
getgroups(32, [100, 0]) = 2
On the other hand, bit-sets are also shown using square brackets,
but set elements are separated only by a space. Here is the
shell, preparing to execute an external command:
sigprocmask(SIG_BLOCK, [CHLD TTOU], []) = 0
Here, the second argument is a bit-set of two signals, SIGCHLD
and SIGTTOU
. In some cases, the bit-set is so full that printing
out the unset elements is more valuable. In that case, the bit-
set is prefixed by a tilde like this:
sigprocmask(SIG_UNBLOCK, ~[], NULL) = 0
Here, the second argument represents the full set of all signals.