These options apply to all tools, as they affect certain obscure
workings of the Valgrind core. Most people won't need to use
them.
--smc-check=<none|stack|all|all-non-file> [default: all-non-file
for x86/amd64/s390x, stack for other archs]
This option controls Valgrind's detection of self-modifying
code. If no checking is done, when a program executes some
code, then overwrites it with new code, and executes the new
code, Valgrind will continue to execute the translations it
made for the old code. This will likely lead to incorrect
behaviour and/or crashes.
For "modern" architectures -- anything that's not x86, amd64
or s390x -- the default is stack. This is because a correct
program must take explicit action to reestablish D-I cache
coherence following code modification. Valgrind observes and
honours such actions, with the result that self-modifying
code is transparently handled with zero extra cost.
For x86, amd64 and s390x, the program is not required to
notify the hardware of required D-I coherence syncing. Hence
the default is all-non-file, which covers the normal case of
generating code into an anonymous (non-file-backed) mmap'd
area.
The meanings of the four available settings are as follows.
No detection (none), detect self-modifying code on the stack
(which is used by GCC to implement nested functions) (stack),
detect self-modifying code everywhere (all), and detect
self-modifying code everywhere except in file-backed mappings
(all-non-file).
Running with all will slow Valgrind down noticeably. Running
with none will rarely speed things up, since very little code
gets dynamically generated in most programs. The
VALGRIND_DISCARD_TRANSLATIONS
client request is an
alternative to --smc-check=all
and --smc-check=all-non-file
that requires more programmer effort but allows Valgrind to
run your program faster, by telling it precisely when
translations need to be re-made.
--smc-check=all-non-file
provides a cheaper but more limited
version of --smc-check=all
. It adds checks to any
translations that do not originate from file-backed memory
mappings. Typical applications that generate code, for
example JITs in web browsers, generate code into anonymous
mmaped areas, whereas the "fixed" code of the browser always
lives in file-backed mappings. --smc-check=all-non-file
takes advantage of this observation, limiting the overhead of
checking to code which is likely to be JIT generated.
--read-inline-info=<yes|no> [default: see below]
When enabled, Valgrind will read information about inlined
function calls from DWARF3 debug info. This slows Valgrind
startup and makes it use more memory (typically for each
inlined piece of code, 6 words and space for the function
name), but it results in more descriptive stacktraces.
Currently, this functionality is enabled by default only for
Linux, Android and Solaris targets and only for the tools
Memcheck, Massif, Helgrind and DRD. Here is an example of
some stacktraces with --read-inline-info=no
:
==15380== Conditional jump or move depends on uninitialised value(s)
==15380== at 0x80484EA: main (inlinfo.c:6)
==15380==
==15380== Conditional jump or move depends on uninitialised value(s)
==15380== at 0x8048550: fun_noninline (inlinfo.c:6)
==15380== by 0x804850E: main (inlinfo.c:34)
==15380==
==15380== Conditional jump or move depends on uninitialised value(s)
==15380== at 0x8048520: main (inlinfo.c:6)
And here are the same errors with --read-inline-info=yes
:
==15377== Conditional jump or move depends on uninitialised value(s)
==15377== at 0x80484EA: fun_d (inlinfo.c:6)
==15377== by 0x80484EA: fun_c (inlinfo.c:14)
==15377== by 0x80484EA: fun_b (inlinfo.c:20)
==15377== by 0x80484EA: fun_a (inlinfo.c:26)
==15377== by 0x80484EA: main (inlinfo.c:33)
==15377==
==15377== Conditional jump or move depends on uninitialised value(s)
==15377== at 0x8048550: fun_d (inlinfo.c:6)
==15377== by 0x8048550: fun_noninline (inlinfo.c:41)
==15377== by 0x804850E: main (inlinfo.c:34)
==15377==
==15377== Conditional jump or move depends on uninitialised value(s)
==15377== at 0x8048520: fun_d (inlinfo.c:6)
==15377== by 0x8048520: main (inlinfo.c:35)
--read-var-info=<yes|no> [default: no]
When enabled, Valgrind will read information about variable
types and locations from DWARF3 debug info. This slows
Valgrind startup significantly and makes it use significantly
more memory, but for the tools that can take advantage of it
(Memcheck, Helgrind, DRD) it can result in more precise error
messages. For example, here are some standard errors issued
by Memcheck:
==15363== Uninitialised byte(s) found during client check request
==15363== at 0x80484A9: croak (varinfo1.c:28)
==15363== by 0x8048544: main (varinfo1.c:55)
==15363== Address 0x80497f7 is 7 bytes inside data symbol "global_i2"
==15363==
==15363== Uninitialised byte(s) found during client check request
==15363== at 0x80484A9: croak (varinfo1.c:28)
==15363== by 0x8048550: main (varinfo1.c:56)
==15363== Address 0xbea0d0cc is on thread 1's stack
==15363== in frame #1, created by main (varinfo1.c:45)
And here are the same errors with --read-var-info=yes
:
==15370== Uninitialised byte(s) found during client check request
==15370== at 0x80484A9: croak (varinfo1.c:28)
==15370== by 0x8048544: main (varinfo1.c:55)
==15370== Location 0x80497f7 is 0 bytes inside global_i2[7],
==15370== a global variable declared at varinfo1.c:41
==15370==
==15370== Uninitialised byte(s) found during client check request
==15370== at 0x80484A9: croak (varinfo1.c:28)
==15370== by 0x8048550: main (varinfo1.c:56)
==15370== Location 0xbeb4a0cc is 0 bytes inside local var "local"
==15370== declared at varinfo1.c:46, in frame #1 of thread 1
--vgdb-poll=<number> [default: 5000]
As part of its main loop, the Valgrind scheduler will poll to
check if some activity (such as an external command or some
input from a gdb) has to be handled by gdbserver. This
activity poll will be done after having run the given number
of basic blocks (or slightly more than the given number of
basic blocks). This poll is quite cheap so the default value
is set relatively low. You might further decrease this value
if vgdb cannot use ptrace system call to interrupt Valgrind
if all threads are (most of the time) blocked in a system
call.
--vgdb-shadow-registers=no|yes [default: no]
When activated, gdbserver will expose the Valgrind shadow
registers to GDB. With this, the value of the Valgrind shadow
registers can be examined or changed using GDB. Exposing
shadow registers only works with GDB version 7.1 or later.
--vgdb-prefix=<prefix> [default: /tmp/vgdb-pipe]
To communicate with gdb/vgdb, the Valgrind gdbserver creates
3 files (2 named FIFOs and a mmap shared memory file). The
prefix option controls the directory and prefix for the
creation of these files.
--run-libc-freeres=<yes|no> [default: yes]
This option is only relevant when running Valgrind on Linux.
The GNU C library (libc.so
), which is used by all programs,
may allocate memory for its own uses. Usually it doesn't
bother to free that memory when the program ends—there would
be no point, since the Linux kernel reclaims all process
resources when a process exits anyway, so it would just slow
things down.
The glibc authors realised that this behaviour causes leak
checkers, such as Valgrind, to falsely report leaks in glibc,
when a leak check is done at exit. In order to avoid this,
they provided a routine called __libc_freeres
specifically to
make glibc release all memory it has allocated. Memcheck
therefore tries to run __libc_freeres
at exit.
Unfortunately, in some very old versions of glibc,
__libc_freeres
is sufficiently buggy to cause segmentation
faults. This was particularly noticeable on Red Hat 7.1. So
this option is provided in order to inhibit the run of
__libc_freeres
. If your program seems to run fine on
Valgrind, but segfaults at exit, you may find that
--run-libc-freeres=no
fixes that, although at the cost of
possibly falsely reporting space leaks in libc.so.
--run-cxx-freeres=<yes|no> [default: yes]
This option is only relevant when running Valgrind on Linux
or Solaris C++ programs.
The GNU Standard C++ library (libstdc++.so
), which is used by
all C++ programs compiled with g++, may allocate memory for
its own uses. Usually it doesn't bother to free that memory
when the program ends—there would be no point, since the
kernel reclaims all process resources when a process exits
anyway, so it would just slow things down.
The gcc authors realised that this behaviour causes leak
checkers, such as Valgrind, to falsely report leaks in
libstdc++, when a leak check is done at exit. In order to
avoid this, they provided a routine called
__gnu_cxx::__freeres
specifically to make libstdc++ release
all memory it has allocated. Memcheck therefore tries to run
__gnu_cxx::__freeres
at exit.
For the sake of flexibility and unforeseen problems with
__gnu_cxx::__freeres
, option --run-cxx-freeres=no
exists,
although at the cost of possibly falsely reporting space
leaks in libstdc++.so.
--sim-hints=hint1,hint2,...
Pass miscellaneous hints to Valgrind which slightly modify
the simulated behaviour in nonstandard or dangerous ways,
possibly to help the simulation of strange features. By
default no hints are enabled. Use with caution! Currently
known hints are:
• lax-ioctls:
Be very lax about ioctl handling; the only
assumption is that the size is correct. Doesn't require
the full buffer to be initialised when writing. Without
this, using some device drivers with a large number of
strange ioctl commands becomes very tiresome.
• fuse-compatible:
Enable special handling for certain
system calls that may block in a FUSE file-system. This
may be necessary when running Valgrind on a
multi-threaded program that uses one thread to manage a
FUSE file-system and another thread to access that
file-system.
• enable-outer:
Enable some special magic needed when the
program being run is itself Valgrind.
• no-inner-prefix:
Disable printing a prefix >
in front of
each stdout or stderr output line in an inner Valgrind
being run by an outer Valgrind. This is useful when
running Valgrind regression tests in an outer/inner
setup. Note that the prefix >
will always be printed in
front of the inner debug logging lines.
• no-nptl-pthread-stackcache:
This hint is only relevant
when running Valgrind on Linux; it is ignored on Solaris
and Mac OS X.
The GNU glibc pthread library (libpthread.so
), which is
used by pthread programs, maintains a cache of pthread
stacks. When a pthread terminates, the memory used for
the pthread stack and some thread local storage related
data structure are not always directly released. This
memory is kept in a cache (up to a certain size), and is
re-used if a new thread is started.
This cache causes the helgrind tool to report some false
positive race condition errors on this cached memory, as
helgrind does not understand the internal glibc cache
synchronisation primitives. So, when using helgrind,
disabling the cache helps to avoid false positive race
conditions, in particular when using thread local storage
variables (e.g. variables using the __thread
qualifier).
When using the memcheck tool, disabling the cache ensures
the memory used by glibc to handle __thread variables is
directly released when a thread terminates.
Note: Valgrind disables the cache using some internal
knowledge of the glibc stack cache implementation and by
examining the debug information of the pthread library.
This technique is thus somewhat fragile and might not
work for all glibc versions. This has been successfully
tested with various glibc versions (e.g. 2.11, 2.16,
2.18) on various platforms.
• lax-doors:
(Solaris only) Be very lax about door syscall
handling over unrecognised door file descriptors. Does
not require that full buffer is initialised when writing.
Without this, programs using libdoor(3LIB) functionality
with completely proprietary semantics may report large
number of false positives.
• fallback-llsc:
(MIPS and ARM64 only): Enables an
alternative implementation of Load-Linked (LL) and
Store-Conditional (SC) instructions. The standard
implementation gives more correct behaviour, but can
cause indefinite looping on certain processor
implementations that are intolerant of extra memory
references between LL and SC. So far this is known only
to happen on Cavium 3 cores. You should not need to use
this flag, since the relevant cores are detected at
startup and the alternative implementation is
automatically enabled if necessary. There is no
equivalent anti-flag: you cannot force-disable the
alternative implementation, if it is automatically
enabled. The underlying problem exists because the
"standard" implementation of LL and SC is done by copying
through LL and SC instructions into the instrumented
code. However, tools may insert extra instrumentation
memory references in between the LL and SC instructions.
These memory references are not present in the original
uninstrumented code, and their presence in the
instrumented code can cause the SC instructions to
persistently fail, leading to indefinite looping in LL-SC
blocks. The alternative implementation gives correct
behaviour of LL and SC instructions between threads in a
process, up to and including the ABA scenario. It also
gives correct behaviour between a Valgrinded thread and a
non-Valgrinded thread running in a different process,
that communicate via shared memory, but only up to and
including correct CAS behaviour -- in this case the ABA
scenario may not be correctly handled.
--fair-sched=<no|yes|try> [default: no]
The --fair-sched
option controls the locking mechanism used
by Valgrind to serialise thread execution. The locking
mechanism controls the way the threads are scheduled, and
different settings give different trade-offs between fairness
and performance. For more details about the Valgrind thread
serialisation scheme and its impact on performance and thread
scheduling, see Scheduling and Multi-Thread Performance.
• The value --fair-sched=yes
activates a fair scheduler. In
short, if multiple threads are ready to run, the threads
will be scheduled in a round robin fashion. This
mechanism is not available on all platforms or Linux
versions. If not available, using --fair-sched=yes
will
cause Valgrind to terminate with an error.
You may find this setting improves overall responsiveness
if you are running an interactive multithreaded program,
for example a web browser, on Valgrind.
• The value --fair-sched=try
activates fair scheduling if
available on the platform. Otherwise, it will
automatically fall back to --fair-sched=no
.
• The value --fair-sched=no
activates a scheduler which
does not guarantee fairness between threads ready to run,
but which in general gives the highest performance.
--kernel-variant=variant1,variant2,...
Handle system calls and ioctls arising from minor variants of
the default kernel for this platform. This is useful for
running on hacked kernels or with kernel modules which
support nonstandard ioctls, for example. Use with caution. If
you don't understand what this option does then you almost
certainly don't need it. Currently known variants are:
• bproc
: support the sys_broc
system call on x86. This is
for running on BProc, which is a minor variant of
standard Linux which is sometimes used for building
clusters.
• android-no-hw-tls
: some versions of the Android emulator
for ARM do not provide a hardware TLS (thread-local
state) register, and Valgrind crashes at startup. Use
this variant to select software support for TLS.
• android-gpu-sgx5xx
: use this to support handling of
proprietary ioctls for the PowerVR SGX 5XX series of GPUs
on Android devices. Failure to select this does not cause
stability problems, but may cause Memcheck to report
false errors after the program performs GPU-specific
ioctls.
• android-gpu-adreno3xx
: similarly, use this to support
handling of proprietary ioctls for the Qualcomm Adreno
3XX series of GPUs on Android devices.
--merge-recursive-frames=<number> [default: 0]
Some recursive algorithms, for example balanced binary tree
implementations, create many different stack traces, each
containing cycles of calls. A cycle is defined as two
identical program counter values separated by zero or more
other program counter values. Valgrind may then use a lot of
memory to store all these stack traces. This is a poor use of
memory considering that such stack traces contain repeated
uninteresting recursive calls instead of more interesting
information such as the function that has initiated the
recursive call.
The option --merge-recursive-frames=<number>
instructs
Valgrind to detect and merge recursive call cycles having a
size of up to <number>
frames. When such a cycle is detected,
Valgrind records the cycle in the stack trace as a unique
program counter.
The value 0 (the default) causes no recursive call merging. A
value of 1 will cause stack traces of simple recursive
algorithms (for example, a factorial implementation) to be
collapsed. A value of 2 will usually be needed to collapse
stack traces produced by recursive algorithms such as binary
trees, quick sort, etc. Higher values might be needed for
more complex recursive algorithms.
Note: recursive calls are detected by analysis of program
counter values. They are not detected by looking at function
names.
--num-transtab-sectors=<number> [default: 6 for Android
platforms, 16 for all others]
Valgrind translates and instruments your program's machine
code in small fragments (basic blocks). The translations are
stored in a translation cache that is divided into a number
of sections (sectors). If the cache is full, the sector
containing the oldest translations is emptied and reused. If
these old translations are needed again, Valgrind must
re-translate and re-instrument the corresponding machine
code, which is expensive. If the "executed instructions"
working set of a program is big, increasing the number of
sectors may improve performance by reducing the number of
re-translations needed. Sectors are allocated on demand. Once
allocated, a sector can never be freed, and occupies
considerable space, depending on the tool and the value of
--avg-transtab-entry-size
(about 40 MB per sector for
Memcheck). Use the option --stats=yes
to obtain precise
information about the memory used by a sector and the
allocation and recycling of sectors.
--avg-transtab-entry-size=<number> [default: 0, meaning use tool
provided default]
Average size of translated basic block. This average size is
used to dimension the size of a sector. Each tool provides a
default value to be used. If this default value is too small,
the translation sectors will become full too quickly. If this
default value is too big, a significant part of the
translation sector memory will be unused. Note that the
average size of a basic block translation depends on the
tool, and might depend on tool options. For example, the
memcheck option --track-origins=yes
increases the size of the
basic block translations. Use --avg-transtab-entry-size
to
tune the size of the sectors, either to gain memory or to
avoid too many retranslations.
--aspace-minaddr=<address> [default: depends on the platform]
To avoid potential conflicts with some system libraries,
Valgrind does not use the address space below
--aspace-minaddr
value, keeping it reserved in case a library
specifically requests memory in this region. So, some
"pessimistic" value is guessed by Valgrind depending on the
platform. On linux, by default, Valgrind avoids using the
first 64MB even if typically there is no conflict in this
complete zone. You can use the option --aspace-minaddr
to
have your memory hungry application benefitting from more of
this lower memory. On the other hand, if you encounter a
conflict, increasing aspace-minaddr value might solve it.
Conflicts will typically manifest themselves with mmap
failures in the low range of the address space. The provided
address must be page aligned and must be equal or bigger to
0x1000 (4KB). To find the default value on your platform, do
something such as valgrind -d -d date 2>&1 | grep -i minaddr.
Values lower than 0x10000 (64KB) are known to create problems
on some distributions.
--valgrind-stacksize=<number> [default: 1MB]
For each thread, Valgrind needs its own 'private' stack. The
default size for these stacks is largely dimensioned, and so
should be sufficient in most cases. In case the size is too
small, Valgrind will segfault. Before segfaulting, a warning
might be produced by Valgrind when approaching the limit.
Use the option --valgrind-stacksize
if such an (unlikely)
warning is produced, or Valgrind dies due to a segmentation
violation. Such segmentation violations have been seen when
demangling huge C++ symbols.
If your application uses many threads and needs a lot of
memory, you can gain some memory by reducing the size of
these Valgrind stacks using the option --valgrind-stacksize
.
--show-emwarns=<yes|no> [default: no]
When enabled, Valgrind will emit warnings about its CPU
emulation in certain cases. These are usually not
interesting.
--require-text-symbol=:sonamepatt:fnnamepatt
When a shared object whose soname matches sonamepatt is
loaded into the process, examine all the text symbols it
exports. If none of those match fnnamepatt, print an error
message and abandon the run. This makes it possible to ensure
that the run does not continue unless a given shared object
contains a particular function name.
Both sonamepatt and fnnamepatt can be written using the usual
? and * wildcards. For example: ":*libc.so*:foo?bar". You
may use characters other than a colon to separate the two
patterns. It is only important that the first character and
the separator character are the same. For example, the above
example could also be written "Q*libc.so*Qfoo?bar". Multiple
--require-text-symbol flags are allowed, in which case
shared objects that are loaded into the process will be
checked against all of them.
The purpose of this is to support reliable usage of marked-up
libraries. For example, suppose we have a version of GCC's
libgomp.so which has been marked up with annotations to
support Helgrind. It is only too easy and confusing to load
the wrong, un-annotated libgomp.so into the application. So
the idea is: add a text symbol in the marked-up library, for
example annotated_for_helgrind_3_6, and then give the flag
--require-text-symbol=:*libgomp*so*:annotated_for_helgrind_3_6
so that when libgomp.so is loaded, Valgrind scans its symbol
table, and if the symbol isn't present the run is aborted,
rather than continuing silently with the un-marked-up
library. Note that you should put the entire flag in quotes
to stop shells expanding up the * and ? wildcards.
--soname-synonyms=syn1=pattern1,syn2=pattern2,...
When a shared library is loaded, Valgrind checks for
functions in the library that must be replaced or wrapped.
For example, Memcheck replaces some string and memory
functions (strchr, strlen, strcpy, memchr, memcpy, memmove,
etc.) with its own versions. Such replacements are normally
done only in shared libraries whose soname matches a
predefined soname pattern (e.g. libc.so* on linux). By
default, no replacement is done for a statically linked
binary or for alternative libraries, except for the
allocation functions (malloc, free, calloc, memalign,
realloc, operator new, operator delete, etc.) Such allocation
functions are intercepted by default in any shared library or
in the executable if they are exported as global symbols.
This means that if a replacement allocation library such as
tcmalloc is found, its functions are also intercepted by
default. In some cases, the replacements allow
--soname-synonyms
to specify one additional synonym pattern,
giving flexibility in the replacement. Or to prevent
interception of all public allocation symbols.
Currently, this flexibility is only allowed for the malloc
related functions, using the synonym somalloc. This synonym
is usable for all tools doing standard replacement of malloc
related functions (e.g. memcheck, helgrind, drd, massif,
dhat).
• Alternate malloc library: to replace the malloc related
functions in a specific alternate library with soname
mymalloclib.so (and not in any others), give the option
--soname-synonyms=somalloc=mymalloclib.so
. A pattern can
be used to match multiple libraries sonames. For example,
--soname-synonyms=somalloc=*tcmalloc*
will match the
soname of all variants of the tcmalloc library (native,
debug, profiled, ... tcmalloc variants).
Note: the soname of a elf shared library can be retrieved
using the readelf utility.
• Replacements in a statically linked library are done by
using the NONE pattern. For example, if you link with
libtcmalloc.a, and only want to intercept the malloc
related functions in the executable (and standard
libraries) themselves, but not any other shared
libraries, you can give the option
--soname-synonyms=somalloc=NONE
. Note that a NONE pattern
will match the main executable and any shared library
having no soname.
• To run a "default" Firefox build for Linux, in which
JEMalloc is linked in to the main executable, use
--soname-synonyms=somalloc=NONE
.
• To only intercept allocation symbols in the default
system libraries, but not in any other shared library or
the executable defining public malloc or operator new
related functions use a non-existing library name like
--soname-synonyms=somalloc=nouserintercepts
(where
nouserintercepts can be any non-existing library name).
• Shared library of the dynamic (runtime) linker is
excluded from searching for global public symbols, such
as those for the malloc related functions (identified by
somalloc synonym).
--progress-interval=<number> [default: 0, meaning 'disabled']
This is an enhancement to Valgrind's debugging output. It is
unlikely to be of interest to end users.
When number is set to a non-zero value, Valgrind will print a
one-line progress summary every number seconds. Valid
settings for number are between 0 and 3600 inclusive. Here's
some example output with number set to 10:
PROGRESS: U 110s, W 113s, 97.3% CPU, EvC 414.79M, TIn 616.7k, TOut 0.5k, #thr 67
PROGRESS: U 120s, W 124s, 96.8% CPU, EvC 505.27M, TIn 636.6k, TOut 3.0k, #thr 64
PROGRESS: U 130s, W 134s, 97.0% CPU, EvC 574.90M, TIn 657.5k, TOut 3.0k, #thr 63
Each line shows:
• U: total user time
• W: total wallclock time
• CPU: overall average cpu use
• EvC: number of event checks. An event check is a
backwards branch in the simulated program, so this is a
measure of forward progress of the program
• TIn: number of code blocks instrumented by the JIT
• TOut: number of instrumented code blocks that have been
thrown away
• #thr: number of threads in the program
From the progress of these, it is possible to observe:
• when the program is compute bound (TIn rises slowly, EvC
rises rapidly)
• when the program is in a spinloop (TIn/TOut fixed, EvC
rises rapidly)
• when the program is JIT-bound (TIn rises rapidly)
• when the program is rapidly discarding code (TOut rises
rapidly)
• when the program is about to achieve some expected state
(EvC arrives at some value you expect)
• when the program is idling (U rises more slowly than W)