Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   valgrind    ( 1 )

набор инструментов для отладки и профилирования программ (a suite of tools for debugging and profiling programs)

UNCOMMON OPTIONS

These options apply to all tools, as they affect certain obscure
       workings of the Valgrind core. Most people won't need to use
       them.

--smc-check=<none|stack|all|all-non-file> [default: all-non-file for x86/amd64/s390x, stack for other archs] This option controls Valgrind's detection of self-modifying code. If no checking is done, when a program executes some code, then overwrites it with new code, and executes the new code, Valgrind will continue to execute the translations it made for the old code. This will likely lead to incorrect behaviour and/or crashes.

For "modern" architectures -- anything that's not x86, amd64 or s390x -- the default is stack. This is because a correct program must take explicit action to reestablish D-I cache coherence following code modification. Valgrind observes and honours such actions, with the result that self-modifying code is transparently handled with zero extra cost.

For x86, amd64 and s390x, the program is not required to notify the hardware of required D-I coherence syncing. Hence the default is all-non-file, which covers the normal case of generating code into an anonymous (non-file-backed) mmap'd area.

The meanings of the four available settings are as follows. No detection (none), detect self-modifying code on the stack (which is used by GCC to implement nested functions) (stack), detect self-modifying code everywhere (all), and detect self-modifying code everywhere except in file-backed mappings (all-non-file).

Running with all will slow Valgrind down noticeably. Running with none will rarely speed things up, since very little code gets dynamically generated in most programs. The VALGRIND_DISCARD_TRANSLATIONS client request is an alternative to --smc-check=all and --smc-check=all-non-file that requires more programmer effort but allows Valgrind to run your program faster, by telling it precisely when translations need to be re-made.

--smc-check=all-non-file provides a cheaper but more limited version of --smc-check=all. It adds checks to any translations that do not originate from file-backed memory mappings. Typical applications that generate code, for example JITs in web browsers, generate code into anonymous mmaped areas, whereas the "fixed" code of the browser always lives in file-backed mappings. --smc-check=all-non-file takes advantage of this observation, limiting the overhead of checking to code which is likely to be JIT generated.

--read-inline-info=<yes|no> [default: see below] When enabled, Valgrind will read information about inlined function calls from DWARF3 debug info. This slows Valgrind startup and makes it use more memory (typically for each inlined piece of code, 6 words and space for the function name), but it results in more descriptive stacktraces. Currently, this functionality is enabled by default only for Linux, Android and Solaris targets and only for the tools Memcheck, Massif, Helgrind and DRD. Here is an example of some stacktraces with --read-inline-info=no:

==15380== Conditional jump or move depends on uninitialised value(s) ==15380== at 0x80484EA: main (inlinfo.c:6) ==15380== ==15380== Conditional jump or move depends on uninitialised value(s) ==15380== at 0x8048550: fun_noninline (inlinfo.c:6) ==15380== by 0x804850E: main (inlinfo.c:34) ==15380== ==15380== Conditional jump or move depends on uninitialised value(s) ==15380== at 0x8048520: main (inlinfo.c:6)

And here are the same errors with --read-inline-info=yes:

==15377== Conditional jump or move depends on uninitialised value(s) ==15377== at 0x80484EA: fun_d (inlinfo.c:6) ==15377== by 0x80484EA: fun_c (inlinfo.c:14) ==15377== by 0x80484EA: fun_b (inlinfo.c:20) ==15377== by 0x80484EA: fun_a (inlinfo.c:26) ==15377== by 0x80484EA: main (inlinfo.c:33) ==15377== ==15377== Conditional jump or move depends on uninitialised value(s) ==15377== at 0x8048550: fun_d (inlinfo.c:6) ==15377== by 0x8048550: fun_noninline (inlinfo.c:41) ==15377== by 0x804850E: main (inlinfo.c:34) ==15377== ==15377== Conditional jump or move depends on uninitialised value(s) ==15377== at 0x8048520: fun_d (inlinfo.c:6) ==15377== by 0x8048520: main (inlinfo.c:35)

--read-var-info=<yes|no> [default: no] When enabled, Valgrind will read information about variable types and locations from DWARF3 debug info. This slows Valgrind startup significantly and makes it use significantly more memory, but for the tools that can take advantage of it (Memcheck, Helgrind, DRD) it can result in more precise error messages. For example, here are some standard errors issued by Memcheck:

==15363== Uninitialised byte(s) found during client check request ==15363== at 0x80484A9: croak (varinfo1.c:28) ==15363== by 0x8048544: main (varinfo1.c:55) ==15363== Address 0x80497f7 is 7 bytes inside data symbol "global_i2" ==15363== ==15363== Uninitialised byte(s) found during client check request ==15363== at 0x80484A9: croak (varinfo1.c:28) ==15363== by 0x8048550: main (varinfo1.c:56) ==15363== Address 0xbea0d0cc is on thread 1's stack ==15363== in frame #1, created by main (varinfo1.c:45)

And here are the same errors with --read-var-info=yes:

==15370== Uninitialised byte(s) found during client check request ==15370== at 0x80484A9: croak (varinfo1.c:28) ==15370== by 0x8048544: main (varinfo1.c:55) ==15370== Location 0x80497f7 is 0 bytes inside global_i2[7], ==15370== a global variable declared at varinfo1.c:41 ==15370== ==15370== Uninitialised byte(s) found during client check request ==15370== at 0x80484A9: croak (varinfo1.c:28) ==15370== by 0x8048550: main (varinfo1.c:56) ==15370== Location 0xbeb4a0cc is 0 bytes inside local var "local" ==15370== declared at varinfo1.c:46, in frame #1 of thread 1

--vgdb-poll=<number> [default: 5000] As part of its main loop, the Valgrind scheduler will poll to check if some activity (such as an external command or some input from a gdb) has to be handled by gdbserver. This activity poll will be done after having run the given number of basic blocks (or slightly more than the given number of basic blocks). This poll is quite cheap so the default value is set relatively low. You might further decrease this value if vgdb cannot use ptrace system call to interrupt Valgrind if all threads are (most of the time) blocked in a system call.

--vgdb-shadow-registers=no|yes [default: no] When activated, gdbserver will expose the Valgrind shadow registers to GDB. With this, the value of the Valgrind shadow registers can be examined or changed using GDB. Exposing shadow registers only works with GDB version 7.1 or later.

--vgdb-prefix=<prefix> [default: /tmp/vgdb-pipe] To communicate with gdb/vgdb, the Valgrind gdbserver creates 3 files (2 named FIFOs and a mmap shared memory file). The prefix option controls the directory and prefix for the creation of these files.

--run-libc-freeres=<yes|no> [default: yes] This option is only relevant when running Valgrind on Linux.

The GNU C library (libc.so), which is used by all programs, may allocate memory for its own uses. Usually it doesn't bother to free that memory when the program ends—there would be no point, since the Linux kernel reclaims all process resources when a process exits anyway, so it would just slow things down.

The glibc authors realised that this behaviour causes leak checkers, such as Valgrind, to falsely report leaks in glibc, when a leak check is done at exit. In order to avoid this, they provided a routine called __libc_freeres specifically to make glibc release all memory it has allocated. Memcheck therefore tries to run __libc_freeres at exit.

Unfortunately, in some very old versions of glibc, __libc_freeres is sufficiently buggy to cause segmentation faults. This was particularly noticeable on Red Hat 7.1. So this option is provided in order to inhibit the run of __libc_freeres. If your program seems to run fine on Valgrind, but segfaults at exit, you may find that --run-libc-freeres=no fixes that, although at the cost of possibly falsely reporting space leaks in libc.so.

--run-cxx-freeres=<yes|no> [default: yes] This option is only relevant when running Valgrind on Linux or Solaris C++ programs.

The GNU Standard C++ library (libstdc++.so), which is used by all C++ programs compiled with g++, may allocate memory for its own uses. Usually it doesn't bother to free that memory when the program ends—there would be no point, since the kernel reclaims all process resources when a process exits anyway, so it would just slow things down.

The gcc authors realised that this behaviour causes leak checkers, such as Valgrind, to falsely report leaks in libstdc++, when a leak check is done at exit. In order to avoid this, they provided a routine called __gnu_cxx::__freeres specifically to make libstdc++ release all memory it has allocated. Memcheck therefore tries to run __gnu_cxx::__freeres at exit.

For the sake of flexibility and unforeseen problems with __gnu_cxx::__freeres, option --run-cxx-freeres=no exists, although at the cost of possibly falsely reporting space leaks in libstdc++.so.

--sim-hints=hint1,hint2,... Pass miscellaneous hints to Valgrind which slightly modify the simulated behaviour in nonstandard or dangerous ways, possibly to help the simulation of strange features. By default no hints are enabled. Use with caution! Currently known hints are:

lax-ioctls: Be very lax about ioctl handling; the only assumption is that the size is correct. Doesn't require the full buffer to be initialised when writing. Without this, using some device drivers with a large number of strange ioctl commands becomes very tiresome.

fuse-compatible: Enable special handling for certain system calls that may block in a FUSE file-system. This may be necessary when running Valgrind on a multi-threaded program that uses one thread to manage a FUSE file-system and another thread to access that file-system.

enable-outer: Enable some special magic needed when the program being run is itself Valgrind.

no-inner-prefix: Disable printing a prefix > in front of each stdout or stderr output line in an inner Valgrind being run by an outer Valgrind. This is useful when running Valgrind regression tests in an outer/inner setup. Note that the prefix > will always be printed in front of the inner debug logging lines.

no-nptl-pthread-stackcache: This hint is only relevant when running Valgrind on Linux; it is ignored on Solaris and Mac OS X.

The GNU glibc pthread library (libpthread.so), which is used by pthread programs, maintains a cache of pthread stacks. When a pthread terminates, the memory used for the pthread stack and some thread local storage related data structure are not always directly released. This memory is kept in a cache (up to a certain size), and is re-used if a new thread is started.

This cache causes the helgrind tool to report some false positive race condition errors on this cached memory, as helgrind does not understand the internal glibc cache synchronisation primitives. So, when using helgrind, disabling the cache helps to avoid false positive race conditions, in particular when using thread local storage variables (e.g. variables using the __thread qualifier).

When using the memcheck tool, disabling the cache ensures the memory used by glibc to handle __thread variables is directly released when a thread terminates.

Note: Valgrind disables the cache using some internal knowledge of the glibc stack cache implementation and by examining the debug information of the pthread library. This technique is thus somewhat fragile and might not work for all glibc versions. This has been successfully tested with various glibc versions (e.g. 2.11, 2.16, 2.18) on various platforms.

lax-doors: (Solaris only) Be very lax about door syscall handling over unrecognised door file descriptors. Does not require that full buffer is initialised when writing. Without this, programs using libdoor(3LIB) functionality with completely proprietary semantics may report large number of false positives.

fallback-llsc: (MIPS and ARM64 only): Enables an alternative implementation of Load-Linked (LL) and Store-Conditional (SC) instructions. The standard implementation gives more correct behaviour, but can cause indefinite looping on certain processor implementations that are intolerant of extra memory references between LL and SC. So far this is known only to happen on Cavium 3 cores. You should not need to use this flag, since the relevant cores are detected at startup and the alternative implementation is automatically enabled if necessary. There is no equivalent anti-flag: you cannot force-disable the alternative implementation, if it is automatically enabled. The underlying problem exists because the "standard" implementation of LL and SC is done by copying through LL and SC instructions into the instrumented code. However, tools may insert extra instrumentation memory references in between the LL and SC instructions. These memory references are not present in the original uninstrumented code, and their presence in the instrumented code can cause the SC instructions to persistently fail, leading to indefinite looping in LL-SC blocks. The alternative implementation gives correct behaviour of LL and SC instructions between threads in a process, up to and including the ABA scenario. It also gives correct behaviour between a Valgrinded thread and a non-Valgrinded thread running in a different process, that communicate via shared memory, but only up to and including correct CAS behaviour -- in this case the ABA scenario may not be correctly handled.

--fair-sched=<no|yes|try> [default: no] The --fair-sched option controls the locking mechanism used by Valgrind to serialise thread execution. The locking mechanism controls the way the threads are scheduled, and different settings give different trade-offs between fairness and performance. For more details about the Valgrind thread serialisation scheme and its impact on performance and thread scheduling, see Scheduling and Multi-Thread Performance.

• The value --fair-sched=yes activates a fair scheduler. In short, if multiple threads are ready to run, the threads will be scheduled in a round robin fashion. This mechanism is not available on all platforms or Linux versions. If not available, using --fair-sched=yes will cause Valgrind to terminate with an error.

You may find this setting improves overall responsiveness if you are running an interactive multithreaded program, for example a web browser, on Valgrind.

• The value --fair-sched=try activates fair scheduling if available on the platform. Otherwise, it will automatically fall back to --fair-sched=no.

• The value --fair-sched=no activates a scheduler which does not guarantee fairness between threads ready to run, but which in general gives the highest performance.

--kernel-variant=variant1,variant2,... Handle system calls and ioctls arising from minor variants of the default kernel for this platform. This is useful for running on hacked kernels or with kernel modules which support nonstandard ioctls, for example. Use with caution. If you don't understand what this option does then you almost certainly don't need it. Currently known variants are:

bproc: support the sys_broc system call on x86. This is for running on BProc, which is a minor variant of standard Linux which is sometimes used for building clusters.

android-no-hw-tls: some versions of the Android emulator for ARM do not provide a hardware TLS (thread-local state) register, and Valgrind crashes at startup. Use this variant to select software support for TLS.

android-gpu-sgx5xx: use this to support handling of proprietary ioctls for the PowerVR SGX 5XX series of GPUs on Android devices. Failure to select this does not cause stability problems, but may cause Memcheck to report false errors after the program performs GPU-specific ioctls.

android-gpu-adreno3xx: similarly, use this to support handling of proprietary ioctls for the Qualcomm Adreno 3XX series of GPUs on Android devices.

--merge-recursive-frames=<number> [default: 0] Some recursive algorithms, for example balanced binary tree implementations, create many different stack traces, each containing cycles of calls. A cycle is defined as two identical program counter values separated by zero or more other program counter values. Valgrind may then use a lot of memory to store all these stack traces. This is a poor use of memory considering that such stack traces contain repeated uninteresting recursive calls instead of more interesting information such as the function that has initiated the recursive call.

The option --merge-recursive-frames=<number> instructs Valgrind to detect and merge recursive call cycles having a size of up to <number> frames. When such a cycle is detected, Valgrind records the cycle in the stack trace as a unique program counter.

The value 0 (the default) causes no recursive call merging. A value of 1 will cause stack traces of simple recursive algorithms (for example, a factorial implementation) to be collapsed. A value of 2 will usually be needed to collapse stack traces produced by recursive algorithms such as binary trees, quick sort, etc. Higher values might be needed for more complex recursive algorithms.

Note: recursive calls are detected by analysis of program counter values. They are not detected by looking at function names.

--num-transtab-sectors=<number> [default: 6 for Android platforms, 16 for all others] Valgrind translates and instruments your program's machine code in small fragments (basic blocks). The translations are stored in a translation cache that is divided into a number of sections (sectors). If the cache is full, the sector containing the oldest translations is emptied and reused. If these old translations are needed again, Valgrind must re-translate and re-instrument the corresponding machine code, which is expensive. If the "executed instructions" working set of a program is big, increasing the number of sectors may improve performance by reducing the number of re-translations needed. Sectors are allocated on demand. Once allocated, a sector can never be freed, and occupies considerable space, depending on the tool and the value of --avg-transtab-entry-size (about 40 MB per sector for Memcheck). Use the option --stats=yes to obtain precise information about the memory used by a sector and the allocation and recycling of sectors.

--avg-transtab-entry-size=<number> [default: 0, meaning use tool provided default] Average size of translated basic block. This average size is used to dimension the size of a sector. Each tool provides a default value to be used. If this default value is too small, the translation sectors will become full too quickly. If this default value is too big, a significant part of the translation sector memory will be unused. Note that the average size of a basic block translation depends on the tool, and might depend on tool options. For example, the memcheck option --track-origins=yes increases the size of the basic block translations. Use --avg-transtab-entry-size to tune the size of the sectors, either to gain memory or to avoid too many retranslations.

--aspace-minaddr=<address> [default: depends on the platform] To avoid potential conflicts with some system libraries, Valgrind does not use the address space below --aspace-minaddr value, keeping it reserved in case a library specifically requests memory in this region. So, some "pessimistic" value is guessed by Valgrind depending on the platform. On linux, by default, Valgrind avoids using the first 64MB even if typically there is no conflict in this complete zone. You can use the option --aspace-minaddr to have your memory hungry application benefitting from more of this lower memory. On the other hand, if you encounter a conflict, increasing aspace-minaddr value might solve it. Conflicts will typically manifest themselves with mmap failures in the low range of the address space. The provided address must be page aligned and must be equal or bigger to 0x1000 (4KB). To find the default value on your platform, do something such as valgrind -d -d date 2>&1 | grep -i minaddr. Values lower than 0x10000 (64KB) are known to create problems on some distributions.

--valgrind-stacksize=<number> [default: 1MB] For each thread, Valgrind needs its own 'private' stack. The default size for these stacks is largely dimensioned, and so should be sufficient in most cases. In case the size is too small, Valgrind will segfault. Before segfaulting, a warning might be produced by Valgrind when approaching the limit.

Use the option --valgrind-stacksize if such an (unlikely) warning is produced, or Valgrind dies due to a segmentation violation. Such segmentation violations have been seen when demangling huge C++ symbols.

If your application uses many threads and needs a lot of memory, you can gain some memory by reducing the size of these Valgrind stacks using the option --valgrind-stacksize.

--show-emwarns=<yes|no> [default: no] When enabled, Valgrind will emit warnings about its CPU emulation in certain cases. These are usually not interesting.

--require-text-symbol=:sonamepatt:fnnamepatt When a shared object whose soname matches sonamepatt is loaded into the process, examine all the text symbols it exports. If none of those match fnnamepatt, print an error message and abandon the run. This makes it possible to ensure that the run does not continue unless a given shared object contains a particular function name.

Both sonamepatt and fnnamepatt can be written using the usual ? and * wildcards. For example: ":*libc.so*:foo?bar". You may use characters other than a colon to separate the two patterns. It is only important that the first character and the separator character are the same. For example, the above example could also be written "Q*libc.so*Qfoo?bar". Multiple --require-text-symbol flags are allowed, in which case shared objects that are loaded into the process will be checked against all of them.

The purpose of this is to support reliable usage of marked-up libraries. For example, suppose we have a version of GCC's libgomp.so which has been marked up with annotations to support Helgrind. It is only too easy and confusing to load the wrong, un-annotated libgomp.so into the application. So the idea is: add a text symbol in the marked-up library, for example annotated_for_helgrind_3_6, and then give the flag --require-text-symbol=:*libgomp*so*:annotated_for_helgrind_3_6 so that when libgomp.so is loaded, Valgrind scans its symbol table, and if the symbol isn't present the run is aborted, rather than continuing silently with the un-marked-up library. Note that you should put the entire flag in quotes to stop shells expanding up the * and ? wildcards.

--soname-synonyms=syn1=pattern1,syn2=pattern2,... When a shared library is loaded, Valgrind checks for functions in the library that must be replaced or wrapped. For example, Memcheck replaces some string and memory functions (strchr, strlen, strcpy, memchr, memcpy, memmove, etc.) with its own versions. Such replacements are normally done only in shared libraries whose soname matches a predefined soname pattern (e.g. libc.so* on linux). By default, no replacement is done for a statically linked binary or for alternative libraries, except for the allocation functions (malloc, free, calloc, memalign, realloc, operator new, operator delete, etc.) Such allocation functions are intercepted by default in any shared library or in the executable if they are exported as global symbols. This means that if a replacement allocation library such as tcmalloc is found, its functions are also intercepted by default. In some cases, the replacements allow --soname-synonyms to specify one additional synonym pattern, giving flexibility in the replacement. Or to prevent interception of all public allocation symbols.

Currently, this flexibility is only allowed for the malloc related functions, using the synonym somalloc. This synonym is usable for all tools doing standard replacement of malloc related functions (e.g. memcheck, helgrind, drd, massif, dhat).

• Alternate malloc library: to replace the malloc related functions in a specific alternate library with soname mymalloclib.so (and not in any others), give the option --soname-synonyms=somalloc=mymalloclib.so. A pattern can be used to match multiple libraries sonames. For example, --soname-synonyms=somalloc=*tcmalloc* will match the soname of all variants of the tcmalloc library (native, debug, profiled, ... tcmalloc variants).

Note: the soname of a elf shared library can be retrieved using the readelf utility.

• Replacements in a statically linked library are done by using the NONE pattern. For example, if you link with libtcmalloc.a, and only want to intercept the malloc related functions in the executable (and standard libraries) themselves, but not any other shared libraries, you can give the option --soname-synonyms=somalloc=NONE. Note that a NONE pattern will match the main executable and any shared library having no soname.

• To run a "default" Firefox build for Linux, in which JEMalloc is linked in to the main executable, use --soname-synonyms=somalloc=NONE.

• To only intercept allocation symbols in the default system libraries, but not in any other shared library or the executable defining public malloc or operator new related functions use a non-existing library name like --soname-synonyms=somalloc=nouserintercepts (where nouserintercepts can be any non-existing library name).

• Shared library of the dynamic (runtime) linker is excluded from searching for global public symbols, such as those for the malloc related functions (identified by somalloc synonym).

--progress-interval=<number> [default: 0, meaning 'disabled'] This is an enhancement to Valgrind's debugging output. It is unlikely to be of interest to end users.

When number is set to a non-zero value, Valgrind will print a one-line progress summary every number seconds. Valid settings for number are between 0 and 3600 inclusive. Here's some example output with number set to 10:

PROGRESS: U 110s, W 113s, 97.3% CPU, EvC 414.79M, TIn 616.7k, TOut 0.5k, #thr 67 PROGRESS: U 120s, W 124s, 96.8% CPU, EvC 505.27M, TIn 636.6k, TOut 3.0k, #thr 64 PROGRESS: U 130s, W 134s, 97.0% CPU, EvC 574.90M, TIn 657.5k, TOut 3.0k, #thr 63

Each line shows:

U: total user time

W: total wallclock time

CPU: overall average cpu use

EvC: number of event checks. An event check is a backwards branch in the simulated program, so this is a measure of forward progress of the program

TIn: number of code blocks instrumented by the JIT

TOut: number of instrumented code blocks that have been thrown away

#thr: number of threads in the program

From the progress of these, it is possible to observe:

• when the program is compute bound (TIn rises slowly, EvC rises rapidly)

• when the program is in a spinloop (TIn/TOut fixed, EvC rises rapidly)

• when the program is JIT-bound (TIn rises rapidly)

• when the program is rapidly discarding code (TOut rises rapidly)

• when the program is about to achieve some expected state (EvC arrives at some value you expect)

• when the program is idling (U rises more slowly than W)