By default, perf script will decode trace data found in the
perf.data file. This can be further controlled by new option
--itrace.
New --itrace option
Having no option is the same as
--itrace
which, in turn, is the same as
--itrace=cepwx
The letters are:
i synthesize "instructions" events
b synthesize "branches" events
x synthesize "transactions" events
w synthesize "ptwrite" events
p synthesize "power" events (incl. PSB events)
c synthesize branches events (calls only)
r synthesize branches events (returns only)
e synthesize tracing error events
d create a debug log
g synthesize a call chain (use with i or x)
G synthesize a call chain on existing event records
l synthesize last branch entries (use with i or x)
L synthesize last branch entries on existing event records
s skip initial number of events
q quicker (less detailed) decoding
Z prefer to ignore timestamps (so-called "timeless" decoding)
"Instructions" events look like they were recorded by "perf
record -e instructions".
"Branches" events look like they were recorded by "perf record -e
branches". "c" and "r" can be combined to get calls and returns.
"Transactions" events correspond to the start or end of
transactions. The flags field can be used in perf script to
determine whether the event is a tranasaction start, commit or
abort.
Note that "instructions", "branches" and "transactions" events
depend on code flow packets which can be disabled by using the
config term "branch=0". Refer to the config terms section above.
"ptwrite" events record the payload of the ptwrite instruction
and whether "fup_on_ptw" was used. "ptwrite" events depend on
PTWRITE packets which are recorded only if the "ptw" config term
was used. Refer to the config terms section above. perf script
"synth" field displays "ptwrite" information like this: "ip: 0
payload: 0x123456789abcdef0" where "ip" is 1 if "fup_on_ptw" was
used.
"Power" events correspond to power event packets and CBR
(core-to-bus ratio) packets. While CBR packets are always
recorded when tracing is enabled, power event packets are
recorded only if the "pwr_evt" config term was used. Refer to the
config terms section above. The power events record information
about C-state changes, whereas CBR is indicative of CPU
frequency. perf script "event,synth" fields display information
like this: cbr: cbr: 22 freq: 2189 MHz (200%) mwait: hints: 0x60
extensions: 0x1 pwre: hw: 0 cstate: 2 sub-cstate: 0 exstop: ip: 1
pwrx: deepest cstate: 2 last cstate: 2 wake reason: 0x4 Where:
"cbr" includes the frequency and the percentage of maximum
non-turbo "mwait" shows mwait hints and extensions "pwre" shows
C-state transitions (to a C-state deeper than C0) and whether
initiated by hardware "exstop" indicates execution stopped and
whether the IP was recorded exactly, "pwrx" indicates return to
C0 For more details refer to the Intel 64 and IA-32 Architectures
Software Developer Manuals.
PSB events show when a PSB+ occurred and also the byte-offset in
the trace. Emitting a PSB+ can cause a CPU a slight delay. When
doing timing analysis of code with Intel PT, it is useful to know
if a timing bubble was caused by Intel PT or not.
Error events show where the decoder lost the trace. Error events
are quite important. Users must know if what they are seeing is a
complete picture or not. The "e" option may be followed by flags
which affect what errors will or will not be reported. Each flag
must be preceded by either + or -. The flags supported by Intel
PT are: -o Suppress overflow errors -l Suppress trace data lost
errors For example, for errors but not overflow or data lost
errors:
--itrace=e-o-l
The "d" option will cause the creation of a file "intel_pt.log"
containing all decoded packets and instructions. Note that this
option slows down the decoder and that the resulting file may be
very large. The "d" option may be followed by flags which affect
what debug messages will or will not be logged. Each flag must be
preceded by either + or -. The flags support by Intel PT are: -a
Suppress logging of perf events +a Log all perf events By
default, logged perf events are filtered by any specified time
ranges, but flag +a overrides that.
In addition, the period of the "instructions" event can be
specified. e.g.
--itrace=i10us
sets the period to 10us i.e. one instruction sample is
synthesized for each 10 microseconds of trace. Alternatives to
"us" are "ms" (milliseconds), "ns" (nanoseconds), "t" (TSC ticks)
or "i" (instructions).
"ms", "us" and "ns" are converted to TSC ticks.
The timing information included with Intel PT does not give the
time of every instruction. Consequently, for the purpose of
sampling, the decoder estimates the time since the last timing
packet based on 1 tick per instruction. The time on the sample is
not
adjusted and reflects the last known value of TSC.
For Intel PT, the default period is 100us.
Setting it to a zero period means "as often as possible".
In the case of Intel PT that is the same as a period of 1 and a
unit of instructions (i.e. --itrace=i1i).
Also the call chain size (default 16, max. 1024) for instructions
or transactions events can be specified. e.g.
--itrace=ig32
--itrace=xg32
Also the number of last branch entries (default 64, max. 1024)
for instructions or transactions events can be specified. e.g.
--itrace=il10
--itrace=xl10
Note that last branch entries are cleared for each sample, so
there is no overlap from one sample to the next.
The G and L options are designed in particular for sample mode,
and work much like g and l but add call chain and branch stack to
the other selected events instead of synthesized events. For
example, to record branch-misses events for ls and then add a
call chain derived from the Intel PT trace:
perf record --aux-sample -e '{intel_pt//u,branch-misses:u}' -- ls
perf report --itrace=Ge
Although in fact G is a default for perf report, so that is the
same as just:
perf report
One caveat with the G and L options is that they work poorly with
"Large PEBS". Large PEBS means PEBS records will be accumulated
by hardware and the written into the event buffer in one go. That
reduces interrupts, but can give very late timestamps. Because
the Intel PT trace is synchronized by timestamps, the PEBS events
do not match the trace. Currently, Large PEBS is used only in
certain circumstances: - hardware supports it - PEBS is used -
event period is specified, instead of frequency - the sample type
is limited to the following flags: PERF_SAMPLE_IP |
PERF_SAMPLE_TID | PERF_SAMPLE_ADDR | PERF_SAMPLE_ID |
PERF_SAMPLE_CPU | PERF_SAMPLE_STREAM_ID | PERF_SAMPLE_DATA_SRC |
PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_TRANSACTION |
PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_REGS_INTR |
PERF_SAMPLE_REGS_USER | PERF_SAMPLE_PERIOD (and sometimes) |
PERF_SAMPLE_TIME Because Intel PT sample mode uses a different
sample type to the list above, Large PEBS is not used with Intel
PT sample mode. To avoid Large PEBS in other cases, avoid
specifying the event period i.e. avoid the perf record -c option,
--count option, or period config term.
To disable trace decoding entirely, use the option --no-itrace.
It is also possible to skip events generated (instructions,
branches, transactions) at the beginning. This is useful to
ignore initialization code.
--itrace=i0nss1000000
skips the first million instructions.
The q option changes the way the trace is decoded. The decoding
is much faster but much less detailed. Specifically, with the q
option, the decoder does not decode TNT packets, and does not
walk object code, but gets the ip from FUP and TIP packets. The q
option can be used with the b and i options but the period is not
used. The q option decodes more quickly, but is useful only if
the control flow of interest is represented or indicated by FUP,
TIP, TIP.PGE, or TIP.PGD packets (refer below). However the q
option could be used to find time ranges that could then be
decoded fully using the --time option.
What will not
be decoded with the (single) q option:
• direct calls and jmps
• conditional branches
• non-branch instructions
What will
be decoded with the (single) q option:
• asynchronous branches such as interrupts
• indirect branches
• function return target address if
the noretcomp config term
(refer config terms section) was used
• start of (control-flow) tracing
• end of (control-flow) tracing, if it is not out of context
• power events, ptwrite, transaction start and abort
• instruction pointer associated with PSB packets
Note the q option does not specify what events will be
synthesized e.g. the p option must be used also to show power
events.
Repeating the q option (double-q i.e. qq) results in even faster
decoding and even less detail. The decoder decodes only extended
PSB (PSB+) packets, getting the instruction pointer if there is a
FUP packet within PSB+ (i.e. between PSB and PSBEND). Note PSB
packets occur regularly in the trace based on the psb_period
config term (refer config terms section). There will be a FUP
packet if the PSB+ occurs while control flow is being traced.
What will not
be decoded with the qq option:
• everything except instruction pointer associated with PSB
packets
What will
be decoded with the qq option:
• instruction pointer associated with PSB packets
The Z option is equivalent to having recorded a trace without TSC
(i.e. config term tsc=0). It can be useful to avoid timestamp
issues when decoding a trace of a virtual machine.
dump option
perf script has an option (-D) to "dump" the events i.e. display
the binary data.
When -D is used, Intel PT packets are displayed. The packet
decoder does not pay attention to PSB packets, but just decodes
the bytes - so the packets seen by the actual decoder may not be
identical in places where the data is corrupt. One example of
that would be when the buffer-switching interrupt has been too
slow, and the buffer has been filled completely. In that case,
the last packet in the buffer might be truncated and immediately
followed by a PSB as the trace continues in the next buffer.
To disable the display of Intel PT packets, combine the -D option
with --no-itrace.