Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   perf-intel-pt    ( 1 )

поддержка Intel Processor Trace в инструментах perf (Support for Intel Processor Trace within perf tools)

Описание (Description)

Intel Processor Trace (Intel PT) is an extension of Intel Architecture that collects information about software execution such as control flow, execution modes and timings and formats it into highly compressed binary packets. Technical details are documented in the Intel 64 and IA-32 Architectures Software Developer Manuals, Chapter 36 Intel Processor Trace.

Intel PT is first supported in Intel Core M and 5th generation Intel Core processors that are based on the Intel micro-architecture code name Broadwell.

Trace data is collected by perf record and stored within the perf.data file. See below for options to perf record.

Trace data must be decoded which involves walking the object code and matching the trace data packets. For example a TNT packet only tells whether a conditional branch was taken or not taken, so to make use of that packet the decoder must know precisely which instruction was being executed.

Decoding is done on-the-fly. The decoder outputs samples in the same format as samples output by perf hardware events, for example as though the "instructions" or "branches" events had been recorded. Presently 3 tools support this: perf script, perf report and perf inject. See below for more information on using those tools.

The main distinguishing feature of Intel PT is that the decoder can determine the exact flow of software execution. Intel PT can be used to understand why and how did software get to a certain point, or behave a certain way. The software does not have to be recompiled, so Intel PT works with debug or release builds, however the executed images are needed - which makes use in JIT-compiled environments, or with self-modified code, a challenge. Also symbols need to be provided to make sense of addresses.

A limitation of Intel PT is that it produces huge amounts of trace data (hundreds of megabytes per second per core) which takes a long time to decode, for example two or three orders of magnitude longer than it took to collect. Another limitation is the performance impact of tracing, something that will vary depending on the use-case and architecture.