Reads and reports stat data from perf data file.
-i file, --input file
Input file name.
--per-socket
Aggregate counts per processor socket for system-wide mode
measurements.
--per-die
Aggregate counts per processor die for system-wide mode
measurements.
--per-core
Aggregate counts per physical processor for system-wide mode
measurements.
-M, --metrics
Print metrics or metricgroups specified in a comma separated
list. For a group all metrics from the group are added. The
events from the metrics are automatically measured. See perf
list output for the possble metrics and metricgroups.
-A, --no-aggr
Do not aggregate counts across all monitored CPUs.
--topdown
Print complete top-down metrics supported by the CPU. This
allows to determine bottle necks in the CPU pipeline for CPU
bound workloads, by breaking the cycles consumed down into
frontend bound, backend bound, bad speculation and retiring.
Frontend bound means that the CPU cannot fetch and decode
instructions fast enough. Backend bound means that computation or
memory access is the bottle neck. Bad Speculation means that the
CPU wasted cycles due to branch mispredictions and similar
issues. Retiring means that the CPU computed without an
apparently bottleneck. The bottleneck is only the real bottleneck
if the workload is actually bound by the CPU and not by something
else.
For best results it is usually a good idea to use it with
interval mode like -I 1000, as the bottleneck of workloads can
change often.
This enables --metric-only, unless overridden with
--no-metric-only.
The following restrictions only apply to older Intel CPUs and
Atom, on newer CPUs (IceLake and later) TopDown can be collected
for any thread:
The top down metrics are collected per core instead of per CPU
thread. Per core mode is automatically enabled and -a (global
monitoring) is needed, requiring root rights or
perf.perf_event_paranoid=-1.
Topdown uses the full Performance Monitoring Unit, and needs
disabling of the NMI watchdog (as root): echo 0 >
/proc/sys/kernel/nmi_watchdog for best results. Otherwise the
bottlenecks may be inconsistent on workload with changing phases.
To interpret the results it is usually needed to know on which
CPUs the workload runs on. If needed the CPUs can be forced using
taskset.
--td-level
Print the top-down statistics that equal to or lower than the
input level. It allows users to print the interested top-down
metrics level instead of the complete top-down metrics.
The availability of the top-down metrics level depends on the
hardware. For example, Ice Lake only supports L1 top-down
metrics. The Sapphire Rapids supports both L1 and L2 top-down
metrics.
Default: 0 means the max level that the current hardware support.
Error out if the input is higher than the supported max level.
--no-merge
Do not merge results from same PMUs.
When multiple events are created from a single event
specification, stat will, by default, aggregate the event counts
and show the result in a single row. This option disables that
behavior and shows the individual events and counts.
Multiple events are created from a single event specification
when: 1. Prefix or glob matching is used for the PMU name. 2.
Aliases, which are listed immediately after the Kernel PMU events
by perf list, are used.
--smi-cost
Measure SMI cost if msr/aperf/ and msr/smi/ events are
supported.
During the measurement, the /sys/device/cpu/freeze_on_smi will be
set to freeze core counters on SMI. The aperf counter will not be
effected by the setting. The cost of SMI can be measured by
(aperf - unhalted core cycles).
In practice, the percentages of SMI cycles is very useful for
performance oriented analysis. --metric_only will be applied by
default. The output is SMI cycles%, equals to (aperf - unhalted
core cycles) / aperf
Users who wants to get the actual value can apply
--no-metric-only.
--all-kernel
Configure all used events to run in kernel space.
--all-user
Configure all used events to run in user space.
--percore-show-thread
The event modifier "percore" has supported to sum up the
event counts for all hardware threads in a core and show the
counts per core.
This option with event modifier "percore" enabled also sums up
the event counts for all hardware threads in a core but show the
sum counts per hardware thread. This is essentially a replacement
for the any bit and convenient for post processing.
--summary
Print summary for interval mode (-I).
--no-csv-summary
Don't print summary at the first column for CVS summary
output. This option must be used with -x and --summary.
This option can be enabled in perf config by setting the variable
stat.no-csv-summary.
$ perf config stat.no-csv-summary=true