`perf-stat` ( 1 )

запустите команду и соберите статистику счетчика производительности (Run a command and gather performance counter statistics)

STAT REPORT

Reads and reports stat data from perf data file.

-i file, --input file Input file name.

--per-socket Aggregate counts per processor socket for system-wide mode measurements.

--per-die Aggregate counts per processor die for system-wide mode measurements.

--per-core Aggregate counts per physical processor for system-wide mode measurements.

-M, --metrics Print metrics or metricgroups specified in a comma separated list. For a group all metrics from the group are added. The events from the metrics are automatically measured. See perf list output for the possble metrics and metricgroups.

-A, --no-aggr Do not aggregate counts across all monitored CPUs.

--topdown Print complete top-down metrics supported by the CPU. This allows to determine bottle necks in the CPU pipeline for CPU bound workloads, by breaking the cycles consumed down into frontend bound, backend bound, bad speculation and retiring.

Frontend bound means that the CPU cannot fetch and decode instructions fast enough. Backend bound means that computation or memory access is the bottle neck. Bad Speculation means that the CPU wasted cycles due to branch mispredictions and similar issues. Retiring means that the CPU computed without an apparently bottleneck. The bottleneck is only the real bottleneck if the workload is actually bound by the CPU and not by something else.

For best results it is usually a good idea to use it with interval mode like -I 1000, as the bottleneck of workloads can change often.

This enables --metric-only, unless overridden with --no-metric-only.

The following restrictions only apply to older Intel CPUs and Atom, on newer CPUs (IceLake and later) TopDown can be collected for any thread:

The top down metrics are collected per core instead of per CPU thread. Per core mode is automatically enabled and -a (global monitoring) is needed, requiring root rights or perf.perf_event_paranoid=-1.

Topdown uses the full Performance Monitoring Unit, and needs disabling of the NMI watchdog (as root): echo 0 > /proc/sys/kernel/nmi_watchdog for best results. Otherwise the bottlenecks may be inconsistent on workload with changing phases.

To interpret the results it is usually needed to know on which CPUs the workload runs on. If needed the CPUs can be forced using taskset.

--td-level Print the top-down statistics that equal to or lower than the input level. It allows users to print the interested top-down metrics level instead of the complete top-down metrics.

The availability of the top-down metrics level depends on the hardware. For example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids supports both L1 and L2 top-down metrics.

Default: 0 means the max level that the current hardware support. Error out if the input is higher than the supported max level.

--no-merge Do not merge results from same PMUs.

When multiple events are created from a single event specification, stat will, by default, aggregate the event counts and show the result in a single row. This option disables that behavior and shows the individual events and counts.

Multiple events are created from a single event specification when: 1. Prefix or glob matching is used for the PMU name. 2. Aliases, which are listed immediately after the Kernel PMU events by perf list, are used.

--smi-cost Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.

During the measurement, the /sys/device/cpu/freeze_on_smi will be set to freeze core counters on SMI. The aperf counter will not be effected by the setting. The cost of SMI can be measured by (aperf - unhalted core cycles).

In practice, the percentages of SMI cycles is very useful for performance oriented analysis. --metric_only will be applied by default. The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf

Users who wants to get the actual value can apply --no-metric-only.

--all-kernel Configure all used events to run in kernel space.

--all-user Configure all used events to run in user space.

--percore-show-thread The event modifier "percore" has supported to sum up the event counts for all hardware threads in a core and show the counts per core.

This option with event modifier "percore" enabled also sums up the event counts for all hardware threads in a core but show the sum counts per hardware thread. This is essentially a replacement for the any bit and convenient for post processing.

--summary Print summary for interval mode (-I).

--no-csv-summary Don't print summary at the first column for CVS summary output. This option must be used with -x and --summary.

This option can be enabled in perf config by setting the variable stat.no-csv-summary.

$ perf config stat.no-csv-summary=true

Исходный текст на man7.org

perf-stat ( 1 )

STAT REPORT

`perf-stat` ( 1 )