расширенный мониторинг системы и процессов (Advanced System and Process Monitor)
OUTPUT DESCRIPTION - SYSTEM LEVEL
The system level information consists of the following output
lines:
PRC
Process and thread level totals.
This line contains the total cpu time consumed in system
mode (`sys') and in user mode (`user'), the total number of
processes present at this moment (`#proc'), the total number
of threads present at this moment in state `running'
(`#trun'), `sleeping interruptible' (`#tslpi') and `sleeping
uninterruptible' (`#tslpu'), the number of zombie processes
(`#zombie'), the number of clone system calls (`clones'),
and the number of processes that ended during the interval
(`#exit') when process accounting is used. Instead of
`#exit` the last column may indicate that process accounting
could not be activated (`no procacct`).
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
CPU
CPU utilization.
At least one line is shown for the total occupation of all
CPUs together.
In case of a multi-processor system, an additional line is
shown for every individual processor (with `cpu' in lower
case), sorted on activity. Inactive CPUs will not be shown
by default. The lines showing the per-cpu occupation
contain the cpu number in the field combined with the wait
percentage.
Every line contains the percentage of cpu time spent in
kernel mode by all active processes (`sys'), the percentage
of cpu time consumed in user mode (`user') for all active
processes (including processes running with a nice value
larger than zero), the percentage of cpu time spent for
interrupt handling (`irq') including softirq, the percentage
of unused cpu time while no processes were waiting for disk
I/O (`idle'), and the percentage of unused cpu time while at
least one process was waiting for disk I/O (`wait').
In case of per-cpu occupation, the cpu number and the wait
percentage (`w') for that cpu. The number of lines showing
the per-cpu occupation can be limited.
For virtual machines, the steal-percentage (`steal') shows
the percentage of cpu time stolen by other virtual machines
running on the same hardware.
For physical machines hosting one or more virtual machines,
the guest-percentage (`guest') shows the percentage of cpu
time used by the virtual machines. Notice that this
percentage overlaps the user-percentage!
When PMC performance monitoring counters are supported by
the CPU and the kernel (and pmdaperfevent(1) runs with root
privileges), the number of instructions per CPU cycle
(`ipc') is shown. The first sample always shows the value
'initial', because the counters are just activated at the
moment that pcp-atop
is started.
When the CPU busy percentage is high and the IPC is less
than 1.0, it is likely that the CPU is frequently waiting
for memory access during instruction execution (larger CPU
caches or faster memory might be helpful to improve
performance). When the CPU busy percentage is high and the
IPC is greater than 1.0, it is likely that the CPU is
instruction-bound (more/faster cores might be helpful to
improve performance).
Furthermore, per CPU the effective number of cycles (`cycl')
is shown. This value can reach the current CPU frequency if
such CPU is 100% busy. When an idle CPU is halted, the
number of effective cycles can be (considerably) lower than
the current frequency.
Notice that the average instructions per cycle and number of
cycles is shown in the CPU line for all CPUs.
See also:
http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html
In case of frequency scaling, all previously mentioned CPU
percentages are relative to the used scaling of the CPU
during the interval. If a CPU has been active for e.g. 50%
in user mode during the interval while the frequency scaling
of that CPU was 40%, only 20% of the full capacity of the
CPU has been used in user mode.
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
CPL
CPU load information.
This line contains the load average figures reflecting the
number of threads that are available to run on a CPU (i.e.
part of the runqueue) or that are waiting for disk I/O.
These figures are averaged over 1 (`avg1'), 5 (`avg5') and
15 (`avg15') minutes.
Furthermore the number of context switches (`csw'), the
number of serviced interrupts (`intr') and the number of
available CPUs are shown.
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
GPU
GPU utilization (Nvidia).
Read the section GPU STATISTICS GATHERING in this document
to find the details about the activation of the pmdanvidia
daemon.
In the first column of every line, the bus-id (last nine
characters) and the GPU number are shown. The subsequent
columns show the percentage of time that one or more kernels
were executing on the GPU (`gpubusy'), the percentage of
time that global (device) memory was being read or written
(`membusy'), the occupation percentage of memory (`memocc'),
the total memory (`total'), the memory being in use at the
moment of the sample (`used'), the average memory being in
use during the sample time (`usavg'), the number of
processes being active on the GPU at the moment of the
sample (`#proc'), and the type of GPU.
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
The number of lines showing the GPUs can be limited.
MEM
Memory occupation.
This line contains the total amount of physical memory
(`tot'), the amount of memory which is currently free
(`free'), the amount of memory in use as page cache
including the total resident shared memory (`cache'), the
amount of memory within the page cache that has to be
flushed to disk (`dirty'), the amount of memory used for
filesystem meta data (`buff'), the amount of memory being
used for kernel mallocs (`slab'), the amount of slab memory
that is reclaimable (`slrec'), the resident size of shared
memory including tmpfs (`shmem`), the resident size of
shared memory (`shrss`) the amount of shared memory that is
currently swapped (`shswp`), the amount of memory that is
currently claimed by vmware's balloon driver (`vmbal`), the
amount of memory that is currently claimed by the ARC
(cache) of ZFSonlinux (`zfarc`), the amount of memory that
is claimed for huge pages (`hptot`), and the amount of huge
page memory that is really in use (`hpuse`).
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
SWP
Swap occupation and overcommit info.
This line contains the total amount of swap space on disk
(`tot') and the amount of free swap space (`free'), the size
of the swap cache (`swcac'), the total size of compressed
storage in zswap (`zpool`), the total size of the compressed
pages stored in zswap (`zstor'), the total size of the
memory used for KSM (`ksuse`, i.e. shared), and the total
size of the memory saved (deduped) by KSM (`kssav`, i.e.
sharing).
Furthermore the committed virtual memory space (`vmcom') and
the maximum limit of the committed space (`vmlim', which is
by default swap size plus 50% of memory size) is shown. The
committed space is the reserved virtual space for all
allocations of private memory space for processes. The
kernel only verifies whether the committed space exceeds the
limit if strict overcommit handling is configured
(vm.overcommit_memory is 2).
PAG
Paging frequency.
This line contains the number of scanned pages (`scan') due
to the fact that free memory drops below a particular
threshold and the number times that the kernel tries to
reclaim pages due to an urgent need (`stall').
Also the number of memory pages the system read from swap
space (`swin') and the number of memory pages the system
wrote to swap space (`swout') and the number of OOM (out-of-
memory) kills (`oomkill') are shown.
PSI
Pressure Stall Information.
This line contains percentages about resource pressure
related to CPU, memory and I/O. Certain percentages refer to
'some' meaning that some processes/threads were delayed due
to resource overload. Other percentages refer to 'full'
meaning a loss of overall throughput due to resource
overload.
The values `cpusome', `memsome', `memfull', `iosome' and
`iofull' show the pressure percentage during the entire
interval.
The values `cs' (cpu some), `ms' (memory some), `mf' (memory
full), `is' (I/O some) and `if' (I/O full) each show three
percentages separated by slashes: pressure percentage over
the last 10, 60 and 300 seconds.
LVM/MDD/DSK
Logical volume/multiple device/disk utilization.
Per active unit one line is produced, sorted on unit
activity. Such line shows the name (e.g. VolGroup00-lvtmp
for a logical volume or sda for a hard disk), the busy
percentage i.e. the portion of time that the unit was busy
handling requests (`busy'), the number of read requests
issued (`read'), the number of write requests issued
(`write'), the number of KiBytes per read (`KiB/r'), the
number of KiBytes per write (`KiB/w'), the number of MiBytes
per second throughput for reads (`MBr/s'), the number of
MiBytes per second throughput for writes (`MBw/s'), the
average queue depth (`avq') and the average number of
milliseconds needed by a request (`avio') for seek, latency
and data transfer.
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
The number of lines showing the units can be limited per
class (LVM, MDD or DSK) with the 'l' key or statically (see
separate man-page of pcp-atoprc(5)). By specifying the
value 0 for a particular class, no lines will be shown any
more for that class.
NFM
Network Filesystem (NFS) mount at the client side.
For each NFS-mounted filesystem, a line is shown that
contains the mounted server directory, the name of the
server (`srv'), the total number of bytes physically read
from the server (`read') and the total number of bytes
physically written to the server (`write'). Data transfer
is subdivided in the number of bytes read via normal read()
system calls (`nread'), the number of bytes written via
normal read() system calls (`nwrit'), the number of bytes
read via direct I/O (`dread'), the number of bytes written
via direct I/O (`dwrit'), the number of bytes read via
memory mapped I/O pages (`mread'), and the number of bytes
written via memory mapped I/O pages (`mwrit').
NFC
Network Filesystem (NFS) client side counters.
This line contains the number of RPC calls issues by local
processes (`rpc'), the number of read RPC calls (`read`) and
write RPC calls (`rpwrite') issued to the NFS server, the
number of RPC calls being retransmitted (`retxmit') and the
number of authorization refreshes (`autref').
NFS
Network Filesystem (NFS) server side counters.
This line contains the number of RPC calls received from NFS
clients (`rpc'), the number of read RPC calls received
(`cread`), the number of write RPC calls received (`cwrit'),
the number of Megabytes/second returned to read requests by
clients (`MBcr/s`), the number of Megabytes/second passed in
write requests by clients (`MBcw/s`), the number of network
requests handled via TCP (`nettcp'), the number of network
requests handled via UDP (`netudp'), the number of reply
cache hits (`rchits'), the number of reply cache misses
(`rcmiss') and the number of uncached requests (`rcnoca').
Furthermore some error counters indicating the number of
requests with a bad format (`badfmt') or a bad authorization
(`badaut'), and a counter indicating the number of bad
clients (`badcln').
NET
Network utilization (TCP/IP).
One line is shown for activity of the transport layer (TCP
and UDP), one line for the IP layer and one line per active
interface.
For the transport layer, counters are shown concerning the
number of received TCP segments including those received in
error (`tcpi'), the number of transmitted TCP segments
excluding those containing only retransmitted octets
(`tcpo'), the number of UDP datagrams received (`udpi'), the
number of UDP datagrams transmitted (`udpo'), the number of
active TCP opens (`tcpao'), the number of passive TCP opens
(`tcppo'), the number of TCP output retransmissions
(`tcprs'), the number of TCP input errors (`tcpie'), the
number of TCP output resets (`tcpor'), the number of UDP no
ports (`udpnp'), and the number of UDP input errors
(`udpie').
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
These counters are related to IPv4 and IPv6 combined.
For the IP layer, counters are shown concerning the number
of IP datagrams received from interfaces, including those
received in error (`ipi'), the number of IP datagrams that
local higher-layer protocols offered for transmission
(`ipo'), the number of received IP datagrams which were
forwarded to other interfaces (`ipfrw'), the number of IP
datagrams which were delivered to local higher-layer
protocols (`deliv'), the number of received ICMP datagrams
(`icmpi'), and the number of transmitted ICMP datagrams
(`icmpo').
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
These counters are related to IPv4 and IPv6 combined.
For every active network interface one line is shown, sorted
on the interface activity. Such line shows the name of the
interface and its busy percentage in the first column. The
busy percentage for half duplex is determined by comparing
the interface speed with the number of bits transmitted and
received per second; for full duplex the interface speed is
compared with the highest of either the transmitted or the
received bits. When the interface speed can not be
determined (e.g. for the loopback interface), `---' is shown
instead of the percentage.
Furthermore the number of received packets (`pcki'), the
number of transmitted packets (`pcko'), the line speed of
the interface (`sp'), the effective amount of bits received
per second (`si'), the effective amount of bits transmitted
per second (`so'), the number of collisions (`coll'), the
number of received multicast packets (`mlti'), the number of
errors while receiving a packet (`erri'), the number of
errors while transmitting a packet (`erro'), the number of
received packets dropped (`drpi'), and the number of
transmitted packets dropped (`drpo').
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
The number of lines showing the network interfaces can be
limited.
IFB
Infiniband utilization.
For every active Infiniband port one line is shown, sorted
on activity. Such line shows the name of the port and its
busy percentage in the first column. The busy percentage is
determined by taking the highest of either the transmitted
or the received bits during the interval, multiplying that
value by the number of lanes and comparing it against the
maximum port speed.
Furthermore the number of received packets divided by the
number of lanes (`pcki'), the number of transmitted packets
divided by the number of lanes (`pcko'), the maximum line
speed (`sp'), the effective amount of bits received per
second (`si'), the effective amount of bits transmitted per
second (`so'), and the number of lanes (`lanes').
If the screen-width does not allow all of these counters,
only a relevant subset is shown.
The number of lines showing the Infiniband ports can be
limited.