Путеводитель по Руководству Linux

  User  |  Syst  |  Libr  |  Device  |  Files  |  Other  |  Admin  |  Head  |



   pmie    ( 1 )

механизм вывода для показателей производительности (inference engine for performance metrics)

Примеры (Examples)

The following example expressions demonstrate some of the
       capabilities of the inference engine.

The directory $PCP_DEMOS_DIR/pmie contains a number of other annotated examples of pmie expressions.

The variable delta controls expression evaluation frequency. Specify that subsequent expressions be evaluated once a second, until further notice:

delta = 1 sec;

If the total context switch rate exceeds 10000 per second per CPU, then display an alarm notifier:

kernel.all.pswitch / hinv.ncpu > 10000 count/sec -> alarm "high context switch rate %v";

If the high context switch rate is sustained for 10 consecutive samples, then launch top(1) in an xterm(1) window to monitor processes, but do this at most once every 5 minutes:

all_sample ( kernel.all.pswitch @0..9 > 10 Kcount/sec * hinv.ncpu ) -> shell 5 min "xterm -e 'top'";

The following rules are evaluated once every 20 seconds:

delta = 20 sec;

If any disk is performing more than 60 I/Os per second, then print a message identifying the busy disk to standard output and launch dkvis(1):

some_inst ( disk.dev.total > 60 count/sec ) -> print "busy disks:" " %i" & shell 5 min "dkvis";

Refine the preceding rule to apply only between the hours of 9am and 5pm, and to require 3 of 4 consecutive samples to exceed the threshold before executing the action:

$hour >= 9 && $hour <= 17 && some_inst ( 75 %_sample ( disk.dev.total @0..3 > 60 count/sec ) ) -> print "disks busy for 20 sec:" " [%h]%i";

The following two rules are evaluated once every 10 minutes:

delta = 10 min;

If either the / or the /usr filesystem is more than 95% full, display an alarm popup, but not if it has already been displayed during the last 4 hours:

filesys.free #'/dev/root' / filesys.capacity #'/dev/root' < 0.05 -> alarm 4 hour "root filesystem (almost) full";

filesys.free #'/dev/usr' / filesys.capacity #'/dev/usr' < 0.05 -> alarm 4 hour "/usr filesystem (almost) full";

The following rule requires a machine that supports the lmsensors metrics. If the machine environment temperature rises more than 2 degrees over a 10 minute interval, write an entry in the system log:

lmsensors.coretemp_isa.temp1 @0 - lmsensors.coretemp_isa.temp1 @1 > 2 -> alarm "temperature rising fast" & syslog "machine room temperature rise alarm";

And something interesting if you have performance problems with your Oracle database:

// back to 30sec evaluations delta = 30 sec; sid = "ptg1"; # $ORACLE_SID setting lid = "223"; # latch ID from v$latch lru = "#'$sid/$lid cache buffers lru chain'"; host = ":moomba.melbourne.sgi.com"; gets = "oracle.latch.gets $host $lru"; total = "oracle.latch.gets $host $lru + oracle.latch.misses $host $lru + oracle.latch.immisses $host $lru";

$total > 100 && $gets / $total < 0.2 -> alarm "high lru latch contention in database $sid";

The following ruleset will emit exactly one message depending on the availability and value of the 1-minute load average.

delta = 1 minute; ruleset kernel.all.load #'1 minute' > 10 * hinv.ncpu -> print "extreme load average %v" else kernel.all.load #'1 minute' > 2 * hinv.ncpu -> print "moderate load average %v" unknown -> print "load average unavailable" otherwise -> print "load average OK" ;

The following rule will emit a message when some filesystem is more than 75% full and is filling at a rate that if sustained would fill the filesystem to 100% in less than 30 minutes.

some_inst ( 100 * filesys.used / filesys.capacity > 75 && filesys.used + 30min * (rate filesys.used) > filesys.capacity ) -> print "filesystem will be full within 30 mins:" " %i";

If the metric mypmda.errors counts errors then the following rule will emit a message if the rate of errors exceeds 1 per second provided the error count is less than 100.

mypmda.errors > 1 && instant mypmda.errors < 100 -> print "high error rate: %v";