кривая иерархического добросовестного обслуживания (Hierarchical Fair Service Curve)
LINUX AND TIMER RESOLUTION
In certain situations, the scheduler can throttle itself and
setup so called watchdog to wakeup dequeue function at some time
later. In case of HFSC it happens when for example no packet is
eligible for scheduling, and UL service curve is used to limit
the speed at which LS criterion is allowed to dequeue packets.
It's called throttling, and accuracy of it is dependent on how
the kernel is compiled.
There're 3 important options in modern kernels, as far as timers'
resolution goes: 'tickless system', 'high resolution timer
support' and 'timer frequency'.
If you have 'tickless system' enabled, then the timer interrupt
will trigger as slowly as possible, but each time a scheduler
throttles itself (or any other part of the kernel needs better
accuracy), the rate will be increased as needed / possible. The
ceiling is either 'timer frequency' if 'high resolution timer
support' is not available or not compiled in, or it's hardware
dependent and can go far beyond the highest 'timer frequency'
setting available.
If 'tickless system' is not enabled, the timer will trigger at a
fixed rate specified by 'timer frequency' - regardless if high
resolution timers are or aren't available.
This is important to keep those settings in mind, as in scenario
like: no tickless, no HR timers, frequency set to 100hz -
throttling accuracy would be at 10ms. It doesn't automatically
mean you would be limited to ~0.8Mbit/s (assuming packets at
~1KB) - as long as your queues are prepared to cover for timer
inaccuracy. Of course, in case of e.g. locally generated UDP
traffic - appropriate socket size is needed as well. Short
example to make it more understandable (assume hardcore
anti-schedule settings - HZ=100, no HR timers, no tickless):
tc qdisc add dev eth0 root handle 1:0 hfsc default 1
tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 10Mbit
Assuming packet of ~1KB size and HZ=100, that averages to
~0.8Mbit - anything beyond it (e.g. the above example with
specified rate over 10x larger) will require appropriate queuing
and cause bursts every ~10 ms. As you can imagine, any HFSC's RT
guarantees will be seriously invalidated by that. Aforementioned
example is mainly important if you deal with old hardware - as is
particularly popular for home server chores. Even then, you can
easily set HZ=1000 and have very accurate scheduling for typical
adsl speeds.
Anything modern (apic or even hpet msi based timers + 'tickless
system') will provide enough accuracy for superb 1Gbit
scheduling. For example, on one of my cheap dual-core AMD boards
I have the following settings:
tc qdisc add dev eth0 parent root handle 1:0 hfsc default 1
tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 300mbit
And a simple:
nc -u dst.host.com 54321 </dev/zero
nc -l -p 54321 >/dev/null
...will yield the following effects over a period of ~10 seconds
(taken from /proc/interrupts):
319: 42124229 0 HPET_MSI-edge hpet2 (before)
319: 42436214 0 HPET_MSI-edge hpet2 (after 10s.)
That's roughly 31000/s. Now compare it with HZ=1000 setting. The
obvious drawback of it is that cpu load can be rather high with
servicing that many timer interrupts. The example with 300Mbit RT
service curve on 1Gbit link is particularly ugly, as it requires
a lot of throttling with minuscule delays.
Also note that it's just an example showing the capabilities of
current hardware. The above example (essentially a 300Mbit TBF
emulator) is pointless on an internal interface to begin with:
you will pretty much always want a regular LS service curve
there, and in such a scenario HFSC simply doesn't throttle at
all.
300Mbit RT service curve (selected columns from mpstat -P ALL 1):
10:56:43 PM CPU %sys %irq %soft %idle
10:56:44 PM all 20.10 6.53 34.67 37.19
10:56:44 PM 0 35.00 0.00 63.00 0.00
10:56:44 PM 1 4.95 12.87 6.93 73.27
So, in the rare case you need those speeds with only a RT service
curve, or with a UL service curve: remember the drawbacks.