Introducing bpftune for lightweight, always-on auto-tuning of system behaviour
TCP rmem, wmem, congestion control, oh my!
The Linux kernel contains more than 1,500 tunables – and setting these parameters correctly can significantly improve system performance and utilization! For years, we’ve tried to provide the right suggestions for these tunables, via software release notes and improved default values, but many system loads will benefit from dynamic tuning of these values.
Introducing bpftune, an automatic configurator that monitors your workloads and sets the correct kernel parameter values! bpftune is an open source project available via dnf install in the Oracle Linux ol_developer repos, and at https://github.com/oracle-samples/bpftune.
bpftune aims to provide lightweight, always-on auto-tuning of system behaviour. The key benefits it provides are:
Continuously monitoring and adjusting system behavior by using BPF (Berkeley Packet Filter) observability features.
Tuning system behavior at a fine-grained level, made possible since we can observe more details of system state using BPF.
It is currently focused on some of the most common issues with tunables we have run into at Oracle, but with a pluggable infrastructure that is open to contributions. We hope you find it useful too!
What can bpftune tune?
Congestion tuner: auto-tune choice of congestion control algorithm. See bpftune-tcp-cong (8).
Neighbour table tuner: auto-tune neighbour table sizes by growing tables when approaching full. See bpftune-neigh (8).
Route table tuner: auto-tune route table size by growing tables when approaching full. See bpftune-route (8).
sysctl tuner: monitor sysctl setting and if it collides with an auto-tuned sysctl value, disable the associated tuner. See bpftune-sysctl (8).
TCP buffer tuner: auto-tune max and initial buffer sizes. See bpftune-tcp-buffer (8).
net buffer tuner: auto-tune tunables related to core networking. See bpftune-net-buffer (8).
netns tuner: notices addition and removal of network namespaces, which helps power namespace awareness for bpftune as a whole. Namespace awareness is important as we want to be able to auto-tune containers also. See bpftune-netns (8).
The problem with tunables
Even as the number of sysctls in the kernel grows, individual systems get a lot less care and adminstrator attention than they used to; phrases like “cattle not pets” exemplify this. Given the modern cloud architectures used for most deployments, most systems never have any human adminstrator interaction after initial provisioning; in fact given the scale requirements, this is often an explicit design goal — “no ssh’ing in!”.
These two observations are not unrelated; in an earlier era of fewer, larger systems, tuning by administrators was more feasible.
These trends — system complexity combined with minimal admin interaction suggest a rethink in terms of tunable management.
A lot of lore accumulates around these tunables, and to help clarify why we developed bpftune, we will use a straw-man version of the approach taken with tunables:
“find the set of magic numbers that will work for the system forever”
This is obviously a caricature of how administrators approach the problem, but it does highlight a critical implicit assumption — that systems are static.
And that gets to the “BPF” in bpftune; BPF provides the means to carry out low-overhead observations of a system. So not only can we observe the system and tune appropriately, we can also observe the effect of that tuning and re-tune if necessary. This is a key feature of bpftune which we will return to.
- Comments