You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 12th Gen Intel(R) Core(TM) i7-1260P
│ - │
│ sec/op │
ParseVmlinux-16 50.47m ± ∞ ¹
¹ need >= 6 samples for confidence interval at level 0.95
│ - │
│ B/op │
ParseVmlinux-16 38.46Mi ± ∞ ¹
¹ need >= 6 samples for confidence interval at level 0.95
So ~50ms and 40MiB at current master, and this is after doing a bunch of optimization work. For this reason we cache vmlinux BTF in a global variable and try to reuse the same parsed Spec as much as possible. Otherwise features requiring vmlinux BTF like kconfig or ksyms are unbearably slow. I've made several attempts at improving the situation:
Parsing is expensive, but the parsed representation is also pretty large. Go doesn't have great tools for this, but an educated guess is that parsed vmlinux takes about 25MiB of RSS. Users have of course noticed this, and we've added an escape hatch to flush the cached BTF by calling linux.FlushCaches. Even that causes problems and confusion however, as shown by #1063. Tetragon was also impacted and had to chase down copies of vmlinux.
I've come to the conclusion that we're using the wrong approach. Instead of parsing the whole BTF upfront we should only "inflate" types that are needed by the user via Spec.TypeByID(), Spec.TypeByName() and so on. BTF wire format is already optimized for size, so we should refer to it as much as possible. @ti-mo suggested this a long time ago, but for reasons I can't remember I wasn't a fan. I think it's now time do to this, but getting there is going to take several refactors.
Split writing from reading BTF
Spec currently serves a dual purpose. You can use it to query for types you are interested in (the "vmlinux types" use case) and you can build BTF from scratch using NewSpec() and Spec.Add() (the "build map and program BTF" use case). Combining both use cases in a single type complicates the implementation. Crucially, it makes the changes I want to make for the vmlinux use case really difficult.
So as a first step, reading and writing BTF will be split into two separate types. Spec is used for the read case, and a new btf.Writer, btf.Builder, ... is used for writing. I've already prototyped this code and it's overall a net benefit. I'll propose the change even
if we decide against the rest of the changes.
Split reading from querying BTF
The constructor for Spec currently does a full parse of the vmlinux and turns all BTF wire format types into btf.Type. This is the really expensive part, since vmlinux contains around 130k types.
Instead, we will refactor the parsing logic so that we can pull Types out of wire format BTF one by one. This works by doing a single pass over the BTF and remembering at which offset each TypeID lives.
typeSpecstruct {
// Reader pointing at the start of the BTF type sectionbtf io.ReaderAt// To find the offset of a type we index into typeOffsets[ID-1]typeOffsets []uint32// other things like the string table
}
This change is purely internal and doesn't change exported API.
Memory cost of lazy decoding
Some napkin math to gauge how much memory we need in addition to the wire format BTF:
~130k types in vmlinux
~130k distinct strings, avg length 17
~70k distinct names
Not accounting for slice, string and map header overhead a reader will use 130000 * sizeof(uint32) + 70000 * (17 + sizeof(TypeID)) = 2451kb. So roughly 3MiB
which is 1/10th of what we currently pin.
On a buggy ubuntu this goes up to 15MB or so. Copying this into heap memory would double the memory used by vmlinux BTF. It would be nice if we could instead only read in the bits we are interested in, and otherwise rely on the page cache for fast access. We can do this either via regular os.File.Read or by mmaping the file. It's not clear which solution is better but both approaches have a huge drawback: we need to manage the lifecycle of the os.File or the mmap somehow.
The easiest solution is to add Spec.Close() which forwards to the underlying reader. This is problematic since there is plenty of existing
code which upon updating the library won't call Close(). We'll of course have a finalizer, but that
only closes the fd when there is memory pressure. This will cause problems in
cases where an application runs out of fds before triggering GC, for example if
ulimits have been set very low.
For this reason I want to come back to an idea @ti-mo and I were discussing a while ago: replace btf.LoadKernelSpec with a type that allows sharing a single BTF between multiple callers in a safe way. Let's call it btf.Cache:
typeCachestruct{}
// LoadCache parses the given file.funcLoadCache(*os.File) (*Cache, error)
// Acquire returns a new Spec.//// Callers must ensure that [CachedSpec.Release] is called on the returned value.func (*Cache) Acquire() *CachedSpec// Close waits until all [CachedSpec] have been released and then closes the underlying file descriptor.func (cs*CachedSpec) Close() errortypeCachedSpecstruct {
*Speccache*Cache
}
// Release the cached Spec.//// Must be called once the Spec is not required anymore.func (*CachedSpec) Release()
We'd then replace LoadKernelSpec with a function which returns CachedSpec and rewrite FlushKernelSpec to call Cache.Close + a bunch of atomics.
"Regular" BTF parsing from ELF would not use this mechanism. Instead we simply copy the whole into memory, as we do now. This means that map and program loading doesn't have to change.
The text was updated successfully, but these errors were encountered:
Maybe introducing a Spec.Close isn't so bad after all. We could use this as an opportunity to rename Spec to Types or similar and then tell users that the API is the same except they need to call Close() as appropriate. Breaking API helps to highlight the new semantics.
P.S. This isn't as easy after all, since CollectionSpec.Types would mean that CollectionSpec has to be closed.
Loading
vmlinux
BTF is currently very expensive:So ~50ms and 40MiB at current master, and this is after doing a bunch of optimization work. For this reason we cache vmlinux BTF in a global variable and try to reuse the same parsed Spec as much as possible. Otherwise features requiring vmlinux BTF like kconfig or ksyms are unbearably slow. I've made several attempts at improving the situation:
Parsing is expensive, but the parsed representation is also pretty large. Go doesn't have great tools for this, but an educated guess is that parsed vmlinux takes about 25MiB of RSS. Users have of course noticed this, and we've added an escape hatch to flush the cached BTF by calling
linux.FlushCaches
. Even that causes problems and confusion however, as shown by #1063. Tetragon was also impacted and had to chase down copies of vmlinux.I've come to the conclusion that we're using the wrong approach. Instead of parsing the whole BTF upfront we should only "inflate" types that are needed by the user via
Spec.TypeByID()
,Spec.TypeByName()
and so on. BTF wire format is already optimized for size, so we should refer to it as much as possible. @ti-mo suggested this a long time ago, but for reasons I can't remember I wasn't a fan. I think it's now time do to this, but getting there is going to take several refactors.Split writing from reading BTF
Spec
currently serves a dual purpose. You can use it to query for types you are interested in (the "vmlinux types" use case) and you can build BTF from scratch usingNewSpec()
andSpec.Add()
(the "build map and program BTF" use case). Combining both use cases in a single type complicates the implementation. Crucially, it makes the changes I want to make for the vmlinux use case really difficult.So as a first step, reading and writing BTF will be split into two separate types.
Spec
is used for the read case, and a newbtf.Writer
,btf.Builder
, ... is used for writing. I've already prototyped this code and it's overall a net benefit. I'll propose the change evenif we decide against the rest of the changes.
Split reading from querying BTF
The constructor for
Spec
currently does a full parse of the vmlinux and turns all BTF wire format types intobtf.Type
. This is the really expensive part, since vmlinux contains around 130k types.Instead, we will refactor the parsing logic so that we can pull
Type
s out of wire format BTF one by one. This works by doing a single pass over the BTF and remembering at which offset each TypeID lives.This change is purely internal and doesn't change exported API.
Memory cost of lazy decoding
Some napkin math to gauge how much memory we need in addition to the wire format BTF:
Not accounting for slice, string and map header overhead a reader will use
130000 * sizeof(uint32) + 70000 * (17 + sizeof(TypeID)) = 2451kb
. So roughly 3MiBwhich is 1/10th of what we currently pin.
Can we avoid copying wire-format BTF into memory?
On my machine vmlinux BTF is ~6MB:
On a buggy ubuntu this goes up to 15MB or so. Copying this into heap memory would double the memory used by vmlinux BTF. It would be nice if we could instead only read in the bits we are interested in, and otherwise rely on the page cache for fast access. We can do this either via regular
os.File.Read
or bymmap
ing the file. It's not clear which solution is better but both approaches have a huge drawback: we need to manage the lifecycle of theos.File
or the mmap somehow.The easiest solution is to add
Spec.Close()
which forwards to the underlying reader. This is problematic since there is plenty of existingcode which upon updating the library won't call Close(). We'll of course have a finalizer, but that
only closes the fd when there is memory pressure. This will cause problems in
cases where an application runs out of fds before triggering GC, for example if
ulimits have been set very low.
For this reason I want to come back to an idea @ti-mo and I were discussing a while ago: replace
btf.LoadKernelSpec
with a type that allows sharing a single BTF between multiple callers in a safe way. Let's call itbtf.Cache
:We'd then replace
LoadKernelSpec
with a function which returnsCachedSpec
and rewriteFlushKernelSpec
to callCache.Close
+ a bunch of atomics."Regular" BTF parsing from ELF would not use this mechanism. Instead we simply copy the whole into memory, as we do now. This means that map and program loading doesn't have to change.
The text was updated successfully, but these errors were encountered: