Introduce to Kernel Space Memory Profiler: memprofiling

Needone App
3 min read4 days ago

--

This profiler, completed by Kent Overstree and Suren Baghdasaryan from Google, provides a clear view of memory usage within the kernel space. It shows who allocated each memory block and how many objects were allocated for each caller:

Image

However, all profilers of this type face a problem: how to remember who allocated what. The recording cost should be as low as possible, ideally so low that it can be used not only during debugging but also in production environments.

The Memory allocation profiling patchset implements a code tagging library, which uses data structures to record the module name, file name, function name, and line number of each memory allocation caller:

Image

Since code tags are linear arrays, iterating over them is relatively simple:

Image

Further, an alloc_tag is added to the code tag:

Image

This is simply a counter added to the previous code tag: how many bytes were allocated at this location, and how many times was this code called. When the same code is hit again, the counter increments:

Image

The key part of this is DEFINE_ALLOC_TAG:

Image

Further expansion…

Image

So, the information contained in these tags is actually filled in during the pre-processing phase of compilation and not at runtime. It’s not dependent on debugging information to be parsed back into file names, function names, or line numbers.

In the subsequent processing paths for memory allocation and deallocation, alloc_tag’s add and sub operations are performed separately:

Image

Readers may wonder how to quickly determine the offset in the tag area for their alloc_tag. Does it require a quick iteration over the codetag area, comparing their tag’s function name, line number, etc., with some other tag and then getting its position in the tag array, finally adding 1 or subtracting 1 at the correct tag position?

This would be quite expensive. However, upon closer inspection of DEFINE_ALLOC_TAG, it is defined as a static struct alloc_tag _alloc_tag, and the key point is that it’s static and linked to a specific section: __section(ALLOC_TAG_SECTION_NAME). Therefore, by subtracting the section start address from their tag address, they can get an approximation of the tag’s position in the linear area.

Image

It’s clear that the implementation of Memory allocation profiling mainly relies on pre-processing and link-time tricks to reduce CPU overhead.

Image

References:

--

--

No responses yet