Introduce to Kernel Space Memory Profiler: memprofiling
This profiler, completed by Kent Overstree and Suren Baghdasaryan from Google, provides a clear view of memory usage within the kernel space. It shows who allocated each memory block and how many objects were allocated for each caller:
However, all profilers of this type face a problem: how to remember who allocated what. The recording cost should be as low as possible, ideally so low that it can be used not only during debugging but also in production environments.
The Memory allocation profiling patchset implements a code tagging library, which uses data structures to record the module name, file name, function name, and line number of each memory allocation caller:
Since code tags are linear arrays, iterating over them is relatively simple:
Further, an alloc_tag is added to the code tag:
This is simply a counter added to the previous code tag: how many bytes were allocated at this location, and how many times was this code called. When the same code is hit again, the counter increments:
The key part of this is DEFINE_ALLOC_TAG:
Further expansion…
So, the information contained in these tags is actually filled in during the pre-processing phase of compilation and not at runtime. It’s not dependent on debugging information to be parsed back into file names, function names, or line numbers.
In the subsequent processing paths for memory allocation and deallocation, alloc_tag’s add and sub operations are performed separately:
Readers may wonder how to quickly determine the offset in the tag area for their alloc_tag. Does it require a quick iteration over the codetag area, comparing their tag’s function name, line number, etc., with some other tag and then getting its position in the tag array, finally adding 1 or subtracting 1 at the correct tag position?
This would be quite expensive. However, upon closer inspection of DEFINE_ALLOC_TAG, it is defined as a static struct alloc_tag _alloc_tag, and the key point is that it’s static and linked to a specific section: __section(ALLOC_TAG_SECTION_NAME). Therefore, by subtracting the section start address from their tag address, they can get an approximation of the tag’s position in the linear area.
It’s clear that the implementation of Memory allocation profiling mainly relies on pre-processing and link-time tricks to reduce CPU overhead.
References: