-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SMDK allocator allocates more memory than intended (compatible path) #31
Comments
Thank you for using SMDK. The SMDK Allocator is an extension of Jemalloc. For this reason, if you start the application through the SMDK Allocator and check the amount of memory used by the application, It looks like your application consumes more than it actually used. In addition, Jemalloc provides memory allocation service in their memory chunk only for small size objects (2MB in my memory, which can be wrong, so I recommend you to check it again personally), and calls mmap syscall immediately for large size objects (mallocs that want to allocate 5GB at once like your application). Therefore, in this case, it may appear that about 1GB, which is the memory allocated by Jemalloc for cache, remains. |
Can I get your test code and check script like your figures printed out Huge, Heap, Stack, Private... (does it using procp//smaps)? I've run under code in my system as you told, but I got a different result.
|
Thank you for testing the code. My code is almost the same as yours. I have also run the code you provided and found that exporting CXLMALLOC_CONF triggers the issue. If I export LD_PRELOAD only, the memory (jemalloc cache) accumulation does not occur. However, if I export CXLMALLOC_CONF, the memory accumulation does occur. For example, the following script does not result in the 1 GB memory accumulation.
However, the following script does result in the 1 GB memory accumulation:
If I'm using CXLMALLOC_CONF incorrectly, please let me know! |
I ran it in the same with the script you sent me, but in my case, it does not accumulate 1GB... |
I am using the 6.9.0-smdk kernel and Ubuntu 24.04 LTS. My server has two NUMA nodes, each equipped with DRAM, and one CXL device (sorry, but I’m unable to provide detailed hardware information about my system). Meanwhile, I find that this issue might not be related to the memory system but is somehow connected to the Docker container environment. However, when I run the script inside a Docker container, the aforementioned problems arise. You can run the test by executing the following commands:
(inside the container)
If the additional allocation does not occur in that environment, then this behavior might be an issue specific to my setup... |
Sorry for my late response. Because of another company task, I start to debug this problem from now on, and I will try to fix it as soon as possible then reply again. |
Thank you for your response and help! |
The key problem is that it is in the use_auto_arena_scaling config. Specifically, a function of 69 lines (get_auto_scale_target_arena) and 77 lines (get_normal_target_arena) of https://github.com/OpenMPDK/SMDK/blob/main/lib/smdk_allocator/core/init.c is applied according to use_auto_arena_scaling config. (If true, then get_auto_scale_target_arena, else get_normal_target_arena) These two are abstracted and used by a function called get_target_arena, which is used to determine which arena to allocate and use memory or which arena to return memory during malloc and free. The problem is that when the use_auto_arena_scaling config is false, get_normal_target_arena is used, which uses the arena to be used through in round-robin form (pool->arena_index++; in 82 Lines). For example, assuming that the number of cores of CPU 0 is 10, if the application that repeats malloc-free 20 times is executed through numeractl --cpunodebind=0, the first 10 times appear to have an accumulated memory, but the next 10 times do not generate an accumulated memory. Because they use the memory allocated in arena 0 again. For this problem, it is recommended to use the use_auto_arena_scaling config as true. Thank you for discovering this problems, and we need to discuss internally how to fix this problems. |
Thank you for your clear explanation. The figure is really helpful to understand. |
Hello Sangun-Choi! As SeungjunHa said, thanks to your report my team were able to find a issue and are discussing internally how to fix it. Can I close your issue now? |
Hi junhyeok-im! |
1. CXL Hotness Monitoring Unit Support (CXL 3.2.8.2.8) - The CXL Hotness Monitoring Unit (CHMU) is an interface that allows software running on CXL hosts to identify the ‘hot’ memory ranges in CXL memory devices in terms of memory access counts. - Added a CXL device driver to support CHMU and an emulation function based on QEMU. 2. Update NDCTL(CXL_CLI): v79 → v80 3. Bug Fix - Fix for memory leak issue occuring when setting the 'use_auto_arena_scaling' allocator option to 'false'. - When 'use_auto_arena_scaling option' is set to false, the number of arenas is 1 per pool) (When 'true', the arena is generated in proportion to the number of CPUs as before) - Reference: OpenMPDK#31 Signed-off-by: JunhyeokIm <junhyeok.im@samsung.com> Signed-off-by: WonjaeLee <wj28.lee@samsung.com> Signed-off-by: SeungjunHa <seungjun.ha@samsung.com> Signed-off-by: JehoonPark <jehoon.park@samsung.com> Signed-off-by: HojinNam <hj96.nam@samsung.com> Signed-off-by: YoungshinPark <yshin0.park@samsung.com> Signed-off-by: HeesooKim <habil.kim@samsung.com>
1. CXL Hotness Monitoring Unit Support (CXL 3.2.8.2.8) - The CXL Hotness Monitoring Unit (CHMU) is an interface that allows software running on CXL hosts to identify the ‘hot’ memory ranges in CXL memory devices in terms of memory access counts. - Added a CXL device driver to support CHMU and an emulation function based on QEMU. 2. Update NDCTL(CXL_CLI): v79 → v80 3. Bug Fix - Fix for memory leak issue occuring when setting the 'use_auto_arena_scaling' allocator option to 'false'. - When 'use_auto_arena_scaling option' is set to false, the number of arenas is 1 per pool) (When 'true', the arena is generated in proportion to the number of CPUs as before) - Reference: OpenMPDK#31 Signed-off-by: JunhyeokIm <junhyeok.im@samsung.com> Signed-off-by: WonjaeLee <wj28.lee@samsung.com> Signed-off-by: SeungjunHa <seungjun.ha@samsung.com> Signed-off-by: JehoonPark <jehoon.park@samsung.com> Signed-off-by: HojinNam <hj96.nam@samsung.com> Signed-off-by: YoungshinPark <yshin0.park@samsung.com> Signed-off-by: HeesooKim <habil.kim@samsung.com>
1. CXL Hotness Monitoring Unit Support (CXL 3.2.8.2.8) - The CXL Hotness Monitoring Unit (CHMU) is an interface that allows software running on CXL hosts to identify the ‘hot’ memory ranges in CXL memory devices in terms of memory access counts. - Added a CXL device driver to support CHMU and an emulation function based on QEMU. 2. Update NDCTL(CXL_CLI): v79 → v80 3. Bug Fix - Fix for memory leak issue occuring when setting the 'use_auto_arena_scaling' allocator option to 'false'. - When 'use_auto_arena_scaling option' is set to false, the number of arenas is 1 per pool) (When 'true', the arena is generated in proportion to the number of CPUs as before) - Reference: OpenMPDK#31 Signed-off-by: JunhyeokIm <junhyeok.im@samsung.com> Signed-off-by: WonjaeLee <wj28.lee@samsung.com> Signed-off-by: SeungjunHa <seungjun.ha@samsung.com> Signed-off-by: JehoonPark <jehoon.park@samsung.com> Signed-off-by: HojinNam <hj96.nam@samsung.com> Signed-off-by: YoungshinPark <yshin0.park@samsung.com> Signed-off-by: HeesooKim <habil.kim@samsung.com>
Dear SMDK contributors,
I run a very simple program that allocates 5 GB of memory using malloc to create an array, and then frees the array.
However, I notice an additional 1 GB memory allocation occurring with SMDK's compatible path.
I’m curious why this additional allocation occurs.
The C code is as follows:
Without loading the SMDK allocator library, the program behaves as expected. It allocates 5 GB of memory and then frees it. I monitor the program's memory usage with
numastat
.(5 GB malloc)
(after freeing)
After loading the SMDK allocator library, the program allocates 6 GB of memory.
Even after freeing the array, the program continues to use 1 GB of memory.
Also, if a program repeatedly performs 5 GB malloc and free, the additional 1 GB allocations are accumulated, and the program's memory usage continues to grow.
The text was updated successfully, but these errors were encountered: