You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compared to controllers=1, there was almost no significant difference. The metric I focus on is the IPC corresponding to the CPU in the output file zsim.out.
(2) The second approach draws inspiration from the implementation of banshee[https://github.com/yxymit/banshee]. It involves creating four DDR channels in an array-like structure to form a multi-channel DDR (mcdram). Memory requests are then distributed across channels by performing a modulo operation based on the number of channels to determine which channel handles each request.
_mcdram = (MemObject **) gm_malloc(sizeof(MemObject *) * _mcdram_per_mc);
for (uint32_t i = 0; i < _mcdram_per_mc; i++) {
g_string mcdram_name = _name + g_string("-mc-") + g_string(to_string(i).c_str());
// ...
} elseif (_mcdram_type == "DDR") {
// XXX HACK tBL for mcdram is 1, so for data access, should multiply by 2, for tad access, should multiply by 3.
_mcdram[i] = BuildDDRMemory(config, frequency, domain, mcdram_name, "sys.mem.mcdram.", 1, timing_scale);
}//....
Unfortunately, I still observed almost identical performance (IPC) compared to the pure DDR setup with controllers=1.
To gain a deeper understanding of this issue, I referred to several past issues. For instance, I experimented with modifying tCK to increase bandwidth and adjusting tBL. While these changes had some effect, the improvements were not significant. I also examined the zsim-ndp[https://github.com/CriusT/zsim-ndp] implementation of MemChannel[https://github.com/CriusT/zsim-ndp/blob/master/src/mem_channel.cpp], but encountered similar performance challenges. I have also tried modifying the memory interleaving approach, but the results were still not good.
I added debugging information in the **trySchedule** function of **ddr_mem.cpp**. By comparing the debug output, I found that the two aforementioned methods for constructing multi-channel DDR systems exhibited almost identical r->arrivalCycle sequences. When timing parameters such as tBL were modified, only numerical differences appeared, but the pattern remained largely consistent.
I encountered a similar issue when using gem5 Simulator. This raises the question: are these discrete-event-driven simulators inherently limited in accurately simulating the parallelism achievable with multi-channel memory systems, particularly their ability to exploit high bandwidth through concurrency?
Thank you for any useful suggestions!
The text was updated successfully, but these errors were encountered:
I tried two approaches to utilize multiple channels.
(1) The first was simply setting
controllers=4
in thecfg
file.Compared to
controllers=1
, there was almost no significant difference. The metric I focus on is the IPC corresponding to the CPU in the output filezsim.out
.(2) The second approach draws inspiration from the implementation of
banshee
[https://github.com/yxymit/banshee
]. It involves creating four DDR channels in an array-like structure to form a multi-channel DDR (mcdram
). Memory requests are then distributed across channels by performing a modulo operation based on the number of channels to determine which channel handles each request.Unfortunately, I still observed almost identical performance (IPC) compared to the pure DDR setup with controllers=1.
To gain a deeper understanding of this issue, I referred to several past issues. For instance, I experimented with modifying tCK to increase bandwidth and adjusting tBL. While these changes had some effect, the improvements were not significant. I also examined the
zsim-ndp
[https://github.com/CriusT/zsim-ndp
] implementation of MemChannel[https://github.com/CriusT/zsim-ndp/blob/master/src/mem_channel.cpp
], but encountered similar performance challenges. I have also tried modifying the memory interleaving approach, but the results were still not good.I added debugging information in the
**trySchedule**
function of**ddr_mem.cpp**
. By comparing the debug output, I found that the two aforementioned methods for constructing multi-channel DDR systems exhibited almost identicalr->arrivalCycle
sequences. When timing parameters such as tBL were modified, only numerical differences appeared, but the pattern remained largely consistent.I encountered a similar issue when using
gem5
Simulator. This raises the question: are these discrete-event-driven simulators inherently limited in accurately simulating the parallelism achievable with multi-channel memory systems, particularly their ability to exploit high bandwidth through concurrency?Thank you for any useful suggestions!
The text was updated successfully, but these errors were encountered: