-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance measurements: Add Global Performance Measurements #9186
Performance measurements: Add Global Performance Measurements #9186
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick look at a high level, nothing major. Could do with more comments around the larger code blocks.
96082f2
to
7542233
Compare
f56b3a1
to
a775f3f
Compare
src/debug/telemetry/telemetry.c
Outdated
int ret = sys_bitarray_alloc(bitmap->array, 1, offset); | ||
|
||
if (!ret) | ||
bitmap->occupied++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this bitmap have multiple client users ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. This is used in comp_new and it seems that those can be executed on other cores so I should just add a spinlock for good measure. To be fair, as this code is based on Zephyr's bitarray, it would be a better idea to just add occupied bit count to bitarray in Zephyr, as it already has spinlocks and everything. Getting this accepted may take a while though. Might need to update this later.
b3ec702
to
d117581
Compare
src/audio/component.c
Outdated
|
||
static uint32_t get_one_ms_in_bytes(const struct ipc4_audio_format fmt) | ||
{ | ||
#ifdef ROUND_UP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which cases is ROUND_UP
not defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some builds that don't recognize this. It is defined in Zephyr so I 'm guessing those builds don't use Zephyr (not sure which one it was). This makes it awkward, as I'd rather just use the macro , but it seems I need to expand it anyway because builds will fail otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you actually need SOF_DIV_ROUND_UP()
@@ -40,6 +40,7 @@ CONFIG_MM_DRV_INTEL_ADSP_TLB_REMAP_UNUSED_RAM=y | |||
CONFIG_AMS=y | |||
CONFIG_COUNTER=y | |||
CONFIG_SOF_TELEMETRY=y | |||
CONFIG_SOF_TELEMETRY_PERFORMANCE_MEASUREMENTS=y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with these changes performance measurements would be unconditionally enabled on MTL and LNL? Is this desirable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the performance measurement itself is enabled by IPC, so there's little to no overhead. But now that you mention it, disabling only CONFIG_SOF_TELEMETRY may break the build. It would be good if performance measurements could be enabled regardless of telemetry, I'll think about it.
if (perf_meas_get_state() == IPC4_PERF_MEASUREMENTS_STARTED) { | ||
/* we divide by ibs so we need to check if its set */ | ||
if (item && dev->ibs != 0) { | ||
item->total_iteration_count++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how often does this run? If this runs every millisecond, then in a couple of days this will overflow and the division below will divide by zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it may as well be every ms. Doubt it will be used for a couple of days , but still I should add some check here, true.
d117581
to
a299c75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very brief review (as we are middle of 2.10 release work), but code in general looks good, but I think the name should somehow reflect this is built on top of the the ADSP_DW infrastructure that is currently very specific to Intel/Zephyr SOF platforms. We have other such things (like mtrace),
Part of this is already in the original telemetry PR.
I think this could be improved by just naming it something less generic than "SOF telemetry".
#define ADSP_PMW ((volatile uint32_t *) \ | ||
(sys_cache_uncached_ptr_get((__sparse_force void __sparse_cache *) \ | ||
(WIN3_MBASE + WIN3_OFFSET)))) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but this is now very Intel specific, should this be called something else than debug/telemtry? debug/intel-telemetry.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved new functionality to separate files. I'm guessing we still want those to be called intel-performance-monitor, and intel-telemetry? Do we want to change the CONFIG_SOF_TELEMETRY and CONFIG_SOF_TELEMETRY_PERFORMANCE_MEASUREMENTS names too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can keep the current names. This is not strictly Intel specific either as other platforms could support this if similar memory windows to host can be exposed. We can do the rename when the first alternative telemetry implementation comes (if that uses other transport than the IPC4 memory window layout used here).
src/debug/telemetry/telemetry.c
Outdated
size_t slots_count; | ||
size_t slot_idx = 0; | ||
struct telemetry_wnd_data *wnd_data = | ||
(struct telemetry_wnd_data *)ADSP_DW->slots[SOF_DW_TELEMETRY_SLOT]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the file is pretty general otherwise, but this use of the memory window is Intel specific. This itself is ok, but I think it's a problem to call this "SOF debug telemetry" which suggests a generic mechanism, while in practise this only works on platforms that provide a very specific memory window structure (described in Zephyr soc/xtensa/intel_adsp/common/include/adsp-debug-window.h ).
7000d28
to
fd79ef0
Compare
src/audio/component.c
Outdated
|
||
static uint32_t get_one_ms_in_bytes(const struct ipc4_audio_format fmt) | ||
{ | ||
#ifdef ROUND_UP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you actually need SOF_DIV_ROUND_UP()
item->total_iteration_count = 1; | ||
tr_err(&ipc_tr, | ||
"overflow for module %#x, performance measurement incorrect", | ||
dev_comp_id(dev)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but like this after such an overflow they'll stay incorrect? Can we reset total_cycles_consumed
to start producing valid data again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we can either do that, or zero everything and set a flag to stop the measurement to further indicate that something went wrong.
fd79ef0
to
d43392f
Compare
src/audio/component.c
Outdated
{ | ||
/* TODO Reference Firmware also has systick multiplier and divider in this equation */ | ||
return get_sample_group_size_in_bytes(fmt) * | ||
(SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000) / 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think you need to divide by 1000 now any more, SOF_DIV_ROUND_UP()
already does it. And after you've removed it please also drop parentheses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The divide is part of the equation, it's correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the difference to your previous version:
i.e. basically you replaced (fmt.sampling_frequency + 999) / 1000
with SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000) / 1000
and that is wrong, because SOF_DIV_ROUND_UP()
is defined as
#define SOF_DIV_ROUND_UP(val, div) (((val) + (div) - 1) / (div))
and therefore already performs the division by 1000 for you. So, either your original formula was wrong and you actually wanted to divide by 1000000 and not 1000, or this your update is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original formula was wrong, yes. I should've mentioned it, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @tobonex the formula is not correct now:
- 16bit stereo 48000Hz example -> sample group size in bytes 4
- this function would return now: 4 * (48000 / 1000) / 1000 = 0.912 = 0
So I think this should be:
return get_sample_group_size_in_bytes(fmt) *
SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000);
Which gives you 4 * 48000 / 1000 = 192 bytes (which is correct. I think you missed SOF_DIV_ROUND_UP does actually do the dvision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok i get it now, ROUND_UP_DIV adds division COMPARED to ROUND_UP. I'll fix.
d43392f
to
bde8bfb
Compare
UPDATE: |
8f46fbc
to
d7c48fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This starts to be ready for merge. I think we could have a better tree-wide mechanism to identify features built on top of the ADSP_MW/DW (memory window or debug window) interface. But upon second thought, I don't want start renaming the files yet, let's wait whether we can make this generic for other platforms, and if not, make the names more specific later.
There is one issue w.r.t. 1ms-in-bytes calculation, can you @tobonex check that? Otherwise I'm good with this.
#define ADSP_PMW ((volatile uint32_t *) \ | ||
(sys_cache_uncached_ptr_get((__sparse_force void __sparse_cache *) \ | ||
(WIN3_MBASE + WIN3_OFFSET)))) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you can keep the current names. This is not strictly Intel specific either as other platforms could support this if similar memory windows to host can be exposed. We can do the rename when the first alternative telemetry implementation comes (if that uses other transport than the IPC4 memory window layout used here).
src/audio/component.c
Outdated
{ | ||
/* TODO Reference Firmware also has systick multiplier and divider in this equation */ | ||
return get_sample_group_size_in_bytes(fmt) * | ||
(SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000) / 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @tobonex the formula is not correct now:
- 16bit stereo 48000Hz example -> sample group size in bytes 4
- this function would return now: 4 * (48000 / 1000) / 1000 = 0.912 = 0
So I think this should be:
return get_sample_group_size_in_bytes(fmt) *
SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000);
Which gives you 4 * 48000 / 1000 = 192 bytes (which is correct. I think you missed SOF_DIV_ROUND_UP does actually do the dvision.
Implement wrapper to extend Zephyr's bitarray by adding a counter for allocated bits. This bitmap will be used to allocate performance data entries in memory window 3. Also adds new performance_monitor files. Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Add data to component struct for computing performance. Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Add temporary macro for accessing memory window 3 data. Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Add config to enable global performance measurements. Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Enable global performance measutements. Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implement global performance measurement which measure performance of .copy functions of multiple components. Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implement global performance data get ipc which extracts performance data from MW3 Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implement extended global performance data get ipc which extracts performance data from MW3 Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implement actual functionality for perf meas state ipc handling. This enables changing the state of global performance measurements. Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
d7c48fb
to
229033b
Compare
@marcinszkudlinski @abonislawski @tmleman @softwarecki @lgirdwood @lyakh ... this one is waiting for another review, otherwise ready to go. |
Adds global performance measurements feature. It measures performance of a limited number of module instances. This feature is disabled by default and enabled by an IPC message.