Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance measurements: Add Global Performance Measurements #9186

Merged
merged 9 commits into from
Jul 10, 2024

Conversation

tobonex
Copy link
Contributor

@tobonex tobonex commented Jun 3, 2024

Adds global performance measurements feature. It measures performance of a limited number of module instances. This feature is disabled by default and enabled by an IPC message.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick look at a high level, nothing major. Could do with more comments around the larger code blocks.

src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/audio/component.c Outdated Show resolved Hide resolved
src/include/sof/debug/telemetry/telemetry.h Outdated Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
@tobonex tobonex force-pushed the telemetry_global_perf_final branch from 96082f2 to 7542233 Compare June 3, 2024 16:24
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/audio/component.c Outdated Show resolved Hide resolved
src/audio/component.c Outdated Show resolved Hide resolved
src/audio/component.c Outdated Show resolved Hide resolved
src/include/sof/audio/component.h Outdated Show resolved Hide resolved
src/audio/base_fw.c Show resolved Hide resolved
src/audio/base_fw.c Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/audio/component.c Outdated Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
@tobonex tobonex force-pushed the telemetry_global_perf_final branch 7 times, most recently from f56b3a1 to a775f3f Compare June 6, 2024 10:11
int ret = sys_bitarray_alloc(bitmap->array, 1, offset);

if (!ret)
bitmap->occupied++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this bitmap have multiple client users ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. This is used in comp_new and it seems that those can be executed on other cores so I should just add a spinlock for good measure. To be fair, as this code is based on Zephyr's bitarray, it would be a better idea to just add occupied bit count to bitarray in Zephyr, as it already has spinlocks and everything. Getting this accepted may take a while though. Might need to update this later.

@tobonex tobonex force-pushed the telemetry_global_perf_final branch 5 times, most recently from b3ec702 to d117581 Compare June 18, 2024 08:12
@tobonex tobonex marked this pull request as ready for review June 18, 2024 08:28
src/audio/component.c Outdated Show resolved Hide resolved

static uint32_t get_one_ms_in_bytes(const struct ipc4_audio_format fmt)
{
#ifdef ROUND_UP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which cases is ROUND_UP not defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some builds that don't recognize this. It is defined in Zephyr so I 'm guessing those builds don't use Zephyr (not sure which one it was). This makes it awkward, as I'd rather just use the macro , but it seems I need to expand it anyway because builds will fail otherwise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you actually need SOF_DIV_ROUND_UP()

src/audio/component.c Outdated Show resolved Hide resolved
@@ -40,6 +40,7 @@ CONFIG_MM_DRV_INTEL_ADSP_TLB_REMAP_UNUSED_RAM=y
CONFIG_AMS=y
CONFIG_COUNTER=y
CONFIG_SOF_TELEMETRY=y
CONFIG_SOF_TELEMETRY_PERFORMANCE_MEASUREMENTS=y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with these changes performance measurements would be unconditionally enabled on MTL and LNL? Is this desirable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the performance measurement itself is enabled by IPC, so there's little to no overhead. But now that you mention it, disabling only CONFIG_SOF_TELEMETRY may break the build. It would be good if performance measurements could be enabled regardless of telemetry, I'll think about it.

src/audio/component.c Outdated Show resolved Hide resolved
if (perf_meas_get_state() == IPC4_PERF_MEASUREMENTS_STARTED) {
/* we divide by ibs so we need to check if its set */
if (item && dev->ibs != 0) {
item->total_iteration_count++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how often does this run? If this runs every millisecond, then in a couple of days this will overflow and the division below will divide by zero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it may as well be every ms. Doubt it will be used for a couple of days , but still I should add some check here, true.

@tobonex tobonex force-pushed the telemetry_global_perf_final branch from d117581 to a299c75 Compare June 18, 2024 09:09
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very brief review (as we are middle of 2.10 release work), but code in general looks good, but I think the name should somehow reflect this is built on top of the the ADSP_DW infrastructure that is currently very specific to Intel/Zephyr SOF platforms. We have other such things (like mtrace),

Part of this is already in the original telemetry PR.

I think this could be improved by just naming it something less generic than "SOF telemetry".

#define ADSP_PMW ((volatile uint32_t *) \
(sys_cache_uncached_ptr_get((__sparse_force void __sparse_cache *) \
(WIN3_MBASE + WIN3_OFFSET))))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but this is now very Intel specific, should this be called something else than debug/telemtry? debug/intel-telemetry.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved new functionality to separate files. I'm guessing we still want those to be called intel-performance-monitor, and intel-telemetry? Do we want to change the CONFIG_SOF_TELEMETRY and CONFIG_SOF_TELEMETRY_PERFORMANCE_MEASUREMENTS names too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can keep the current names. This is not strictly Intel specific either as other platforms could support this if similar memory windows to host can be exposed. We can do the rename when the first alternative telemetry implementation comes (if that uses other transport than the IPC4 memory window layout used here).

size_t slots_count;
size_t slot_idx = 0;
struct telemetry_wnd_data *wnd_data =
(struct telemetry_wnd_data *)ADSP_DW->slots[SOF_DW_TELEMETRY_SLOT];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the file is pretty general otherwise, but this use of the memory window is Intel specific. This itself is ok, but I think it's a problem to call this "SOF debug telemetry" which suggests a generic mechanism, while in practise this only works on platforms that provide a very specific memory window structure (described in Zephyr soc/xtensa/intel_adsp/common/include/adsp-debug-window.h ).

src/debug/telemetry/Kconfig Show resolved Hide resolved
@tobonex tobonex force-pushed the telemetry_global_perf_final branch 2 times, most recently from 7000d28 to fd79ef0 Compare June 24, 2024 15:54
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved

static uint32_t get_one_ms_in_bytes(const struct ipc4_audio_format fmt)
{
#ifdef ROUND_UP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you actually need SOF_DIV_ROUND_UP()

src/audio/component.c Show resolved Hide resolved
item->total_iteration_count = 1;
tr_err(&ipc_tr,
"overflow for module %#x, performance measurement incorrect",
dev_comp_id(dev));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but like this after such an overflow they'll stay incorrect? Can we reset total_cycles_consumed to start producing valid data again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we can either do that, or zero everything and set a flag to stop the measurement to further indicate that something went wrong.

src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
@lgirdwood lgirdwood added this to the v2.11 milestone Jun 26, 2024
@tobonex tobonex force-pushed the telemetry_global_perf_final branch from fd79ef0 to d43392f Compare June 27, 2024 17:50
lyakh
lyakh previously requested changes Jun 28, 2024
{
/* TODO Reference Firmware also has systick multiplier and divider in this equation */
return get_sample_group_size_in_bytes(fmt) *
(SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000) / 1000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think you need to divide by 1000 now any more, SOF_DIV_ROUND_UP() already does it. And after you've removed it please also drop parentheses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The divide is part of the equation, it's correct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the difference to your previous version:
Screenshot_20240704_083619
i.e. basically you replaced (fmt.sampling_frequency + 999) / 1000 with SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000) / 1000 and that is wrong, because SOF_DIV_ROUND_UP() is defined as

#define SOF_DIV_ROUND_UP(val, div) (((val) + (div) - 1) / (div))

and therefore already performs the division by 1000 for you. So, either your original formula was wrong and you actually wanted to divide by 1000000 and not 1000, or this your update is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original formula was wrong, yes. I should've mentioned it, sorry.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @tobonex the formula is not correct now:

  • 16bit stereo 48000Hz example -> sample group size in bytes 4
  • this function would return now: 4 * (48000 / 1000) / 1000 = 0.912 = 0

So I think this should be:

return get_sample_group_size_in_bytes(fmt) *
			SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000);

Which gives you 4 * 48000 / 1000 = 192 bytes (which is correct. I think you missed SOF_DIV_ROUND_UP does actually do the dvision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok i get it now, ROUND_UP_DIV adds division COMPARED to ROUND_UP. I'll fix.

src/debug/telemetry/telemetry.c Outdated Show resolved Hide resolved
@tobonex tobonex force-pushed the telemetry_global_perf_final branch from d43392f to bde8bfb Compare July 3, 2024 13:23
@tobonex
Copy link
Contributor Author

tobonex commented Jul 3, 2024

UPDATE:
In ref. FW, this functionality was in separate file so I moved all of this to new files. Now this functionality can be disabled regardless of telemetry as .c files are included conditionally in CMakeLists. Also telemetry.c was getting a bit bloated.

@tobonex tobonex force-pushed the telemetry_global_perf_final branch 2 times, most recently from 8f46fbc to d7c48fb Compare July 3, 2024 16:36
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This starts to be ready for merge. I think we could have a better tree-wide mechanism to identify features built on top of the ADSP_MW/DW (memory window or debug window) interface. But upon second thought, I don't want start renaming the files yet, let's wait whether we can make this generic for other platforms, and if not, make the names more specific later.

There is one issue w.r.t. 1ms-in-bytes calculation, can you @tobonex check that? Otherwise I'm good with this.

#define ADSP_PMW ((volatile uint32_t *) \
(sys_cache_uncached_ptr_get((__sparse_force void __sparse_cache *) \
(WIN3_MBASE + WIN3_OFFSET))))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can keep the current names. This is not strictly Intel specific either as other platforms could support this if similar memory windows to host can be exposed. We can do the rename when the first alternative telemetry implementation comes (if that uses other transport than the IPC4 memory window layout used here).

{
/* TODO Reference Firmware also has systick multiplier and divider in this equation */
return get_sample_group_size_in_bytes(fmt) *
(SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000) / 1000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @tobonex the formula is not correct now:

  • 16bit stereo 48000Hz example -> sample group size in bytes 4
  • this function would return now: 4 * (48000 / 1000) / 1000 = 0.912 = 0

So I think this should be:

return get_sample_group_size_in_bytes(fmt) *
			SOF_DIV_ROUND_UP(fmt.sampling_frequency, 1000);

Which gives you 4 * 48000 / 1000 = 192 bytes (which is correct. I think you missed SOF_DIV_ROUND_UP does actually do the dvision.

tobonex added 9 commits July 4, 2024 17:38
Implement wrapper to extend Zephyr's bitarray by adding a counter for
allocated bits. This bitmap will be used to allocate performance data
entries in memory window 3. Also adds new performance_monitor files.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Add data to component struct for computing performance.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Add temporary macro for accessing memory window 3 data.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Add config to enable global performance measurements.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Enable global performance measutements.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implement global performance measurement which measure performance of .copy
functions of multiple components.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implement global performance data get ipc which extracts performance data
from MW3

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implement extended global performance data get ipc which extracts
performance data from MW3

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implement actual functionality for perf meas state ipc handling. This
enables changing the state of global performance measurements.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
@tobonex tobonex force-pushed the telemetry_global_perf_final branch from d7c48fb to 229033b Compare July 5, 2024 16:03
@lyakh lyakh dismissed their stale review July 8, 2024 06:34

comment addressed

@kv2019i
Copy link
Collaborator

kv2019i commented Jul 10, 2024

@marcinszkudlinski @abonislawski @tmleman @softwarecki @lgirdwood @lyakh ... this one is waiting for another review, otherwise ready to go.

@lgirdwood lgirdwood merged commit b58df6d into thesofproject:main Jul 10, 2024
42 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants