From 25401428e6a7b853bfd0896b306f1a9346c83469 Mon Sep 17 00:00:00 2001 From: Levente Meszaros Date: Fri, 29 Nov 2024 17:27:31 +0100 Subject: [PATCH] Tests: Updated speed tests README file. --- tests/speed/README.txt | 136 +++++++++++++++++++++++++++++++++-------- 1 file changed, 110 insertions(+), 26 deletions(-) diff --git a/tests/speed/README.txt b/tests/speed/README.txt index 04daa5ed863..1b931dd6f68 100644 --- a/tests/speed/README.txt +++ b/tests/speed/README.txt @@ -1,17 +1,15 @@ SPEED TESTS FOR INET ==================== -This folder contains speed tests for the INET framework using the provided -examples. +This folder contains speed tests for the INET Framework using the examples. In general, it must be noted that it's notoriously difficult to do repeatable -speed measurements. This file describes several common pitfalls that must be +speed measurements. This file describes several common pitfalls that should be avoided. Before optimizing, it's essential to have a speed test that produces -repeatable results, otherwise it's unclear if any particular optimization is +repeatable results. Otherwise it's unclear if any particular optimization is right or wrong. -The intended audience of this document are the users and the maintainers of -the INET framework and its speed tests. +The intended audience of this document are the maintainers of the INET Framework. 1. Fix the Environment @@ -37,11 +35,11 @@ many interesting things done by the operating system. Either way, if the goal is measuring how long the simulation takes, then the best thing to do is to put in a few lines of code around that part such as this: -clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start); -... -clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end); -int64_t time = ((int64_t)end.tv_sec * 1000000000L + end.tv_nsec) - ((int64_t)start.tv_sec * 1000000000L + start.tv_nsec); -out << "Runtime: " << time / 1.0E+9 << "\n"; + clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start); + ... + clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end); + int64_t time = ((int64_t)end.tv_sec * 1000000000L + end.tv_nsec) - ((int64_t)start.tv_sec * 1000000000L + start.tv_nsec); + out << "Runtime: " << time / 1.0E+9 << "\n"; 4. Calculate Mean @@ -87,14 +85,103 @@ measurements less repeatable. Making sure the cpu's frequency doesn't scale during the speedtest: -sudo pstate-frequency -S --turbo off # turn off frequency scaling -sudo pstate-frequency -S --min 100 # set frequency of cpu cores to 100% (base frequency) -sudo pstate-frequency -S --max 100 +$ sudo pstate-frequency -S --turbo off # turn off frequency scaling +$ sudo pstate-frequency -S --min 100 # set frequency of cpu cores to 100% (base frequency) +$ sudo pstate-frequency -S --max 100 -pstate-frequency -G # to check if turbo is off and the percentages are properly set to around 100% (the cpu frequencies reported here can be misleading) -cpufreq-info # check the actual cpu core frequencies, should be around the base frequency +$ pstate-frequency -G # to check if turbo is off and the percentages are properly set to around 100% (the cpu frequencies reported here can be misleading) +$ cpufreq-info # check the actual cpu core frequencies, should be around the base frequency -10. Configure OMNeT++ Features +10. Disable Thermal Throttling + +Ensure thermal throttling does not alter CPU performance. Monitor CPU temperatures +and use sufficient cooling. + +11. Minimize Interrupts + +Minimize hardware and software interrupts during measurements to reduce unexpected +performance impacts. + +12. Change CPU Affinity + +Set CPU affinity to bind the process to a specific CPU. This ensures the process runs only on designated CPUs. Use the taskset command: + +$ taskset -c 0 ./inet_profile -u Cmdenv + +13. Isolate CPU(s) + +Isolate specific CPUs using kernel parameters to dedicate them exclusively to +the measurement process. Modify the GRUB configuration: + + GRUB_CMDLINE_LINUX="isolcpus=14,15 nohz_full=14,15 rcu_nocbs=14,15" + +14. Disable Power Management + +Disable all CPU power management features to ensure consistent performance. + +15. Use Real-Time Kernel + +Consider using a real-time kernel for higher priority and more deterministic task scheduling. + +16. Be aware of clock granularity + +Understand the resolution of the timing functions used, as insufficient granularity +can lead to inaccurate measurements. + +17. User time vs system time + +Differentiate between user time (time spent in user-mode processes) and system +time (time spent in kernel-mode processes). This distinction helps interpret +performance results. + +18. Wall-clock time vs CPU time + +Wall-clock time includes all elapsed time, whereas CPU time only measures the +time actively used by the process. + +19. Know What is Measured + +Measure CPU time, CPU cycles or CPU instructions to analyze performance at the +hardware level. Example APIs include: + + _rdtsc + clock() + std::chrono + clock_gettime(CLOCK_PROCESS_CPUTIME_ID + struct rusage r; + getrusage(RUSAGE_SELF, &r); + +20. CPU Cache + +Be mindful of cache effects on performance. Warm-up runs can help mitigate +cache-related variability. A context switch, to handle an interrupt or to run +another task, clears the CPU caches, so most likely increases CPU time. + +21. Useful Command-Line Tools + +$ time +$ cpufreq-info +$ lscpu +$ perf # this can display context switches, CPU cycles, CPU instructions, CPU migrations +$ taskset +$ mpstat + +22. Important mapped files + +Use the following for relevant configuration and status information: + +$ cat /proc/cmdline +$ cat /proc/interrupts +$ cat /proc/self/status | grep Cpus_allowed_list +$ cat /proc/sys/kernel/perf_event_paranoid +$ cat /sys/devices/system/cpu/smt/control +$ cat /sys/devices/system/cpu/intel_pstate/no_turbo +$ cat /sys/devices/system/cpu/cpu14/cpufreq/scaling_max_freq +$ cat /sys/devices/system/cpu/cpu14/cpufreq/scaling_cur_freq +$ cat /sys/devices/system/cpu/cpu15/cpufreq/scaling_governor +$ cat /proc/irq/*/smp_affinity + +23. Configure OMNeT++ Features Disable Qtenv, Tkenv, Osg, OsgEarth, Parsim and other unnecessary optional OMNeT++ features. Some of these features link with several additional libraries that @@ -102,28 +189,25 @@ may affect performance. For example, linking with pthread library might decrease the performance of std::shared_ptr by using locking to provide atomic operations for the reference counter. -11. Configure INET Features +24. Configure INET Features Disable VOIP, visualization, and other unnecessary optional INET features. Some of these features link with several additional libraries that may affect performance. -12. Disable Self Checking +25. Disable Self Checking Disable all self checking code that would unnecessarily affect the measurement results. Even if these would be included in the final product, they don't change the output of the program during the measurement. For example, the INET packet API contains two sets of self checking operations, see Chunk.h for more details. -13. Running Tests +26. Running Tests -In the simplest case you can run all speed tests with the following command: -$./speedtest +In the simplest case, you can run all speed tests with the following command: -For a more complicated example, you could run some tests from examples.csv -using the compiled debug version 10 times with the following command: -$./speedtest examples.csv -m TwoHosts -r 10 -e /home/levy/workspace/omnetpp/bin/opp_run_dbg +$ inet_run_speed_tests For more details on command line options, run: -$./speedtest -h +$ inet_run_speed_tests -h