by Fernando Trias
Implementation of gprof profiler for Teensy 3 and 4 platform from PJRC.
This document is written for Linux/Mac. It is possible to follow a similar procedure for Windows.
See license.txt for licenses used by code.
-
Build gprof cross-compiled for ARM or use the binaries in "binaries" directory. Instructions for building are beyond the scope of this text. Look up how to install your own cross-compiler, including arm-none-eabi-gprof. Normally this is part of a build, but Teensyduino does not include it.
-
Python 2.7 or higher for automatic processing.
-
Modify Teensyduino files as shown in Patches section at bottom of this document.
GProf is a sampling profiler that does two things:
-
Samples the current instruction every 1 millisecond. That is, at each millisecond it looks to see what function is currently running and keeps a counter of how many times this happens. Over time, this gives an approximate value of how much time each function consumes.
-
Keep track of which function calls which. It will use this to create an estimate of the cumulative time spent in a function and all functions it calls.
After running for a while (you determine the amount of time), a file "gmon.out" is written out to a serial port with the function counters and data. A python program listens for this file on the serial port and then gprof will cross-reference this file the original executable by gprof to generate a table of execution times.
-
Install ZIP file as Arudino library.
-
Modify files as described in Patches section.
-
Build or copy arm-none-eabi-gprof and modify
readfile.py
to point to it.
-
Start Arduino and open TeensyGProf example.
-
Select menu Tools / Profile / On.
-
Select a USB Type that includes Serial (preferably Dual Serial USB).
-
Compile and upload.
-
Run
python readfile.py --serial /dev/cu.usb.usbmodem123456
[substitute actual usb serial device].
readfile.py
will open the serial port, print out anything it receives, filtering and processing the special gmon.out
data. It will write out the gmon.out
file and then run gprof
to show the outout.
The library also supports writing the gmon.out
file from midi, in hex or to an SD card. See TeensyGProf.h
.
Compile the following with Dual Serial USB
support and Profile / On
.
#include "TeensyGProf.h"
volatile int sum = 0;
void do_something() {
for(int i=0; i<1000; i++) {
sum += 1;
}
}
void setup() {
// collect for 5000 milliseconds and send on second USB Serial
gprof.begin(&SerialUSB1, 5000);
}
void loop() {
do_something();
}
You can view the sketch output with the Serial Monitor and collect the profile data with the readfile.py
script as in:
python readfile.py --serial /dev/cu.usb.usbmodem123456
The script will go into a loop processing all the runs that it detects until you terminate it by pressing Ctrl-C. In this way, you can recompile and restart Teensy without having to restart readfile.py
.
The files below must be added (or modified if already existing). They are located in the Teensyduino install directory, which is a part of Arduino. On the Mac, they are in /Applications/Arduino.app/Contents/Java/hardware/teensy/avr
.
-
In the same directory as
boards.txt
, place a copy ofboards.local.txt
. If the file exists, add to the end. This will add the menu options. -
In the same directory as above, copy
platform.local.txt
(or append to it). This will modify the compile stage of cpp files to add the profiling option. It will also add an additional build step that copies the elf file (renaming a section--see below) to standard location so thatreadfile.py
can find it. The value forprofelf
must match the value inreadfile.py
.
-
TeensyGProf will commandeer the
systick_isr
handler. It will process it's own profiling before handing control to the originalsystick_isr
handler. Thus EventResponder and all other timing code should be unaffected. The sampling data will be stored in RAM. -
It adds the
-pg
compiler flag to cpp files. This causes the compiler to add a call to_gnu_mcount_nc
at the start of every function. That's how it keeps track of the call stack. Call stacks (called Arcs) are also stored in RAM. -
You can configure the amount of RAM memory used by the sampler in Step 1 and the call tracker in Step 2. The more memory you allocate, the more accurate your results. Look at file
gmon.h
and modifyHASHFRACTION
andARCDENSITY
. -
If you call
grpof.begin()
and pass milliseconds it will start a timer that upon termination executesgprof.end()
, which outputs the data. Otherwise you must callgprof.end()
. That processes all the data and outputs the contents ofgmon.out
to the desired port in the format requested. This file, along with a copy of theelf
file is used by gprof to generate a report. You can customize the output method by subclassing classGProfOutput
. For example, you could send this file via a network or HTTP. -
For some reason, Teensy 4 puts it's code in a section called
.text.itcm
. Gprof expects it in a section called.text
, which is the standard in Linux. Teensy 3 puts it in the right place. So theplatform.local.txt
files tells arduino to runobjcopy
to rename the section.
-
Clean up source code. Code was ported from a destop implementation and could use some comments and optimization for ARM.
-
Create a script that will make all necessary modifications to Teensyduino files.
-
Right now, it only profiles C++ code. If I add profiling to C files, it won't work. I've tried adding
__attribute__((no_instrument_section))
to many of the basic C functions like ResetHandler(), etc. but I can't find the right combination of functions to change. With more time, I'm sure this could be solved. However, since C++ is the default language for Arduino files and almost all libraries are written in C++, this may not be a big problem.
For ARM solution this project is based on see: https://mcuoneclipse.com/2015/08/23/tutorial-using-gnu-profiling-gprof-with-arm-cortex-m/
For an interesting overview of gprof: http://wwwcdf.pd.infn.it/localdoc/gprof.pdf