callgrind profiling of embedded targets
When performing optimization or micro-optimization on desktop applications, callgrind combined with kcachegrind are one of my favorite combinations. While slow to run, it gives you precise information about where instructions are spent and has at least a decent way of moving up and down the call frame or digging into disassembly. The downside, is that it largely only works if you can run the application on your host processor, which isn’t that relevant when working with embedded targets like STM32 (or other) microcontrollers. Recently, I got fed up trying to find more cycles to shave off the moteus firmware, and decided to take a stab at making at least a minimal solution.