A simple, quick and semi-automatic way to measure run-time speed of functions

A typical task of a programmer is to know if a function is efficient or not, both from the algorithmic point of view and from actual execution time. I think it is necessary to clarify that both values ​​are important, since the first will tell us the behavior of the algorithm as the data set grows, and the second a more touchable value, close to what the end user will perceive.

On the other hand, it is not necessary to struggle optimizing our code everywhere, since most of the time the real execution time is good for the bulk of our code, and the bottleneck is in small portions of code: searches on huge sets of unsorted data, unnecessary repetitions, functions called tens of thousands of times per second, disk access, cache failures… This is summarized in the famous rule of 90-10 (the 90% of the execution time is spent on 10% of the code, more information here).

To size

One of the first phases of optimization is measurement: optimizing something (time, space, etc.) without previously measuring it can cause us to waste time on. This has happened to all of us:  – 3 hours of work to make a function run 85% faster, but that function took 1 second and ran in the background once every 5 hours… come on, we basically lost our time.  – 4 hours to improve the performance of a code snippet by only 25%, but that code takes 20 ms in average and is called 50 times per second during the video recording process: 20 ms * 50 = 1 second! That 25% makes it possible to lighten the CPU considerably at a critical moment, surely avoiding the loss of frames.

Measure, right, but, how? A profiling tool is usually one of the best alternatives, since it presents grouped and ordered information about the performance of critical functions. But sometimes we are simply interested in measuring a small handful of specific functions, or measuring them on client, where there are no profiling tools available.

TicToc

This article presents a small class, TicToc (do you notice that I’ve used Matlab?), To measure the runtime of a function automatically and easily. This class uses one of the maximum of C ++ (and one of my favorites, [RAII] (https://en.wikipedia.org/wiki/RAII)), to automate the measurement and printing of the duration per console.

And although I know that the use of macros should be limited, this is one of those situations in which they are useful: automate actions. The TICTOC() macro automatically generates a measurement point that will be displayed at the end of the context in which it is used.

Code

#include <string>
#include <chrono>
#include <iostream>

class TicToc {
public:
 explicit TicToc(std::string label = {}) : m_label(label),
 m_begin(std::chrono::high_resolution_clock::now()) {}
 ~TicToc() {
 print();
 }

public:
 void print() const {
 using namespace std::chrono;
 const auto end = high_resolution_clock::now();
 const auto duration = duration_cast<microseconds>(end - m_begin).count();
 const auto duration_in_ms = duration / 1000.0;

 std::cout << m_label << duration_in_ms << " ms";
 }

private:
 const std::string m_label;
 std::chrono::time_point<std::chrono::steady_clock> m_begin;
};

// More info: https://stackoverflow.com/q/1489932/1485885
#define HELPER_JOIN(a, b) HELPER_JOIN2(a, b)
#define HELPER_JOIN2(a, b) a ## b

// Generate an unique variable name within current file
// Measurement tag: name of the caller function and line number
#define TICTOC() STT::TicToc HELPER_JOIN(tic_toc_, __LINE__)(std::string(__FUNCTION__) + "@" + std::to_string(__LINE__) + " = ");

void run() {
 TICTOC(); // that's all folks!

 for (int i = 0; i < 1000000; ++i) {
 do_something(); // ejem...
 }
}

int main() {
 run();
}

And we’re done! An output for above code can be:

::run@33 = 154.2 ms

Boost (update)

If our application depends on Boost, a possible improvement would be to use boost::timer::auto_cpu_timer, similar to the Linux time command. This class, similar to TicToc, shows the execution time between the declaration of the object and its destruction:

#include <boost/timer/timer.hpp>

void run() {
 boost::timer::auto_cpu_timer t;

 for (int i = 0; i < 1000000; ++i) {
 do_something();
 }
}

int main() {
 run();
}

Output:

 0.148997s wall, 0.078125s user + 0.062500s system = 0.140625s CPU (94.4%)

Of course, TicToc could be modified to use such class while using TicToc‘s automatic labeling:

class TicToc2 {
public:
 explicit TicToc2 (std::string label = {}) : m_label(label) {}
 ~TicToc2() {
 std::cout << m_label;
 }

private:
 const std::string m_label;
 boost::timer::auto_cpu_timer m_timer;
};

Other possible improvements

   – Use higher resolution / precision clocks.   – Add a switch to disable the measurement completely, or at levels, so that it is not necessary to suppress the code in production.   – Be able to redirect the output to, for example, a log file.   – Anything else?