Performance Counter Super-Resolution

by syoyo

Performance Counter Super-Resolution


This idea seems good.
Commodity CPUs has precise HW performance counter facility,
but when using it, it has a side effect on measured program.

Increasing the sampling rate also increases
system bus accesses, memory accesses, etc to transfer sampled data,
which affects behavior of application running(measuring),
resulting in poor and inaccurate profiling.

Intel’s VTune have a recommended lower bound of one millisecond as the minimum interval in between taking counter measurements to constrain the impact of these two types of error.

I wondered that VTune’s sampling interval is too sparse(msec order),
In such a coarse sampling rate case, short function are easily missed in result sampling data.
But I understand it should be so, according to quotes and the link info.

Usually we(performance eager) want to profile program’s behavior in
100~1000 cycle accurate order for such a profiling purpose.

The idea using super-resolution techniques may solve such a problem.

Super resolution profiling,
i.e. running your app(measured function) multiple times with low frequency but assign unique jittering,
then assembling it to get one high frequency profiling result,
may gives more accurate sampling using HW performance monitor facility.

I’m considering to support such a HW sampling techniques for MUDA optimization platform.
For example, running oprofile profiler multiple times with different start time jittering.