Roofline Components

Namespace: tim::component

Roofline is a visually intuitive performance model used to bound the performance of various numerical methods and operations running on multicore, manycore, or accelerator processor architectures. Rather than simply using percent-of-peak estimates, the model can be used to assess the quality of attained performance by combining locality, bandwidth, and different parallelization paradigms into a single performance figure. One can examine the resultant Roofline figure in order to determine both the implementation and inherent performance limitations.

More information on roofline can be found here.

Component Name Category Template Specification Dependencies Description
cpu_roofline CPU cpu_roofline<Types...> PAPI Records the rate at which the hardware counters are accumulated
gpu_roofline CPU gpu_roofline<Types...> CUDA, CUPTI Records the rate at which the hardware counters are accumulated

The roofline components provided by timemory execute a workflow during application termination that calculates the theoretical peak for the roofline. A pre-defined set of algorithms for the theoretical peak are provided but these can be customized. An example can be found in timemory/examples/ex-cpu-roofline/test_cpu_roofline.cpp and timemory/examples/ex-gpu-roofline/test_gpu_roofline.cpp.

Pre-defined Types

Namespace: tim::component

Component Name Underlying Template Specification Description
cpu_roofline_flops cpu_roofline<float, double> Rate of single- and double-precision FLOP/s
cpu_roofline_dp_flops cpu_roofline<double> Rate of double-precision FLOP/s
cpu_roofline_sp_flops cpu_roofline<float> Rate of single-precision FLOP/s
gpu_roofline_flops gpu_roofline<fp16_t, float, double> Rate of half-, single- and double-precision FLOP/s
gpu_roofline_dp_flops gpu_roofline<double> Rate of double-precision FLOP/s
gpu_roofline_sp_flops gpu_roofline<float> Rate of single-precision FLOP/s
gpu_roofline_hp_flops gpu_roofline<fp16_t> Rate of half-precision FLOP/s