Roofline Components¶
Namespace:
tim::component
Roofline is a visually intuitive performance model used to bound the performance of various numerical methods and operations running on multicore, manycore, or accelerator processor architectures. Rather than simply using percent-of-peak estimates, the model can be used to assess the quality of attained performance by combining locality, bandwidth, and different parallelization paradigms into a single performance figure. One can examine the resultant Roofline figure in order to determine both the implementation and inherent performance limitations.
More information on roofline can be found here.
| Component Name | Category | Template Specification | Dependencies | Description |
|---|---|---|---|---|
cpu_roofline |
CPU | cpu_roofline<Types...> |
PAPI | Records the rate at which the hardware counters are accumulated |
gpu_roofline |
CPU | gpu_roofline<Types...> |
CUDA, CUPTI | Records the rate at which the hardware counters are accumulated |
The roofline components provided by timemory execute a workflow during application termination that calculates the theoretical peak for the roofline.
A pre-defined set of algorithms for the theoretical peak are provided but these can be customized.
An example can be found in timemory/examples/ex-cpu-roofline/test_cpu_roofline.cpp and timemory/examples/ex-gpu-roofline/test_gpu_roofline.cpp.
Pre-defined Types¶
Namespace:
tim::component
| Component Name | Underlying Template Specification | Description |
|---|---|---|
cpu_roofline_flops |
cpu_roofline<float, double> |
Rate of single- and double-precision FLOP/s |
cpu_roofline_dp_flops |
cpu_roofline<double> |
Rate of double-precision FLOP/s |
cpu_roofline_sp_flops |
cpu_roofline<float> |
Rate of single-precision FLOP/s |
gpu_roofline_flops |
gpu_roofline<fp16_t, float, double> |
Rate of half-, single- and double-precision FLOP/s |
gpu_roofline_dp_flops |
gpu_roofline<double> |
Rate of double-precision FLOP/s |
gpu_roofline_sp_flops |
gpu_roofline<float> |
Rate of single-precision FLOP/s |
gpu_roofline_hp_flops |
gpu_roofline<fp16_t> |
Rate of half-precision FLOP/s |