Components¶
Overview¶
This is an overview of the components available in timemory. For detailed info on the member functions, etc. please refer to the Doxygen.
The component documentation below is categorized into some general subsections and then sorted alphabetically.
In general, which member function are present are not that important as long
as you use the variadic component bundlers – these handle ignoring
trying to call start()
on a component is the component does not have
a start()
member function but is bundled alongside other components
which do (that the start()
was intended for).
Component Basics¶
Timemory components are C++ structs (class which defaults to public
instead of private
) which
define a single collection instance, e.g. the wall_clock
component is written as a simple class
with two 64-bit integers with start()
and stop()
member functions.
// This "component" is for conceptual demonstration only
// It is not intended to be copy+pasted
struct wall_clock
{
int64_t m_value = 0;
int64_t m_accum = 0;
void start();
void stop();
};
The start()
member function which records a timestamp
and assigns it to one of the integers temporarily, the stop()
member function
which records another timestamp, computes the difference and then assigns the difference
to the first integer and adds the difference to the second integer.
void wall_clock::start()
{
m_value = get_timestamp();
}
void wall_clock::stop()
{
// compute difference b/t when start and stop were called
m_value = (get_timestamp() - m_value);
// accumulate the difference
m_accum += m_value;
}
Thus, after start()
and stop()
is invoked twice on the object:
wall_clock foo;
foo.start();
sleep(1); // sleep for 1 second
foo.stop();
foo.start();
sleep(1); // sleep for 1 second
foo.stop();
The first integer (m_value
) represents the most recent timing interval of 1 second
and the second integer (m_accum
) represents the accumulated timing interval totaling 2 seconds.
This design not only encapsulates how to take the measurement, but also provides it’s own
data storage model. With this design, timemory measurements naturally support asynchronous
data collection. Additionally, as part of the design for generating the call-graph,
call-graphs are accumulated locally on each thread and on each process and merged at
the termination of the thread or process. This allows parallel data to be collection
free from synchronization overheads. On the worker threads, there is a concept of being
at “sea-level” – the call-graphs relative position based on the base-line of the
primary thread in the application. When a worker thread is at sea-level, it reads the
position of the call-graph on the primary thread and creates a copy of that entry
in it’s call-graph, ensuring that when merged into the primary thread at the end,
the accumulated call-graph across all threads is inserted into the appropriate
location. This approach has been found to produce the fewest number of artifacts.
In general, components do not need to conform to a specific interface. This is relatively unique approach. Most performance analysis which allow user extensions use callbacks and dynamic polymorphism to integrate the user extensions into their workflow. It should be noted that there is nothing preventing a component from creating a similar system but timemory is designed to query the presence of member function names for feature detection and adapts accordingly to the overloads of that function name and it’s return type. This is all possible due to the template-based design which makes extensive use of variadic functions to accept any arguments at a high-level and SFINAE to decide at compile-time which function to invoke (if a function is invoked at all). For example:
component A can contain these member functions:
void start()
int get()
void set_prefix(const char*)
component B can contains these member functions:
void start()
void start(cudaStream_t)
double get()
component C can contain these member functions:
void start()
void set_prefix(const std::string&)
And for a given bundle component_tuple<A, B, C> obj
:
When
obj
is created, a string identifer, instance of asource_location
struct, or a hash is requiredThis is the label for the measurement
If a string is passed,
obj
generates the hash and adds the hash and the string to a hash-map if it didn’t previously existA::set_prefix(const char*)
will be invoked with the underlyingconst char*
from the string that the hash maps to in the hash-mapC::set_prefix(const std::string&)
will be invoked with string that the hash maps to in the hash-mapIt will be detected that
B
does not have a member function namedset_prefix
and no member function will be invoked
Invoking
obj.start()
calls the following member functions on instances of A, B, and C:A::start()
B::start()
C::start()
Invoking
obj.start(cudaStream_t)
calls the following member functions on instances of A, B, and C:A::start()
B::start(cudaStream_t)
C::start()
Invoking
obj.get()
:Returns
std::tuple<int, double>
because it detects the two return types from A and B and the lack ofget()
member function in component C.
This design makes has several benefits and one downside in particular. The benefits are that timemory: (1) makes it extremely easy to create a unified interface between two or more components which different interfaces/capabilities, (2) invoking the different interfaces is efficient since no feature detection logic is required at runtime, and (3) components define their own interface.
With respect to #2, consider the two more traditional implementations. If callbacks are used, a function pointer exists and a component which does not implement a feature will either have a null function pointer (requiring a check at runtime time) or the tool will implement an array of function pointers with an unknown size at compile-time. In the latter case, this will require heap allocations (which are expensive operations) and in both cases, the loop of the function pointers will likely be quite ineffienct since function pointers have a very high probability of thrashing the instruction cache. If dynamic polymorphism is used, then virtual table look-ups are required during every iteration. In the timemory approach, none of these additional overheads are present and there isn’t even a loop – the bundle either expands into a direct call to the member function without any abstractions or nothing.
With respect to #1 and #3, this has some interesting implications with regard to a universal instrumentation interface and is discussed in the following section and the CONTRIBUTING.md documentation.
The aforementioned downside is that the byproduct of all this flexibility and adaption to custom interfaces by each component is that directly using the template interface can take quite a long time to compile.
Component Metadata¶
-
template<int Idx>
struct tim::component::enumerator : public tim::component::properties<placeholder<nothing>>¶ This is a critical specialization for mapping string and integers to component types at runtime (should always be specialized alongside tim::component::properties) and it is also critical for performing template metaprogramming “loops” over all the components. E.g.:
template <size_t Idx> using Enumerator_t = typename tim::component::enumerator<Idx>::type; template <size_t... Idx> auto init(std::index_sequence<Idx...>) { // expand for [0, TIMEMORY_COMPONENTS_END) TIMEMORY_FOLD_EXPRESSION(tim::storage_initializer::get< Enumerator_t<Idx>>()); } void init() { init(std::make_index_sequence<TIMEMORY_COMPONENTS_END>{}); }
- tparam Idx
Enumeration value
Public Functions
-
inline bool operator==(int) const¶
-
inline bool operator==(const char*) const¶
-
inline bool operator==(const std::string&) const¶
-
inline void serialize(Archive&, const unsigned int)¶
-
inline TIMEMORY_COMPONENT operator()()¶
-
inline constexpr operator TIMEMORY_COMPONENT() const¶
Public Static Functions
-
static inline constexpr bool specialized()¶
-
static inline constexpr const char *enum_string()¶
-
static inline constexpr const char *id()¶
-
static inline idset_t ids()¶
Public Static Attributes
-
static constexpr bool value = false¶
-
template<typename Tp>
struct tim::component::metadata¶ Provides forward declaration support for assigning static metadata properties. This is most useful for specialization of template components. If this class is specialized for component, then the component does not need to provide the static member functions
label()
anddescription()
.Public Static Functions
-
static std::string name()¶
-
static std::string label()¶
-
static std::string description()¶
-
static inline std::string extra_description()¶
-
static inline constexpr bool specialized()¶
Public Static Attributes
-
static constexpr TIMEMORY_COMPONENT value = TIMEMORY_COMPONENTS_END¶
-
static std::string name()¶
-
template<typename Tp>
struct tim::component::properties : public tim::component::static_properties<Tp>¶ This is a critical specialization for mapping string and integers to component types at runtime. The
enum_string()
function is the enum id as a string. Theid()
function is (typically) the name of the C++ component as a string. Theids()
function returns a set of strings which are alternative string identifiers to the enum string or the string ID. Additionally, it provides serializaiton of these values.A macro is provides to simplify this specialization:
- tparam Tp
Component type
TIMEMORY_PROPERTY_SPECIALIZATION(wall_clock, TIMEMORY_WALL_CLOCK, "wall_clock", "real_clock", "virtual_clock")
In the above, the first parameter is the C++ type, the second is the enumeration id, the enum string is automatically generated via preprocessor
#
on the second parameter, the third parameter is the string ID, and the remaining values are placed in theids()
. Additionally, this macro specializes the tim::component::enumerator.Public Functions
-
inline TIMEMORY_COMPONENT operator()()¶
-
inline constexpr operator TIMEMORY_COMPONENT() const¶
Public Static Functions
-
static inline constexpr bool specialized()¶
-
static inline constexpr const char *enum_string()¶
-
static inline constexpr const char *id()¶
-
static inline idset_t ids()¶
Public Static Attributes
-
static constexpr TIMEMORY_COMPONENT value = TIMEMORY_COMPONENTS_END¶
-
template<typename Tp, bool PlaceHolder = concepts::is_placeholder<Tp>::value>
struct static_properties¶ Provides three variants of a
matches
function for determining if a component is identified by a given string or enumeration value.- tparam Tp
Component type
- tparam Placeholder
Whether or not the component type is a placeholder type that should be ignored during runtime initialization.
Subclassed by tim::component::properties< placeholder< nothing > >, tim::component::properties< Tp >
Timing Components¶
-
struct cpu_clock : public tim::component::base<cpu_clock>¶
this component extracts only the CPU time spent in both user- and kernel- mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct cpu_util : public tim::component::base<cpu_util, std::pair<int64_t, int64_t>>¶
this computes the CPU utilization percentage for the calling process and child processes. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct kernel_mode_time : public tim::component::base<kernel_mode_time, int64_t>¶
This is the total amount of time spent executing in kernel mode.
-
struct monotonic_clock : public tim::component::base<monotonic_clock>¶
clock that increments monotonically, tracking the time since an arbitrary point, and will continue to increment while the system is asleep.
-
struct monotonic_raw_clock : public tim::component::base<monotonic_raw_clock>¶
clock that increments monotonically, tracking the time since an arbitrary point like CLOCK_MONOTONIC. However, this clock is unaffected by frequency or time adjustments. It should not be compared to other system time sources.
-
struct process_cpu_clock : public tim::component::base<process_cpu_clock>¶
this clock measures the CPU time within the current process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct process_cpu_util : public tim::component::base<process_cpu_util, std::pair<int64_t, int64_t>>¶
this computes the CPU utilization percentage for ONLY the calling process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct system_clock : public tim::component::base<system_clock>¶
this component extracts only the CPU time spent in kernel-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct thread_cpu_clock : public tim::component::base<thread_cpu_clock>¶
this clock measures the CPU time within the current thread (excludes sibling/child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct thread_cpu_util : public tim::component::base<thread_cpu_util, std::pair<int64_t, int64_t>>¶
this computes the CPU utilization percentage for ONLY the calling thread (excludes sibling and child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct user_clock : public tim::component::base<user_clock>¶
this component extracts only the CPU time spent in user-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn’t work that way.
-
struct user_mode_time : public tim::component::base<user_mode_time, int64_t>¶
This is the total amount of time spent executing in user mode.
-
struct wall_clock : public tim::component::base<wall_clock, int64_t>¶
Resource Usage Components¶
-
struct current_peak_rss : public tim::component::base<current_peak_rss, std::pair<int64_t, int64_t>>¶
this struct extracts the absolute value of high-water mark of the resident set size (RSS) at start and stop points. RSS is current amount of memory in RAM.
-
struct num_io_in : public tim::component::base<num_io_in>¶
the number of times the file system had to perform input.
-
struct num_io_out : public tim::component::base<num_io_out>¶
the number of times the file system had to perform output.
-
struct num_major_page_faults : public tim::component::base<num_major_page_faults>¶
the number of page faults serviced that required I/O activity.
-
struct num_minor_page_faults : public tim::component::base<num_minor_page_faults>¶
the number of page faults serviced without any I/O activity; here I/O activity is avoided by reclaiming a page frame from the list of pages awaiting reallocation.
-
struct page_rss : public tim::component::base<page_rss, int64_t>¶
this struct measures the resident set size (RSS) currently allocated in pages of memory. Unlike the peak_rss, this value will fluctuate as memory gets freed and allocated
-
struct peak_rss : public tim::component::base<peak_rss>¶
this struct extracts the high-water mark (or a change in the high-water mark) of the resident set size (RSS). Which is current amount of memory in RAM. When used on a system with swap enabled, this value may fluctuate but should not on an HPC system.
-
struct priority_context_switch : public tim::component::base<priority_context_switch>¶
the number of times a context switch resulted due to a higher priority process becoming runnable or because the current process exceeded its time slice
-
struct virtual_memory : public tim::component::base<virtual_memory>¶
this struct extracts the virtual memory usage
-
struct voluntary_context_switch : public tim::component::base<voluntary_context_switch>¶
the number of times a context switch resulted due to a process voluntarily giving up the processor before its time slice was completed (usually to await availability of a resource).
I/O Components¶
-
struct read_bytes : public tim::component::base<read_bytes, std::pair<int64_t, int64_t>>¶
I/O counter for bytes read. Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the submit_bio() level, so it is accurate for block-backed filesystems.
-
struct read_char : public tim::component::base<read_char, std::pair<int64_t, int64_t>>¶
I/O counter for chars read. The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read() and pread(). It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache)
-
struct written_bytes : public tim::component::base<written_bytes, std::array<int64_t, 2>>¶
I/O counter for bytes written. Attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at page-dirtying time.
-
struct written_char : public tim::component::base<written_char, std::array<int64_t, 2>>¶
I/O counter for chars written. The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with tim::component::read_char (rchar).
User Bundle Components¶
Timemory provides the user_bundle
component as a generic component bundler
that the user can use to insert components at runtime. This component is
heavily used when mapping timemory to languages other than C++. Timemory
implements many specialization of this template class for various tools.
For example, user_mpip_bundle
is the bundle used by the MPI wrappers,
user_profiler_bundle
is used by the Python function profiler,
user_trace_bundle
is used by the dynamic instrumentation tool timemory-run
and
the Python line tracing profiler, etc. These specialization are
all individually configurable and it is recommended that applications create
their own specialization specific to their project – this will ensure that
the desired set of components configured by your application will not be
affected by a third-party library configuring their own set of components.
The general design is that each user-bundle:
Has their own unique environment variable for exclusive configuration, usually
"TIMEMORY_<LABEL>_COMPONENTS"
, e.g.:"TIMEMORY_TRACE_COMPONENTS"
foruser_trace_bundle
"TIMEMORY_MPIP_COMPONENTS"
foruser_mpip_components
If the unique environment variable is set, only the components in the variable are used
Thus making the bundle uniquely configurable
If the unique environment variable is not set, it searches one or more backup environment variables, the last of which being
"TIMEMORY_GLOBAL_COMPONENTS"
Thus, if no specific environment variables are set, all user bundles collect the components specified in
"TIMEMORY_GLOBAL_COMPONENTS"
If the unique environment variable is set to
"none"
, it terminates searching the backup environment variablesThus,
"TIMEMORY_GLOBAL_COMPONENTS"
can be set but the user can suppress a specific bundle from being affected by this configuration
If the unique environment variable contains
"fallthrough"
, it will continue adding the components specified by the backup environment variablesThus, the components specified in
"TIMEMORY_GLOBAL_COMPONENTS"
and"TIMEMORY_<LABEL>_COMPONENTS"
will be added
-
template<size_t Idx, typename Tag>
struct user_bundle : public tim::component::base<user_bundle<Idx, Tag>, void>, public tim::concepts::runtime_configurable¶
Warning
doxygentypedef: Cannot find typedef “tim::component::user_global_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_mpip_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_ncclp_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_ompt_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_profiler_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_trace_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Warning
doxygentypedef: Cannot find typedef “tim::component::user_kokkosp_bundle” in doxygen xml output for project “timemory” from directory: doxygen-xml
Third-Party Interface Components¶
-
struct allinea_map : public tim::component::base<allinea_map, void>, private tim::policy::instance_tracker<allinea_map, false>¶
Controls the AllineaMap sampling profiler.
-
struct caliper_marker : public tim::component::base<caliper_marker, void>, public tim::component::base<caliper_marker, void>, public tim::component::caliper_common¶
Standard marker for the Caliper Performance Analysis Toolbox.
-
struct caliper_config : public tim::component::base<caliper_config, void>, public tim::component::base<caliper_config, void>, private tim::policy::instance_tracker<caliper_config, false>¶
Component which provides Caliper
cali::ConfigManager
.
-
struct caliper_loop_marker : public tim::component::base<caliper_loop_marker, void>, public tim::component::base<caliper_loop_marker, void>, public tim::component::caliper_common¶
Loop marker for the Caliper Performance Analysis Toolbox.
-
struct craypat_counters : public tim::component::base<craypat_counters, std::vector<unsigned long>>¶
-
struct craypat_flush_buffer : public tim::component::base<craypat_flush_buffer, unsigned long>¶
Writes all the recorded contents in the data buffer. Returns the number of bytes flushed.
-
struct craypat_heap_stats : public tim::component::base<craypat_heap_stats, void>¶
Dumps the craypat heap statistics.
-
struct craypat_record : public tim::component::base<craypat_record, void>, private tim::policy::instance_tracker<craypat_record>¶
Provides scoping the CrayPAT profiler. Global initialization stops the profiler, the first call to
start()
starts the profiler again on the calling thread. Instance counting is enabled per-thread and each call to start increments the counter. All calls tostop()
have no effect until the counter reaches zero, at which point the compiler is turned off again.
-
struct craypat_region : public tim::component::base<craypat_region, void>, private tim::policy::instance_tracker<craypat_region, false>¶
Adds a region label to the CrayPAT profiling output.
Retrieves the names and value of any counter events that have been set to count on the hardware category.
-
struct gperftools_cpu_profiler : public tim::component::base<gperftools_cpu_profiler, void>¶
-
struct gperftools_heap_profiler : public tim::component::base<gperftools_heap_profiler, void>¶
-
struct likwid_marker : public tim::component::base<likwid_marker, void>¶
Provides likwid perfmon marker forwarding. Requires.
-
struct likwid_nvmarker : public tim::component::base<likwid_nvmarker, void>¶
Provides likwid nvmon marker forwarding. Requires.
-
template<typename Api>
struct ompt_handle : public tim::component::base<ompt_handle<Api>, void>, private tim::policy::instance_tracker<ompt_handle<Api>>¶
-
struct tau_marker : public tim::component::base<tau_marker, void>¶
Forwards timemory labels to the TAU (Tuning and Analysis Utilities)
-
struct vtune_event : public tim::component::base<vtune_event, void>¶
Implements
__itt_event
-
struct vtune_frame : public tim::component::base<vtune_frame, void>¶
Implements
__itt_domain
-
struct vtune_profiler : public tim::component::base<vtune_profiler, void>, private tim::policy::instance_tracker<vtune_profiler, false>¶
Implements
__itt_pause()
and__itt_resume()
to control where the vtune profiler is active.
Hardware Counter Components¶
-
template<int... EventTypes>
struct papi_tuple : public tim::component::base<papi_tuple<EventTypes...>, std::array<long long, sizeof...(EventTypes)>>, private tim::policy::instance_tracker<papi_tuple<EventTypes...>>, private tim::component::papi_common¶ This component is useful for bundling together a fixed set of hardware counter identifiers which require no runtime configuration.
// the "Instructions" alias below explicitly collects the total instructions, // the number of load instructions, the number of store instructions using Instructions = papi_tuple<PAPI_TOT_INS, PAPI_LD_INS, PAPI_SR_INS>; Instructions inst{}; inst.start(); ... inst.stop(); std::vector<double> data = inst.get();
- tparam EventTypes
Compile-time constant list of PAPI event identifiers
-
template<typename RateT, int... EventTypes>
struct papi_rate_tuple : public tim::component::base<papi_rate_tuple<RateT, EventTypes...>, std::pair<papi_tuple<EventTypes...>, RateT>>, private tim::component::papi_common¶ This component pairs a tim::component::papi_tuple with a component which will provide an interval over which the hardware counters will be reported, e.g. if
RateT
is tim::component::wall_clock, the reported values will be the hardware-counters w.r.t. the wall-clock time. IfRateT
is tim::component::cpu_clock, the reported values will be the hardware counters w.r.t. the cpu time.// the "Instructions" alias below explicitly collects the total instructions per second, // the number of load instructions per second, the number of store instructions per second using Instructions = papi_rate_tuple<wall_clock, PAPI_TOT_INS, PAPI_LD_INS, PAPI_SR_INS>; Instructions inst{}; inst.start(); ... inst.stop(); std::vector<double> data = inst.get();
- tparam RateT
Component whose value will be the divisor for all the hardware counters
- tparam EventTypes
Compile-time constant list of PAPI event identifiers
-
template<size_t MaxNumEvents>
struct papi_array : public tim::component::base<papi_array<MaxNumEvents>, std::array<long long, MaxNumEvents>>, private tim::policy::instance_tracker<papi_array<MaxNumEvents>>, private tim::component::papi_common¶
-
struct papi_vector : public tim::component::base<papi_vector, std::vector<long long>>, private tim::policy::instance_tracker<papi_vector>, private tim::component::papi_common¶
Miscellaneous Components¶
-
template<typename ...Types>
struct cpu_roofline : public tim::component::base<cpu_roofline<Types...>, std::pair<std::vector<long long>, double>>¶ Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine.
- tparam Types
Variadic list of data types for roofline analysis
-
typedef cpu_roofline<double> tim::component::cpu_roofline_dp_flops¶
A specialization of tim::component::cpu_roofline for 64-bit floating point operations.
-
using tim::component::cpu_roofline_flops = cpu_roofline<float, double>¶
-
typedef cpu_roofline<float> tim::component::cpu_roofline_sp_flops¶
A specialization of tim::component::cpu_roofline for 32-bit floating point operations.
GPU Components¶
-
struct cuda_event : public tim::component::base<cuda_event, float>¶
Records the time interval between two points in a CUDA stream. Less accurate than ‘cupti_activity’ for kernel timing but does not require linking to the CUDA driver.
-
struct cupti_activity : public tim::component::base<cupti_activity, intmax_t>¶
CUPTI activity tracing component for high-precision kernel timing. For low-precision kernel timing, use tim::component::cuda_event component.
-
struct cupti_counters : public tim::component::base<cupti_counters, cupti::profiler::results_t>¶
NVprof-style hardware counters via the CUpti callback API. Collecting these hardware counters has a higher overhead than the new CUpti Profiling API (tim::component::cupti_profiler). However, there are currently some issues with nesting the Profiling API and it is currently recommended to use this component for NVIDIA hardware counters in timemory. The callback API / NVprof is quite specific about the distinction between an “event” and a “metric”. For your convenience, timemory removes this distinction and events can be specified arbitrarily as metrics and vice-versa and this component will sort them into their appropriate category. For the full list of the available events/metrics, use
timemory-avail -H
from the command-line.
Warning
doxygenstruct: Cannot find class “tim::component::cupti_profiler” in doxygen xml output for project “timemory” from directory: doxygen-xml
-
template<typename ...Types>
struct gpu_roofline : public tim::component::base<gpu_roofline<Types...>, std::tuple<cupti_activity::value_type, cupti_counters::value_type>>¶ Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine.
- tparam Types
Variadic list of data types for roofline analysis
-
typedef gpu_roofline<double> tim::component::gpu_roofline_dp_flops¶
A specialization of tim::component::gpu_roofline for 64-bit floating point operations.
-
using tim::component::gpu_roofline_flops = gpu_roofline<float, double>¶
-
typedef gpu_roofline<cuda::fp16_t> tim::component::gpu_roofline_hp_flops¶
A specialization of tim::component::gpu_roofline for 16-bit floating point operations (depending on availability).
-
typedef gpu_roofline<float> tim::component::gpu_roofline_sp_flops¶
A specialization of tim::component::gpu_roofline for 32-bit floating point operations.
-
struct tim::component::nvtx_marker : public tim::component::base<nvtx_marker, void>¶
Inserts NVTX markers with the current timemory prefix. The default color scheme is a round-robin of red, blue, green, yellow, purple, cyan, pink, and light_green. These colors.
Public Functions
-
inline explicit nvtx_marker(const nvtx::color::color_t &_color)¶
construct with an specific color
-
inline explicit nvtx_marker(cuda::stream_t _stream)¶
construct with an specific CUDA stream
-
inline nvtx_marker(const nvtx::color::color_t &_color, cuda::stream_t _stream)¶
construct with an specific color and CUDA stream
-
inline void start()¶
start an nvtx range. Equivalent to
nvtxRangeStartEx
-
inline void stop()¶
stop the nvtx range. Equivalent to
nvtxRangeEnd
. Depending onsettings::nvtx_marker_device_sync()
this will either callcudaDeviceSynchronize()
orcudaStreamSynchronize(m_stream)
before stopping the range.
-
inline void mark_begin()¶
asynchronously add a marker. Equivalent to
nvtxMarkA
-
inline void mark_end()¶
asynchronously add a marker. Equivalent to
nvtxMarkA
-
inline void mark_begin(cuda::stream_t _stream)¶
asynchronously add a marker for a specific stream. Equivalent to
nvtxMarkA
-
inline void mark_end(cuda::stream_t _stream)¶
asynchronously add a marker for a specific stream. Equivalent to
nvtxMarkA
-
inline void set_stream(cuda::stream_t _stream)¶
set the current CUDA stream
-
inline void set_color(nvtx::color::color_t _color)¶
set the current color
-
inline explicit nvtx_marker(const nvtx::color::color_t &_color)¶
Data Tracking Components¶
-
template<typename InpT, typename Tag>
struct tim::component::data_tracker : public tim::component::base<data_tracker<InpT, Tag>, InpT>¶ This component is provided to facilitate data tracking. The first template parameter is the type of data to be tracked, the second is a custom tag for differentiating trackers which handle the same data types but record different high-level data.
Usage:
// declarations struct myproject {}; using itr_tracker_type = data_tracker<uint64_t, myproject>; using err_tracker_type = data_tracker<double, myproject>; // add statistics capabilities TIMEMORY_STATISTICS_TYPE(itr_tracker_type, int64_t) TIMEMORY_STATISTICS_TYPE(err_tracker_type, double) // set the label and descriptions TIMEMORY_METADATA_SPECIALIZATION( itr_tracker_type, "myproject_iterations", "short desc", "long description") TIMEMORY_METADATA_SPECIALIZATION( err_tracker_type, "myproject_convergence", "short desc", "long description") // this is the generic bundle pairing a timer with an iteration tracker // using this and not updating the iteration tracker will create entries // in the call-graph with zero iterations. using bundle_t = tim::auto_tuple<wall_clock, itr_tracker_type>; // this is a dedicated bundle for adding data-tracker entries. This style // can also be used with the iteration tracker or you can bundle // both trackers together. The auto_tuple will call start on construction // and stop on destruction so once can construct a nameless temporary of the // this bundle type and call store(...) on the nameless tmp. This will // ensure that the statistics are updated for each entry // using err_bundle_t = tim::auto_tuple<err_tracker_type>; // usage in a function is implied below double err = std::numeric_limits<double>::max(); const double tolerance = 1.0e-6; bundle_t t("iteration_time"); while(err > tolerance) { // store the starting error double initial_err = err; // add 1 for each iteration. Stats only updated when t is destroyed or t.stop() is // called t.store(std::plus<uint64_t>{}, 1); // ... do something ... // construct a nameless temporary which records the change in the error and // update the statistics <-- "foo" will have mean/min/max/stddev of the // error err_bundle_t{ "foo" }.store(err - initial_err); // NOTE: std::plus is used with t above bc it has multiple components so std::plus // helps ensure 1 doesn't get added to some other component with `store(int)` // In above err_bundle_t, there is only one component so there is not concern. }
When creating new data trackers, it is recommended to have this in header:
TIMEMORY_DECLARE_EXTERN_COMPONENT(custom_data_tracker_t, true, data_type)
And this in one source file (preferably one that is not re-compiled often)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(custom_data_tracker_t, true, data_type) TIMEMORY_INITIALIZE_STORAGE(custom_data_tracker_t)
where
custom_data_tracker_t
is the custom data tracker type (or an alias to the type) anddata_type
is the data type being tracked.Public Functions
-
inline auto get() const¶
get the data in the final form after unit conversion
-
inline auto get_display() const¶
get the data in a form suitable for display
-
inline auto get_secondary() const¶
map of the secondary entries. When TIMEMORY_ADD_SECONDARY is enabled contents of this map will be added as direct children of the current node in the call-graph.
-
template<typename T>
void store(T &&val, enable_if_acceptable_t<T, int> = 0)¶ store some data. Uses tim::data::handler for the type.
-
template<typename T>
void store(handler_type&&, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which takes a handler to ensure proper overload resolution
-
template<typename FuncT, typename T>
auto store(FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0) -> decltype(std::declval<handler_type>().store(*this, std::forward<FuncT>(f), std::forward<T>(val)), void())¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values
-
template<typename FuncT, typename T>
auto store(handler_type&&, FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0) -> decltype(std::declval<handler_type>().store(*this, std::forward<FuncT>(f), std::forward<T>(val)), void())¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values and takes a handler to ensure proper overload resolution
-
template<typename T>
void mark_begin(T &&val, enable_if_acceptable_t<T, int> = 0)¶ The combination of
mark_begin(...)
andmark_end(...)
can be used to store some initial data which may be needed later. Whenmark_end(...)
is called, the value is updated with the difference of the value provided tomark_end
and the temporary stored duringmark_begin
.
-
template<typename T>
void mark_begin(handler_type&&, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which takes a handler to ensure proper overload resolution
-
template<typename FuncT, typename T>
void mark_begin(FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values
-
template<typename FuncT, typename T>
void mark_begin(handler_type&&, FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values and takes a handler to ensure proper overload resolution
-
template<typename T>
void mark_end(T &&val, enable_if_acceptable_t<T, int> = 0)¶ The combination of
mark_begin(...)
andmark_end(...)
can be used to store some initial data which may be needed later. Whenmark_end(...)
is called, the value is updated with the difference of the value provided tomark_end
and the temporary stored duringmark_begin
. It may be valid to callmark_end
without callingmark_begin
but the result will effectively be a more expensive version of callingstore
.
-
template<typename T>
void mark_end(handler_type&&, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which takes a handler to ensure proper overload resolution
-
template<typename FuncT, typename T>
void mark_end(FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values
-
template<typename FuncT, typename T>
void mark_end(handler_type&&, FuncT &&f, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values and takes a handler to ensure proper overload resolution
-
template<typename T>
this_type *add_secondary(const std::string &_key, T &&val, enable_if_acceptable_t<T, int> = 0)¶ add a secondary value to the current node in the call-graph. When TIMEMORY_ADD_SECONDARY is enabled contents of this map will be added as direct children of the current node in the call-graph. This is useful for finer-grained details that might not always be desirable to display
-
template<typename T>
this_type *add_secondary(const std::string &_key, handler_type &&h, T &&val, enable_if_acceptable_t<T, int> = 0)¶ overload which takes a handler to ensure proper overload resolution
-
template<typename FuncT, typename T>
this_type *add_secondary(const std::string &_key, FuncT &&f, T &&val, enable_if_acceptable_and_func_t<FuncT, T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values
-
template<typename FuncT, typename T>
this_type *add_secondary(const std::string &_key, handler_type &&h, FuncT &&f, T &&val, enable_if_acceptable_and_func_t<FuncT, T, int> = 0)¶ overload which uses a lambda to bypass the default behavior of how the handler updates the values and takes a handler to ensure proper overload resolution
-
inline void set_value(const value_type &v)¶
set the current value
-
inline void set_value(value_type &&v)¶
set the current value via move
Public Static Functions
-
static inline std::string &label()¶
a reference is returned here so that it can be easily updated
-
static std::string &description()¶
a reference is returned here so that it can be easily updated
-
static inline auto &get_unit()¶
this returns a reference so that it can be easily modified
-
inline auto get() const¶
-
typedef data_tracker<intmax_t, TIMEMORY_API> tim::component::data_tracker_integer¶
-
typedef data_tracker<size_t, TIMEMORY_API> tim::component::data_tracker_unsigned¶
-
using tim::component::data_tracker_floating = data_tracker<double, TIMEMORY_API>¶
Function Wrapping Components¶
-
template<size_t Nt, typename BundleT, typename DiffT>
struct tim::component::gotcha : public tim::component::base<gotcha<Nt, BundleT, DiffT>, void>, public tim::concepts::external_function_wrapper¶ The gotcha component rewrites the global offset table such that calling the wrapped function actually invokes either a function which is wrapped by timemory instrumentation or is replaced by a timemory component with an function call operator (
operator()
) whose return value and arguments exactly match the original function. This component is only available on Linux and can only by applied to external, dynamically-linked functions (i.e. functions defined in a shared library). If theBundleT
template parameter is a non-empty component bundle, this component will surround the original function call with:bundle_type _obj{ "<NAME-OF-ORIGINAL-FUNCTION>" }; _obj.construct(_args...); _obj.start(); _obj.audit("<NAME-OF-ORIGINAL-FUNCTION>", _args...); Ret _ret = <CALL-ORIGINAL-FUNCTION> _obj.audit("<NAME-OF-ORIGINAL-FUNCTION>", _ret); _obj.stop();
- tparam Nt
Max number of functions which will wrapped by this component
- tparam BundleT
Component bundle to wrap around the function(s)
- tparam DiffT
Differentiator type to distinguish different sets of wrappers with identical values of
Nt
andBundleT
(or provide function call operator if replacing functions instead of wrapping functions)
If the
BundleT
template parameter is an empty variadic class, e.g.std::tuple<>
,tim::component_tuple<>
, etc., and theDiffT
template parameter is a timemory component, the assumption is that theDiffT
component has a function call operator which should replace the original function call, e.g.void* malloc(size_t)
can be replaced with a component withvoid* operator()(size_t)
, e.g.:// replace 'double exp(double)' struct exp_replace : base<exp_replace, void> { double operator()(double value) { float result = expf(static_cast<float>(value)); return static_cast<double>(result); } };
Example usage:
#include <timemory/timemory.hpp> #include <cassert> #include <cmath> #include <tuple> using empty_tuple_t = std::tuple<>; using base_bundle_t = tim::component_tuple<wall_clock, cpu_clock>; using gotcha_wrap_t = tim::component::gotcha<2, base_bundle_t, void>; using gotcha_repl_t = tim::component::gotcha<2, empty_tuple_t, exp_replace>; using impl_bundle_t = tim::mpl::append_type_t<base_bundle_t, tim::type_list<gotcha_wrap_t, gotcha_repl_t>>; void init_wrappers() { // wraps the sin and cos math functions gotcha_wrap_t::get_initializer() = []() { TIMEMORY_C_GOTCHA(gotcha_wrap_t, 0, sin); // index 0 replaces sin TIMEMORY_C_GOTCHA(gotcha_wrap_t, 1, cos); // index 1 replace cos }; // replaces the 'exp' function which may be 'exp' in symbols table // or '__exp_finite' in symbols table (use `nm <bindary>` to determine) gotcha_repl_t::get_initializer() = []() { TIMEMORY_C_GOTCHA(gotcha_repl_t, 0, exp); TIMEMORY_DERIVED_GOTCHA(gotcha_repl_t, 1, exp, "__exp_finite"); }; } // the following is useful to avoid having to call 'init_wrappers()' explicitly: // use comma operator to call 'init_wrappers' and return true static auto called_init_at_load = (init_wrappers(), true); int main() { assert(called_init_at_load == true); double angle = 45.0 * (M_PI / 180.0); impl_bundle_t _obj{ "main" }; // gotcha wrappers not activated yet printf("cos(%f) = %f\n", angle, cos(angle)); printf("sin(%f) = %f\n", angle, sin(angle)); printf("exp(%f) = %f\n", angle, exp(angle)); // gotcha wrappers are reference counted according to start/stop _obj.start(); printf("cos(%f) = %f\n", angle, cos(angle)); printf("sin(%f) = %f\n", angle, sin(angle)); printf("exp(%f) = %f\n", angle, exp(angle)); _obj.stop(); // gotcha wrappers will be deactivated printf("cos(%f) = %f\n", angle, cos(angle)); printf("sin(%f) = %f\n", angle, sin(angle)); printf("exp(%f) = %f\n", angle, exp(angle)); return 0; }
Public Static Functions
-
static inline get_select_list_t &get_permit_list()¶
when a permit list is provided, only these functions are wrapped by GOTCHA
-
static inline get_select_list_t &get_reject_list()¶
reject listed functions are never wrapped by GOTCHA
-
static inline void add_global_suppression(const std::string &func)¶
add function names at runtime to suppress wrappers
-
static inline auto get_ready()¶
get an array of whether the wrappers are filled and ready
-
static inline auto set_ready(bool val)¶
set filled wrappers to array of ready values
-
template<size_t N, typename Ret, typename ...Args>
struct instrument¶
-
struct tim::component::malloc_gotcha : public tim::component::base<malloc_gotcha, double>, public tim::concepts::external_function_wrapper¶
Public Functions
-
struct memory_allocations : public tim::component::base<memory_allocations, void>, public tim::concepts::external_function_wrapper, private tim::policy::instance_tracker<memory_allocations, true>¶
This component wraps malloc, calloc, free, cudaMalloc, cudaFree via GOTCHA and tracks the number of bytes requested/freed in each call. This component is useful for detecting the locations where memory re-use would provide a performance benefit.
Base Components¶
-
template<typename Tp, typename Value>
struct tim::component::base : public tim::component::empty_base, private tim::component::base_state, private base_data_t<Tp, Value>, public tim::concepts::component¶ Public Types
-
using EmptyT = std::tuple<>¶
-
using dynamic_type = typename trait::dynamic_base<Tp>::type¶
-
using statistics_policy = policy::record_statistics<Tp, Value>¶
-
using fmtflags = std::ios_base::fmtflags¶
Public Functions
-
void set_started()¶
store that start has been called
-
void set_stopped()¶
store that stop has been called
-
void reset()¶
reset the values
-
void get(void *&ptr, size_t _typeid_hash) const¶
assign type to a pointer
-
inline auto get() const¶
retrieve the current measurement value in the units for the type
-
inline auto get_display() const¶
retrieve the current measurement value in the units for the type in a format that can be piped to the output stream operator (‘<<’)
-
template<typename Up = Tp>
void print(std::ostream&, enable_if_t<trait::uses_value_storage<Up, Value>::value, int> = 0) const¶
-
template<typename Up = Tp>
void print(std::ostream&, enable_if_t<!trait::uses_value_storage<Up, Value>::value, long> = 0) const¶
-
template<typename Archive, typename Up = Type, enable_if_t<!trait::custom_serialization<Up>::value, int> = 0>
void load(Archive &ar, unsigned int)¶ serialization load (input)
-
template<typename Archive, typename Up = Type, enable_if_t<!trait::custom_serialization<Up>::value, int> = 0>
void save(Archive &ar, unsigned int version) const¶ serialization store (output)
-
inline int64_t get_laps() const¶
add a sample
get number of measurement
-
inline auto get_iterator() const¶
-
inline void set_laps(int64_t v)¶
-
inline void set_iterator(graph_iterator itr)¶
-
inline decltype(auto) load()¶
-
inline decltype(auto) load() const¶
-
inline bool get_depth_change() const¶
-
inline bool get_is_flat() const¶
-
inline bool get_is_invalid() const¶
-
inline bool get_is_on_stack() const¶
-
inline bool get_is_running() const¶
-
inline bool get_is_transient() const¶
-
inline void set_depth_change(bool v)¶
-
inline void set_is_flat(bool v)¶
-
inline void set_is_invalid(bool v)¶
-
inline void set_is_on_stack(bool v)¶
-
inline void set_is_running(bool v)¶
-
inline void set_is_transient(bool v)¶
Public Static Functions
-
template<typename Vp, typename Up = Tp, enable_if_t<trait::sampler<Up>::value, int> = 0>
static void add_sample(Vp&&)¶
-
static base_storage_type *get_storage()¶
-
template<typename Up = Type, typename UnitT = typename trait::units<Up>::type, enable_if_t<std::is_same<UnitT, int64_t>::value, int> = 0>
static int64_t unit()¶
-
template<typename Up = Type, typename UnitT = typename trait::units<Up>::display_type, enable_if_t<std::is_same<UnitT, std::string>::value, int> = 0>
static std::string display_unit()¶
-
template<typename Up = Type, typename UnitT = typename trait::units<Up>::type, enable_if_t<std::is_same<UnitT, int64_t>::value, int> = 0>
static int64_t get_unit()¶
-
template<typename Up = Type, typename UnitT = typename trait::units<Up>::display_type, enable_if_t<std::is_same<UnitT, std::string>::value, int> = 0>
static std::string get_display_unit()¶
-
static short get_width()¶
-
static short get_precision()¶
-
static std::string label()¶
-
static std::string description()¶
-
static std::string get_label()¶
-
static std::string get_description()¶
Public Static Attributes
-
static constexpr bool is_component = true¶
-
static constexpr bool timing_category_v = trait::is_timing_category<Type>::value¶
-
static constexpr bool memory_category_v = trait::is_memory_category<Type>::value¶
-
static constexpr bool timing_units_v = trait::uses_timing_units<Type>::value¶
-
static constexpr bool memory_units_v = trait::uses_memory_units<Type>::value¶
-
static constexpr bool percent_units_v = trait::uses_percent_units<Type>::value¶
-
static constexpr auto ios_fixed = std::ios_base::fixed¶
-
static constexpr auto ios_decimal = std::ios_base::dec¶
-
static constexpr auto ios_showpoint = std::ios_base::showpoint¶
-
static const fmtflags format_flags = ios_fixed | ios_decimal | ios_showpoint¶
Private Functions
-
inline bool get_is_running() const
-
inline bool get_is_on_stack() const
-
inline bool get_is_transient() const
-
inline bool get_is_flat() const
-
inline bool get_depth_change() const
-
inline bool get_is_invalid() const
-
inline void set_is_running(bool v)
-
inline void set_is_on_stack(bool v)
-
inline void set_is_transient(bool v)
-
inline void set_is_flat(bool v)
-
inline void set_depth_change(bool v)
-
inline void set_is_invalid(bool v)
Friends
- friend struct node::graph< Tp >
- friend struct operation::init_storage< Tp >
- friend struct operation::fini_storage< Tp >
- friend struct operation::cache< Tp >
- friend struct operation::construct< Tp >
- friend struct operation::set_prefix< Tp >
- friend struct operation::push_node< Tp >
- friend struct operation::pop_node< Tp >
- friend struct operation::record< Tp >
- friend struct operation::reset< Tp >
- friend struct operation::measure< Tp >
- friend struct operation::start< Tp >
- friend struct operation::stop< Tp >
- friend struct operation::set_started< Tp >
- friend struct operation::set_stopped< Tp >
- friend struct operation::minus< Tp >
- friend struct operation::plus< Tp >
- friend struct operation::multiply< Tp >
- friend struct operation::divide< Tp >
- friend struct operation::base_printer< Tp >
- friend struct operation::print< Tp >
- friend struct operation::print_storage< Tp >
- friend struct operation::copy< Tp >
- friend struct operation::sample< Tp >
- friend struct operation::serialization< Tp >
- friend struct operation::finalize::get< Tp, true >
- friend struct operation::finalize::get< Tp, false >
- friend struct operation::finalize::merge< Tp, true >
- friend struct operation::finalize::merge< Tp, false >
- friend struct operation::finalize::print< Tp, true >
- friend struct operation::finalize::print< Tp, false >
- friend struct operation::compose
-
using EmptyT = std::tuple<>¶
-
struct empty_base¶
The default base class for timemory components.
Subclassed by tim::component::base< vtune_profiler, void >, tim::component::base< thread_cpu_clock >, tim::component::base< virtual_memory >, tim::component::base< read_char, std::pair< int64_t, int64_t > >, tim::component::base< current_peak_rss, std::pair< int64_t, int64_t > >, tim::component::base< allinea_map, void >, tim::component::base< cupti_pcsampling, cupti::pcsample >, tim::component::base< cuda_profiler, void >, tim::component::base< cpu_clock >, tim::component::base< mpi_trace_gotcha, void >, tim::component::base< malloc_gotcha, double >, tim::component::base< cpu_roofline< Types… >, std::pair< std::vector< long long >, double > >, tim::component::base< user_clock >, tim::component::base< placeholder< Types… >, void >, tim::component::base< page_rss, int64_t >, tim::component::base< nvtx_marker, void >, tim::component::base< num_minor_page_faults >, tim::component::base< mpip_handle< Toolset, Tag >, void >, tim::component::base< cupti_activity, intmax_t >, tim::component::base< papi_rate_tuple< RateT, EventTypes… >, std::pair< papi_tuple< EventTypes… >, RateT > >, tim::component::base< num_io_in >, tim::component::base< voluntary_context_switch >, tim::component::base< trip_count >, tim::component::base< ompt_handle< Api >, void >, tim::component::base< nothing, skeleton::base >, tim::component::base< user_bundle< Idx, Tag >, void >, tim::component::base< network_stats, cache::network_stats >, tim::component::base< data_tracker< InpT, Tag >, InpT >, tim::component::base< wall_clock, int64_t >, tim::component::base< vtune_frame, void >, tim::component::base< num_major_page_faults >, tim::component::base< memory_allocations, void >, tim::component::base< craypat_region, void >, tim::component::base< craypat_record, void >, tim::component::base< system_clock >, tim::component::base< papi_vector, std::vector< long long > >, tim::component::base< papi_tuple< EventTypes… >, std::array< long long, sizeof…(EventTypes)> >, tim::component::base< ncclp_handle< Toolset, Tag >, void >, tim::component::base< cupti_counters, cupti::profiler::results_t >, tim::component::base< cpu_util, std::pair< int64_t, int64_t > >, tim::component::base< user_mode_time, int64_t >, tim::component::base< process_cpu_clock >, tim::component::base< num_io_out >, tim::component::base< monotonic_raw_clock >, tim::component::base< kernel_mode_time, int64_t >, tim::component::base< gotcha< Nt, BundleT, DiffT >, void >, tim::component::base< cuda_event, float >, tim::component::base< craypat_counters, std::vector< unsigned long > >, tim::component::base< thread_cpu_util, std::pair< int64_t, int64_t > >, tim::component::base< gperftools_cpu_profiler, void >, tim::component::base< papi_array< MaxNumEvents >, std::array< long long, MaxNumEvents > >, tim::component::base< monotonic_clock >, tim::component::base< gpu_roofline< Types… >, std::tuple< cupti_activity::value_type, cupti_counters::value_type > >, tim::component::base< gperftools_heap_profiler, void >, tim::component::base< craypat_heap_stats, void >, tim::component::base< caliper_marker, void >, tim::component::base< caliper_loop_marker, void >, tim::component::base< sampler< CompT< Types… >, N, SigIds… >, void >, tim::component::base< written_char, std::array< int64_t, 2 > >, tim::component::base< priority_context_switch >, tim::component::base< ompt_data_tracker< Api >, void >, tim::component::base< craypat_flush_buffer, unsigned long >, tim::component::base< caliper_config, void >, tim::component::base< pthread_gotcha, void >, tim::component::base< vtune_event, void >, tim::component::base< tau_marker, void >, tim::component::base< read_bytes, std::pair< int64_t, int64_t > >, tim::component::base< process_cpu_util, std::pair< int64_t, int64_t > >, tim::component::base< peak_rss >, tim::component::base< kernel_logger, void >, tim::component::base< written_bytes, std::array< int64_t, 2 > >, tim::component::base< likwid_nvmarker, void >, tim::component::base< likwid_marker, void >, tim::component::base< Tp, Value >, tim::component::base< Tp, void >