timemory 3.3.0
Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
tim::component Namespace Reference

Namespaces

namespace  factory
 
namespace  operators
 
namespace  skeleton
 

Classes

struct  allinea_map
 Controls the AllineaMap sampling profiler. More...
 
struct  base
 
struct  base_data
 
struct  base_data< Tp, 0 >
 
struct  base_data< Tp, 1 >
 
struct  base_data< Tp, 2 >
 
struct  base_data< Tp, 3 >
 
struct  base_state
 Provide state configuration options for a component instance. The current states are: More...
 
struct  caliper_common
 
struct  caliper_config
 Component which provides Caliper cali::ConfigManager. More...
 
struct  caliper_loop_marker
 Loop marker for the Caliper Performance Analysis Toolbox. More...
 
struct  caliper_marker
 Standard marker for the Caliper Performance Analysis Toolbox. More...
 
struct  cpu_clock
 this component extracts only the CPU time spent in both user- and kernel- mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More...
 
struct  cpu_roofline
 Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine. More...
 
struct  cpu_util
 this computes the CPU utilization percentage for the calling process and child processes. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More...
 
struct  craypat_counters
 
struct  craypat_flush_buffer
 Writes all the recorded contents in the data buffer. Returns the number of bytes flushed. More...
 
struct  craypat_heap_stats
 Dumps the craypat heap statistics. More...
 
struct  craypat_record
 Provides scoping the CrayPAT profiler. Global initialization stops the profiler, the first call to start() starts the profiler again on the calling thread. Instance counting is enabled per-thread and each call to start increments the counter. All calls to stop() have no effect until the counter reaches zero, at which point the compiler is turned off again. More...
 
struct  craypat_region
 Adds a region label to the CrayPAT profiling output. More...
 
struct  cuda_event
 Records the time interval between two points in a CUDA stream. Less accurate than 'cupti_activity' for kernel timing but does not require linking to the CUDA driver. More...
 
struct  cuda_profiler
 Control switch for a CUDA profiler running on the application. Only the first call to start() and the last call to stop() actually toggle the state of the external CUDA profiler when component instances are nested. More...
 
struct  cupti_activity
 CUPTI activity tracing component for high-precision kernel timing. For low-precision kernel timing, use tim::component::cuda_event component. More...
 
struct  cupti_counters
 NVprof-style hardware counters via the CUpti callback API. Collecting these hardware counters has a higher overhead than the new CUpti Profiling API (tim::component::cupti_profiler). However, there are currently some issues with nesting the Profiling API and it is currently recommended to use this component for NVIDIA hardware counters in timemory. The callback API / NVprof is quite specific about the distinction between an "event" and a "metric". For your convenience, timemory removes this distinction and events can be specified arbitrarily as metrics and vice-versa and this component will sort them into their appropriate category. For the full list of the available events/metrics, use timemory-avail -H from the command-line. More...
 
struct  cupti_pcsampling
 The PC Sampling gives the number of samples for each source and assembly line with various stall reasons. Using this information, you can pinpoint portions of your kernel that are introducing latencies and the reason for the latency. More...
 
struct  current_peak_rss
 this struct extracts the absolute value of high-water mark of the resident set size (RSS) at start and stop points. RSS is current amount of memory in RAM. More...
 
struct  data_tracker
 This component is provided to facilitate data tracking. The first template parameter is the type of data to be tracked, the second is a custom tag for differentiating trackers which handle the same data types but record different high-level data. More...
 
struct  empty_base
 A very lightweight base which provides no storage. More...
 
struct  empty_storage
 A very lightweight storage class which provides nothing. More...
 
struct  enumerator
 This is a critical specialization for mapping string and integers to component types at runtime (should always be specialized alongside tim::component::properties) and it is also critical for performing template metaprogramming "loops" over all the components. E.g.: More...
 
struct  ert_timer
 
struct  gotcha
 The gotcha component rewrites the global offset table such that calling the wrapped function actually invokes either a function which is wrapped by timemory instrumentation or is replaced by a timemory component with an function call operator (operator()) whose return value and arguments exactly match the original function. This component is only available on Linux and can only by applied to external, dynamically-linked functions (i.e. functions defined in a shared library). If the BundleT template parameter is a non-empty component bundle, this component will surround the original function call with: More...
 
class  gotcha_suppression
 
struct  gperftools_cpu_profiler
 
struct  gperftools_heap_profiler
 
struct  gpu_roofline
 Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine. More...
 
struct  hip_event
 Records the time interval between two points in a HIP stream. Less accurate than 'cupti_activity' for kernel timing but does not require linking to the HIP driver. More...
 
struct  kernel_mode_time
 This is the total amount of time spent executing in kernel mode. More...
 
struct  likwid_marker
 Provides likwid perfmon marker forwarding. Requires. More...
 
struct  likwid_nvmarker
 Provides likwid nvmon marker forwarding. Requires. More...
 
struct  malloc_gotcha
 
struct  memory_allocations
 This component wraps malloc, calloc, free, CUDA/HIP malloc/free via GOTCHA and tracks the number of bytes requested/freed in each call. This component is useful for detecting the locations where memory re-use would provide a performance benefit. More...
 
struct  metadata
 Provides forward declaration support for assigning static metadata properties. This is most useful for specialization of template components. If this class is specialized for component, then the component does not need to provide the static member functions label() and description(). More...
 
struct  monotonic_clock
 clock that increments monotonically, tracking the time since an arbitrary point, and will continue to increment while the system is asleep. More...
 
struct  monotonic_raw_clock
 clock that increments monotonically, tracking the time since an arbitrary point like CLOCK_MONOTONIC. However, this clock is unaffected by frequency or time adjustments. It should not be compared to other system time sources. More...
 
struct  mpip_handle
 
struct  ncclp_handle
 
struct  network_stats
 
struct  nothing
 
struct  num_io_in
 the number of times the file system had to perform input. More...
 
struct  num_io_out
 the number of times the file system had to perform output. More...
 
struct  num_major_page_faults
 the number of page faults serviced that required I/O activity. More...
 
struct  num_minor_page_faults
 the number of page faults serviced without any I/O activity; here I/O activity is avoided by reclaiming a page frame from the list of pages awaiting reallocation. More...
 
struct  nvtx_marker
 Inserts NVTX markers with the current timemory prefix. The default color scheme is a round-robin of red, blue, green, yellow, purple, cyan, pink, and light_green. These colors. More...
 
struct  ompt_data_tracker
 
struct  ompt_handle
 
struct  opaque
 
struct  page_rss
 this struct measures the resident set size (RSS) currently allocated in pages of memory. Unlike the peak_rss, this value will fluctuate as memory gets freed and allocated More...
 
struct  papi_array
 
struct  papi_common
 
struct  papi_rate_tuple
 This component pairs a tim::component::papi_tuple with a component which will provide an interval over which the hardware counters will be reported, e.g. if RateT is tim::component::wall_clock, the reported values will be the hardware-counters w.r.t. the wall-clock time. If RateT is tim::component::cpu_clock, the reported values will be the hardware counters w.r.t. the cpu time. More...
 
struct  papi_tuple
 This component is useful for bundling together a fixed set of hardware counter identifiers which require no runtime configuration. More...
 
struct  papi_vector
 
struct  peak_rss
 this struct extracts the high-water mark (or a change in the high-water mark) of the resident set size (RSS). Which is current amount of memory in RAM. When used on a system with swap enabled, this value may fluctuate but should not on an HPC system. More...
 
struct  perfetto_trace
 Component providing perfetto implementation. More...
 
struct  placeholder
 provides nothing, used for dummy types in enum More...
 
struct  printer
 A diagnostic component when prints messages via start(...) and stores messages via store(...). The stored messages are returned via the get() member function. If bundled alongside the timestamp component, the timestamp will be added to the stored message. More...
 
struct  priority_context_switch
 the number of times a context switch resulted due to a higher priority process becoming runnable or because the current process exceeded its time slice More...
 
struct  process_cpu_clock
 this clock measures the CPU time within the current process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More...
 
struct  process_cpu_util
 this computes the CPU utilization percentage for ONLY the calling process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More...
 
struct  properties
 This is a critical specialization for mapping string and integers to component types at runtime. The enum_string() function is the enum id as a string. The id() function is (typically) the name of the C++ component as a string. The ids() function returns a set of strings which are alternative string identifiers to the enum string or the string ID. Additionally, it provides serializaiton of these values. More...
 
struct  read_bytes
 I/O counter for bytes read. Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the submit_bio() level, so it is accurate for block-backed filesystems. More...
 
struct  read_char
 I/O counter for chars read. The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read() and pread(). It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache) More...
 
struct  roctx_marker
 Inserts ROCTX markers with the current timemory prefix. More...
 
struct  state
 
struct  static_properties
 Provides three variants of a matches function for determining if a component is identified by a given string or enumeration value. More...
 
struct  static_properties< Tp, false >
 
struct  static_properties< Tp, true >
 
struct  static_properties< void, false >
 
struct  system_clock
 this component extracts only the CPU time spent in kernel-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More...
 
struct  tau_marker
 Forwards timemory labels to the TAU (Tuning and Analysis Utilities) More...
 
struct  thread_cpu_clock
 this clock measures the CPU time within the current thread (excludes sibling/child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More...
 
struct  thread_cpu_util
 this computes the CPU utilization percentage for ONLY the calling thread (excludes sibling and child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More...
 
struct  timestamp
 this component stores the timestamp of when a bundle was started and is specialized such that the "timeline_storage" type-trait is true. This means that every entry in the call-graph for this output will be unique (look in the timestamp.txt output file) More...
 
struct  trip_count
 Records the number of invocations. This is the most lightweight metric available since it only increments an integer and never records any statistics. If dynamic instrumentation is used and the overhead is significant, it is recommended to set this as the only component (-d trip_count) and then use the regex exclude option (-E) to remove any non-critical function calls which have very high trip-counts. More...
 
struct  user_bundle
 
struct  user_clock
 this component extracts only the CPU time spent in user-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More...
 
struct  user_mode_time
 This is the total amount of time spent executing in user mode. More...
 
struct  virtual_memory
 this struct extracts the virtual memory usage More...
 
struct  voluntary_context_switch
 the number of times a context switch resulted due to a process voluntarily giving up the processor before its time slice was completed (usually to await availability of a resource). More...
 
struct  vtune_event
 Implements __itt_event More...
 
struct  vtune_frame
 Implements __itt_domain More...
 
struct  vtune_profiler
 Implements __itt_pause() and __itt_resume() to control where the vtune profiler is active. More...
 
struct  wall_clock
 
struct  written_bytes
 I/O counter for bytes written. Attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at page-dirtying time. More...
 
struct  written_char
 I/O counter for chars written. The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with tim::component::read_char (rchar). More...
 

Typedefs

template<typename Tp , typename ValueT >
using base_data_t = typename internal::base_data< Tp, ValueT >::type
 
template<typename Tp >
using graph_iterator_t = typename graph< node::graph< Tp > >::iterator
 
template<typename Tp >
using graph_const_iterator_t = typename graph< node::graph< Tp > >::const_iterator
 
using cupti_event = cupti_counters
 
template<typename T >
using data_handler_t = typename T::handler_type
 an alias for getting the handle_type of a data tracker More...
 
using data_tracker_integer = data_tracker< intmax_t, TIMEMORY_API >
 
using data_tracker_unsigned = data_tracker< size_t, TIMEMORY_API >
 
using data_tracker_floating = data_tracker< double, TIMEMORY_API >
 
using idset_t = std::set< std::string >
 
template<int Idx>
using enumerator_t = typename enumerator< Idx >::type
 
template<int Idx>
using properties_t = typename enumerator< Idx >::type
 
using cpu_roofline_sp_flops = cpu_roofline< float >
 A specialization of tim::component::cpu_roofline for 32-bit floating point operations. More...
 
using cpu_roofline_dp_flops = cpu_roofline< double >
 A specialization of tim::component::cpu_roofline for 64-bit floating point operations. More...
 
using cpu_roofline_flops = cpu_roofline< float, double >
 
using gpu_roofline_hp_flops = gpu_roofline< cuda::fp16_t >
 A specialization of tim::component::gpu_roofline for 16-bit floating point operations (depending on availability). More...
 
using gpu_roofline_sp_flops = gpu_roofline< float >
 A specialization of tim::component::gpu_roofline for 32-bit floating point operations. More...
 
using gpu_roofline_dp_flops = gpu_roofline< double >
 A specialization of tim::component::gpu_roofline for 64-bit floating point operations. More...
 
using gpu_roofline_flops = gpu_roofline< float, double >
 
using vol_cxt_switch = voluntary_context_switch
 
using prio_cxt_switch = priority_context_switch
 
using timestamp_entry_t = std::chrono::time_point< std::chrono::system_clock >
 
using real_clock = wall_clock
 
using virtual_clock = wall_clock
 
template<bool B, typename T = int>
using enable_if_t = typename std::enable_if< B, T >::type
 

Functions

template<typename Tp , typename Value >
Tp operator+ (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs)
 
template<typename Tp , typename Value >
Tp operator- (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs)
 
template<typename Tp , typename Value >
Tp operator* (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs)
 
template<typename Tp , typename Value >
Tp operator/ (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs)
 
template<typename Toolset , typename Tag >
void configure_mpip (std::set< std::string > permit={}, std::set< std::string > reject={})
 
template<typename Toolset , typename Tag >
uint64_t activate_mpip ()
 The thread that first activates mpip will be the thread that turns it off. Function returns the number of new mpip handles. More...
 
template<typename Toolset , typename Tag >
uint64_t deactivate_mpip (uint64_t)
 The thread that created the initial mpip handle will turn off. Returns the number of handles active. More...
 
template<typename Toolset , typename Tag >
void configure_ncclp (std::set< std::string > permit={}, std::set< std::string > reject={})
 
template<typename Toolset >
opaque get_opaque ()
 
template<typename Toolset >
opaque get_opaque (scope::config _scope)
 
template<typename Toolset , typename Arg , typename... Args>
opaque get_opaque (Arg &&arg, Args &&... args)
 
template<typename Toolset >
std::set< size_t > get_typeids ()
 

Class Documentation

◆ tim::component::base_data

struct tim::component::base_data
template<typename Tp, size_t Sz>
struct tim::component::base_data< Tp, Sz >

Definition at line 41 of file data.hpp.

+ Collaboration diagram for tim::component::base_data< Tp, Sz >:

Typedef Documentation

◆ base_data_t

template<typename Tp , typename ValueT >
using tim::component::base_data_t = typedef typename internal::base_data<Tp, ValueT>::type

Definition at line 519 of file data.hpp.

◆ cpu_roofline_dp_flops

A specialization of tim::component::cpu_roofline for 64-bit floating point operations.

Definition at line 51 of file types.hpp.

◆ cpu_roofline_flops

using tim::component::cpu_roofline_flops = typedef cpu_roofline<float, double>

Definition at line 58 of file types.hpp.

◆ cpu_roofline_sp_flops

A specialization of tim::component::cpu_roofline for 32-bit floating point operations.

Definition at line 45 of file types.hpp.

◆ cupti_event

Definition at line 790 of file cupti_counters.hpp.

◆ data_handler_t

template<typename T >
typename T::handler_type tim::component::data_handler_t

an alias for getting the handle_type of a data tracker

Definition at line 440 of file components.hpp.

◆ data_tracker_floating

Definition at line 461 of file components.hpp.

◆ data_tracker_integer

Specialization of tim::component::data_tracker for storing unsigned integer data

Definition at line 447 of file components.hpp.

◆ data_tracker_unsigned

Specialization of tim::component::data_tracker for storing floating point data

Definition at line 454 of file components.hpp.

◆ enable_if_t

template<bool B, typename T = int>
using tim::component::enable_if_t = typedef typename std::enable_if<B, T>::type

Definition at line 58 of file types.hpp.

◆ enumerator_t

template<int Idx>
using tim::component::enumerator_t = typedef typename enumerator<Idx>::type

Definition at line 273 of file properties.hpp.

◆ gpu_roofline_dp_flops

A specialization of tim::component::gpu_roofline for 64-bit floating point operations.

Definition at line 76 of file types.hpp.

◆ gpu_roofline_flops

using tim::component::gpu_roofline_flops = typedef gpu_roofline<float, double>

Definition at line 86 of file types.hpp.

◆ gpu_roofline_hp_flops

A specialization of tim::component::gpu_roofline for 16-bit floating point operations (depending on availability).

Definition at line 64 of file types.hpp.

◆ gpu_roofline_sp_flops

A specialization of tim::component::gpu_roofline for 32-bit floating point operations.

Definition at line 70 of file types.hpp.

◆ graph_const_iterator_t

template<typename Tp >
using tim::component::graph_const_iterator_t = typedef typename graph<node::graph<Tp> >::const_iterator

Definition at line 46 of file declaration.hpp.

◆ graph_iterator_t

template<typename Tp >
using tim::component::graph_iterator_t = typedef typename graph<node::graph<Tp> >::iterator

Definition at line 43 of file declaration.hpp.

◆ idset_t

using tim::component::idset_t = typedef std::set<std::string>

Definition at line 54 of file properties.hpp.

◆ prio_cxt_switch

Definition at line 425 of file components.hpp.

◆ properties_t

template<int Idx>
using tim::component::properties_t = typedef typename enumerator<Idx>::type

Definition at line 276 of file properties.hpp.

◆ real_clock

Definition at line 72 of file wall_clock.hpp.

◆ timestamp_entry_t

using tim::component::timestamp_entry_t = typedef std::chrono::time_point<std::chrono::system_clock>

Definition at line 51 of file types.hpp.

◆ virtual_clock

Definition at line 74 of file wall_clock.hpp.

◆ vol_cxt_switch

Definition at line 373 of file components.hpp.

Function Documentation

◆ activate_mpip()

template<typename Toolset , typename Tag >
uint64_t tim::component::activate_mpip ( )

The thread that first activates mpip will be the thread that turns it off. Function returns the number of new mpip handles.

Definition at line 161 of file mpip.hpp.

162{
164
165 static std::shared_ptr<handle_t> _handle;
166
167 if(!_handle.get())
168 {
169 _handle = std::make_shared<handle_t>();
170 _handle->start();
171
172 auto cleanup_functor = [=]() {
173 if(_handle)
174 {
175 _handle->stop();
176 _handle.reset();
177 }
178 };
179
180 static std::string _label = []() {
181 std::stringstream ss;
182 ss << "timemory-mpip-" << demangle<Toolset>() << "-" << demangle<Tag>();
183 return ss.str();
184 }();
185 DEBUG_PRINT_HERE("Adding cleanup for %s", _label.c_str());
186 tim::manager::instance()->add_cleanup(_label, cleanup_functor);
187 return 1;
188 }
189 return 0;
190}
static pointer_t instance()
Get a shared pointer to the instance for the current thread.
tim::mpl::apply< std::string > string
Definition: macros.hpp:53
#define DEBUG_PRINT_HERE(...)
Definition: macros.hpp:168

References DEBUG_PRINT_HERE, and tim::manager::instance().

◆ configure_mpip()

template<typename Toolset , typename Tag >
void tim::component::configure_mpip ( std::set< std::string >  permit = {},
std::set< std::string >  reject = {} 
)

Definition at line 222 of file mpip.hpp.

223{}

◆ configure_ncclp()

template<typename Toolset , typename Tag >
void tim::component::configure_ncclp ( std::set< std::string >  permit = {},
std::set< std::string >  reject = {} 
)

◆ deactivate_mpip()

template<typename Toolset , typename Tag >
uint64_t tim::component::deactivate_mpip ( uint64_t  id)

The thread that created the initial mpip handle will turn off. Returns the number of handles active.

Definition at line 200 of file mpip.hpp.

201{
202 if(id > 0)
203 {
204 static std::string _label = []() {
205 std::stringstream ss;
206 ss << "timemory-mpip-" << demangle<Toolset>() << "-" << demangle<Tag>();
207 return ss.str();
208 }();
209 DEBUG_PRINT_HERE("Removing cleanup for %s", _label.c_str());
210 tim::manager::instance()->cleanup(_label);
211 return 0;
212 }
213 return 1;
214}

References DEBUG_PRINT_HERE, and tim::manager::instance().

◆ get_opaque() [1/3]

template<typename Toolset >
opaque tim::component::get_opaque ( )

◆ get_opaque() [2/3]

template<typename Toolset , typename Arg , typename... Args>
opaque tim::component::get_opaque ( Arg &&  arg,
Args &&...  args 
)

◆ get_opaque() [3/3]

template<typename Toolset >
opaque tim::component::get_opaque ( scope::config  _scope)

◆ get_typeids()

template<typename Toolset >
std::set< size_t > tim::component::get_typeids ( )

◆ operator*()

template<typename Tp , typename Value >
Tp tim::component::operator* ( const base< Tp, Value > &  lhs,
const base< Tp, Value > &  rhs 
)

Definition at line 314 of file definition.hpp.

315{
316 return Tp(static_cast<const Tp&>(lhs)) *= static_cast<const Tp&>(rhs);
317}

◆ operator+()

template<typename Tp , typename Value >
Tp tim::component::operator+ ( const base< Tp, Value > &  lhs,
const base< Tp, Value > &  rhs 
)

Definition at line 297 of file definition.hpp.

298{
299 return Tp(static_cast<const Tp&>(lhs)) += static_cast<const Tp&>(rhs);
300}

◆ operator-()

template<typename Tp , typename Value >
Tp tim::component::operator- ( const base< Tp, Value > &  lhs,
const base< Tp, Value > &  rhs 
)

Definition at line 306 of file definition.hpp.

307{
308 return Tp(static_cast<const Tp&>(lhs)) -= static_cast<const Tp&>(rhs);
309}

◆ operator/()

template<typename Tp , typename Value >
Tp tim::component::operator/ ( const base< Tp, Value > &  lhs,
const base< Tp, Value > &  rhs 
)

Definition at line 323 of file definition.hpp.

324{
325 return Tp(static_cast<const Tp&>(lhs)) /= static_cast<const Tp&>(rhs);
326}