timemory
3.2.1
Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
|
Namespaces | |
factory | |
operators | |
skeleton | |
Classes | |
struct | allinea_map |
Controls the AllineaMap sampling profiler. More... | |
struct | base_data |
struct | base_data< Tp, 0 > |
struct | base_data< Tp, 1 > |
struct | base_data< Tp, 2 > |
struct | base_data< Tp, 3 > |
struct | base_state |
Provide state configuration options for a component instance. The current states are: More... | |
struct | empty_base |
The default base class for timemory components. More... | |
struct | base |
struct | caliper_marker |
Standard marker for the Caliper Performance Analysis Toolbox. More... | |
struct | caliper_loop_marker |
Loop marker for the Caliper Performance Analysis Toolbox. More... | |
struct | caliper_config |
Component which provides Caliper cali::ConfigManager . More... | |
struct | caliper_common |
struct | craypat_record |
Provides scoping the CrayPAT profiler. Global initialization stops the profiler, the first call to start() starts the profiler again on the calling thread. Instance counting is enabled per-thread and each call to start increments the counter. All calls to stop() have no effect until the counter reaches zero, at which point the compiler is turned off again. More... | |
struct | craypat_region |
Adds a region label to the CrayPAT profiling output. More... | |
struct | craypat_counters |
struct | craypat_heap_stats |
Dumps the craypat heap statistics. More... | |
struct | craypat_flush_buffer |
Writes all the recorded contents in the data buffer. Returns the number of bytes flushed. More... | |
struct | cuda_event |
Records the time interval between two points in a CUDA stream. Less accurate than 'cupti_activity' for kernel timing but does not require linking to the CUDA driver. More... | |
struct | cuda_profiler |
Control switch for a CUDA profiler running on the application. Only the first call to start() and the last call to stop() actually toggle the state of the external CUDA profiler when component instances are nested. More... | |
struct | nvtx_marker |
Inserts NVTX markers with the current timemory prefix. The default color scheme is a round-robin of red, blue, green, yellow, purple, cyan, pink, and light_green. These colors. More... | |
struct | cupti_activity |
CUPTI activity tracing component for high-precision kernel timing. For low-precision kernel timing, use tim::component::cuda_event component. More... | |
struct | cupti_counters |
NVprof-style hardware counters via the CUpti callback API. Collecting these hardware counters has a higher overhead than the new CUpti Profiling API (tim::component::cupti_profiler). However, there are currently some issues with nesting the Profiling API and it is currently recommended to use this component for NVIDIA hardware counters in timemory. The callback API / NVprof is quite specific about the distinction between an "event" and a "metric". For your convenience, timemory removes this distinction and events can be specified arbitrarily as metrics and vice-versa and this component will sort them into their appropriate category. For the full list of the available events/metrics, use timemory-avail -H from the command-line. More... | |
struct | cupti_pcsampling |
The PC Sampling gives the number of samples for each source and assembly line with various stall reasons. Using this information, you can pinpoint portions of your kernel that are introducing latencies and the reason for the latency. More... | |
struct | data_tracker |
This component is provided to facilitate data tracking. The first template parameter is the type of data to be tracked, the second is a custom tag for differentiating trackers which handle the same data types but record different high-level data. More... | |
struct | gotcha |
The gotcha component rewrites the global offset table such that calling the wrapped function actually invokes either a function which is wrapped by timemory instrumentation or is replaced by a timemory component with an function call operator (operator() ) whose return value and arguments exactly match the original function. This component is only available on Linux and can only by applied to external, dynamically-linked functions (i.e. functions defined in a shared library). If the BundleT template parameter is a non-empty component bundle, this component will surround the original function call with: More... | |
struct | malloc_gotcha |
struct | memory_allocations |
This component wraps malloc, calloc, free, cudaMalloc, cudaFree via GOTCHA and tracks the number of bytes requested/freed in each call. This component is useful for detecting the locations where memory re-use would provide a performance benefit. More... | |
struct | mpip_handle |
struct | ncclp_handle |
struct | gperftools_cpu_profiler |
struct | gperftools_heap_profiler |
struct | read_char |
I/O counter for chars read. The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read() and pread(). It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache) More... | |
struct | written_char |
I/O counter for chars written. The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with tim::component::read_char (rchar). More... | |
struct | read_bytes |
I/O counter for bytes read. Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the submit_bio() level, so it is accurate for block-backed filesystems. More... | |
struct | written_bytes |
I/O counter for bytes written. Attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at page-dirtying time. More... | |
struct | likwid_marker |
Provides likwid perfmon marker forwarding. Requires. More... | |
struct | likwid_nvmarker |
Provides likwid nvmon marker forwarding. Requires. More... | |
struct | metadata |
Provides forward declaration support for assigning static metadata properties. This is most useful for specialization of template components. If this class is specialized for component, then the component does not need to provide the static member functions label() and description() . More... | |
struct | network_stats |
struct | ompt_handle |
struct | ompt_data_tracker |
struct | opaque |
struct | papi_array |
struct | papi_common |
struct | papi_rate_tuple |
This component pairs a tim::component::papi_tuple with a component which will provide an interval over which the hardware counters will be reported, e.g. if RateT is tim::component::wall_clock, the reported values will be the hardware-counters w.r.t. the wall-clock time. If RateT is tim::component::cpu_clock, the reported values will be the hardware counters w.r.t. the cpu time. More... | |
struct | papi_tuple |
This component is useful for bundling together a fixed set of hardware counter identifiers which require no runtime configuration. More... | |
struct | papi_vector |
struct | placeholder |
provides nothing, used for dummy types in enum More... | |
struct | static_properties |
Provides three variants of a matches function for determining if a component is identified by a given string or enumeration value. More... | |
struct | properties |
This is a critical specialization for mapping string and integers to component types at runtime. The enum_string() function is the enum id as a string. The id() function is (typically) the name of the C++ component as a string. The ids() function returns a set of strings which are alternative string identifiers to the enum string or the string ID. Additionally, it provides serializaiton of these values. More... | |
struct | state |
struct | static_properties< void, false > |
struct | static_properties< Tp, false > |
struct | static_properties< Tp, true > |
struct | enumerator |
This is a critical specialization for mapping string and integers to component types at runtime (should always be specialized alongside tim::component::properties) and it is also critical for performing template metaprogramming "loops" over all the components. E.g.: More... | |
struct | cpu_roofline |
Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine. More... | |
struct | gpu_roofline |
Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine. More... | |
struct | peak_rss |
this struct extracts the high-water mark (or a change in the high-water mark) of the resident set size (RSS). Which is current amount of memory in RAM. When used on a system with swap enabled, this value may fluctuate but should not on an HPC system. More... | |
struct | page_rss |
this struct measures the resident set size (RSS) currently allocated in pages of memory. Unlike the peak_rss, this value will fluctuate as memory gets freed and allocated More... | |
struct | num_io_in |
the number of times the file system had to perform input. More... | |
struct | num_io_out |
the number of times the file system had to perform output. More... | |
struct | num_minor_page_faults |
the number of page faults serviced without any I/O activity; here I/O activity is avoided by reclaiming a page frame from the list of pages awaiting reallocation. More... | |
struct | num_major_page_faults |
the number of page faults serviced that required I/O activity. More... | |
struct | voluntary_context_switch |
the number of times a context switch resulted due to a process voluntarily giving up the processor before its time slice was completed (usually to await availability of a resource). More... | |
struct | priority_context_switch |
the number of times a context switch resulted due to a higher priority process becoming runnable or because the current process exceeded its time slice More... | |
struct | virtual_memory |
this struct extracts the virtual memory usage More... | |
struct | user_mode_time |
This is the total amount of time spent executing in user mode. More... | |
struct | kernel_mode_time |
This is the total amount of time spent executing in kernel mode. More... | |
struct | current_peak_rss |
this struct extracts the absolute value of high-water mark of the resident set size (RSS) at start and stop points. RSS is current amount of memory in RAM. More... | |
struct | tau_marker |
Forwards timemory labels to the TAU (Tuning and Analysis Utilities) More... | |
struct | system_clock |
this component extracts only the CPU time spent in kernel-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | user_clock |
this component extracts only the CPU time spent in user-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | cpu_clock |
this component extracts only the CPU time spent in both user- and kernel- mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | monotonic_clock |
clock that increments monotonically, tracking the time since an arbitrary point, and will continue to increment while the system is asleep. More... | |
struct | monotonic_raw_clock |
clock that increments monotonically, tracking the time since an arbitrary point like CLOCK_MONOTONIC. However, this clock is unaffected by frequency or time adjustments. It should not be compared to other system time sources. More... | |
struct | thread_cpu_clock |
this clock measures the CPU time within the current thread (excludes sibling/child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | process_cpu_clock |
this clock measures the CPU time within the current process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | cpu_util |
this computes the CPU utilization percentage for the calling process and child processes. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | process_cpu_util |
this computes the CPU utilization percentage for ONLY the calling process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | thread_cpu_util |
this computes the CPU utilization percentage for ONLY the calling thread (excludes sibling and child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | ert_timer |
struct | wall_clock |
struct | trip_count |
Records the number of invocations. This is the most lightweight metric available since it only increments an integer and never records any statistics. If dynamic instrumentation is used and the overhead is significant, it is recommended to set this as the only component (-d trip_count) and then use the regex exclude option (-E) to remove any non-critical function calls which have very high trip-counts. More... | |
struct | nothing |
struct | user_bundle |
struct | vtune_event |
Implements __itt_event More... | |
struct | vtune_frame |
Implements __itt_domain More... | |
struct | vtune_profiler |
Implements __itt_pause() and __itt_resume() to control where the vtune profiler is active. More... | |
Typedefs | |
template<typename Tp , typename ValueT > | |
using | base_data_t = typename internal::base_data< Tp, ValueT >::type |
template<typename Tp > | |
using | graph_iterator_t = typename graph< node::graph< Tp > >::iterator |
template<typename Tp > | |
using | graph_const_iterator_t = typename graph< node::graph< Tp > >::const_iterator |
using | cupti_event = cupti_counters |
template<typename T > | |
using | data_handler_t = typename T::handler_type |
an alias for getting the handle_type of a data tracker More... | |
using | data_tracker_integer = data_tracker< intmax_t, TIMEMORY_API > |
using | data_tracker_unsigned = data_tracker< size_t, TIMEMORY_API > |
using | data_tracker_floating = data_tracker< double, TIMEMORY_API > |
using | idset_t = std::set< std::string > |
template<int Idx> | |
using | enumerator_t = typename enumerator< Idx >::type |
template<int Idx> | |
using | properties_t = typename enumerator< Idx >::type |
using | cpu_roofline_sp_flops = cpu_roofline< float > |
A specialization of tim::component::cpu_roofline for 32-bit floating point operations. More... | |
using | cpu_roofline_dp_flops = cpu_roofline< double > |
A specialization of tim::component::cpu_roofline for 64-bit floating point operations. More... | |
using | cpu_roofline_flops = cpu_roofline< float, double > |
using | gpu_roofline_hp_flops = gpu_roofline< cuda::fp16_t > |
A specialization of tim::component::gpu_roofline for 16-bit floating point operations (depending on availability). More... | |
using | gpu_roofline_sp_flops = gpu_roofline< float > |
A specialization of tim::component::gpu_roofline for 32-bit floating point operations. More... | |
using | gpu_roofline_dp_flops = gpu_roofline< double > |
A specialization of tim::component::gpu_roofline for 64-bit floating point operations. More... | |
using | gpu_roofline_flops = gpu_roofline< float, double > |
using | vol_cxt_switch = voluntary_context_switch |
using | prio_cxt_switch = priority_context_switch |
using | real_clock = wall_clock |
using | virtual_clock = wall_clock |
template<bool B, typename T = int> | |
using | enable_if_t = typename std::enable_if< B, T >::type |
Functions | |
template<typename Tp , typename Value > | |
Tp | operator+ (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs) |
template<typename Tp , typename Value > | |
Tp | operator- (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs) |
template<typename Tp , typename Value > | |
Tp | operator* (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs) |
template<typename Tp , typename Value > | |
Tp | operator/ (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs) |
template<typename Toolset , typename Tag > | |
void | configure_mpip (std::set< std::string > permit={}, std::set< std::string > reject={}) |
template<typename Toolset , typename Tag > | |
uint64_t | activate_mpip () |
The thread that first activates mpip will be the thread that turns it off. Function returns the number of new mpip handles. More... | |
template<typename Toolset , typename Tag > | |
uint64_t | deactivate_mpip (uint64_t) |
The thread that created the initial mpip handle will turn off. Returns the number of handles active. More... | |
template<typename Toolset , typename Tag > | |
void | configure_ncclp (std::set< std::string > permit={}, std::set< std::string > reject={}) |
template<typename Toolset > | |
opaque | get_opaque () |
template<typename Toolset > | |
opaque | get_opaque (scope::config _scope) |
template<typename Toolset , typename Arg , typename... Args> | |
opaque | get_opaque (Arg &&arg, Args &&... args) |
template<typename Toolset > | |
std::set< size_t > | get_typeids () |
struct tim::component::base_data |
using tim::component::base_data_t = typedef typename internal::base_data<Tp, ValueT>::type |
A specialization of tim::component::cpu_roofline for 64-bit floating point operations.
using tim::component::cpu_roofline_flops = typedef cpu_roofline<float, double> |
A specialization of tim::component::cpu_roofline for 32-bit floating point operations.
using tim::component::cupti_event = typedef cupti_counters |
Definition at line 790 of file cupti_counters.hpp.
typename T::handler_type tim::component::data_handler_t |
an alias for getting the handle_type of a data tracker
Definition at line 394 of file components.hpp.
using tim::component::data_tracker_floating = typedef data_tracker<double, TIMEMORY_API> |
Definition at line 415 of file components.hpp.
Specialization of tim::component::data_tracker for storing unsigned integer data
Definition at line 401 of file components.hpp.
Specialization of tim::component::data_tracker for storing floating point data
Definition at line 408 of file components.hpp.
using tim::component::enable_if_t = typedef typename std::enable_if<B, T>::type |
using tim::component::enumerator_t = typedef typename enumerator<Idx>::type |
Definition at line 273 of file properties.hpp.
A specialization of tim::component::gpu_roofline for 64-bit floating point operations.
using tim::component::gpu_roofline_flops = typedef gpu_roofline<float, double> |
A specialization of tim::component::gpu_roofline for 16-bit floating point operations (depending on availability).
A specialization of tim::component::gpu_roofline for 32-bit floating point operations.
using tim::component::graph_const_iterator_t = typedef typename graph<node::graph<Tp> >::const_iterator |
Definition at line 45 of file declaration.hpp.
using tim::component::graph_iterator_t = typedef typename graph<node::graph<Tp> >::iterator |
Definition at line 42 of file declaration.hpp.
using tim::component::idset_t = typedef std::set<std::string> |
Definition at line 54 of file properties.hpp.
using tim::component::prio_cxt_switch = typedef priority_context_switch |
Definition at line 425 of file components.hpp.
using tim::component::properties_t = typedef typename enumerator<Idx>::type |
Definition at line 276 of file properties.hpp.
using tim::component::real_clock = typedef wall_clock |
Definition at line 72 of file wall_clock.hpp.
using tim::component::virtual_clock = typedef wall_clock |
Definition at line 74 of file wall_clock.hpp.
using tim::component::vol_cxt_switch = typedef voluntary_context_switch |
Definition at line 373 of file components.hpp.
uint64_t tim::component::activate_mpip | ( | ) |
The thread that first activates mpip will be the thread that turns it off. Function returns the number of new mpip handles.
Definition at line 162 of file mpip.hpp.
References DEBUG_PRINT_HERE, and tim::manager::instance().
void tim::component::configure_mpip | ( | std::set< std::string > | permit = {} , |
std::set< std::string > | reject = {} |
||
) |
void tim::component::configure_ncclp | ( | std::set< std::string > | permit = {} , |
std::set< std::string > | reject = {} |
||
) |
uint64_t tim::component::deactivate_mpip | ( | uint64_t | id | ) |
The thread that created the initial mpip handle will turn off. Returns the number of handles active.
Definition at line 201 of file mpip.hpp.
References DEBUG_PRINT_HERE, and tim::manager::instance().
opaque tim::component::get_opaque | ( | ) |
opaque tim::component::get_opaque | ( | Arg && | arg, |
Args &&... | args | ||
) |
opaque tim::component::get_opaque | ( | scope::config | _scope | ) |
std::set<size_t> tim::component::get_typeids | ( | ) |
Tp tim::component::operator* | ( | const base< Tp, Value > & | lhs, |
const base< Tp, Value > & | rhs | ||
) |
Definition at line 314 of file definition.hpp.
Tp tim::component::operator+ | ( | const base< Tp, Value > & | lhs, |
const base< Tp, Value > & | rhs | ||
) |
Definition at line 297 of file definition.hpp.
Tp tim::component::operator- | ( | const base< Tp, Value > & | lhs, |
const base< Tp, Value > & | rhs | ||
) |
Definition at line 306 of file definition.hpp.
Tp tim::component::operator/ | ( | const base< Tp, Value > & | lhs, |
const base< Tp, Value > & | rhs | ||
) |
Definition at line 323 of file definition.hpp.