timemory 3.3.0
Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.
|
Namespaces | |
namespace | factory |
namespace | operators |
namespace | skeleton |
Classes | |
struct | allinea_map |
Controls the AllineaMap sampling profiler. More... | |
struct | base |
struct | base_data |
struct | base_data< Tp, 0 > |
struct | base_data< Tp, 1 > |
struct | base_data< Tp, 2 > |
struct | base_data< Tp, 3 > |
struct | base_state |
Provide state configuration options for a component instance. The current states are: More... | |
struct | caliper_common |
struct | caliper_config |
Component which provides Caliper cali::ConfigManager . More... | |
struct | caliper_loop_marker |
Loop marker for the Caliper Performance Analysis Toolbox. More... | |
struct | caliper_marker |
Standard marker for the Caliper Performance Analysis Toolbox. More... | |
struct | cpu_clock |
this component extracts only the CPU time spent in both user- and kernel- mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | cpu_roofline |
Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine. More... | |
struct | cpu_util |
this computes the CPU utilization percentage for the calling process and child processes. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | craypat_counters |
struct | craypat_flush_buffer |
Writes all the recorded contents in the data buffer. Returns the number of bytes flushed. More... | |
struct | craypat_heap_stats |
Dumps the craypat heap statistics. More... | |
struct | craypat_record |
Provides scoping the CrayPAT profiler. Global initialization stops the profiler, the first call to start() starts the profiler again on the calling thread. Instance counting is enabled per-thread and each call to start increments the counter. All calls to stop() have no effect until the counter reaches zero, at which point the compiler is turned off again. More... | |
struct | craypat_region |
Adds a region label to the CrayPAT profiling output. More... | |
struct | cuda_event |
Records the time interval between two points in a CUDA stream. Less accurate than 'cupti_activity' for kernel timing but does not require linking to the CUDA driver. More... | |
struct | cuda_profiler |
Control switch for a CUDA profiler running on the application. Only the first call to start() and the last call to stop() actually toggle the state of the external CUDA profiler when component instances are nested. More... | |
struct | cupti_activity |
CUPTI activity tracing component for high-precision kernel timing. For low-precision kernel timing, use tim::component::cuda_event component. More... | |
struct | cupti_counters |
NVprof-style hardware counters via the CUpti callback API. Collecting these hardware counters has a higher overhead than the new CUpti Profiling API (tim::component::cupti_profiler). However, there are currently some issues with nesting the Profiling API and it is currently recommended to use this component for NVIDIA hardware counters in timemory. The callback API / NVprof is quite specific about the distinction between an "event" and a "metric". For your convenience, timemory removes this distinction and events can be specified arbitrarily as metrics and vice-versa and this component will sort them into their appropriate category. For the full list of the available events/metrics, use timemory-avail -H from the command-line. More... | |
struct | cupti_pcsampling |
The PC Sampling gives the number of samples for each source and assembly line with various stall reasons. Using this information, you can pinpoint portions of your kernel that are introducing latencies and the reason for the latency. More... | |
struct | current_peak_rss |
this struct extracts the absolute value of high-water mark of the resident set size (RSS) at start and stop points. RSS is current amount of memory in RAM. More... | |
struct | data_tracker |
This component is provided to facilitate data tracking. The first template parameter is the type of data to be tracked, the second is a custom tag for differentiating trackers which handle the same data types but record different high-level data. More... | |
struct | empty_base |
A very lightweight base which provides no storage. More... | |
struct | empty_storage |
A very lightweight storage class which provides nothing. More... | |
struct | enumerator |
This is a critical specialization for mapping string and integers to component types at runtime (should always be specialized alongside tim::component::properties) and it is also critical for performing template metaprogramming "loops" over all the components. E.g.: More... | |
struct | ert_timer |
struct | gotcha |
The gotcha component rewrites the global offset table such that calling the wrapped function actually invokes either a function which is wrapped by timemory instrumentation or is replaced by a timemory component with an function call operator (operator() ) whose return value and arguments exactly match the original function. This component is only available on Linux and can only by applied to external, dynamically-linked functions (i.e. functions defined in a shared library). If the BundleT template parameter is a non-empty component bundle, this component will surround the original function call with: More... | |
class | gotcha_suppression |
struct | gperftools_cpu_profiler |
struct | gperftools_heap_profiler |
struct | gpu_roofline |
Combines hardware counters and timers and executes the empirical roofline toolkit during application termination to estimate the peak possible performance for the machine. More... | |
struct | hip_event |
Records the time interval between two points in a HIP stream. Less accurate than 'cupti_activity' for kernel timing but does not require linking to the HIP driver. More... | |
struct | kernel_mode_time |
This is the total amount of time spent executing in kernel mode. More... | |
struct | likwid_marker |
Provides likwid perfmon marker forwarding. Requires. More... | |
struct | likwid_nvmarker |
Provides likwid nvmon marker forwarding. Requires. More... | |
struct | malloc_gotcha |
struct | memory_allocations |
This component wraps malloc, calloc, free, CUDA/HIP malloc/free via GOTCHA and tracks the number of bytes requested/freed in each call. This component is useful for detecting the locations where memory re-use would provide a performance benefit. More... | |
struct | metadata |
Provides forward declaration support for assigning static metadata properties. This is most useful for specialization of template components. If this class is specialized for component, then the component does not need to provide the static member functions label() and description() . More... | |
struct | monotonic_clock |
clock that increments monotonically, tracking the time since an arbitrary point, and will continue to increment while the system is asleep. More... | |
struct | monotonic_raw_clock |
clock that increments monotonically, tracking the time since an arbitrary point like CLOCK_MONOTONIC. However, this clock is unaffected by frequency or time adjustments. It should not be compared to other system time sources. More... | |
struct | mpip_handle |
struct | ncclp_handle |
struct | network_stats |
struct | nothing |
struct | num_io_in |
the number of times the file system had to perform input. More... | |
struct | num_io_out |
the number of times the file system had to perform output. More... | |
struct | num_major_page_faults |
the number of page faults serviced that required I/O activity. More... | |
struct | num_minor_page_faults |
the number of page faults serviced without any I/O activity; here I/O activity is avoided by reclaiming a page frame from the list of pages awaiting reallocation. More... | |
struct | nvtx_marker |
Inserts NVTX markers with the current timemory prefix. The default color scheme is a round-robin of red, blue, green, yellow, purple, cyan, pink, and light_green. These colors. More... | |
struct | ompt_data_tracker |
struct | ompt_handle |
struct | opaque |
struct | page_rss |
this struct measures the resident set size (RSS) currently allocated in pages of memory. Unlike the peak_rss, this value will fluctuate as memory gets freed and allocated More... | |
struct | papi_array |
struct | papi_common |
struct | papi_rate_tuple |
This component pairs a tim::component::papi_tuple with a component which will provide an interval over which the hardware counters will be reported, e.g. if RateT is tim::component::wall_clock, the reported values will be the hardware-counters w.r.t. the wall-clock time. If RateT is tim::component::cpu_clock, the reported values will be the hardware counters w.r.t. the cpu time. More... | |
struct | papi_tuple |
This component is useful for bundling together a fixed set of hardware counter identifiers which require no runtime configuration. More... | |
struct | papi_vector |
struct | peak_rss |
this struct extracts the high-water mark (or a change in the high-water mark) of the resident set size (RSS). Which is current amount of memory in RAM. When used on a system with swap enabled, this value may fluctuate but should not on an HPC system. More... | |
struct | perfetto_trace |
Component providing perfetto implementation. More... | |
struct | placeholder |
provides nothing, used for dummy types in enum More... | |
struct | printer |
A diagnostic component when prints messages via start(...) and stores messages via store(...). The stored messages are returned via the get() member function. If bundled alongside the timestamp component, the timestamp will be added to the stored message. More... | |
struct | priority_context_switch |
the number of times a context switch resulted due to a higher priority process becoming runnable or because the current process exceeded its time slice More... | |
struct | process_cpu_clock |
this clock measures the CPU time within the current process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | process_cpu_util |
this computes the CPU utilization percentage for ONLY the calling process (excludes child processes). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | properties |
This is a critical specialization for mapping string and integers to component types at runtime. The enum_string() function is the enum id as a string. The id() function is (typically) the name of the C++ component as a string. The ids() function returns a set of strings which are alternative string identifiers to the enum string or the string ID. Additionally, it provides serializaiton of these values. More... | |
struct | read_bytes |
I/O counter for bytes read. Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the submit_bio() level, so it is accurate for block-backed filesystems. More... | |
struct | read_char |
I/O counter for chars read. The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read() and pread(). It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache) More... | |
struct | roctx_marker |
Inserts ROCTX markers with the current timemory prefix. More... | |
struct | state |
struct | static_properties |
Provides three variants of a matches function for determining if a component is identified by a given string or enumeration value. More... | |
struct | static_properties< Tp, false > |
struct | static_properties< Tp, true > |
struct | static_properties< void, false > |
struct | system_clock |
this component extracts only the CPU time spent in kernel-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | tau_marker |
Forwards timemory labels to the TAU (Tuning and Analysis Utilities) More... | |
struct | thread_cpu_clock |
this clock measures the CPU time within the current thread (excludes sibling/child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | thread_cpu_util |
this computes the CPU utilization percentage for ONLY the calling thread (excludes sibling and child threads). Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | timestamp |
this component stores the timestamp of when a bundle was started and is specialized such that the "timeline_storage" type-trait is true. This means that every entry in the call-graph for this output will be unique (look in the timestamp.txt output file) More... | |
struct | trip_count |
Records the number of invocations. This is the most lightweight metric available since it only increments an integer and never records any statistics. If dynamic instrumentation is used and the overhead is significant, it is recommended to set this as the only component (-d trip_count) and then use the regex exclude option (-E) to remove any non-critical function calls which have very high trip-counts. More... | |
struct | user_bundle |
struct | user_clock |
this component extracts only the CPU time spent in user-mode. Only relevant as a time when a different is computed Do not use a single CPU time as an amount of time; it doesn't work that way. More... | |
struct | user_mode_time |
This is the total amount of time spent executing in user mode. More... | |
struct | virtual_memory |
this struct extracts the virtual memory usage More... | |
struct | voluntary_context_switch |
the number of times a context switch resulted due to a process voluntarily giving up the processor before its time slice was completed (usually to await availability of a resource). More... | |
struct | vtune_event |
Implements __itt_event More... | |
struct | vtune_frame |
Implements __itt_domain More... | |
struct | vtune_profiler |
Implements __itt_pause() and __itt_resume() to control where the vtune profiler is active. More... | |
struct | wall_clock |
struct | written_bytes |
I/O counter for bytes written. Attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at page-dirtying time. More... | |
struct | written_char |
I/O counter for chars written. The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with tim::component::read_char (rchar). More... | |
Typedefs | |
template<typename Tp , typename ValueT > | |
using | base_data_t = typename internal::base_data< Tp, ValueT >::type |
template<typename Tp > | |
using | graph_iterator_t = typename graph< node::graph< Tp > >::iterator |
template<typename Tp > | |
using | graph_const_iterator_t = typename graph< node::graph< Tp > >::const_iterator |
using | cupti_event = cupti_counters |
template<typename T > | |
using | data_handler_t = typename T::handler_type |
an alias for getting the handle_type of a data tracker More... | |
using | data_tracker_integer = data_tracker< intmax_t, TIMEMORY_API > |
using | data_tracker_unsigned = data_tracker< size_t, TIMEMORY_API > |
using | data_tracker_floating = data_tracker< double, TIMEMORY_API > |
using | idset_t = std::set< std::string > |
template<int Idx> | |
using | enumerator_t = typename enumerator< Idx >::type |
template<int Idx> | |
using | properties_t = typename enumerator< Idx >::type |
using | cpu_roofline_sp_flops = cpu_roofline< float > |
A specialization of tim::component::cpu_roofline for 32-bit floating point operations. More... | |
using | cpu_roofline_dp_flops = cpu_roofline< double > |
A specialization of tim::component::cpu_roofline for 64-bit floating point operations. More... | |
using | cpu_roofline_flops = cpu_roofline< float, double > |
using | gpu_roofline_hp_flops = gpu_roofline< cuda::fp16_t > |
A specialization of tim::component::gpu_roofline for 16-bit floating point operations (depending on availability). More... | |
using | gpu_roofline_sp_flops = gpu_roofline< float > |
A specialization of tim::component::gpu_roofline for 32-bit floating point operations. More... | |
using | gpu_roofline_dp_flops = gpu_roofline< double > |
A specialization of tim::component::gpu_roofline for 64-bit floating point operations. More... | |
using | gpu_roofline_flops = gpu_roofline< float, double > |
using | vol_cxt_switch = voluntary_context_switch |
using | prio_cxt_switch = priority_context_switch |
using | timestamp_entry_t = std::chrono::time_point< std::chrono::system_clock > |
using | real_clock = wall_clock |
using | virtual_clock = wall_clock |
template<bool B, typename T = int> | |
using | enable_if_t = typename std::enable_if< B, T >::type |
Functions | |
template<typename Tp , typename Value > | |
Tp | operator+ (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs) |
template<typename Tp , typename Value > | |
Tp | operator- (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs) |
template<typename Tp , typename Value > | |
Tp | operator* (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs) |
template<typename Tp , typename Value > | |
Tp | operator/ (const base< Tp, Value > &lhs, const base< Tp, Value > &rhs) |
template<typename Toolset , typename Tag > | |
void | configure_mpip (std::set< std::string > permit={}, std::set< std::string > reject={}) |
template<typename Toolset , typename Tag > | |
uint64_t | activate_mpip () |
The thread that first activates mpip will be the thread that turns it off. Function returns the number of new mpip handles. More... | |
template<typename Toolset , typename Tag > | |
uint64_t | deactivate_mpip (uint64_t) |
The thread that created the initial mpip handle will turn off. Returns the number of handles active. More... | |
template<typename Toolset , typename Tag > | |
void | configure_ncclp (std::set< std::string > permit={}, std::set< std::string > reject={}) |
template<typename Toolset > | |
opaque | get_opaque () |
template<typename Toolset > | |
opaque | get_opaque (scope::config _scope) |
template<typename Toolset , typename Arg , typename... Args> | |
opaque | get_opaque (Arg &&arg, Args &&... args) |
template<typename Toolset > | |
std::set< size_t > | get_typeids () |
struct tim::component::base_data |
using tim::component::base_data_t = typedef typename internal::base_data<Tp, ValueT>::type |
A specialization of tim::component::cpu_roofline for 64-bit floating point operations.
using tim::component::cpu_roofline_flops = typedef cpu_roofline<float, double> |
A specialization of tim::component::cpu_roofline for 32-bit floating point operations.
using tim::component::cupti_event = typedef cupti_counters |
Definition at line 790 of file cupti_counters.hpp.
typename T::handler_type tim::component::data_handler_t |
an alias for getting the handle_type of a data tracker
Definition at line 440 of file components.hpp.
using tim::component::data_tracker_floating = typedef data_tracker<double, TIMEMORY_API> |
Definition at line 461 of file components.hpp.
Specialization of tim::component::data_tracker for storing unsigned integer data
Definition at line 447 of file components.hpp.
Specialization of tim::component::data_tracker for storing floating point data
Definition at line 454 of file components.hpp.
using tim::component::enable_if_t = typedef typename std::enable_if<B, T>::type |
using tim::component::enumerator_t = typedef typename enumerator<Idx>::type |
Definition at line 273 of file properties.hpp.
A specialization of tim::component::gpu_roofline for 64-bit floating point operations.
using tim::component::gpu_roofline_flops = typedef gpu_roofline<float, double> |
A specialization of tim::component::gpu_roofline for 16-bit floating point operations (depending on availability).
A specialization of tim::component::gpu_roofline for 32-bit floating point operations.
using tim::component::graph_const_iterator_t = typedef typename graph<node::graph<Tp> >::const_iterator |
Definition at line 46 of file declaration.hpp.
using tim::component::graph_iterator_t = typedef typename graph<node::graph<Tp> >::iterator |
Definition at line 43 of file declaration.hpp.
using tim::component::idset_t = typedef std::set<std::string> |
Definition at line 54 of file properties.hpp.
using tim::component::prio_cxt_switch = typedef priority_context_switch |
Definition at line 425 of file components.hpp.
using tim::component::properties_t = typedef typename enumerator<Idx>::type |
Definition at line 276 of file properties.hpp.
using tim::component::real_clock = typedef wall_clock |
Definition at line 72 of file wall_clock.hpp.
using tim::component::timestamp_entry_t = typedef std::chrono::time_point<std::chrono::system_clock> |
using tim::component::virtual_clock = typedef wall_clock |
Definition at line 74 of file wall_clock.hpp.
using tim::component::vol_cxt_switch = typedef voluntary_context_switch |
Definition at line 373 of file components.hpp.
The thread that first activates mpip will be the thread that turns it off. Function returns the number of new mpip handles.
Definition at line 161 of file mpip.hpp.
References DEBUG_PRINT_HERE, and tim::manager::instance().
void tim::component::configure_mpip | ( | std::set< std::string > | permit = {} , |
std::set< std::string > | reject = {} |
||
) |
void tim::component::configure_ncclp | ( | std::set< std::string > | permit = {} , |
std::set< std::string > | reject = {} |
||
) |
The thread that created the initial mpip handle will turn off. Returns the number of handles active.
Definition at line 200 of file mpip.hpp.
References DEBUG_PRINT_HERE, and tim::manager::instance().
opaque tim::component::get_opaque | ( | Arg && | arg, |
Args &&... | args | ||
) |
opaque tim::component::get_opaque | ( | scope::config | _scope | ) |
Definition at line 314 of file definition.hpp.
Definition at line 297 of file definition.hpp.
Definition at line 306 of file definition.hpp.