Memory Management¶
Detailed Doxygen Documentation.
Manager Class¶
The tim::manager
class is a thread-local singleton which handles the memory management for each thread.
It is stored as a std::shared_ptr
which automatically deletes itself when the thread exits.
When the thread-local singleton for each component storage class is created, it increments the reference
count for this class to ensure that it exists while any storage classes are allocated. When the
storage singletons are created, it registers functors with the manager for destorying itself.
-
class
tim
::
manager
¶ Public Functions
-
template<typename
Func
>
voidadd_cleanup
(void*, Func&&)¶ add functors to destroy instances based on a pointer
-
template<typename
Func
>
voidadd_cleanup
(const std::string&, Func&&)¶ add functors to destroy instances based on a string key
-
template<typename
InitFuncT
>
voidadd_initializer
(InitFuncT&&)¶ this is used by storage classes for finalization.
-
template<typename
StackFuncT
, typenameFinalFuncT
>
voidadd_finalizer
(const std::string&, StackFuncT&&, FinalFuncT&&, bool, int32_t = 0)¶ this is used by storage classes for finalization.
-
void
remove_cleanup
(void*)¶ remove a cleanup functor
-
void
remove_cleanup
(const std::string&)¶ remove a cleanup functor
-
void
remove_finalizer
(const std::string&)¶ remove a finalizer functor
-
void
cleanup
(const std::string&)¶ execute a cleanup based on a key
-
inline void
set_write_metadata
(short v)¶ Set to 0 for yes if other output, -1 for never, or 1 for yes.
-
void
write_metadata
(const std::string&, const char* = "")¶ Print metadata to filename.
-
std::ostream &
write_metadata
(std::ostream&)¶ Write metadata to ostream.
-
void
update_metadata_prefix
()¶ Updates settings, rank, output prefix, etc.
-
inline int32_t
get_rank
() const¶ Get the dmp rank. This is stored to avoid having to do MPI/UPC++ query after finalization has been called.
-
inline bool
is_finalizing
() const¶ Query whether finalization is currently occurring.
-
inline void
is_finalizing
(bool v)¶ Sets whether finalization is currently occuring.
-
inline void
add_entries
(uint64_t n)¶ Add number of component output data entries. If this value is zero, metadata output is suppressed unless tim::manager::set_write_metadata was assigned a value of 1.
-
void
add_synchronization
(const std::string&, int64_t, std::function<void()>)¶ Add function for synchronizing data in threads.
-
void
remove_synchronization
(const std::string&, int64_t)¶ Remove function for synchronizing data in threads.
-
void
synchronize
()¶ Synchronizes thread-data for storage.
-
inline int32_t
instance_count
() const¶ Get the instance ID for this manager instance.
-
inline int64_t
get_tid
() const¶ Get the thread-index for this manager instance.
Public Static Functions
-
static pointer_t
instance
()¶ Get a shared pointer to the instance for the current thread.
-
static pointer_t
master_instance
()¶ Get a shared pointer to the instance on the primary thread.
-
static inline int32_t
total_instance_count
()¶ Get the number of instances that are currently allocated. This is decremented during the destructor of each manager instance, unlike tim::manager::get_thread_count()
-
static inline void
use_exit_hook
(bool val)¶ Enable setting std::exit callback.
-
static void
exit_hook
()¶ The exit hook function.
-
static inline int32_t
get_thread_count
()¶ This effectively provides the total number of threads which collected data. It is only “decremented” when the last manager instance has been deleted, at which point it is set to zero.
-
static bool
get_is_main_thread
()¶ Return whether this is the main thread.
-
template<typename
Tp
>
static voidadd_metadata
(const std::string&, const Tp&)¶ Add a metadata entry of a non-string type. If this fails to serialize, either include either the approiate header from timemory/tpls/cereal/cereal/types or provides a serialization function for the type.
-
static void
add_metadata
(const std::string&, const char*)¶ Add a metadata entry of a const character array. This only exists to avoid the template function from serializing the pointer.
-
static void
add_metadata
(const std::string&, const std::string&)¶ Add a metadata entry of a string.
-
static inline void
set_persistent_master
(pointer_t _pinst)¶ This function stores the primary manager instance for the application.
-
static inline void
update_settings
(const settings &_settings)¶ Updates the settings instance use by the manager instance.
-
template<typename ...
Types
>
structget_storage
: public tim::manager::filtered_get_storage<mpl::implemented_t<Types...>>¶ This is used to apply/query storage data for multiple component types.
using types = tim::available_types_t; // type-list of all enumerated types manager_t::get_storage<types>::clear(); // clear storage for all enumerated types
-
template<typename
Memory Buffers¶
-
struct
tim::base
::
ring_buffer
¶ Ring buffer implementation, with support for mmap as backend (Linux only).
Subclassed by tim::data_storage::ring_buffer< Tp >
Public Functions
-
ring_buffer
() = default¶
-
inline explicit
ring_buffer
(bool _use_mmap)¶
-
inline explicit
ring_buffer
(size_t _size)¶
-
ring_buffer
(size_t _size, bool _use_mmap)¶
-
~ring_buffer
()¶
-
ring_buffer
(const ring_buffer&)¶
-
ring_buffer
(ring_buffer&&) noexcept = delete¶
-
ring_buffer &
operator=
(const ring_buffer&)¶
-
ring_buffer &
operator=
(ring_buffer&&) noexcept = delete¶
-
inline bool
is_initialized
() const¶ Returns whether the buffer has been allocated.
-
inline size_t
capacity
() const¶ Get the total number of bytes supported.
-
void
init
(size_t size)¶ Creates new ring buffer.
-
void
destroy
()¶ Destroy ring buffer.
-
template<typename
Tp
>
std::pair<size_t, Tp*>write
(Tp *in, std::enable_if_t<std::is_class<Tp>::value, int> = 0)¶ Write class-type data to buffer (uses placement new).
-
template<typename
Tp
>
std::pair<size_t, Tp*>write
(Tp *in, std::enable_if_t<!std::is_class<Tp>::value, int> = 0)¶ Write non-class-type data to buffer (uses memcpy).
-
template<typename
Tp
>
Tp *request
()¶ Request a pointer to an allocation. This is similar to a “write” except the memory is uninitialized. Typically used by allocators. If Tp is a class type, be sure to use a placement new instead of a memcpy.
-
void *
request
(size_t n)¶ Request a pointer to an allocation for at least.
- Parameters
n – bytes.
-
template<typename
Tp
>
std::pair<size_t, Tp*>read
(Tp *out, std::enable_if_t<std::is_class<Tp>::value, int> = 0) const¶ Read class-type data from buffer (uses placement new).
-
template<typename
Tp
>
std::pair<size_t, Tp*>read
(Tp *out, std::enable_if_t<!std::is_class<Tp>::value, int> = 0) const¶ Read non-class-type data from buffer (uses memcpy).
-
void *
retrieve
(size_t n)¶ Retrieve a pointer to the head allocation of at least.
- Parameters
n – bytes (read).
-
inline size_t
count
() const¶ Returns number of bytes currently held by the buffer.
-
inline size_t
free
() const¶ Returns how many bytes are availiable in the buffer.
-
inline bool
is_empty
() const¶ Returns if the buffer is empty.
-
inline bool
is_full
() const¶ Returns if the buffer is full.
-
size_t
rewind
(size_t n) const¶ Rewind the read position n bytes.
-
void
set_use_mmap
(bool)¶ explicitly configure to use mmap if avail
-
inline bool
get_use_mmap
() const¶ query whether using mmap
-
std::string
as_string
() const¶
Friends
- friend struct data_storage::ring_buffer
-
inline friend std::ostream &
operator<<
(std::ostream &os, const ring_buffer &obj)¶
-
-
template<typename
Tp
>
structtim::data_storage
::
ring_buffer
: private tim::base::ring_buffer¶ Ring buffer wrapper around tim::base::ring_buffer for data of type Tp. If the data object size is larger than the page size (typically 4KB), behavior is undefined. During initialization, one requests a minimum number of objects and the buffer will support that number of object + the remainder of the page, e.g. if a page is 1000 bytes, the object is 1 byte, and the buffer is requested to support 1500 objects, then an allocation supporting 2000 objects (i.e. 2 pages) will be created.
Public Types
-
using
base_type
= base::ring_buffer¶
Public Functions
-
ring_buffer
() = default¶
-
~ring_buffer
() = default¶
-
inline explicit
ring_buffer
(bool _use_mmap)¶
-
inline explicit
ring_buffer
(size_t _size)¶
-
inline
ring_buffer
(size_t _size, bool _use_mmap)¶
-
ring_buffer
(const ring_buffer&)¶
-
ring_buffer
(ring_buffer&&) noexcept = default¶
-
ring_buffer &
operator=
(const ring_buffer&)¶
-
ring_buffer &
operator=
(ring_buffer&&) noexcept = default¶
-
inline bool
is_initialized
() const¶ Returns whether the buffer has been allocated.
-
inline size_t
capacity
() const¶ Get the total number of Tp instances supported.
-
inline void
init
(size_t _size)¶ Creates new ring buffer.
-
inline void
destroy
()¶ Destroy ring buffer.
-
inline size_t
data_size
() const¶ Write data to buffer.
-
inline size_t
count
() const¶ Returns number of Tp instances currently held by the buffer.
-
inline size_t
free
() const¶ Returns how many Tp instances are availiable in the buffer.
-
inline bool
is_empty
() const¶ Returns if the buffer is empty.
-
inline bool
is_full
() const¶ Returns if the buffer is full.
-
inline size_t
rewind
(size_t n) const¶ Rewinds the read pointer.
-
inline std::string
as_string
() const¶
-
inline bool
get_use_mmap
() const¶ query whether using mmap
-
void
set_use_mmap
(bool)¶ explicitly configure to use mmap if avail
Private Functions
-
template<typename
Tp
>
std::pair<size_t, Tp*>write
(Tp *in, std::enable_if_t<std::is_class<Tp>::value, int> = 0)¶ Write class-type data to buffer (uses placement new).
-
template<typename
Tp
>
std::pair<size_t, Tp*>write
(Tp *in, std::enable_if_t<!std::is_class<Tp>::value, int> = 0)¶ Write non-class-type data to buffer (uses memcpy).
-
void *
request
(size_t n)¶ Request a pointer to an allocation for at least.
- Parameters
n – bytes.
-
template<typename
Tp
>
std::pair<size_t, Tp*>read
(Tp *out, std::enable_if_t<std::is_class<Tp>::value, int> = 0) const¶ Read class-type data from buffer (uses placement new).
-
template<typename
Tp
>
std::pair<size_t, Tp*>read
(Tp *out, std::enable_if_t<!std::is_class<Tp>::value, int> = 0) const¶ Read non-class-type data from buffer (uses memcpy).
-
void *
retrieve
(size_t n)¶ Retrieve a pointer to the head allocation of at least.
- Parameters
n – bytes (read).
Friends
-
inline friend std::ostream &
operator<<
(std::ostream &os, const ring_buffer &obj)¶
-
using
Allocators¶
-
template<typename
Tp
, boolMMapV
, size_tBuffCntV
>
classtim::data
::
ring_buffer_allocator
: public std::allocator<Tp>¶ allocator that uses array of (ring) buffers to coalesce memory. Requires This allocator propagates on container swap and container move assignment. Use TIMEMORY_RING_BUFFER_ALLOCATOR_BUFFER_COUNT env variable to specify the default number of allocations or use the
set_buffer_count
/set_buffer_count_cb
. When a reserve is requested and the request is greater than the free spaces in the buffer, the free spaces are stored in a “dangling” array of spaces which are used when single allocations are requested.- tparam Tp
The data type for the allocator
- tparam MMapV
Whether to use mmap (if available)
- tparam BuffCntV
The default buffer count (will be rounded up to multiple of page size)
Public Types
-
using
size_type
= size_t¶
-
using
difference_type
= ptrdiff_t¶
-
using
buffer_type
= data_storage::ring_buffer<Tp>¶
-
using
propagate_on_container_move_assignment
= std::true_type¶
-
using
propagate_on_container_swap
= std::true_type¶
Public Functions
-
ring_buffer_allocator
() = default¶
-
~ring_buffer_allocator
() = default¶
-
ring_buffer_allocator
(const ring_buffer_allocator&) = default¶
-
ring_buffer_allocator
(ring_buffer_allocator&&) noexcept = default¶
-
ring_buffer_allocator &
operator=
(const ring_buffer_allocator&) = default¶
-
ring_buffer_allocator &
operator=
(ring_buffer_allocator&&) noexcept = default¶
-
inline bool
operator==
(const ring_buffer_allocator &rhs) const¶
-
inline bool
operator!=
(const ring_buffer_allocator &rhs) const¶
-
inline size_t
max_size
() const¶
-
inline void
reserve
(const size_t n)¶
-
inline void
steal_resources
(ring_buffer_allocator &rhs)¶ transfers the buffers to another allocator
Public Static Functions
-
template<typename
FuncT
>
static inline voidset_buffer_count_cb
(FuncT &&_f)¶ define a callback function for initializing the buffer size. Will throw if a request for the buffer size has already occured.
-
static inline void
set_buffer_count
(size_t _buff_sz)¶ set the minimum number of objects for the ring buffer. Will throw if a request for the buffer size has already occured.
-
template<typename
Tp
, std::size_tAlignV
= 8 * sizeof(Tp)>
classtim::ert
::
aligned_allocator
¶ -
Public Functions
-
aligned_allocator
() = default¶
-
aligned_allocator
(const aligned_allocator&) = default¶
-
inline
aligned_allocator
(aligned_allocator&&) noexcept¶
-
template<typename
U
>
inlinealigned_allocator
(const aligned_allocator<U, AlignV>&)¶
-
~aligned_allocator
() = default¶
-
aligned_allocator &
operator=
(const aligned_allocator&) = delete¶
-
aligned_allocator &
operator==
(aligned_allocator&&) = delete¶
-
inline bool
operator!=
(const aligned_allocator &other) const¶
-
inline bool
operator==
(const aligned_allocator&) const¶
-
inline std::size_t
max_size
() const¶
-
inline std::size_t
get_alignment
()¶
-
template<typename
U
>
structrebind
¶ Public Types
-
typedef aligned_allocator<U, AlignV>
other
¶
-
typedef aligned_allocator<U, AlignV>
-
Graph Classes¶
The graph classes are responsible for maintaining the hierarchy of the calling context tree.
tim::graph
, tim::graph_data
, and tim::node::graph<T>
are rarely interacted with directly.
Storage results are reported within nested std::vector
s of tim::node::result<T>
and
tim::node::tree<T>
. The former provides the data in a flat heirarchy where the calling-context
is represented through indentation and a depth value, the latter represents the
calling-context through a recursive structure.
-
template<typename
T
, typenameAllocatorT
>
classgraph
¶ Arbitrary Graph / Tree (i.e. binary-tree but not binary). It is unlikely that this class will interacted with directly.
-
template<typename
NodeT
>
classgraph_data
¶ tim::graph instance + current node + head note + sea-level. Sea-level is defined as the node depth after a fork from another graph instance and is only relevant for worker-threads)
-
template<typename
Tp
>
structtim::node
::
graph
: private data::node_type¶ This is the compact representation of a measurement in the call-graph.
- tparam Tp
Component type
Public Functions
-
inline bool &
is_dummy
()¶ denotes this is a placeholder for synchronization
-
inline uint32_t &
tid
()¶ thread identifier
-
inline uint32_t &
pid
()¶ process identifier
-
inline uint64_t &
id
()¶ hash identifer
-
inline int64_t &
depth
()¶ depth in call-graph
-
inline stats_type &
stats
()¶ statistics data for entry in call-graph
-
template<typename
Tp
>
structtim::node
::
result
: public data::result_type¶ This data type is used when rendering the flat representation (i.e. loop-iterable) representation of the calling-context. The prefix here will be identical to the prefix in the text output.
- tparam Tp
Component type
Public Functions
-
inline uint32_t &
tid
()¶ measurement thread. May be
std::numeric_limits<uint16_t>max()
(i.e. 65536) if this entry is a combination of multiple threads
-
inline uint32_t &
pid
()¶ the process identifier of the reporting process, if multiple process data is combined, or the process identifier of the collecting process
-
inline int64_t &
depth
()¶ depth of the node in the calling-context
-
inline uint64_t &
hash
()¶ hash identifer of the node
-
inline uint64_t &
rolling_hash
()¶ the summation of this hash and it’s parent hashes
-
inline string_t &
prefix
()¶ the associated string with the hash + indentation and other decoration
-
inline uintvector_t &
hierarchy
()¶ an array of the hash value + each parent hash (not serialized)
-
inline stats_type &
stats
()¶ reference to the associate statistical accumulation of the data (if any)
-
template<typename
Tp
>
structtim
::
basic_tree
¶ Basic hierarchical tree implementation. Expects population from tim::graph.
- tparam Tp
Component type
-
template<typename
Tp
, typenameStatT
>
structtim::node
::
entry
: public std::tuple<Tp, StatT>¶ This data type is used in tim::node::tree for inclusive and exclusive values.
- tparam Tp
Component type
- tparam StatT
Statistics type
Public Functions
-
template<typename
Tp
>
structtim::node
::
tree
: private data::tree_type¶ This data type is used when rendering the hierarchical representation (i.e. requires recursion) representation of the calling-context. The prefix here has no decoration.
- tparam Tp
Generally
tim::basic_tree<ComponentT>
Public Functions
-
inline bool &
is_dummy
()¶ returns whether or not this node is a synchronization point and, if so, should be ignored
-
inline uint64_t &
hash
()¶ returns the hash identifier for the associated string identifier
-
inline int64_t &
depth
()¶ returns the depth of the node in the tree. NOTE: this value may be relative to dummy nodes
-
inline idset_type &
tid
()¶ the set of thread ids this data was collected from
-
inline idset_type &
pid
()¶ the set of process ids this data was collected from
-
inline entry_type &
inclusive
()¶ the inclusive data + statistics
-
inline entry_type &
exclusive
()¶ the exclusive data + statistics
Graph Result and Tree Sample¶
using namespace tim;
using wall_clock = component::wall_clock;
using node_type = node::result<wall_clock>;
using tree_type = basic_tree<node::tree<wall_clock>>;
// the flat data for the process
std::vector<node_type> foo = storage<wall_clock>::instance()->get();
// aggregated flat data from distributed memory process parallelism
// depending on settings, maybe contain all data on rank 0, partial data, or no data
// on non-zero ranks
std::vector<std::vector<node_type>> bar = storage<wall_clock>::instance()->dmp_get();
// the tree data for the process
std::vector<tree_type> baz{};
baz = storage<wall_clock>::instance()->get(baz);
// aggregated tree data from distributed memory process parallelism
// depending on settings, maybe contain all data on rank 0, partial data, or no data
// on non-zero ranks
std::vector<std::vector<tree_type>> spam{};
spam = storage<wall_clock>::instance()->dmp_get(spam);
Graph Result and Tree Comparison¶
#----------------------------------------#
# Storage Result
#----------------------------------------#
Thread id : 0
Process id : 4385
Depth : 0
Hash : 9631199822919835227
Rolling hash : 9631199822919835227
Prefix : >>> foo
Hierarchy : [9631199822919835227]
Data object : 6.534 sec wall
Statistics : [sum: 6.53361] [min: 6.53361] [max: 6.53361] [sqr: 42.6881] [count: 1]
#----------------------------------------#
Thread id : 0
Process id : 4385
Depth : 1
Hash : 11474628671133349553
Rolling hash : 2659084420343633164
Prefix : >>> |_bar
Hierarchy : [9631199822919835227, 11474628671133349553]
Data object : 5.531 sec wall
Statistics : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]
#----------------------------------------#
# Storage Tree
#----------------------------------------#
Thread id : {0}
Process id : {4385}
Depth : -1
Hash : 0
Prefix : unknown-hash=0
Inclusive data : 0.000 sec wall
Inclusive stat : [sum: 0] [min: 0] [max: 0] [sqr: 0] [count: 0]
Exclusive data : -6.534 sec wall
Exclusive stat : [sum: 0] [min: 0] [max: 0] [sqr: 0] [count: 0]
#----------------------------------------#
Thread id : {0}
Process id : {4385}
Depth : 0
Hash : 9631199822919835227
Prefix : foo
Inclusive data : 6.534 sec wall
Inclusive stat : [sum: 6.53361] [min: 6.53361] [max: 6.53361] [sqr: 42.6881] [count: 1]
Exclusive data : 1.002 sec wall
Exclusive stat : [sum: 1.00246] [min: 6.53361] [max: 6.53361] [sqr: 34.9765] [count: 1]
#----------------------------------------#
Thread id : {0}
Process id : {4385}
Depth : 1
Hash : 11474628671133349553
Prefix : bar
Inclusive data : 5.531 sec wall
Inclusive stat : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]
Exclusive data : 5.531 sec wall
Exclusive stat : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]
Note the first entry of storage tree has a negative depth and hash of zero. Nodes such of these are “dummy” nodes which timemory keeps internally as bookmarks for root nodes and thread-forks (parent call-graph location when a child thread was initialized or returned to “sea-level”). These may be removed in future versions of timemory.
Storage Class¶
The tim::storage
class is a thread-local singleton which handles the call-graph and persistent
data accumulation for each component. It is stored as a std::unique_ptr
which automatically deletes
itself when the thread exits. On the non-primary thread, destruction of the singleton merges it’s
call-graph data into the storage singleton on the primary thread. Initialization and finalization
of the storage class is the ONLY time that thread synchronization and inter-process communication
occurs. This characteristic enables timemory storage to arbitrarily scale to any number of threads and/or
processes without performance degradation. If you want to information of the state of the call-graph,
the tim::storage<T>
is the structure to do so, e.g. the current size of the call-graph, a serialization
of the current process- and thread-specific, etc. Invoking the get()
member function will return
the data for the current thread on worker threads and invoking the get()
member function on the primary
thread will return the data for all the threads. Invoking mpi_get()
will aggregate the results
across all MPI processes, upc_get()
will aggregate the results across all the UPC++ results, and
dmp_get()
(dmp == distributed memory parallelism) will aggregate all the results across MPI and UPC++
processes.
-
class
tim::base
::
storage
¶ Subclassed by tim::impl::storage< Type, false >, tim::impl::storage< Type, true >
Public Functions
-
storage
(bool _is_master, int64_t _instance_id, std::string _label)¶
-
virtual
~storage
()¶
-
inline virtual void
print
()¶
-
inline virtual void
cleanup
()¶
-
inline virtual void
stack_clear
()¶
-
inline virtual void
disable
()¶
-
inline virtual void
initialize
()¶
-
inline virtual void
finalize
()¶
-
inline virtual bool
global_init
()¶
-
inline virtual bool
thread_init
()¶
-
inline virtual bool
data_init
()¶
-
inline const hash_map_ptr_t &
get_hash_ids
() const¶
-
inline const hash_alias_ptr_t &
get_hash_aliases
() const¶
-
hash_value_t
add_hash_id
(const std::string &_prefix)¶
-
void
add_hash_id
(uint64_t _lhs, uint64_t _rhs)¶
-
inline bool
is_initialized
() const¶
-
inline int64_t
instance_id
() const¶
-