Memory Management

Detailed Doxygen Documentation.

Manager Class

The tim::manager class is a thread-local singleton which handles the memory management for each thread. It is stored as a std::shared_ptr which automatically deletes itself when the thread exits. When the thread-local singleton for each component storage class is created, it increments the reference count for this class to ensure that it exists while any storage classes are allocated. When the storage singletons are created, it registers functors with the manager for destorying itself.

class tim::manager

Public Functions

template<typename Func>
void add_cleanup(void*, Func&&)

add functors to destroy instances based on a pointer

template<typename Func>
void add_cleanup(const std::string&, Func&&)

add functors to destroy instances based on a string key

template<typename InitFuncT>
void add_initializer(InitFuncT&&)

this is used by storage classes for finalization.

template<typename StackFuncT, typename FinalFuncT>
void add_finalizer(const std::string&, StackFuncT&&, FinalFuncT&&, bool, int32_t = 0)

this is used by storage classes for finalization.

void remove_cleanup(void*)

remove a cleanup functor

void remove_cleanup(const std::string&)

remove a cleanup functor

void remove_finalizer(const std::string&)

remove a finalizer functor

void cleanup(const std::string&)

execute a cleanup based on a key

inline void set_write_metadata(short v)

Set to 0 for yes if other output, -1 for never, or 1 for yes.

void write_metadata(const std::string&, const char* = "")

Print metadata to filename.

std::ostream &write_metadata(std::ostream&)

Write metadata to ostream.

void update_metadata_prefix()

Updates settings, rank, output prefix, etc.

inline int32_t get_rank() const

Get the dmp rank. This is stored to avoid having to do MPI/UPC++ query after finalization has been called.

inline bool is_finalizing() const

Query whether finalization is currently occurring.

inline void is_finalizing(bool v)

Sets whether finalization is currently occuring.

inline void add_entries(uint64_t n)

Add number of component output data entries. If this value is zero, metadata output is suppressed unless tim::manager::set_write_metadata was assigned a value of 1.

void add_synchronization(const std::string&, int64_t, std::function<void()>)

Add function for synchronizing data in threads.

void remove_synchronization(const std::string&, int64_t)

Remove function for synchronizing data in threads.

void synchronize()

Synchronizes thread-data for storage.

inline int32_t instance_count() const

Get the instance ID for this manager instance.

inline int64_t get_tid() const

Get the thread-index for this manager instance.

Public Static Functions

static pointer_t instance()

Get a shared pointer to the instance for the current thread.

static pointer_t master_instance()

Get a shared pointer to the instance on the primary thread.

static inline int32_t total_instance_count()

Get the number of instances that are currently allocated. This is decremented during the destructor of each manager instance, unlike tim::manager::get_thread_count()

static inline void use_exit_hook(bool val)

Enable setting std::exit callback.

static void exit_hook()

The exit hook function.

static inline int32_t get_thread_count()

This effectively provides the total number of threads which collected data. It is only “decremented” when the last manager instance has been deleted, at which point it is set to zero.

static bool get_is_main_thread()

Return whether this is the main thread.

template<typename Tp>
static void add_metadata(const std::string&, const Tp&)

Add a metadata entry of a non-string type. If this fails to serialize, either include either the approiate header from timemory/tpls/cereal/cereal/types or provides a serialization function for the type.

static void add_metadata(const std::string&, const char*)

Add a metadata entry of a const character array. This only exists to avoid the template function from serializing the pointer.

static void add_metadata(const std::string&, const std::string&)

Add a metadata entry of a string.

static inline void set_persistent_master(pointer_t _pinst)

This function stores the primary manager instance for the application.

static inline void update_settings(const settings &_settings)

Updates the settings instance use by the manager instance.

template<typename ...Types>
struct get_storage : public tim::manager::filtered_get_storage<mpl::implemented_t<Types...>>

This is used to apply/query storage data for multiple component types.

using types = tim::available_types_t;   // type-list of all enumerated types
manager_t::get_storage<types>::clear(); // clear storage for all enumerated types

template<template<typename...> class Tuple, typename ...Types>
struct get_storage<Tuple<Types...>> : public tim::manager::filtered_get_storage<mpl::implemented_t<Types...>>

Overload for a tuple/type-list.

Memory Buffers

struct tim::base::ring_buffer

Ring buffer implementation, with support for mmap as backend (Linux only).

Subclassed by tim::data_storage::ring_buffer< Tp >

Public Functions

ring_buffer() = default
inline explicit ring_buffer(bool _use_mmap)
inline explicit ring_buffer(size_t _size)
ring_buffer(size_t _size, bool _use_mmap)
~ring_buffer()
ring_buffer(const ring_buffer&)
ring_buffer(ring_buffer&&) noexcept = delete
ring_buffer &operator=(const ring_buffer&)
ring_buffer &operator=(ring_buffer&&) noexcept = delete
inline bool is_initialized() const

Returns whether the buffer has been allocated.

inline size_t capacity() const

Get the total number of bytes supported.

void init(size_t size)

Creates new ring buffer.

void destroy()

Destroy ring buffer.

template<typename Tp>
std::pair<size_t, Tp*> write(Tp *in, std::enable_if_t<std::is_class<Tp>::value, int> = 0)

Write class-type data to buffer (uses placement new).

template<typename Tp>
std::pair<size_t, Tp*> write(Tp *in, std::enable_if_t<!std::is_class<Tp>::value, int> = 0)

Write non-class-type data to buffer (uses memcpy).

template<typename Tp>
Tp *request()

Request a pointer to an allocation. This is similar to a “write” except the memory is uninitialized. Typically used by allocators. If Tp is a class type, be sure to use a placement new instead of a memcpy.

void *request(size_t n)

Request a pointer to an allocation for at least.

Parameters

n – bytes.

template<typename Tp>
std::pair<size_t, Tp*> read(Tp *out, std::enable_if_t<std::is_class<Tp>::value, int> = 0) const

Read class-type data from buffer (uses placement new).

template<typename Tp>
std::pair<size_t, Tp*> read(Tp *out, std::enable_if_t<!std::is_class<Tp>::value, int> = 0) const

Read non-class-type data from buffer (uses memcpy).

template<typename Tp>
Tp *retrieve()

Retrieve a pointer to the head allocation (read).

void *retrieve(size_t n)

Retrieve a pointer to the head allocation of at least.

Parameters

n – bytes (read).

inline size_t count() const

Returns number of bytes currently held by the buffer.

inline size_t free() const

Returns how many bytes are availiable in the buffer.

inline bool is_empty() const

Returns if the buffer is empty.

inline bool is_full() const

Returns if the buffer is full.

size_t rewind(size_t n) const

Rewind the read position n bytes.

void set_use_mmap(bool)

explicitly configure to use mmap if avail

inline bool get_use_mmap() const

query whether using mmap

std::string as_string() const

Friends

friend struct data_storage::ring_buffer
inline friend std::ostream &operator<<(std::ostream &os, const ring_buffer &obj)
template<typename Tp>
struct tim::data_storage::ring_buffer : private tim::base::ring_buffer

Ring buffer wrapper around tim::base::ring_buffer for data of type Tp. If the data object size is larger than the page size (typically 4KB), behavior is undefined. During initialization, one requests a minimum number of objects and the buffer will support that number of object + the remainder of the page, e.g. if a page is 1000 bytes, the object is 1 byte, and the buffer is requested to support 1500 objects, then an allocation supporting 2000 objects (i.e. 2 pages) will be created.

Public Types

using base_type = base::ring_buffer

Public Functions

ring_buffer() = default
~ring_buffer() = default
inline explicit ring_buffer(bool _use_mmap)
inline explicit ring_buffer(size_t _size)
inline ring_buffer(size_t _size, bool _use_mmap)
ring_buffer(const ring_buffer&)
ring_buffer(ring_buffer&&) noexcept = default
ring_buffer &operator=(const ring_buffer&)
ring_buffer &operator=(ring_buffer&&) noexcept = default
inline bool is_initialized() const

Returns whether the buffer has been allocated.

inline size_t capacity() const

Get the total number of Tp instances supported.

inline void init(size_t _size)

Creates new ring buffer.

inline void destroy()

Destroy ring buffer.

inline size_t data_size() const

Write data to buffer.

inline Tp *write(Tp *in)

Write data to buffer. Return pointer to location of write.

inline Tp *read(Tp *out) const

Read data from buffer. Return pointer to location of read.

inline Tp *request()

Get an uninitialized address at tail of buffer.

inline Tp *retrieve()

Read data from head of buffer.

inline size_t count() const

Returns number of Tp instances currently held by the buffer.

inline size_t free() const

Returns how many Tp instances are availiable in the buffer.

inline bool is_empty() const

Returns if the buffer is empty.

inline bool is_full() const

Returns if the buffer is full.

inline size_t rewind(size_t n) const

Rewinds the read pointer.

template<typename ...Args>
inline auto emplace(Args&&... args)
inline std::string as_string() const
inline bool get_use_mmap() const

query whether using mmap

void set_use_mmap(bool)

explicitly configure to use mmap if avail

Private Functions

template<typename Tp>
std::pair<size_t, Tp*> write(Tp *in, std::enable_if_t<std::is_class<Tp>::value, int> = 0)

Write class-type data to buffer (uses placement new).

template<typename Tp>
std::pair<size_t, Tp*> write(Tp *in, std::enable_if_t<!std::is_class<Tp>::value, int> = 0)

Write non-class-type data to buffer (uses memcpy).

void *request(size_t n)

Request a pointer to an allocation for at least.

Parameters

n – bytes.

template<typename Tp>
std::pair<size_t, Tp*> read(Tp *out, std::enable_if_t<std::is_class<Tp>::value, int> = 0) const

Read class-type data from buffer (uses placement new).

template<typename Tp>
std::pair<size_t, Tp*> read(Tp *out, std::enable_if_t<!std::is_class<Tp>::value, int> = 0) const

Read non-class-type data from buffer (uses memcpy).

void *retrieve(size_t n)

Retrieve a pointer to the head allocation of at least.

Parameters

n – bytes (read).

Friends

inline friend std::ostream &operator<<(std::ostream &os, const ring_buffer &obj)

Allocators

template<typename Tp, bool MMapV, size_t BuffCntV>
class tim::data::ring_buffer_allocator : public std::allocator<Tp>

allocator that uses array of (ring) buffers to coalesce memory. Requires This allocator propagates on container swap and container move assignment. Use TIMEMORY_RING_BUFFER_ALLOCATOR_BUFFER_COUNT env variable to specify the default number of allocations or use the set_buffer_count / set_buffer_count_cb. When a reserve is requested and the request is greater than the free spaces in the buffer, the free spaces are stored in a “dangling” array of spaces which are used when single allocations are requested.

tparam Tp

The data type for the allocator

tparam MMapV

Whether to use mmap (if available)

tparam BuffCntV

The default buffer count (will be rounded up to multiple of page size)

Public Types

using value_type = Tp
using pointer = Tp*
using reference = Tp&
using const_pointer = const Tp*
using const_reference = const Tp&
using size_type = size_t
using difference_type = ptrdiff_t
using base_type = std::allocator<Tp>
using buffer_type = data_storage::ring_buffer<Tp>
using propagate_on_container_move_assignment = std::true_type
using propagate_on_container_swap = std::true_type

Public Functions

ring_buffer_allocator() = default
~ring_buffer_allocator() = default
ring_buffer_allocator(const ring_buffer_allocator&) = default
ring_buffer_allocator(ring_buffer_allocator&&) noexcept = default
ring_buffer_allocator &operator=(const ring_buffer_allocator&) = default
ring_buffer_allocator &operator=(ring_buffer_allocator&&) noexcept = default
inline bool operator==(const ring_buffer_allocator &rhs) const
inline bool operator!=(const ring_buffer_allocator &rhs) const
inline Tp *address(Tp &_r) const
inline const Tp *address(const Tp &_r) const
inline size_t max_size() const
inline void construct(Tp *const _p, const Tp &_v) const
inline void construct(Tp *const _p, Tp &&_v) const
template<typename ...ArgsT>
inline void construct(Tp *const _p, ArgsT&&... _args) const
inline void destroy(Tp *const _p) const
inline Tp *allocate(const size_t n) const
inline void deallocate(Tp *const ptr, const size_t n) const
inline Tp *allocate(const size_t n, const void*const) const
inline void reserve(const size_t n)
inline void steal_resources(ring_buffer_allocator &rhs)

transfers the buffers to another allocator

Public Static Functions

template<typename FuncT>
static inline void set_buffer_count_cb(FuncT &&_f)

define a callback function for initializing the buffer size. Will throw if a request for the buffer size has already occured.

static inline void set_buffer_count(size_t _buff_sz)

set the minimum number of objects for the ring buffer. Will throw if a request for the buffer size has already occured.

template<typename U>
struct rebind

Public Types

using other = ring_buffer_allocator<U, MMapV, BuffCntV>
template<typename Tp, std::size_t AlignV = 8 * sizeof(Tp)>
class tim::ert::aligned_allocator

Public Types

using value_type = Tp
using pointer = Tp*
using reference = Tp&
using const_pointer = const Tp*
using const_reference = const Tp&
using size_type = std::size_t
using difference_type = ptrdiff_t

Public Functions

aligned_allocator() = default
aligned_allocator(const aligned_allocator&) = default
inline aligned_allocator(aligned_allocator&&) noexcept
template<typename U>
inline aligned_allocator(const aligned_allocator<U, AlignV>&)
~aligned_allocator() = default
aligned_allocator &operator=(const aligned_allocator&) = delete
aligned_allocator &operator==(aligned_allocator&&) = delete
inline bool operator!=(const aligned_allocator &other) const
inline bool operator==(const aligned_allocator&) const
inline Tp *address(Tp &r) const
inline const Tp *address(const Tp &s) const
inline std::size_t max_size() const
inline void construct(Tp *const p, const Tp &t) const
template<typename ...ArgsT>
inline void construct(Tp *const p, ArgsT&&... args) const
inline void destroy(Tp *const p) const
inline Tp *allocate(const std::size_t n) const
inline void deallocate(Tp *const ptr, const std::size_t) const
template<typename U>
inline Tp *allocate(const std::size_t n, const U*) const
inline std::size_t get_alignment()
template<typename U>
struct rebind

Public Types

typedef aligned_allocator<U, AlignV> other

Graph Classes

The graph classes are responsible for maintaining the hierarchy of the calling context tree. tim::graph, tim::graph_data, and tim::node::graph<T> are rarely interacted with directly. Storage results are reported within nested std::vectors of tim::node::result<T> and tim::node::tree<T>. The former provides the data in a flat heirarchy where the calling-context is represented through indentation and a depth value, the latter represents the calling-context through a recursive structure.

template<typename T, typename AllocatorT>
class graph

Arbitrary Graph / Tree (i.e. binary-tree but not binary). It is unlikely that this class will interacted with directly.

template<typename NodeT>
class graph_data

tim::graph instance + current node + head note + sea-level. Sea-level is defined as the node depth after a fork from another graph instance and is only relevant for worker-threads)

template<typename Tp>
struct tim::node::graph : private data::node_type

This is the compact representation of a measurement in the call-graph.

tparam Tp

Component type

Public Functions

inline bool &is_dummy()

denotes this is a placeholder for synchronization

inline uint32_t &tid()

thread identifier

inline uint32_t &pid()

process identifier

inline uint64_t &id()

hash identifer

inline int64_t &depth()

depth in call-graph

inline Tp &obj()

this is the instance that gets updated in call-graph

inline stats_type &stats()

statistics data for entry in call-graph

template<typename Tp>
struct tim::node::result : public data::result_type

This data type is used when rendering the flat representation (i.e. loop-iterable) representation of the calling-context. The prefix here will be identical to the prefix in the text output.

tparam Tp

Component type

Public Functions

inline uint32_t &tid()

measurement thread. May be std::numeric_limits<uint16_t>max() (i.e. 65536) if this entry is a combination of multiple threads

inline uint32_t &pid()

the process identifier of the reporting process, if multiple process data is combined, or the process identifier of the collecting process

inline int64_t &depth()

depth of the node in the calling-context

inline uint64_t &hash()

hash identifer of the node

inline uint64_t &rolling_hash()

the summation of this hash and it’s parent hashes

inline string_t &prefix()

the associated string with the hash + indentation and other decoration

inline uintvector_t &hierarchy()

an array of the hash value + each parent hash (not serialized)

inline Tp &data()

reference to the component

inline stats_type &stats()

reference to the associate statistical accumulation of the data (if any)

inline uint64_t &id()

alias for hash()

inline Tp &obj()

alias for data()

template<typename Tp>
struct tim::basic_tree

Basic hierarchical tree implementation. Expects population from tim::graph.

tparam Tp

Component type

Public Functions

template<typename GraphT, typename ItrT>
this_type &operator()(const GraphT &g, ItrT root)

construction from tim::graph<Tp>

inline auto &get_value()

return the current tree node

inline auto &get_children()

return the array of child nodes

template<typename Tp, typename StatT>
struct tim::node::entry : public std::tuple<Tp, StatT>

This data type is used in tim::node::tree for inclusive and exclusive values.

tparam Tp

Component type

tparam StatT

Statistics type

Public Functions

inline Tp &data()

component object with either inclusive or exclusive values

inline StatT &stats()

statistics data with either inclusive or exclusive values

inline const Tp &data() const

component object with either inclusive or exclusive values

inline const StatT &stats() const

statistics data with either inclusive or exclusive values

template<typename Tp>
struct tim::node::tree : private data::tree_type

This data type is used when rendering the hierarchical representation (i.e. requires recursion) representation of the calling-context. The prefix here has no decoration.

tparam Tp

Generally tim::basic_tree<ComponentT>

Public Functions

inline bool &is_dummy()

returns whether or not this node is a synchronization point and, if so, should be ignored

inline uint64_t &hash()

returns the hash identifier for the associated string identifier

inline int64_t &depth()

returns the depth of the node in the tree. NOTE: this value may be relative to dummy nodes

inline idset_type &tid()

the set of thread ids this data was collected from

inline idset_type &pid()

the set of process ids this data was collected from

inline entry_type &inclusive()

the inclusive data + statistics

inline entry_type &exclusive()

the exclusive data + statistics

Graph Result and Tree Sample

using namespace tim;
using wall_clock = component::wall_clock;
using node_type  = node::result<wall_clock>;
using tree_type  = basic_tree<node::tree<wall_clock>>;

// the flat data for the process
std::vector<node_type> foo = storage<wall_clock>::instance()->get();

// aggregated flat data from distributed memory process parallelism
// depending on settings, maybe contain all data on rank 0, partial data, or no data
// on non-zero ranks
std::vector<std::vector<node_type>> bar = storage<wall_clock>::instance()->dmp_get();

// the tree data for the process
std::vector<tree_type> baz{};
baz = storage<wall_clock>::instance()->get(baz);

// aggregated tree data from distributed memory process parallelism
// depending on settings, maybe contain all data on rank 0, partial data, or no data
// on non-zero ranks
std::vector<std::vector<tree_type>> spam{};
spam = storage<wall_clock>::instance()->dmp_get(spam);

Graph Result and Tree Comparison

#----------------------------------------#
# Storage Result
#----------------------------------------#
  Thread id            : 0
  Process id           : 4385
  Depth                : 0
  Hash                 : 9631199822919835227
  Rolling hash         : 9631199822919835227
  Prefix               : >>> foo
  Hierarchy            : [9631199822919835227]
  Data object          :    6.534 sec wall
  Statistics           : [sum: 6.53361] [min: 6.53361] [max: 6.53361] [sqr: 42.6881] [count: 1]
#----------------------------------------#
  Thread id            : 0
  Process id           : 4385
  Depth                : 1
  Hash                 : 11474628671133349553
  Rolling hash         : 2659084420343633164
  Prefix               : >>> |_bar
  Hierarchy            : [9631199822919835227, 11474628671133349553]
  Data object          :    5.531 sec wall
  Statistics           : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]

#----------------------------------------#
# Storage Tree
#----------------------------------------#
Thread id            : {0}
Process id           : {4385}
Depth                : -1
Hash                 : 0
Prefix               : unknown-hash=0
Inclusive data       :    0.000 sec wall
Inclusive stat       : [sum: 0] [min: 0] [max: 0] [sqr: 0] [count: 0]
Exclusive data       :   -6.534 sec wall
Exclusive stat       : [sum: 0] [min: 0] [max: 0] [sqr: 0] [count: 0]
  #----------------------------------------#
  Thread id            : {0}
  Process id           : {4385}
  Depth                : 0
  Hash                 : 9631199822919835227
  Prefix               : foo
  Inclusive data       :    6.534 sec wall
  Inclusive stat       : [sum: 6.53361] [min: 6.53361] [max: 6.53361] [sqr: 42.6881] [count: 1]
  Exclusive data       :    1.002 sec wall
  Exclusive stat       : [sum: 1.00246] [min: 6.53361] [max: 6.53361] [sqr: 34.9765] [count: 1]
    #----------------------------------------#
    Thread id            : {0}
    Process id           : {4385}
    Depth                : 1
    Hash                 : 11474628671133349553
    Prefix               : bar
    Inclusive data       :    5.531 sec wall
    Inclusive stat       : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]
    Exclusive data       :    5.531 sec wall
    Exclusive stat       : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]

Note the first entry of storage tree has a negative depth and hash of zero. Nodes such of these are “dummy” nodes which timemory keeps internally as bookmarks for root nodes and thread-forks (parent call-graph location when a child thread was initialized or returned to “sea-level”). These may be removed in future versions of timemory.

Storage Class

The tim::storage class is a thread-local singleton which handles the call-graph and persistent data accumulation for each component. It is stored as a std::unique_ptr which automatically deletes itself when the thread exits. On the non-primary thread, destruction of the singleton merges it’s call-graph data into the storage singleton on the primary thread. Initialization and finalization of the storage class is the ONLY time that thread synchronization and inter-process communication occurs. This characteristic enables timemory storage to arbitrarily scale to any number of threads and/or processes without performance degradation. If you want to information of the state of the call-graph, the tim::storage<T> is the structure to do so, e.g. the current size of the call-graph, a serialization of the current process- and thread-specific, etc. Invoking the get() member function will return the data for the current thread on worker threads and invoking the get() member function on the primary thread will return the data for all the threads. Invoking mpi_get() will aggregate the results across all MPI processes, upc_get() will aggregate the results across all the UPC++ results, and dmp_get() (dmp == distributed memory parallelism) will aggregate all the results across MPI and UPC++ processes.

class tim::base::storage

Subclassed by tim::impl::storage< Type, false >, tim::impl::storage< Type, true >

Public Types

using string_t = std::string
using this_type = storage

Public Functions

storage(bool _is_master, int64_t _instance_id, std::string _label)
virtual ~storage()
explicit storage(const this_type&) = delete
explicit storage(this_type&&) = delete
this_type &operator=(const this_type&) = delete
this_type &operator=(this_type &&rhs) = delete
inline virtual void print()
inline virtual void cleanup()
inline virtual void stack_clear()
inline virtual void disable()
inline virtual void initialize()
inline virtual void finalize()
inline virtual bool global_init()
inline virtual bool thread_init()
inline virtual bool data_init()
inline const hash_map_ptr_t &get_hash_ids() const
inline const hash_alias_ptr_t &get_hash_aliases() const
hash_value_t add_hash_id(const std::string &_prefix)
void add_hash_id(uint64_t _lhs, uint64_t _rhs)
inline bool is_initialized() const
inline int64_t instance_id() const
void free_shared_manager()
template<typename Tp, typename Vp>
inline base::storage *base_instance()

Public Static Functions

template<typename Tp, typename Vp>
static this_type *base_instance()
template<typename Tp, typename Vp>
class storage : public tim::impl::storage<Tp, trait::uses_value_storage<Tp, Vp>::value>