Memory Management

Detailed Doxygen Documentation.

Manager Class

The tim::manager class is a thread-local singleton which handles the memory management for each thread. It is stored as a std::shared_ptr which automatically deletes itself when the thread exits. When the thread-local singleton for each component storage class is created, it increments the reference count for this class to ensure that it exists while any storage classes are allocated. When the storage singletons are created, it registers functors with the manager for destorying itself.

class tim::manager

Public Functions

template<typename Func>
void add_cleanup(void*, Func&&)

add functors to destroy instances based on a pointer

template<typename Func>
void add_cleanup(const std::string&, Func&&)

add functors to destroy instances based on a string key

template<typename StackFuncT, typename FinalFuncT>
void add_finalizer(const std::string&, StackFuncT&&, FinalFuncT&&, bool, int32_t = 0)

this is used by storage classes for finalization.

void remove_cleanup(void*)

remove a cleanup functor

void remove_cleanup(const std::string&)

remove a cleanup functor

void remove_finalizer(const std::string&)

remove a finalizer functor

void cleanup(const std::string&)

execute a cleanup based on a key

inline void set_write_metadata(short v)

Set to 0 for yes if other output, -1 for never, or 1 for yes.

void write_metadata(const std::string&, const char* = "")

Print metadata to filename.

std::ostream &write_metadata(std::ostream&)

Write metadata to ostream.

void update_metadata_prefix()

Updates settings, rank, output prefix, etc.

inline int32_t get_rank() const

Get the dmp rank. This is stored to avoid having to do MPI/UPC++ query after finalization has been called.

inline bool is_finalizing() const

Query whether finalization is currently occurring.

inline void is_finalizing(bool v)

Sets whether finalization is currently occuring.

inline void add_entries(uint64_t n)

Add number of component output data entries. If this value is zero, metadata output is suppressed unless tim::manager::set_write_metadata was assigned a value of 1.

void add_synchronization(const std::string&, int64_t, std::function<void()>)

Add function for synchronizing data in threads.

void remove_synchronization(const std::string&, int64_t)

Remove function for synchronizing data in threads.

void synchronize()

Synchronizes thread-data for storage.

inline int32_t instance_count() const

Get the instance ID for this manager instance.

inline int64_t get_tid() const

Get the thread-index for this manager instance.

Public Static Functions

static pointer_t instance()

Get a shared pointer to the instance for the current thread.

static pointer_t master_instance()

Get a shared pointer to the instance on the primary thread.

static inline int32_t total_instance_count()

Get the number of instances that are currently allocated. This is decremented during the destructor of each manager instance, unlike tim::manager::get_thread_count()

static inline void use_exit_hook(bool val)

Enable setting std::exit callback.

static void exit_hook()

The exit hook function.

static inline int32_t get_thread_count()

This effectively provides the total number of threads which collected data. It is only “decremented” when the last manager instance has been deleted, at which point it is set to zero.

static bool get_is_main_thread()

Return whether this is the main thread.

template<typename Tp>
static void add_metadata(const std::string&, const Tp&)

Add a metadata entry of a non-string type. If this fails to serialize, either include either the approiate header from timemory/tpls/cereal/cereal/types or provides a serialization function for the type.

static void add_metadata(const std::string&, const char*)

Add a metadata entry of a const character array. This only exists to avoid the template function from serializing the pointer.

static void add_metadata(const std::string&, const std::string&)

Add a metadata entry of a string.

static inline void set_persistent_master(pointer_t _pinst)

This function stores the primary manager instance for the application.

static inline void update_settings(const settings &_settings)

Updates the settings instance use by the manager instance.

static inline settings &&swap_settings(settings _settings)

swaps out the actual settings instance

template<typename ...Types>
struct get_storage : public tim::manager::filtered_get_storage<mpl::implemented_t<Types...>>

This is used to apply/query storage data for multiple component types.

using types = tim::available_types_t;   // type-list of all enumerated types
manager_t::get_storage<types>::clear(); // clear storage for all enumerated types

template<template<typename...> class Tuple, typename ...Types>
struct get_storage<Tuple<Types...>> : public tim::manager::filtered_get_storage<mpl::implemented_t<Types...>>

Overload for a tuple/type-list.

Graph Classes

The graph classes are responsible for maintaining the hierarchy of the calling context tree. tim::graph, tim::graph_data, and tim::node::graph<T> are rarely interacted with directly. Storage results are reported within nested std::vectors of tim::node::result<T> and tim::node::tree<T>. The former provides the data in a flat heirarchy where the calling-context is represented through indentation and a depth value, the latter represents the calling-context through a recursive structure.

template<typename T, typename AllocatorT>
class graph

Arbitrary Graph / Tree (i.e. binary-tree but not binary). It is unlikely that this class will interacted with directly.

template<typename NodeT>
class graph_data

tim::graph instance + current node + head note + sea-level. Sea-level is defined as the node depth after a fork from another graph instance and is only relevant for worker-threads)

template<typename Tp>
struct tim::node::graph : private data::node_type

This is the compact representation of a measurement in the call-graph.

tparam Tp

Component type

Public Functions

inline bool &is_dummy()

denotes this is a placeholder for synchronization

inline uint32_t &tid()

thread identifier

inline uint32_t &pid()

process identifier

inline uint64_t &id()

hash identifer

inline int64_t &depth()

depth in call-graph

inline Tp &obj()

this is the instance that gets updated in call-graph

inline stats_type &stats()

statistics data for entry in call-graph

template<typename Tp>
struct tim::node::result : public data::result_type

This data type is used when rendering the flat representation (i.e. loop-iterable) representation of the calling-context. The prefix here will be identical to the prefix in the text output.

tparam Tp

Component type

Public Functions

inline uint32_t &tid()

measurement thread. May be std::numeric_limits<uint16_t>max() (i.e. 65536) if this entry is a combination of multiple threads

inline uint32_t &pid()

the process identifier of the reporting process, if multiple process data is combined, or the process identifier of the collecting process

inline int64_t &depth()

depth of the node in the calling-context

inline uint64_t &hash()

hash identifer of the node

inline uint64_t &rolling_hash()

the summation of this hash and it’s parent hashes

inline string_t &prefix()

the associated string with the hash + indentation and other decoration

inline uintvector_t &hierarchy()

an array of the hash value + each parent hash (not serialized)

inline Tp &data()

reference to the component

inline stats_type &stats()

reference to the associate statistical accumulation of the data (if any)

inline uint64_t &id()

alias for hash()

inline Tp &obj()

alias for data()

template<typename Tp>
struct tim::basic_tree

Basic hierarchical tree implementation. Expects population from tim::graph.

tparam Tp

Component type

Public Functions

template<typename GraphT, typename ItrT>
this_type &operator()(const GraphT &g, ItrT root)

construction from tim::graph<Tp>

inline auto &get_value()

return the current tree node

inline auto &get_children()

return the array of child nodes

template<typename Tp, typename StatT>
struct tim::node::entry : public std::tuple<Tp, StatT>

This data type is used in tim::node::tree for inclusive and exclusive values.

tparam Tp

Component type

tparam StatT

Statistics type

Public Functions

inline Tp &data()

component object with either inclusive or exclusive values

inline StatT &stats()

statistics data with either inclusive or exclusive values

inline const Tp &data() const

component object with either inclusive or exclusive values

inline const StatT &stats() const

statistics data with either inclusive or exclusive values

template<typename Tp>
struct tim::node::tree : private data::tree_type

This data type is used when rendering the hierarchical representation (i.e. requires recursion) representation of the calling-context. The prefix here has no decoration.

tparam Tp

Generally tim::basic_tree<ComponentT>

Public Functions

inline bool &is_dummy()

returns whether or not this node is a synchronization point and, if so, should be ignored

inline uint64_t &hash()

returns the hash identifier for the associated string identifier

inline int64_t &depth()

returns the depth of the node in the tree. NOTE: this value may be relative to dummy nodes

inline idset_type &tid()

the set of thread ids this data was collected from

inline idset_type &pid()

the set of process ids this data was collected from

inline entry_type &inclusive()

the inclusive data + statistics

inline entry_type &exclusive()

the exclusive data + statistics

Graph Result and Tree Sample

using namespace tim;
using wall_clock = component::wall_clock;
using node_type  = node::result<wall_clock>;
using tree_type  = basic_tree<node::tree<wall_clock>>;

// the flat data for the process
std::vector<node_type> foo = storage<wall_clock>::instance()->get();

// aggregated flat data from distributed memory process parallelism
// depending on settings, maybe contain all data on rank 0, partial data, or no data
// on non-zero ranks
std::vector<std::vector<node_type>> bar = storage<wall_clock>::instance()->dmp_get();

// the tree data for the process
std::vector<tree_type> baz{};
baz = storage<wall_clock>::instance()->get(baz);

// aggregated tree data from distributed memory process parallelism
// depending on settings, maybe contain all data on rank 0, partial data, or no data
// on non-zero ranks
std::vector<std::vector<tree_type>> spam{};
spam = storage<wall_clock>::instance()->dmp_get(spam);

Graph Result and Tree Comparison

#----------------------------------------#
# Storage Result
#----------------------------------------#
  Thread id            : 0
  Process id           : 4385
  Depth                : 0
  Hash                 : 9631199822919835227
  Rolling hash         : 9631199822919835227
  Prefix               : >>> foo
  Hierarchy            : [9631199822919835227]
  Data object          :    6.534 sec wall
  Statistics           : [sum: 6.53361] [min: 6.53361] [max: 6.53361] [sqr: 42.6881] [count: 1]
#----------------------------------------#
  Thread id            : 0
  Process id           : 4385
  Depth                : 1
  Hash                 : 11474628671133349553
  Rolling hash         : 2659084420343633164
  Prefix               : >>> |_bar
  Hierarchy            : [9631199822919835227, 11474628671133349553]
  Data object          :    5.531 sec wall
  Statistics           : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]

#----------------------------------------#
# Storage Tree
#----------------------------------------#
Thread id            : {0}
Process id           : {4385}
Depth                : -1
Hash                 : 0
Prefix               : unknown-hash=0
Inclusive data       :    0.000 sec wall
Inclusive stat       : [sum: 0] [min: 0] [max: 0] [sqr: 0] [count: 0]
Exclusive data       :   -6.534 sec wall
Exclusive stat       : [sum: 0] [min: 0] [max: 0] [sqr: 0] [count: 0]
  #----------------------------------------#
  Thread id            : {0}
  Process id           : {4385}
  Depth                : 0
  Hash                 : 9631199822919835227
  Prefix               : foo
  Inclusive data       :    6.534 sec wall
  Inclusive stat       : [sum: 6.53361] [min: 6.53361] [max: 6.53361] [sqr: 42.6881] [count: 1]
  Exclusive data       :    1.002 sec wall
  Exclusive stat       : [sum: 1.00246] [min: 6.53361] [max: 6.53361] [sqr: 34.9765] [count: 1]
    #----------------------------------------#
    Thread id            : {0}
    Process id           : {4385}
    Depth                : 1
    Hash                 : 11474628671133349553
    Prefix               : bar
    Inclusive data       :    5.531 sec wall
    Inclusive stat       : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]
    Exclusive data       :    5.531 sec wall
    Exclusive stat       : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]

Note the first entry of storage tree has a negative depth and hash of zero. Nodes such of these are “dummy” nodes which timemory keeps internally as bookmarks for root nodes and thread-forks (parent call-graph location when a child thread was initialized or returned to “sea-level”). These may be removed in future versions of timemory.

Storage Class

The tim::storage class is a thread-local singleton which handles the call-graph and persistent data accumulation for each component. It is stored as a std::unique_ptr which automatically deletes itself when the thread exits. On the non-primary thread, destruction of the singleton merges it’s call-graph data into the storage singleton on the primary thread. Initialization and finalization of the storage class is the ONLY time that thread synchronization and inter-process communication occurs. This characteristic enables timemory storage to arbitrarily scale to any number of threads and/or processes without performance degradation. If you want to information of the state of the call-graph, the tim::storage<T> is the structure to do so, e.g. the current size of the call-graph, a serialization of the current process- and thread-specific, etc. Invoking the get() member function will return the data for the current thread on worker threads and invoking the get() member function on the primary thread will return the data for all the threads. Invoking mpi_get() will aggregate the results across all MPI processes, upc_get() will aggregate the results across all the UPC++ results, and dmp_get() (dmp == distributed memory parallelism) will aggregate all the results across MPI and UPC++ processes.

class tim::base::storage

Subclassed by tim::impl::storage< Type, false >, tim::impl::storage< Type, true >

Public Types

using string_t = std::string
using this_type = storage

Public Functions

storage(bool _is_master, int64_t _instance_id, std::string _label)
virtual ~storage()
explicit storage(const this_type&) = delete
explicit storage(this_type&&) = delete
this_type &operator=(const this_type&) = delete
this_type &operator=(this_type &&rhs) = delete
inline virtual void print()
inline virtual void cleanup()
inline virtual void stack_clear()
inline virtual void disable()
inline virtual void initialize()
inline virtual void finalize()
inline virtual bool global_init()
inline virtual bool thread_init()
inline virtual bool data_init()
inline const hash_map_ptr_t &get_hash_ids() const
inline const hash_alias_ptr_t &get_hash_aliases() const
hash_value_t add_hash_id(const std::string &_prefix)
void add_hash_id(uint64_t _lhs, uint64_t _rhs)
inline bool is_initialized() const
inline int64_t instance_id() const
void free_shared_manager()
template<typename Tp, typename Vp>
inline base::storage *base_instance()

Public Static Functions

template<typename Tp, typename Vp>
static this_type *base_instance()
template<typename Tp, typename Vp>
class storage : public tim::impl::storage<Tp, trait::uses_value_storage<Tp, Vp>::value>