Memory Management¶
Detailed Doxygen Documentation.
Manager Class¶
The tim::manager
class is a thread-local singleton which handles the memory management for each thread.
It is stored as a std::shared_ptr
which automatically deletes itself when the thread exits.
When the thread-local singleton for each component storage class is created, it increments the reference
count for this class to ensure that it exists while any storage classes are allocated. When the
storage singletons are created, it registers functors with the manager for destorying itself.
-
class tim::manager¶
Public Functions
-
template<typename Func>
void add_cleanup(void*, Func&&)¶ add functors to destroy instances based on a pointer
-
template<typename Func>
void add_cleanup(const std::string&, Func&&)¶ add functors to destroy instances based on a string key
-
template<typename StackFuncT, typename FinalFuncT>
void add_finalizer(const std::string&, StackFuncT&&, FinalFuncT&&, bool, int32_t = 0)¶ this is used by storage classes for finalization.
-
void remove_cleanup(void*)¶
remove a cleanup functor
-
void remove_cleanup(const std::string&)¶
remove a cleanup functor
-
void remove_finalizer(const std::string&)¶
remove a finalizer functor
-
void cleanup(const std::string&)¶
execute a cleanup based on a key
-
inline void set_write_metadata(short v)¶
Set to 0 for yes if other output, -1 for never, or 1 for yes.
-
void write_metadata(const std::string&, const char* = "")¶
Print metadata to filename.
-
std::ostream &write_metadata(std::ostream&)¶
Write metadata to ostream.
-
void update_metadata_prefix()¶
Updates settings, rank, output prefix, etc.
-
inline int32_t get_rank() const¶
Get the dmp rank. This is stored to avoid having to do MPI/UPC++ query after finalization has been called.
-
inline bool is_finalizing() const¶
Query whether finalization is currently occurring.
-
inline void is_finalizing(bool v)¶
Sets whether finalization is currently occuring.
-
inline void add_entries(uint64_t n)¶
Add number of component output data entries. If this value is zero, metadata output is suppressed unless tim::manager::set_write_metadata was assigned a value of 1.
-
void add_synchronization(const std::string&, int64_t, std::function<void()>)¶
Add function for synchronizing data in threads.
-
void remove_synchronization(const std::string&, int64_t)¶
Remove function for synchronizing data in threads.
-
void synchronize()¶
Synchronizes thread-data for storage.
-
inline int32_t instance_count() const¶
Get the instance ID for this manager instance.
-
inline int64_t get_tid() const¶
Get the thread-index for this manager instance.
Public Static Functions
-
static pointer_t instance()¶
Get a shared pointer to the instance for the current thread.
-
static pointer_t master_instance()¶
Get a shared pointer to the instance on the primary thread.
-
static inline int32_t total_instance_count()¶
Get the number of instances that are currently allocated. This is decremented during the destructor of each manager instance, unlike tim::manager::get_thread_count()
-
static inline void use_exit_hook(bool val)¶
Enable setting std::exit callback.
-
static void exit_hook()¶
The exit hook function.
-
static inline int32_t get_thread_count()¶
This effectively provides the total number of threads which collected data. It is only “decremented” when the last manager instance has been deleted, at which point it is set to zero.
-
static bool get_is_main_thread()¶
Return whether this is the main thread.
-
template<typename Tp>
static void add_metadata(const std::string&, const Tp&)¶ Add a metadata entry of a non-string type. If this fails to serialize, either include either the approiate header from timemory/tpls/cereal/cereal/types or provides a serialization function for the type.
-
static void add_metadata(const std::string&, const char*)¶
Add a metadata entry of a const character array. This only exists to avoid the template function from serializing the pointer.
-
static void add_metadata(const std::string&, const std::string&)¶
Add a metadata entry of a string.
-
static inline void set_persistent_master(pointer_t _pinst)¶
This function stores the primary manager instance for the application.
-
static inline void update_settings(const settings &_settings)¶
Updates the settings instance use by the manager instance.
-
static inline settings &&swap_settings(settings _settings)¶
swaps out the actual settings instance
-
template<typename ...Types>
struct get_storage : public tim::manager::filtered_get_storage<mpl::implemented_t<Types...>>¶ This is used to apply/query storage data for multiple component types.
using types = tim::available_types_t; // type-list of all enumerated types manager_t::get_storage<types>::clear(); // clear storage for all enumerated types
-
template<typename Func>
Graph Classes¶
The graph classes are responsible for maintaining the hierarchy of the calling context tree.
tim::graph
, tim::graph_data
, and tim::node::graph<T>
are rarely interacted with directly.
Storage results are reported within nested std::vector
s of tim::node::result<T>
and
tim::node::tree<T>
. The former provides the data in a flat heirarchy where the calling-context
is represented through indentation and a depth value, the latter represents the
calling-context through a recursive structure.
-
template<typename T, typename AllocatorT>
class graph¶ Arbitrary Graph / Tree (i.e. binary-tree but not binary). It is unlikely that this class will interacted with directly.
-
template<typename NodeT>
class graph_data¶ tim::graph instance + current node + head note + sea-level. Sea-level is defined as the node depth after a fork from another graph instance and is only relevant for worker-threads)
-
template<typename Tp>
struct tim::node::graph : private data::node_type¶ This is the compact representation of a measurement in the call-graph.
- tparam Tp
Component type
Public Functions
-
inline bool &is_dummy()¶
denotes this is a placeholder for synchronization
-
inline uint32_t &tid()¶
thread identifier
-
inline uint32_t &pid()¶
process identifier
-
inline uint64_t &id()¶
hash identifer
-
inline int64_t &depth()¶
depth in call-graph
-
inline stats_type &stats()¶
statistics data for entry in call-graph
-
template<typename Tp>
struct tim::node::result : public data::result_type¶ This data type is used when rendering the flat representation (i.e. loop-iterable) representation of the calling-context. The prefix here will be identical to the prefix in the text output.
- tparam Tp
Component type
Public Functions
-
inline uint32_t &tid()¶
measurement thread. May be
std::numeric_limits<uint16_t>max()
(i.e. 65536) if this entry is a combination of multiple threads
-
inline uint32_t &pid()¶
the process identifier of the reporting process, if multiple process data is combined, or the process identifier of the collecting process
-
inline int64_t &depth()¶
depth of the node in the calling-context
-
inline uint64_t &hash()¶
hash identifer of the node
-
inline uint64_t &rolling_hash()¶
the summation of this hash and it’s parent hashes
-
inline string_t &prefix()¶
the associated string with the hash + indentation and other decoration
-
inline uintvector_t &hierarchy()¶
an array of the hash value + each parent hash (not serialized)
-
inline stats_type &stats()¶
reference to the associate statistical accumulation of the data (if any)
-
template<typename Tp>
struct tim::basic_tree¶ Basic hierarchical tree implementation. Expects population from tim::graph.
- tparam Tp
Component type
-
template<typename Tp, typename StatT>
struct tim::node::entry : public std::tuple<Tp, StatT>¶ This data type is used in tim::node::tree for inclusive and exclusive values.
- tparam Tp
Component type
- tparam StatT
Statistics type
Public Functions
-
template<typename Tp>
struct tim::node::tree : private data::tree_type¶ This data type is used when rendering the hierarchical representation (i.e. requires recursion) representation of the calling-context. The prefix here has no decoration.
- tparam Tp
Generally
tim::basic_tree<ComponentT>
Public Functions
-
inline bool &is_dummy()¶
returns whether or not this node is a synchronization point and, if so, should be ignored
-
inline uint64_t &hash()¶
returns the hash identifier for the associated string identifier
-
inline int64_t &depth()¶
returns the depth of the node in the tree. NOTE: this value may be relative to dummy nodes
-
inline idset_type &tid()¶
the set of thread ids this data was collected from
-
inline idset_type &pid()¶
the set of process ids this data was collected from
-
inline entry_type &inclusive()¶
the inclusive data + statistics
-
inline entry_type &exclusive()¶
the exclusive data + statistics
Graph Result and Tree Sample¶
using namespace tim;
using wall_clock = component::wall_clock;
using node_type = node::result<wall_clock>;
using tree_type = basic_tree<node::tree<wall_clock>>;
// the flat data for the process
std::vector<node_type> foo = storage<wall_clock>::instance()->get();
// aggregated flat data from distributed memory process parallelism
// depending on settings, maybe contain all data on rank 0, partial data, or no data
// on non-zero ranks
std::vector<std::vector<node_type>> bar = storage<wall_clock>::instance()->dmp_get();
// the tree data for the process
std::vector<tree_type> baz{};
baz = storage<wall_clock>::instance()->get(baz);
// aggregated tree data from distributed memory process parallelism
// depending on settings, maybe contain all data on rank 0, partial data, or no data
// on non-zero ranks
std::vector<std::vector<tree_type>> spam{};
spam = storage<wall_clock>::instance()->dmp_get(spam);
Graph Result and Tree Comparison¶
#----------------------------------------#
# Storage Result
#----------------------------------------#
Thread id : 0
Process id : 4385
Depth : 0
Hash : 9631199822919835227
Rolling hash : 9631199822919835227
Prefix : >>> foo
Hierarchy : [9631199822919835227]
Data object : 6.534 sec wall
Statistics : [sum: 6.53361] [min: 6.53361] [max: 6.53361] [sqr: 42.6881] [count: 1]
#----------------------------------------#
Thread id : 0
Process id : 4385
Depth : 1
Hash : 11474628671133349553
Rolling hash : 2659084420343633164
Prefix : >>> |_bar
Hierarchy : [9631199822919835227, 11474628671133349553]
Data object : 5.531 sec wall
Statistics : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]
#----------------------------------------#
# Storage Tree
#----------------------------------------#
Thread id : {0}
Process id : {4385}
Depth : -1
Hash : 0
Prefix : unknown-hash=0
Inclusive data : 0.000 sec wall
Inclusive stat : [sum: 0] [min: 0] [max: 0] [sqr: 0] [count: 0]
Exclusive data : -6.534 sec wall
Exclusive stat : [sum: 0] [min: 0] [max: 0] [sqr: 0] [count: 0]
#----------------------------------------#
Thread id : {0}
Process id : {4385}
Depth : 0
Hash : 9631199822919835227
Prefix : foo
Inclusive data : 6.534 sec wall
Inclusive stat : [sum: 6.53361] [min: 6.53361] [max: 6.53361] [sqr: 42.6881] [count: 1]
Exclusive data : 1.002 sec wall
Exclusive stat : [sum: 1.00246] [min: 6.53361] [max: 6.53361] [sqr: 34.9765] [count: 1]
#----------------------------------------#
Thread id : {0}
Process id : {4385}
Depth : 1
Hash : 11474628671133349553
Prefix : bar
Inclusive data : 5.531 sec wall
Inclusive stat : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]
Exclusive data : 5.531 sec wall
Exclusive stat : [sum: 5.53115] [min: 0.307581] [max: 0.307581] [sqr: 7.71154] [count: 5]
Note the first entry of storage tree has a negative depth and hash of zero. Nodes such of these are “dummy” nodes which timemory keeps internally as bookmarks for root nodes and thread-forks (parent call-graph location when a child thread was initialized or returned to “sea-level”). These may be removed in future versions of timemory.
Storage Class¶
The tim::storage
class is a thread-local singleton which handles the call-graph and persistent
data accumulation for each component. It is stored as a std::unique_ptr
which automatically deletes
itself when the thread exits. On the non-primary thread, destruction of the singleton merges it’s
call-graph data into the storage singleton on the primary thread. Initialization and finalization
of the storage class is the ONLY time that thread synchronization and inter-process communication
occurs. This characteristic enables timemory storage to arbitrarily scale to any number of threads and/or
processes without performance degradation. If you want to information of the state of the call-graph,
the tim::storage<T>
is the structure to do so, e.g. the current size of the call-graph, a serialization
of the current process- and thread-specific, etc. Invoking the get()
member function will return
the data for the current thread on worker threads and invoking the get()
member function on the primary
thread will return the data for all the threads. Invoking mpi_get()
will aggregate the results
across all MPI processes, upc_get()
will aggregate the results across all the UPC++ results, and
dmp_get()
(dmp == distributed memory parallelism) will aggregate all the results across MPI and UPC++
processes.
-
class tim::base::storage¶
Subclassed by tim::impl::storage< Type, false >, tim::impl::storage< Type, true >
Public Functions
-
storage(bool _is_master, int64_t _instance_id, std::string _label)¶
-
virtual ~storage()¶
-
inline virtual void print()¶
-
inline virtual void cleanup()¶
-
inline virtual void stack_clear()¶
-
inline virtual void disable()¶
-
inline virtual void initialize()¶
-
inline virtual void finalize()¶
-
inline virtual bool global_init()¶
-
inline virtual bool thread_init()¶
-
inline virtual bool data_init()¶
-
inline const hash_map_ptr_t &get_hash_ids() const¶
-
inline const hash_alias_ptr_t &get_hash_aliases() const¶
-
hash_value_t add_hash_id(const std::string &_prefix)¶
-
void add_hash_id(uint64_t _lhs, uint64_t _rhs)¶
-
inline bool is_initialized() const¶
-
inline int64_t instance_id() const¶
-
storage(bool _is_master, int64_t _instance_id, std::string _label)¶