Project Integration

timemory as a Submodule

Timemory has a permissive MIT license and can be directly included within another project. C++ projects can take advantage of the header-only feature of timemory and simply include the folders source/timemory and external/cereal/include.

Using CMake

Timemory uses modern CMake INTERFACE targets to include the components you want without forcing you to include everything – this means that compiler flags, preprocessor definitions, include paths, link options, and link libraries are bundled into separate “library” targets that only need to be “linked” to in CMake.

Available CMake Targets

These are the full target names available within CMake. These targets are always provided but may provide an empty target if the underlying specifications (such as a library and include path) were not available when timemory was installed.

Target Description
COMPILED LIBRARIES
timemory::timemory-cxx-shared C/C++/Fortran library
timemory::timemory-cxx-static C/C++/Fortran library
timemory::timemory-c-shared Minimal C enum interface library (requires TIMEMORY_BUILD_C=ON)
timemory::timemory-c-static Minimal C enum interface library (requires TIMEMORY_BUILD_C=ON)
timemory::timemory-stubs-shared C/C++/Fortran stubs library
timemory::timemory-stubs-static C/C++/Fortran stubs library
timemory::timemory-jump-shared C/C++/Fortran jump library
timemory::timemory-jump-static C/C++/Fortran jump library
INTERFACE LIBRARIES
timemory::timemory-address-sanitizer Adds compiler flags to enable address sanitizer (-fsanitize=address)
timemory::timemory-alignment-sanitizer Adds compiler flags to enable alignment sanitizer (-fsanitize=alignment)
timemory::timemory-allinea-map Enables Allinea-MAP support
timemory::timemory-analysis-tools Internal. Provides sanitizer, gperftools-cpu, coverage, xray
timemory::timemory-arch Adds architecture-specific compiler flags
timemory::timemory-bounds-sanitizer Adds compiler flags to enable bounds sanitizer (-fsanitize=bounds)
timemory::timemory-caliper Enables Caliper support
timemory::timemory-compile-debuginfo Attempts to set best flags for more expressive profiling information in debug or optimized binaries
timemory::timemory-compile-extra Extra optimization flags
timemory::timemory-compile-options Adds the standard set of compiler flags used by timemory
timemory::timemory-compile-timing Adds compiler flags which report compilation timing metrics
timemory::timemory-compiler-instrument-compile-options INTERFACE
timemory::timemory-compiler-instrument Provides library for compiler instrumentation
timemory::timemory-coverage Enables code-coverage flags
timemory::timemory-cpu-roofline Enables flags and libraries for proper CPU roofline generation
timemory::timemory-craypat Enables CrayPAT support
timemory::timemory-cuda-compiler Enables some CUDA compiler flags
timemory::timemory-cuda Enables CUDA support
timemory::timemory-cudart-device Link to CUDA device runtime
timemory::timemory-cudart-static Link to CUDA runtime (static library)
timemory::timemory-cudart Link to CUDA runtime (shared library)
timemory::timemory-cupti Enables CUPTI support (requires linking to libcuda)
timemory::timemory-default-disabled Enables pre-processor directive for disabling timemory by default at runtime
timemory::timemory-default-visibility Adds -fvisibility=default compiler flag
timemory::timemory-develop-options Adds developer compiler flags
timemory::timemory-disable Enables pre-processor directive for disabling timemory completely
timemory::timemory-dmp Enables the default distributed memory parallelism library (e.g. MPI, UPC++)
timemory::timemory-dyninst Provides flags and libraries for Dyninst (dynamic instrumentation
timemory::timemory-extensions Provides a single target for all the timemory extensions which were found
timemory::timemory-extern Enables pre-processor directive to ensure all extern templates are used
timemory::timemory-external-shared Provides a single target for all the timemory extensions (shared libraries)
timemory::timemory-external-static Provides a single target for all the timemory extensions (static libraries)
timemory::timemory-gotcha Enables Gotcha support
timemory::timemory-gperftools Enables user-selected gperftools component ()
timemory::timemory-gpu-roofline Enables flags and libraries for proper GPU roofline generation
timemory::timemory-headers Provides minimal set of include flags to compile with timemory
timemory::timemory-hidden-visibility Adds -fvisibility=hidden compiler flag
timemory::timemory-hip-device Enables HIP support (device code)
timemory::timemory-hip Enables HIP support
timemory::timemory-instrument-functions Adds compiler flags to enable compile-time instrumentation
timemory::timemory-leak-sanitizer Adds compiler flags to enable leak sanitizer (-fsanitize=leak)
timemory::timemory-libunwind Enables libunwind support
timemory::timemory-likwid Enables LIKWID support
timemory::timemory-lto Adds link-time-optimization flags
timemory::timemory-mallocp-library Provides MALLOCP library for tracking memory allocations
timemory::timemory-memory-sanitizer Adds compiler flags to enable memory sanitizer (-fsanitize=memory)
timemory::timemory-mpi Enables MPI support
timemory::timemory-mpip-library Provides MPIP library for MPI performance analysis
timemory::timemory-nccl Enables CUDA NCCL support
timemory::timemory-ncclp-library Provides NCCLP library for NCCL performance analysis
timemory::timemory-no-mpi-init Disables the generation of MPI_Init and MPI_Init_thread symbols
timemory::timemory-null-sanitizer Adds compiler flags to enable null sanitizer (-fsanitize=null)
timemory::timemory-nvml Enables NVML support (NVIDIA)
timemory::timemory-ompt-library Provides OMPT library for OpenMP performance analysis
timemory::timemory-ompt Enables OpenMP-tools support
timemory::timemory-papi-static Enables PAPI support + links to static library
timemory::timemory-papi Enables PAPI support
timemory::timemory-perfetto Enables perfetto support
timemory::timemory-plotting Enables python plotting support (system call)
timemory::timemory-precompiled-headers Provides timemory-headers + precompiles headers if CMAKE_VERSION >= 3.16
timemory::timemory-python Enables python support (embedded interpreter)
timemory::timemory-roofline-options Compiler flags for roofline generation
timemory::timemory-roofline Enables flags and libraries for proper roofline generation
timemory::timemory-sanitizer-compile-options Adds compiler flags for sanitizers
timemory::timemory-sanitizer Adds compiler flags to enable leak sanitizer (-fsanitizer=leak)
timemory::timemory-statistics Enables statistics for all components which define TIMEMORY_STATISTICS_TYPE(...)
timemory::timemory-tau Enables TAU support
timemory::timemory-thread-sanitizer Adds compiler flags to enable thread sanitizer (-fsanitize=thread)
timemory::timemory-threading Enables multithreading support
timemory::timemory-undefined-sanitizer Adds compiler flags to enable undefined sanitizer (-fsanitize=undefined)
timemory::timemory-unreachable-sanitizer Adds compiler flags to enable unreachable sanitizer (-fsanitize=unreachable)
timemory::timemory-upcxx Enables UPC++ support
timemory::timemory-vector Adds pre-processor definition of the max vectorization width in bytes
timemory::timemory-vtune Enables VTune support (ittnotify)
timemory::timemory-xml Enables XML serialization support
timemory::timemory-xray Adds compiler flags to enable xray-instrumentation (Clang only)

find_package Approach with COMPONENTS

These libraries can be included in a downstream project via the COMPONENTS or OPTIONAL_COMPONENTS arguments to the CMake find_package command. When the COMPONENTS option is used, the default interface target will be named timemory. Alternatively, one can set the timemory_FIND_COMPONENTS_INTERFACE variable to define a custom interface library name.

When targets are listed after the COMPONENTS arguments to find_package, the timemory- prefix can be omitted. Additionally, the link type (shared or static) and languages suffixes (c, cxx, cuda) can be listed once and dropped from subsequent items in the list.

timemory will bundle the targets specified after COMPONENTS into one interface library.

# create interface target w/ the components
find_package(timemory REQUIRED COMPONENTS cxx shared compile-options)

# create some library
add_library(foo SHARED foo.cpp)

# import all the compiler defs, flags, linked libs, include paths, etc. from above components
target_link_library(foo timemory)

# override the name of INTERFACE library w/ the components
set(timemory_FIND_COMPONENTS_INTERFACE timemory-cuda-extern)

# creates interface library target: timemory-cuda-extern
find_package(timemory REQUIRED COMPONENTS cxx static compile-options cuda cupti)

# create anoter library
add_library(bar STATIC bar.cpp)

# import all the compiler defs, flags, linked libs, include paths, etc. from above components
target_link_library(foo timemory-cuda-extern)
find_package(timemory REQUIRED COMPONENTS headers cxx-shared)

add_executable(foo foo.cpp)
target_link_libraries(foo PRIVATE timemory)
set(timemory_FIND_COMPONENTS_INTERFACE timemory::foo-interface)
find_package(timemory REQUIRED COMPONENTS headers OPTIONAL_COMPONENTS arch papi cuda cupti)

add_executable(foo foo.cpp)
target_link_libraries(foo PRIVATE timemory::foo-interface)

Using Makefiles

Timemory generates a Makefile.timemory.inc during installation. This file is intended for projects which rely on Makefiles. In general, each of the above CMake targets generates LIBS, DEFS, INCLUDE, CFLAGS, CPPFLAGS, CUFLAGS, and DEPENDS variables, e.g. timemory::timemory-cxx-shared generates TIMEMORY_CXX_SHARED_LIBS, TIMEMORY_CXX_SHARED_DEFS, TIMEMORY_CXX_SHARED_INCLUDE, TIMEMORY_CXX_SHARED_CFLAGS, TIMEMORY_CXX_SHARED_CPPFLAGS. TIMEMORY_CXX_SHARED_CUFLAGS, and TIMEMORY_CXX_SHARED_DEPENDS. The *_DEPENDS is a list of the targets which the library depends on.

Variable Description
*_DEFS Pre-processor definition flags
*_INCLUDE Include flags
*_LIBS Linker flags
*_CFLAGS C compiler flags
*_CPPFLAGS C++ compiler flags
*_CUFLAGS CUDA compiler flags
*_DEPENDS Internal target dependencies
g++ $(TIMEMORY_PAPI_DEFS) $(TIMEMORY_PAPI_INCLUDE) $(TIMEMORY_PAPI_CPPFLAGS) foo.cpp -o foo $(TIMEMORY_PAPI_LIBS)

Compilation with the Template Interface

It has been noted elsewhere that direct use of the template interface can introduce long compile-times. However, this interface is extremely powerful and one might be tempted to use it directly. The 2011 standard of C++ introduced the concept of an extern template and it is highly recommended to use this feature if the template interface is used. In general, a project using the template interface should have a header which declares the component bundle as an extern template at the end. Here is example of what this might look like:

#include <timemory/variadic/component_bundle.hpp>
#include <timemory/variadic/auto_bundle.hpp>
#include <timemory/components/types.hpp>
#include <timemory/macros.hpp>

// create an API for your project
TIMEMORY_DEFINE_API(FooBenchmarking)

#if defined(DISABLE_BENCHMARKING)
// this will elimiate all components from the component_bundle or auto_bundle
// with 'api::FooBenchmarking' as the first template parameter
// e.g. bundle<Foo, ...> turns into bundle<Foo> (no components)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, api::FooBenchmarking, false_type)
#endif

// this structure will:
//  - Always record:
//      - wall-clock timer
//      - cpu-clock timer
//      - cpu utilization
//      - Any tools which downstream users inject into the user_global_bundle
//          - E.g. 'user_global_bundle::configure<peak_rss>()'
//  - Optionally enable activating (at runtime):
//      - PAPI hardware counters
//      - GPU kernel tracing
//      - GPU hardware counters
//      - The '*' at the end is what designates the component as optional
#if !defined(FOO_TOOLSET)
#define FOO_TOOLSET                             \
    tim::component_bundle<                      \
        tim::api::FooBenchmarking,              \
        tim::component::wall_clock,             \
        tim::component::cpu_clock,              \
        tim::component::cpu_util,               \
        tim::component::user_global_bundle,     \
        tim::component::papi_vector*,           \
        tim::component::cupti_activity*,        \
        tim::component::cupti_counters*>
#endif

namespace foo
{
namespace benchmark
{
using bundle_t = FOO_TOOLSET;
using auto_bundle_t = typename FOO_TOOLSET::auto_type;
}
}

//  THIS WILL MAKE SURE THE TEMPLATE NEVER GETS INSTANTIATED
//  LEADING TO SIGNIFICANTLY REDUCED COMPILE TIMES
#if !defined(FOO_BENCHMARKING_SOURCE)
extern template class FOO_TOOLSET;
#endif

And then in the one source file:

// avoid the extern template declaration
// make sure this is defined before inclusing the header
#define FOO_BENCHMARKING_SOURCE

// include the header with the code from the previous block
#include "/path/to/header/file"

// pull in all the definitions required to instantiate the template
#include <timemory/timemory.hpp>

// provide an instantiation
template class FOO_TOOLSET;

A similar scheme to the above is used extensively internally by timemory – the source code contains many almost empty .cpp files which contain only a single line of code: #include "timemory/<some-path>/extern.hpp. These source files are part of the scheme for pre-compiling many of the expensive template instantiations (the templated storage class, in particular), not junk files that were accidentally committed. In this scheme, when the .cpp file is compiled a macro is used to transform the statement in the header into a template instantiation but when included from other headers, the macro transforms the statement into an extern template declaration. In general, this is how it is implemented:

#
# source/timemory/components/foo/CMakeLists.txt
#
add_library(foo SHARED <OTHER_FILES> extern.cpp)
target_compile_definitions(foo
    #  extern.cpp will be compiled with -DTIMEMORY_FOO_SOURCE
    PRIVATE     TIMEMORY_FOO_SOURCE
    #  When the "foo" target part of a 'target_link_libraries(...)'
    #  command by another target downstream, CMake will add
    #  -DTIMEMORY_USE_FOO_EXTERN to the compile definitions
    INTERFACE   TIMEMORY_USE_FOO_EXTERN)
//
// source/timemory/components/foo/extern.hpp
//
#if defined(TIMEMORY_FOO_SOURCE)
#   define FOO_EXTERN_TEMPLATE(...) template __VA_ARGS__;
#elif defined(TIMEMORY_USE_FOO_EXTERN)
#   define FOO_EXTERN_TEMPLATE(...) extern template __VA_ARGS__;
#else
#   define FOO_EXTERN_TEMPLATE(...)
#endif

// in header-only mode, the macro makes the code disappear
FOO_EXTERN_TEMPLATE(tim::component::base<Foo>)
FOO_EXTERN_TEMPLATE(tim::operation::start<Foo>)
FOO_EXTERN_TEMPLATE(tim::operation::stop<Foo>)
FOO_EXTERN_TEMPLATE(tim::storage<Foo>)
//
// source/timemory/components/foo/extern.cpp
//
#include "timemory/components/foo/extern.hpp"