timem¶
Command-line tool which provides the same capabilities of the UNIX command-line tool time
but extends it to several additional metrics:
Percent CPU utilization
Memory usage
Page faults
Context switches
Read/write bytes
Read/write bytes per second
Hardware counters (when built with PAPI support)
Aggregation of data for multiple MPI processes (when using
timem-mpi
)
Installation¶
General¶
timem
is automatically built whenever TIMEMORY_BUILD_TIMEM=ON
. This option defaults to the value of TIMEMORY_BUILD_TOOLS=ON
on non-Windows systems.
cmake -B /path/to/build -DTIMEMORY_BUILD_TIMEM=ON /path/to/source
Standalone¶
If you want to build and install timem
exclusively without building or installing anything else from timemory,
this is possible by setting TIMEMORY_INSTALL_ALL=OFF
, which removes the CMake target dependency on all
for the install
target (i.e. make install
does not run make all
first). In this situation, since you will have
an “incomplete” build, it is advisable to also suppress installation of the CMake configuration files
(TIMEMORY_INSTALL_CONFIG=OFF
) and the installation of the timemory headers (TIMEMORY_INSTALL_HEADERS=OFF
).
$ cmake -B /path/to/build \
-DCMAKE_INSTALL_PREFIX=/path/to/install
-DTIMEMORY_INSTALL_HEADERS=OFF \
-DTIMEMORY_INSTALL_CONFIG=OFF \
-DTIMEMORY_INSTALL_ALL=OFF \
-DTIMEMORY_BUILD_TIMEM=ON \
/path/to/source
$ cmake --build /path/to/build --target timem
$ cmake --build /path/to/build --target install
Output Examples¶
$ timem ls /
bin boot dev etc home lib lib32 lib64 libx32 media mnt opt proc root run sbin srv sys tmp usr var
[ls]> Measurement totals:
0.001380 sec wall
0.000000 sec user
0.000000 sec sys
0.000000 sec cpu
0.000000 % cpu_util
2.152000 MB peak_rss
0.004096 MB page_rss
0.004096 MB virtual_memory
0 major_page_flts
122 minor_page_flts
0 prio_cxt_swch
1 vol_cxt_swch
0.001116 MB char read
0.912436 MB/sec char read
0.000000 MB bytes read
0.000000 MB/sec bytes read
0.000000 MB char written
0.000000 MB/sec char written
0.000000 MB bytes written
0.000000 MB/sec bytes written
JSON Output¶
$ timem -o sleep -- sleep 2
[sleep]> Outputting 'sleep.json'...
[sleep]> Measurement totals:
1.997019 sec wall
0.000000 sec user
0.000000 sec sys
0.000000 sec cpu
0.000000 % cpu_util
1.912000 MB peak_rss
0.798720 MB page_rss
4.648960 MB virtual_memory
0 major_page_flts
87 minor_page_flts
0 prio_cxt_swch
2 vol_cxt_swch
0.003896 MB char read
0.021460 MB/sec char read
0.000000 MB bytes read
0.000000 MB/sec bytes read
0.000000 MB char written
0.000000 MB/sec char written
0.000000 MB bytes written
0.000000 MB/sec bytes written
Aggregated MPI Output¶
mpirun -n 2 timem-mpi -- sleep 2
[sleep]> Measurement totals (# ranks = 2):
0|> 3.998239 sec wall
0|> 0.000000 sec user
0|> 0.000000 sec sys
0|> 0.000000 sec cpu
0|> 0.000000 % cpu_util
0|> 13.040000 MB peak_rss
0|> 9.879552 MB page_rss
0|> 117.964800 MB virtual_memory
0|> 0 major_page_flts
0|> 187 minor_page_flts
0|> 2 prio_cxt_swch
0|> 4 vol_cxt_swch
0|> 0.007792 MB char read
0|> 0.021438 MB/sec char read
0|> 0.000000 MB bytes read
0|> 0.000000 MB/sec bytes read
0|> 0.000000 MB char written
0|> 0.000000 MB/sec char written
0|> 0.000000 MB bytes written
0|> 0.000000 MB/sec bytes written
Individual MPI Output¶
$ mpirun -n 2 timem-mpi -i -- sleep 2
[sleep]> Measurement totals (# ranks = 2):
0|> 1.997152 sec wall
0|> 0.000000 sec user
0|> 0.000000 sec sys
0|> 0.000000 sec cpu
0|> 0.000000 % cpu_util
0|> 6.488000 MB peak_rss
0|> 4.947968 MB page_rss
0|> 58.994688 MB virtual_memory
0|> 0 major_page_flts
0|> 90 minor_page_flts
0|> 1 prio_cxt_swch
0|> 2 vol_cxt_swch
0|> 0.003896 MB char read
0|> 0.021459 MB/sec char read
0|> 0.000000 MB bytes read
0|> 0.000000 MB/sec bytes read
0|> 0.000000 MB char written
0|> 0.000000 MB/sec char written
0|> 0.000000 MB bytes written
0|> 0.000000 MB/sec bytes written
[sleep]> Measurement totals (# ranks = 2):
1|> 1.996507 sec wall
1|> 0.000000 sec user
1|> 0.000000 sec sys
1|> 0.000000 sec cpu
1|> 0.000000 % cpu_util
1|> 6.484000 MB peak_rss
1|> 0.856064 MB page_rss
1|> 4.648960 MB virtual_memory
1|> 0 major_page_flts
1|> 91 minor_page_flts
1|> 0 prio_cxt_swch
1|> 2 vol_cxt_swch
1|> 0.003896 MB char read
1|> 0.021467 MB/sec char read
1|> 0.000000 MB bytes read
1|> 0.000000 MB/sec bytes read
1|> 0.000000 MB char written
1|> 0.000000 MB/sec char written
1|> 0.000000 MB bytes written
1|> 0.000000 MB/sec bytes written
Options¶
Command Line¶
Standard Executable¶
The timem
and timem-mpi
executation wrappers support simple prefixing of the executable and any arguments
are assumed to belong to the file being executed unless "--"
(two dashes) surrounded by spaces is detected.
If the two stand-alone dashes are detected, these executables assume every argument preceding the
"--"
is an argument to timem
/timem-mpi
and every argument following the two dashes is the command
to execute, e.g. timem <timem-arguments> -- <exe> <exe-arguments>
.
$ timem --help
Usage: ./timem [tim::argparse::argument_parser arguments...] -- <CMD> <ARGS>
Options:
-h, -?, --help Shows this page
--debug Debug output
-v, --verbose Verbose output
-q, --quiet Suppress as much reporting as possible
-d, --sample-delay Set the delay before the sampler starts (seconds)
-f, --sample-freq Set the frequency of the sampler (number of interrupts per second)
--disable-sample Disable UNIX signal-based sampling.
Sampling is the most common culprit for timem hanging (i.e. failing to exit after the child process exits)
-e, --events, --papi-events Set the hardware counter events to record (ref: `timemory-avail -H | grep PAPI`)
--disable-papi Disable hardware counters
-o, --output Write results to JSON output file.
Use:
- '%p' to encode the process ID
- '%j' to encode the slurm job ID
- '%r' to encode the MPI comm rank
- '%s' to encode the MPI comm size
E.g. '-o timem-output-%p'.
If verbosity >= 2 or debugging is enabled, will also write sampling data to log file.
-s, --shell Enable launching command via a shell command (if no arguments, $SHELL is used)
--shell-flags Set the shell flags to use (pass as single string as leading dashes can confuse parser) [default: -i]
MPI-Compatible Executable¶
$ ./timem-mpi --help
Usage: ./timem-mpi [tim::argparse::argument_parser arguments...] -- <CMD> <ARGS>
Options:
-h, -?, --help Shows this page
--debug Debug output
-v, --verbose Verbose output
-q, --quiet Suppress as much reporting as possible
-d, --sample-delay Set the delay before the sampler starts (seconds)
-f, --sample-freq Set the frequency of the sampler (number of interrupts per second)
--disable-sample Disable UNIX signal-based sampling.
Sampling is the most common culprit for timem hanging (i.e. failing to exit after the child process exits)
-e, --events, --papi-events Set the hardware counter events to record (ref: `timemory-avail -H | grep PAPI`)
--disable-papi Disable hardware counters
-o, --output Write results to JSON output file.
Use:
- '%p' to encode the process ID
- '%j' to encode the slurm job ID
- '%r' to encode the MPI comm rank
- '%s' to encode the MPI comm size
E.g. '-o timem-output-%p'.
If verbosity >= 2 or debugging is enabled, will also write sampling data to log file.
-s, --shell Enable launching command via a shell command (if no arguments, $SHELL is used)
--shell-flags Set the shell flags to use (pass as single string as leading dashes can confuse parser) [default: -i]
--mpi Launch processes via MPI_Comm_spawn_multiple (reduced functionality)
--disable-mpi Disable MPI_Finalize
-i, --indiv Output individual results for each process (i.e. rank) instead of reporting the aggregation
The --mpi
option is only recommended if forking MPI processes causes issues.
The processes spawn through this method are not children of the timem-mpi
process and
thus timem-mpi
is more restricted with respect to the data that it can collect for the
spawned processes. The --disable-mpi
option is generally not recommended unless timem-mpi
is hanging and using --disable-sample
does not resolve the issue.
Environment¶
All the standard environment variables provided by timemory-avail -S
are supported (when applicable).
However, these environment variables use TIMEM_
as the prefix to the environment variable instead
of TIMEMORY_
, e.g. instead of using TIMEMORY_PRECISION=4
to set the output precision to 4, use TIMEM_PRECISION=4
.
In addition to the standard environment variables, many of the command-line options are configurable via the following environment variables:
Environment Variable | Default value | Equivalent Command-line Argument |
---|---|---|
TIMEM_OUTPUT |
"" |
-o , --output (with argument) |
TIMEM_USE_SHELL |
"OFF" |
-s , --shell (with no argument) |
TIMEM_SHELL_FLAGS |
"" |
--shell-flags |
TIMEM_SHELL |
$SHELL |
-s , --shell (with argument) |
TIMEM_SAMPLE |
"ON" |
--disable-sample (if value is "OFF" ) |
TIMEM_SAMPLE_FREQ |
5.0 |
-f , --sample-freq |
TIMEM_SAMPLE_DELAY |
1.0e-6 |
-d , --sample-delay |
TIMEM_USE_MPI |
false |
--mpi |
TIMEM_USE_PAPI |
true (when available) |
--disable-papi (if value is "OFF" ) |
Command-line arguments override environment variables, e.g.
TIMEM_OUTPUT=foo timem -o bar -- <CMD>
will outputbar.json
, notfoo.json
.
Customization Demonstration¶
The ability to customize the behavior of several components without altering the components themselves in demonstrated in
timem.hpp
. In this tool, additional type-traits are added, the printing output of a bundle of components is customized,
and new capabilities are introduced:
Introduction of
custom_print
andtimem_sample
operationscustom_print
replaces comma delimiters with new line delimiterstimem_sample
invokes a
Specialization of the
trait::custom_label_printing
type-trait forpapi_array_t
This type-trait communicates to timemory that data labels for the components should not be included when invoking
operator<<
on this component
Specialization of the
sample
,start
, andstop
operations forpapi_array_t
componentThe default behavior of
operation::sample<T>
pushes a measurement to an array in thestorage<T>
However, in this tool, we modify the operation to just update the value of the current object
Specialization of the
operation::base_printer<T>
forread_bytes
andwritten_bytes
componentsThe default behavior of
read_bytes
andwritten_bytes
is to print the bytes read/written + the bytes read/written per second on the same line. This is customized to print on separate lines
Introduction of
timem_tuple
– an extension tocomponent_tuple
which:Reimplements the
component_tuple<T...>::sample()
member function to include thetimem_sample
operationReplaces the
operator<<
to utilize thecustom_print
operation