Skip to content

Commit

Permalink
add warning for DFTRACER_DATA_DIR
Browse files Browse the repository at this point in the history
- added warning in `example.rst`
- added warning in `api.rst`
  • Loading branch information
rayandrew committed Oct 14, 2024
1 parent 5af14fc commit bb92b6e
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 55 deletions.
35 changes: 7 additions & 28 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ ENV Variables supported
DFTRACER_LOG_FILE STRING PATH To log file. In this case process id and app name is appended to file.
DFTRACER_DATA_DIR STRING Colon separated paths that will be traced for I/O accesses by profiler.
For tracing all directories use the string "all" (not recommended).
Note: DFTRACER_DATA_DIR acts as a prefix. If both ``/local/scratch`` and
``/local/scratch/data`` are in the list, the order matters—
the last one will override the first. As a result, the first path won’t be traced.
To avoid this, only use ``/local/scratch``.
DFTRACER_INC_METADATA INT Include or exclude metadata (default 0)
DFTRACER_SET_CORE_AFFINITY INT Include or exclude core affinity (default 0).
``DFTRACER_INC_METADATA`` needs to be enabled.
Expand All @@ -73,7 +77,7 @@ ENV Variables supported
DFTRACER_DISABLE_STDIO INT Disable automatic binding of STDIO I/O calls (default: 0).
DFTRACER_TRACE_COMPRESSION INT Enable trace compression (default 0).
DFTRACER_DISABLE_TIDS INT Disable tracing of thread ids (default 0).
DFTRACER_WRITE_BUFFER_SIZE INT Setup the buffering size for write optimization (default 0). Note: Disabled as
DFTRACER_WRITE_BUFFER_SIZE INT Setup the buffering size for write optimization (default 0). Note: Disabled as
this won't work for AI workloads which uses ``fork`` and ``spawn`` without a clear ``exit``.
Also, it does not work for workloads which uses ``exec`` and rewrite process buffer state.
================================ ====== ===========================================================================
Expand All @@ -86,7 +90,6 @@ This section describes how to use DFTracer for profiling C++ application using C

-----


Include the DFTracer Header for C++
****************************************

Expand All @@ -96,8 +99,6 @@ In C or C++ applications, include ``dftracer/dftracer.h``.
#include <dftracer/dftracer.h>
Initialization of DFTracer
****************************************

Expand All @@ -111,7 +112,6 @@ Additionally, if users pass nullptr to process_id, then getpid() function would
DFTRACER_CPP_INIT(log_file, data_dirs, process_id);
Finalization of DFTracer
****************************************

Expand All @@ -121,8 +121,6 @@ Finalization call to clean DFTracer entries (Optional). If users do not call thi
DFTRACER_CPP_FINI();
Function Profiling
****************************************

Expand All @@ -135,7 +133,6 @@ To profile a function, add the wrapper ``DFTRACER_CPP_FUNCTION`` at the start of
sleep(1);
} // DFTRACER_CPP_FUNCTION ends here.
Region Level Profiling for Code blocks
****************************************

Expand All @@ -154,7 +151,6 @@ The name of the region should unique within the scope of the function/code block
} // DFTRACER_CPP_REGION ends here implicitly
} // DFTRACER_CPP_FUNCTION ends here.
Region Level Profiling for lines of code
****************************************

Expand All @@ -175,7 +171,6 @@ The ``START`` and ``END`` calls should be in the same scope of the function.
} // DFTRACER_CPP_REGION ends here implicitly
} // DFTRACER_CPP_FUNCTION ends here.
---------------------
DFTracer C APIs
---------------------
Expand All @@ -184,7 +179,6 @@ This section describes how to use DFTracer for profiling C application using C A

-----


Include the DFTracer Header for C
****************************************

Expand All @@ -194,8 +188,6 @@ In C application, include ``dftracer/dftracer.h``.
#include <dftracer/dftracer.h>
Initialization of DFTracer
****************************************

Expand All @@ -209,7 +201,6 @@ Additionally, if users pass NULL to process_id, then getpid() function would be
DFTRACER_C_INIT(log_file, data_dirs, process_id);
Finalization of DFTracer
****************************************

Expand All @@ -219,7 +210,6 @@ Finalization call to clean DFTracer entries (Optional). If users do not call thi
DFTRACER_C_FINI();
Function Profiling
****************************************

Expand All @@ -242,7 +232,6 @@ To profile a function, add the wrapper ``DFTRACER_C_FUNCTION_START`` at the star

For capturing all code branches, every return statement should have a corresponding ``DFTRACER_C_FUNCTION_END`` block within the function.


Region Level Profiling for lines of code
****************************************

Expand All @@ -268,9 +257,9 @@ DFTracer C/C++ Function Profiling using GCC
GCC supports function level tracing using ``-finstrument-functions``.
DFTracer allows application to compile with ``-g -finstrument-functions -Wl,-E -fvisibility=default``.
If the applications are using cmake, they can find_package and then use the CMAKE Variable `DFTRACER_FUNCTION_FLAGS` for compile flags.
This can be applied globally or on a target.
This can be applied globally or on a target.

Internally DFTracer uses ``dladdr`` to resolve symbol names which work for shared libraries.
Internally DFTracer uses ``dladdr`` to resolve symbol names which work for shared libraries.
For executables or binaries, we store the address and the name which can be used to derive the function name at analysis time.
This can be done using ``nm -D`` or ``readelf -S`` utilities.

Expand All @@ -282,7 +271,6 @@ This section describes how to use DFTracer for profiling python applications.

-----


Include the DFTracer module
****************************************

Expand All @@ -292,8 +280,6 @@ In C application, include ``dftracer/dftracer.h``.
from dftracer.logger import dftracer
Initialization of DFTracer
****************************************

Expand All @@ -307,8 +293,6 @@ Additionally, if users pass -1 to process_id, then getpid() function would be us
dft_logger = dftracer.initialize_log(logfile, data_dir, process_id)
Finalization of DFTracer
****************************************

Expand All @@ -318,8 +302,6 @@ Finalization call to clean DFTracer entries (Optional). If users do not call thi
dft_logger.finalize()
Function decorator style profiling
****************************************

Expand Down Expand Up @@ -356,7 +338,6 @@ For logging ``__init__`` function within a class, applications can use ``log_ini
For logging ``@staticmethod`` function within a class, applications can use ``log_static`` function.


Iteration/Loop Profiling
****************************************

Expand All @@ -370,7 +351,6 @@ For logging every block within a loop, we have an ``dft_fn.iter`` which takes a
for batch in dft_fn.iter(loader.next()):
sleep(1)
Context style Profiling
****************************************

Expand All @@ -383,7 +363,6 @@ We can also profile a block of code using Python's context managers using ``dft_
sleep(1)
dft.update(step=1)
Custom Profiling
****************************************

Expand Down
47 changes: 20 additions & 27 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,12 @@ Example of running this configurations are:
# Enable profiler
DFTRACER_ENABLE=1
.. warning::

Note: DFTRACER_DATA_DIR acts as a prefix. If both ``/local/scratch`` and
``/local/scratch/data`` are in the list, the order matters—
the last one will override the first. As a result, the first path won’t be traced.
To avoid this, only use ``/local/scratch``.

LD_PRELOAD Example:
**************************
Expand Down Expand Up @@ -109,7 +115,6 @@ Example of running this configurations are:
# Enable profiler
export DFTRACER_ENABLE=1
Hybrid Example:
**************************

Expand Down Expand Up @@ -247,7 +252,6 @@ Example of running this configurations are:
# Enable profiler
DFTRACER_ENABLE=1
LD_PRELOAD Example:
**************************

Expand Down Expand Up @@ -286,7 +290,6 @@ Example of running this configurations are:
# Enable profiler
export DFTRACER_ENABLE=1
Hybrid Example:
**************************

Expand Down Expand Up @@ -356,8 +359,6 @@ Example of running this configurations are:
# Enable profiler
DFTRACER_ENABLE=1
----------------
Python Example
----------------
Expand Down Expand Up @@ -407,7 +408,6 @@ Application Level Example:
pool.map(posix_calls, ((2, True),))
log_inst.finalize()
if __name__ == "__main__":
main()
Expand All @@ -426,7 +426,6 @@ Example of running this configurations are:
# Enable profiler
DFTRACER_ENABLE=1
LD_PRELOAD Example:
*******************

Expand Down Expand Up @@ -480,7 +479,6 @@ Example of running this configurations are:
# Enable profiler
export DFTRACER_ENABLE=1
.. _python-hybrid-mode:

Hybrid Example:
Expand Down Expand Up @@ -528,7 +526,6 @@ Hybrid Example:
pool.map(posix_calls, ((2, True),))
log_inst.finalize()
if __name__ == "__main__":
main()
Expand All @@ -550,7 +547,6 @@ Example of running this configurations are:
# Enable profiler
DFTRACER_ENABLE=1
----------------------------------------------------------------
Resnet50 with pytorch and torchvision example from ALCF Polaris:
----------------------------------------------------------------
Expand All @@ -559,29 +555,29 @@ Create a separate conda environment for the application and install dftracer

.. code-block:: bash
:linenos:
#!/bin/bash +x
set -e
set -x
export MODULEPATH=/soft/modulefiles/conda/:$MODULEPATH
module load 2023-10-04 # This is the latest conda module on Polaris
export ML_ENV=$PWD/PolarisAT/conda-envs/ml_workload_latest_conda_2 # Please change the following path accordingly
export ML_ENV=$PWD/PolarisAT/conda-envs/ml_workload_latest_conda_2 # Please change the following path accordingly
if [[ -e $ML_ENV ]]; then
conda activate $ML_ENV
else
conda create -p $ML_ENV --clone /soft/datascience/conda/2023-10-04/mconda3/
conda activate $ML_ENV
yes | MPICC="cc -shared -target-accel=nvidia80" pip install --force-reinstall --no-cache-dir --no-binary=mpi4py mpi4py
yes | pip install --no-cache-dir git+https://github.com/hariharan-devarajan/dftracer.git
pip uninstall -y torch horovod
pip uninstall -y torch horovod
yes | pip install --no-cache-dir horovod
#INSTALL OTHER MISSING FILES
#INSTALL OTHER MISSING FILES
fi
Since, torchvision.datasets.ImageFolder spawns separate python processes to help the parallel data loading in torch, we will be using the `HYBRID MODE` of the DFTracer (e.g., see
:ref:`Python Hybrid mode <python-hybrid-mode>`), so that the application can use both APP and PRELOAD Mode to log I/O from all dynamically spawned processes and function profiling from application.
Since, torchvision.datasets.ImageFolder spawns separate python processes to help the parallel data loading in torch, we will be using the `HYBRID MODE` of the DFTracer (e.g., see
:ref:`Python Hybrid mode <python-hybrid-mode>`), so that the application can use both APP and PRELOAD Mode to log I/O from all dynamically spawned processes and function profiling from application.

The following dftracer code is added to profile the application at the function level.
Note: dftracer python level log file location is provided inside the python code in the dftracer.initialize_log() function and the POSIX or STDIO calls level log file location is provided in the job scirpt environment variable `DFTRACER_LOG_FILE`
Expand Down Expand Up @@ -615,27 +611,26 @@ Note: dftracer python level log file location is provided inside the python code
# At the end of main function
log_inst.finalize()
Job submition script
Job submition script

.. code-block:: bash
:linenos:
export MODULEPATH=/soft/modulefiles/conda/:$MODULEPATH
module load 2023-10-04
conda activate./dlio_ml_workloads/PolarisAT/conda-envs/ml_workload_latest_conda
export LD_LIBRARY_PATH=$env_path/lib/:$LD_LIBRARY_PATH
export DFTRACER_LOG_LEVEL=ERROR
export DFTRACER_ENABLE=1
export DFTRACER_INC_METADATA=1
export DFTRACER_INIT=PRELOAD
export DFTRACER_DATA_DIR=./resnet_original_data #Path to the orignal resnet 50 dataset
export DFTRACER_DATA_DIR=./resnet_original_data #Path to the orignal resnet 50 dataset
export DFTRACER_LOG_FILE=./dft_fn_posix_level.pfw
LD_PRELOAD=./dlio_ml_workloads/PolarisAT/conda-envs/ml_workload_latest_conda/lib/python*/site-packages/dftracer/lib/libdftracer_preload.so aprun -n 4 -N 4 python resnet_hvd_dlio.py --batch-size 64 --epochs 1 > dft_fn 2>&1
cat *.pfw > combined_logs.pfw # To combine to a single pfw file.
cat *.pfw > combined_logs.pfw # To combine to a single pfw file.
-----------------------
Integrated Applications
Expand All @@ -657,5 +652,3 @@ Here, we can see that we can get application level calls (e.g., ``train`` and ``
.. image:: images/tracing/trace.png
:width: 400
:alt: Unet3D applications


0 comments on commit bb92b6e

Please sign in to comment.