Skip to content

Commit

Permalink
Add option for device sync at beginning and end of TinyProfiler region (
Browse files Browse the repository at this point in the history
#1505)

## Summary

Now when setting `tiny_profiler.device_synchronize_on_region_end  = 1` in the inputs file, we will synchronize before calling nvtxRangePop() and nvtxRangePush(), which means that TINY_PROFILE regions will include the full kernel time, rather than just the kernel launch time.

## Additional background

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX users
- [ ] are described in the proposed changes to the AMReX documentation, if appropriate
  • Loading branch information
maximumcats authored Nov 4, 2020
1 parent 81fdb9a commit db41d3d
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 0 deletions.
1 change: 1 addition & 0 deletions Src/Base/AMReX_TinyProfiler.H
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ private:
static std::deque<std::tuple<double,double,std::string*> > ttstack;
static std::map<std::string,std::map<std::string, Stats> > statsmap;
static double t_init;
static int device_synchronize_around_region;

static void PrintStats (std::map<std::string,Stats>& regstats, double dt_max);
};
Expand Down
19 changes: 19 additions & 0 deletions Src/Base/AMReX_TinyProfiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
#include <AMReX_ParallelDescriptor.H>
#include <AMReX_ParallelReduce.H>
#include <AMReX_Utility.H>
#include <AMReX_ParmParse.H>
#ifdef AMREX_USE_GPU
#include <AMReX_GpuDevice.H>
#endif
#include <AMReX_Print.H>

#ifdef AMREX_USE_CUPTI
Expand All @@ -29,6 +33,7 @@ std::vector<std::string> TinyProfiler::regionstack;
std::deque<std::tuple<double,double,std::string*> > TinyProfiler::ttstack;
std::map<std::string,std::map<std::string, TinyProfiler::Stats> > TinyProfiler::statsmap;
double TinyProfiler::t_init = std::numeric_limits<double>::max();
int TinyProfiler::device_synchronize_around_region = 0;

namespace {
std::set<std::string> improperly_nested_timers;
Expand Down Expand Up @@ -88,6 +93,9 @@ TinyProfiler::start () noexcept
global_depth = ttstack.size();

#ifdef AMREX_USE_CUDA
if (device_synchronize_around_region) {
amrex::Gpu::Device::synchronize();
}
nvtxRangePush(fname.c_str());
#endif

Expand Down Expand Up @@ -163,6 +171,9 @@ TinyProfiler::stop () noexcept
}

#ifdef AMREX_USE_CUDA
if (device_synchronize_around_region) {
amrex::Gpu::Device::synchronize();
}
nvtxRangePop();
#endif
} else {
Expand Down Expand Up @@ -231,6 +242,9 @@ TinyProfiler::stop (unsigned boxUintID) noexcept
}

#ifdef AMREX_USE_CUDA
if (device_synchronize_around_region) {
amrex::Gpu::Device::synchronize();
}
nvtxRangePop();
#endif
} else
Expand All @@ -248,6 +262,11 @@ TinyProfiler::Initialize () noexcept
{
regionstack.push_back(mainregion);
t_init = amrex::second();

{
amrex::ParmParse pp("tiny_profiler");
pp.query("device_synchronize_around_region", device_synchronize_around_region);
}
}

void
Expand Down

0 comments on commit db41d3d

Please sign in to comment.