Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow process signals to trigger checkpoint or break behavior #2896

Merged
merged 46 commits into from
Apr 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
23a5724
Beginnings of signal handling machinery
PhilMiller Feb 26, 2022
97da48f
Add tentative logic to make checkpoint call
PhilMiller Feb 28, 2022
aa96c6d
Adapt formatting slightly
PhilMiller Feb 28, 2022
0fd0bb9
Add calls to read signals and set up signal handlers
PhilMiller Mar 3, 2022
65bc0ec
Initialize signal flag array
PhilMiller Mar 3, 2022
3eb85b8
Add parsing of signal names, and fix some whitespace issues
PhilMiller Mar 7, 2022
ef160d7
Skip signal setup on Windows
PhilMiller Mar 7, 2022
d9bf8e5
added checkpoint and break signal inputs to picmi.py
roelof-groenewald Mar 7, 2022
707a3a8
Address initial review requests
PhilMiller Mar 12, 2022
bd27779
Correct comment to match changed code
PhilMiller Mar 12, 2022
bf24eca
Convert maximum signal number to a symbolic name
PhilMiller Mar 12, 2022
6802c00
Always parse signal input, and error out on Windows or wherever it ma…
PhilMiller Mar 12, 2022
98e22e2
Typo fix
PhilMiller Mar 12, 2022
7dde101
Add missing reset of checkpoint signal flag
PhilMiller Mar 12, 2022
ba52cbc
Add reset of break signal, in support of Python or library usage
PhilMiller Mar 12, 2022
827dc23
Test for a configured checkpoint diag when asked to checkpoint on a s…
PhilMiller Mar 14, 2022
7fa6ec4
Fix typo in Linux code path
PhilMiller Mar 15, 2022
ca41283
Clean up MPI support
PhilMiller Mar 15, 2022
9c76199
Use symbolic name for maximum signal number
PhilMiller Mar 15, 2022
64a8b52
Fix unused variable in the no-MPI case
PhilMiller Mar 15, 2022
6c83790
Add missing header inclusions
PhilMiller Mar 15, 2022
9c82613
Switch signal parsing to an enumerated table
PhilMiller Mar 22, 2022
1fd000d
Test signal handling for Linux, not GNU C library
PhilMiller Mar 22, 2022
bf93ea3
Avoid another magic number
PhilMiller Mar 22, 2022
d0ceffc
Update MPI_Ibcast call to match symbolic array length
PhilMiller Mar 22, 2022
2efa4b6
Update loop over signal flags to use symbolic limit
PhilMiller Mar 22, 2022
c464dd4
Match #includes to usage
PhilMiller Mar 22, 2022
1efee65
Add omitted C++ std <atomic> header include
PhilMiller Mar 22, 2022
dc5379c
Guard entire set of signal definitions as *nix-only, not for Windows
PhilMiller Mar 22, 2022
95ce13d
Broaden Windows exclusion to avoid zero-length array that displeases …
PhilMiller Mar 22, 2022
d9befb1
Check return value from sigaction()
PhilMiller Mar 22, 2022
77483d0
Convert conditional calls to Abort() to assertions
PhilMiller Mar 24, 2022
99f2c6d
Move check for platform support to input parsing
PhilMiller Mar 24, 2022
fe6556c
Shift signal handling code over toward ABLASTR to share with ImpactX …
PhilMiller Mar 24, 2022
6c17d43
Minor cleanup
PhilMiller Mar 24, 2022
a4bb270
A bit more cleanup
PhilMiller Mar 24, 2022
a739857
Fix formatting nits
PhilMiller Mar 24, 2022
b98a1ef
Add AMReX error handling on MPI calls
PhilMiller Mar 24, 2022
6d0e282
Add ABLASTR signal handling code to GNU makefile too
PhilMiller Mar 25, 2022
493aa8c
Document new input parameters
PhilMiller Mar 25, 2022
74edd72
Use ABLASTR assertion macros in ABLASTR code
PhilMiller Mar 29, 2022
cfb49de
Convert requests limit value to a requests array size
PhilMiller Mar 29, 2022
37f076f
Generalize signal handling to an arbitrary set of potential actions
PhilMiller Mar 29, 2022
23a58f5
Rename class to match usage and file name
PhilMiller Mar 29, 2022
d408a5d
Stick stuff in ABLASTR namespace
PhilMiller Mar 29, 2022
5a4c568
Indent conditional includes as requested
PhilMiller Mar 29, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions Docs/source/usage/parameters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,15 @@ Overall simulation parameters
It is mainly intended for debug purposes, and is best used with
``warpx.always_warn_immediately=1``.

* ``warpx.break_signals`` (array of `string`, separated by spaces) optional
A list of signal names or numbers that the simulation should
handle by cleanly terminating at the next timestep

* ``warpx.checkpoint_signals`` (array of `string`, separated by spaces) optional
A list of signal names or numbers that the simulation should
handle by outputting a checkpoint at the next timestep. A
diagnostic of type `checkpoint` must be configured.

* ``warpx.random_seed`` (`string` or `int` > 0) optional
If provided ``warpx.random_seed = random``, the random seed will be determined
using `std::random_device` and `std::clock()`,
Expand Down
6 changes: 6 additions & 0 deletions Python/pywarpx/picmi.py
Original file line number Diff line number Diff line change
Expand Up @@ -981,6 +981,9 @@ def init(self, kw):
self.collisions = kw.pop('warpx_collisions', None)
self.embedded_boundary = kw.pop('warpx_embedded_boundary', None)

self.break_signals = kw.pop('warpx_break_signals', None)
self.checkpoint_signals = kw.pop('warpx_checkpoint_signals', None)

self.inputs_initialized = False
self.warpx_initialized = False

Expand Down Expand Up @@ -1019,6 +1022,9 @@ def initialize_inputs(self):

pywarpx.amr.check_input = self.amr_check_input

pywarpx.warpx.break_signals = self.break_signals
pywarpx.warpx.checkpoint_signals = self.checkpoint_signals

particle_shape = self.particle_shape
for s in self.species:
if s.particle_shape is not None:
Expand Down
27 changes: 26 additions & 1 deletion Source/Evolve/WarpXEvolve.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@
#include "Utils/WarpXProfilerWrapper.H"
#include "Utils/WarpXUtil.H"

#include <ablastr/utils/SignalHandling.H>

#include <AMReX.H>
#include <AMReX_Array.H>
#include <AMReX_BLassert.H>
Expand All @@ -53,6 +55,7 @@
#include <vector>

using namespace amrex;
using ablastr::utils::SignalHandling;

void
WarpX::Evolve (int numsteps)
Expand All @@ -79,6 +82,8 @@ WarpX::Evolve (int numsteps)
WARPX_PROFILE("WarpX::Evolve::step");
Real evolve_time_beg_step = amrex::second();

CheckSignals();

multi_diags->NewIteration();

// Start loop on time steps
Expand Down Expand Up @@ -342,6 +347,8 @@ WarpX::Evolve (int numsteps)
Real evolve_time_end_step = amrex::second();
evolve_time += evolve_time_end_step - evolve_time_beg_step;

HandleSignals();

if (verbose) {
amrex::Print()<< "STEP " << step+1 << " ends." << " TIME = " << cur_time
<< " DT = " << dt[0] << "\n";
Expand All @@ -350,7 +357,7 @@ WarpX::Evolve (int numsteps)
<< " s; Avg. per step = " << evolve_time/(step-step_begin+1) << " s\n";
}

if (cur_time >= stop_time - 1.e-3*dt[0]) {
if (cur_time >= stop_time - 1.e-3*dt[0] || SignalHandling::TestAndResetActionRequestFlag(SignalHandling::SIGNAL_REQUESTS_BREAK)) {
break;
}

Expand Down Expand Up @@ -929,3 +936,21 @@ WarpX::applyMirrors(Real time){
}
}
}

void
WarpX::CheckSignals()
{
SignalHandling::CheckSignals();
}

void
WarpX::HandleSignals()
{
SignalHandling::WaitSignals();

// SIGNAL_REQUESTS_BREAK is handled directly in WarpX::Evolve

if (SignalHandling::TestAndResetActionRequestFlag(SignalHandling::SIGNAL_REQUESTS_CHECKPOINT)) {
multi_diags->FilterComputePackFlushLastTimestep( istep[0] );
}
}
5 changes: 5 additions & 0 deletions Source/WarpX.H
Original file line number Diff line number Diff line change
Expand Up @@ -951,6 +951,11 @@ private:
// Singleton is used when the code is run from python
static WarpX* m_instance;

//! Check and clear signal flags and asynchronously broadcast them from process 0
static void CheckSignals ();
//! Complete the asynchronous broadcast of signal flags, and initiate a checkpoint if requested
void HandleSignals ();

///
/// Advance the simulation by numsteps steps, electromagnetic case.
///
Expand Down
48 changes: 48 additions & 0 deletions Source/WarpX.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@
#include "Utils/WarpXProfilerWrapper.H"
#include "Utils/WarpXUtil.H"

#include <ablastr/utils/SignalHandling.H>

#ifdef AMREX_USE_SENSEI_INSITU
# include <AMReX_AmrMeshInSituBridge.H>
#endif
Expand Down Expand Up @@ -233,6 +235,8 @@ WarpX::WarpX ()

InitEB();

ablastr::utils::SignalHandling::InitSignalHandling();

// Geometry on all levels has been defined already.
// No valid BoxArray and DistributionMapping have been defined.
// But the arrays for them have been resized.
Expand Down Expand Up @@ -555,6 +559,50 @@ WarpX::ReadParameters ()
}
}

using ablastr::utils::SignalHandling;
std::vector<std::string> signals_in;
pp_warpx.queryarr("break_signals", signals_in);

#if defined(__linux__) || defined(__APPLE__)
for (const std::string &str : signals_in) {
int sig = SignalHandling::parseSignalNameToNumber(str);
SignalHandling::signal_conf_requests[SignalHandling::SIGNAL_REQUESTS_BREAK][sig] = true;
}
signals_in.clear();
#else
WARPX_ALWAYS_ASSERT_WITH_MESSAGE(signals_in.empty(),
"Signal handling requested in input, but is not supported on this platform");
#endif

bool have_checkpoint_diagnostic = false;

ParmParse pp("diagnostics");
std::vector<std::string> diags_names;
pp.queryarr("diags_names", diags_names);

for (const auto &diag : diags_names) {
ParmParse dd(diag);
std::string format;
dd.query("format", format);
if (format == "checkpoint") {
have_checkpoint_diagnostic = true;
break;
}
}

pp_warpx.queryarr("checkpoint_signals", signals_in);
#if defined(__linux__) || defined(__APPLE__)
for (const std::string &str : signals_in) {
int sig = SignalHandling::parseSignalNameToNumber(str);
SignalHandling::signal_conf_requests[SignalHandling::SIGNAL_REQUESTS_CHECKPOINT][sig] = true;
WARPX_ALWAYS_ASSERT_WITH_MESSAGE(have_checkpoint_diagnostic,
"Signal handling was requested to checkpoint, but no checkpoint diagnostic is configured");
}
#else
WARPX_ALWAYS_ASSERT_WITH_MESSAGE(signals_in.empty(),
"Signal handling requested in input, but is not supported on this platform");
#endif

// set random seed
std::string random_seed = "default";
pp_warpx.query("random_seed", random_seed);
Expand Down
1 change: 1 addition & 0 deletions Source/ablastr/utils/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
target_sources(ablastr
PRIVATE
TextMsg.cpp
SignalHandling.cpp
)
2 changes: 1 addition & 1 deletion Source/ablastr/utils/Make.package
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
CEXE_sources += TextMsg.cpp
CEXE_sources += TextMsg.cpp SignalHandling.cpp

VPATH_LOCATIONS += $(WARPX_HOME)/Source/ablastr/utils
88 changes: 88 additions & 0 deletions Source/ablastr/utils/SignalHandling.H
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/* Copyright 2022 Philip Miller
*
* This file is part of WarpX.
*
* License: BSD-3-Clause-LBNL
*/

#ifndef ABLASTR_SIGNAL_HANDLING_H_
#define ABLASTR_SIGNAL_HANDLING_H_

#include <AMReX_Config.H>

#if defined(AMREX_USE_MPI)
# include <mpi.h>
#endif

#include <atomic>
#include <string>

namespace ablastr::utils {

/**
* \brief
* Signal handling
*
* Rank 0 will accept signals and asynchronously broadcast the
* configured response; other processes will ignore them and
* follow the lead of rank 0 to avoid potential for deadlocks or
* timestep-skewed response.
*
* Variables and functions are static rather than per-instance
* because signal handlers are configured at the process level.
*/
class SignalHandling
{
public:
//! The range of signal values to accept
static constexpr int NUM_SIGNALS = 32;

//! Labels for indexed positions in signal_actions_requests
enum signal_action_requested_labels {
//! Cleanly stop execution, as if the simulation reached its configured end
SIGNAL_REQUESTS_BREAK = 0,
//! Produce a checkpoint
SIGNAL_REQUESTS_CHECKPOINT = 1,
SIGNAL_REQUESTS_SIZE = 2 // This should always be 1 greater than the last valid value
};

//! Whether configuration requests the code take a particular action on a particular signal
static bool signal_conf_requests[SIGNAL_REQUESTS_SIZE][NUM_SIGNALS];

//! Take a string and convert it to a corresponding signal number if possible
static int parseSignalNameToNumber (const std::string &str);

//! Set up signal handlers based on input configuration provided in `signal_conf_requests_*`
static void InitSignalHandling ();

//! Check and clear signal flags and asynchronously broadcast them from process 0
static void CheckSignals ();
//! Complete the asynchronous broadcast of signal flags
static void WaitSignals ();

//! Check whether a given action has been requested, and reset the associated flag
static bool TestAndResetActionRequestFlag (int action_to_test);

private:
//! On process 0, whether a given signal has been received since the last check
static std::atomic<bool> signal_received_flags[NUM_SIGNALS];

#if defined(AMREX_USE_MPI)
//! MPI requests for the asynchronous broadcasts of the signal-requested actions
static MPI_Request signal_mpi_ibcast_request;
#endif

//! Signal handler to set flags on process 0 (other processes ignore configured signals)
static void SignalSetFlag (int signal_number);

//! Boolean flags transmitted between CheckSignals() and
//! HandleSignals() to indicate actions requested by signals
static bool signal_actions_requested[SIGNAL_REQUESTS_SIZE];

// Don't allow clients to incorrectly try to construct and use an instance of this type
SignalHandling() = delete;
};

} // namespace ablastr::utils

#endif // ABLASTR_SIGNAL_HANDLING_H_
Loading