Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce TritonService and workflows #32576

Merged
merged 31 commits into from
Feb 4, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
974cf5f
numerous improvements to triton script: in path, container retry, tem…
kpedro88 Dec 8, 2020
59c5ac2
move client construction into beginJob/beginStream
kpedro88 Dec 8, 2020
f4ffb8e
introduce TritonService to keep track of servers and models
kpedro88 Dec 10, 2020
3983d32
keep list of unserved models and launch fallback server if enabled
kpedro88 Dec 10, 2020
22040d3
code format
kpedro88 Dec 10, 2020
6374f35
add special workflow for SonicTriton w/ TritonService enabled automat…
kpedro88 Dec 10, 2020
0232dc0
update docs
kpedro88 Dec 10, 2020
3d75f3e
add unit tests for all Triton module types
kpedro88 Dec 22, 2020
4cd5c91
code format
kpedro88 Dec 22, 2020
ac3d194
minor doc update
kpedro88 Dec 22, 2020
5c7b0a1
improve dryrun
kpedro88 Dec 29, 2020
51bbe5f
fix default
kpedro88 Dec 29, 2020
c2e5109
use unordered_map instead of unordered_set
kpedro88 Dec 30, 2020
0557185
handle module construction and destruction, assume serial
kpedro88 Dec 30, 2020
a2addb9
use popen and uuid
kpedro88 Dec 30, 2020
4f6345d
rename triton -> cmsTriton
kpedro88 Dec 30, 2020
7a73dca
automate fallback server shutdown, handle more error codes
kpedro88 Jan 11, 2021
d540930
simplify and improve shutdown automation
kpedro88 Jan 12, 2021
1a3abbc
assume gat model always available
kpedro88 Jan 13, 2021
18222ba
port checking to enable multiple servers per node, bug fixes (handle …
kpedro88 Jan 13, 2021
e5825fa
use GlobalIdentifier
kpedro88 Jan 14, 2021
c366b15
retry if port in use to avoid race condition, fix some bugs in stopping
kpedro88 Jan 14, 2021
c57021a
increase verbosity of unit test
kpedro88 Jan 18, 2021
b289164
prevent segfault
kpedro88 Jan 19, 2021
c5832fc
look in correct directory
kpedro88 Jan 19, 2021
fbeacab
improvements to unit test
kpedro88 Jan 20, 2021
daa7238
test unprivileged user namespace
kpedro88 Jan 20, 2021
dbe6b0f
bug fix
kpedro88 Jan 20, 2021
c3d4085
add another unprivileged check
kpedro88 Jan 21, 2021
4ad2933
mitigate instability in auto_stop, detect Socket closed in unit test,…
kpedro88 Jan 28, 2021
b2f4702
fix unit test script
kpedro88 Jan 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions Configuration/ProcessModifiers/python/allSonicTriton_cff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import FWCore.ParameterSet.Config as cms

from Configuration.ProcessModifiers.enableSonicTriton_cff import enableSonicTriton

# collect all SonicTriton-related process modifiers here
allSonicTriton = cms.ModifierChain(enableSonicTriton)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #32272 a customize function was preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not prevent the use of customization functions where desired. However, there may be other modifiers required as well (e.g. the mlpf modifier in #32048). The goal here is to facilitate a single Triton-enabled special workflow, with all available producers included. (Customization functions can also be included in the workflow definition directly, or via makeProcessModifier().)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the Modifier be used then? Within production configuration fragments, non-production configurations fragments, customize functions, something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, this allSonicTriton modifier is just intended to simplify the special workflow setup in Configuration/PyReleaseValidation/python/upgradeWorkflowComponents.py.

I could envision a similar aggregating customize function like customizeAllSonicTriton to aggregate any and all customize functions, and then the special workflow would just need that one function. That isn't introduced here because there's nothing to test it with (yet).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused by the purpose of these two Modifiers (now). In #32272 the motivation to favor customize functions was to avoid putting customization for "R&D and testing" into production configuration fragments. If the behavior is customized with a function, I don't see the added value of centrally defined Modifiers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:
#32048 introduces a modifier mlpf† to activate the ML ParticleFlow. There is also a SonicTriton version of the MLPF producer that will be introduced. Even if that producer is activated via a customize function, the mlpf modifier will still be needed to include it in a workflow. Therefore, it is worthwhile to be able to aggregate multiple relevant modifiers for a SonicTriton-enabled workflow.

† because this "policy" of using customize functions for R&D was just invented last month to placate certain partisans and is rarely applied anywhere else, but I digress.

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
import FWCore.ParameterSet.Config as cms

enableSonicTriton = cms.Modifier()
Original file line number Diff line number Diff line change
Expand Up @@ -919,6 +919,41 @@ def condition(self, fragment, stepList, key, hasHarvest):
)
upgradeWFs['DD4hep'].allowReuse = False

class UpgradeWorkflow_SonicTriton(UpgradeWorkflow):
def setup_(self, step, stepName, stepDict, k, properties):
stepDict[stepName][k] = merge([{'--procModifiers': 'allSonicTriton'}, stepDict[step][k]])
def condition(self, fragment, stepList, key, hasHarvest):
return (fragment=='TTbar_13' and '2021' in key) \
or (fragment=='TTbar_14TeV' and '2026' in key)
upgradeWFs['SonicTriton'] = UpgradeWorkflow_SonicTriton(
steps = [
'GenSim',
'GenSimHLBeamSpot',
'GenSimHLBeamSpot14',
'Digi',
'DigiTrigger',
'Reco',
'RecoGlobal',
'HARVEST',
'HARVESTGlobal',
'ALCA',
],
PU = [
'GenSim',
'GenSimHLBeamSpot',
'GenSimHLBeamSpot14',
'Digi',
'DigiTrigger',
'Reco',
'RecoGlobal',
'HARVEST',
'HARVESTGlobal',
'ALCA',
],
suffix = '_SonicTriton',
offset = 0.9001,
)

# check for duplicate offsets
offsets = [specialWF.offset for specialType,specialWF in six.iteritems(upgradeWFs)]
seen = set()
Expand Down
6 changes: 6 additions & 0 deletions Configuration/StandardSequences/python/Services_cff.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,9 @@ def _addCUDAServices(process):

from Configuration.ProcessModifiers.gpu_cff import gpu
modifyConfigurationStandardSequencesServicesAddCUDAServices_ = gpu.makeProcessModifier(_addCUDAServices)

# load TritonService when SONIC workflow is enabled
def _addTritonService(process):
process.load("HeterogeneousCore.SonicTriton.TritonService_cff")
from Configuration.ProcessModifiers.enableSonicTriton_cff import enableSonicTriton
modifyConfigurationStandardSequencesServicesAddTritonService_ = enableSonicTriton.makeProcessModifier(_addTritonService)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle loading of TrigonService_cff could be done in the customize function, but to me doing it here with a Modifier does not make a big difference. (and now it demonstrates how "adding more Services like these" would look like)

(this is my last comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep it here if that's acceptable.

17 changes: 12 additions & 5 deletions HeterogeneousCore/SonicCore/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@ To implement a concrete derived producer class, the following skeleton can be us
class MyProducer : public SonicEDProducer<Client>
{
public:
explicit MyProducer(edm::ParameterSet const& cfg) : SonicEDProducer<Client>(cfg) {
//for debugging
setDebugName("MyProducer");
explicit MyProducer(edm::ParameterSet const& cfg) : SonicEDProducer<Client>(cfg, "MyProducer") {
//do any necessary operations
}
void acquire(edm::Event const& iEvent, edm::EventSetup const& iSetup, Input& iInput) override {
//convert event data to client input format
Expand Down Expand Up @@ -65,7 +64,7 @@ To add a new communication protocol for SONIC, follow these steps:
2. Set up the concrete client(s) that use the communication protocol in a new package in the `HeterogeneousCore` subsystem
3. Add a test producer (see above) to make sure it works

To implement a concrete client, the following skeleton can be used for the `.h` file, with the function implementations in an associated `.cc` file:
To implement a concrete client, the following skeleton can be used for the `.h` file:
```cpp
#ifndef HeterogeneousCore_MyPackage_MyClient
#define HeterogeneousCore_MyPackage_MyClient
Expand All @@ -75,7 +74,7 @@ To implement a concrete client, the following skeleton can be used for the `.h`

class MyClient : public SonicClient<Input,Output> {
public:
MyClient(const edm::ParameterSet& params);
MyClient(const edm::ParameterSet& params, const std::string& debugName);

static void fillPSetDescription(edm::ParameterSetDescription& iDesc);

Expand All @@ -86,6 +85,14 @@ protected:
#endif
```

The concrete client member function implementations, in an associated `.cc` file, should include the following:
```cpp
MyClient::MyClient(const edm::ParameterSet& params, const std::string& debugName)
: SonicClient(params, debugName, "MyClient") {
//do any necessary operations
}
```

The `SonicClient` has three available modes:
* `Sync`: synchronous call, blocks until the result is returned.
* `Async`: asynchronous, non-blocking call.
Expand Down
26 changes: 18 additions & 8 deletions HeterogeneousCore/SonicCore/interface/SonicAcquirer.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,33 +8,43 @@
#include "HeterogeneousCore/SonicCore/interface/sonic_utils.h"

#include <chrono>
#include <memory>
#include <string>

template <typename Client, typename Module>
class SonicAcquirer : public Module {
public:
//typedef to simplify usage
typedef typename Client::Input Input;
//constructor
SonicAcquirer(edm::ParameterSet const& cfg) : client_(cfg.getParameter<edm::ParameterSet>("Client")) {}
SonicAcquirer(edm::ParameterSet const& cfg, const std::string& debugName = "")
: clientPset_(cfg.getParameterSet("Client")), debugName_(debugName) {}
//destructor
~SonicAcquirer() override = default;

//derived classes use a dedicated acquire() interface that incorporates client_.input()
//construct client at beginning of job
//in case client constructor depends on operations happening in derived module constructors
void beginStream(edm::StreamID) override { makeClient(); }

//derived classes use a dedicated acquire() interface that incorporates client_->input()
//(no need to interact with callback holder)
void acquire(edm::Event const& iEvent, edm::EventSetup const& iSetup, edm::WaitingTaskWithArenaHolder holder) final {
auto t0 = std::chrono::high_resolution_clock::now();
acquire(iEvent, iSetup, client_.input());
sonic_utils::printDebugTime(client_.debugName(), "acquire() time: ", t0);
acquire(iEvent, iSetup, client_->input());
sonic_utils::printDebugTime(debugName_, "acquire() time: ", t0);
t_dispatch_ = std::chrono::high_resolution_clock::now();
client_.dispatch(holder);
client_->dispatch(holder);
}
virtual void acquire(edm::Event const& iEvent, edm::EventSetup const& iSetup, Input& iInput) = 0;

protected:
//for debugging
void setDebugName(const std::string& debugName) { client_.setDebugName(debugName); }
//helper
void makeClient() { client_ = std::make_unique<Client>(clientPset_, debugName_); }

//members
Client client_;
edm::ParameterSet clientPset_;
std::unique_ptr<Client> client_;
std::string debugName_;
std::chrono::time_point<std::chrono::high_resolution_clock> t_dispatch_;
};

Expand Down
3 changes: 2 additions & 1 deletion HeterogeneousCore/SonicCore/interface/SonicClient.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ template <typename InputT, typename OutputT = InputT>
class SonicClient : public SonicClientBase, public SonicClientTypes<InputT, OutputT> {
public:
//constructor
SonicClient(const edm::ParameterSet& params) : SonicClientBase(params), SonicClientTypes<InputT, OutputT>() {}
SonicClient(const edm::ParameterSet& params, const std::string& debugName, const std::string& clientName)
: SonicClientBase(params, debugName, clientName), SonicClientTypes<InputT, OutputT>() {}
};

#endif
7 changes: 4 additions & 3 deletions HeterogeneousCore/SonicCore/interface/SonicClientBase.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,11 @@ enum class SonicMode { Sync = 1, Async = 2, PseudoAsync = 3 };
class SonicClientBase {
public:
//constructor
SonicClientBase(const edm::ParameterSet& params);
SonicClientBase(const edm::ParameterSet& params, const std::string& debugName, const std::string& clientName);

//destructor
virtual ~SonicClientBase() = default;

void setDebugName(const std::string& debugName);
const std::string& debugName() const { return debugName_; }
const std::string& clientName() const { return clientName_; }
SonicMode mode() const { return mode_; }
Expand All @@ -42,6 +41,8 @@ class SonicClientBase {
static void fillBasePSetDescription(edm::ParameterSetDescription& desc, bool allowRetry = true);

protected:
void setMode(SonicMode mode);

virtual void evaluate() = 0;

void start(edm::WaitingTaskWithArenaHolder holder);
Expand All @@ -57,7 +58,7 @@ class SonicClientBase {
std::optional<edm::WaitingTaskWithArenaHolder> holder_;

//for logging/debugging
std::string clientName_, debugName_, fullDebugName_;
std::string debugName_, clientName_, fullDebugName_;
std::chrono::time_point<std::chrono::high_resolution_clock> t0_;

friend class SonicDispatcher;
Expand Down
14 changes: 7 additions & 7 deletions HeterogeneousCore/SonicCore/interface/SonicEDFilter.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,22 @@ class SonicEDFilter : public SonicAcquirer<Client, edm::stream::EDFilter<edm::Ex
//typedef to simplify usage
typedef typename Client::Output Output;
//constructor
SonicEDFilter(edm::ParameterSet const& cfg)
: SonicAcquirer<Client, edm::stream::EDFilter<edm::ExternalWork, Capabilities...>>(cfg) {}
SonicEDFilter(edm::ParameterSet const& cfg, const std::string& debugName)
: SonicAcquirer<Client, edm::stream::EDFilter<edm::ExternalWork, Capabilities...>>(cfg, debugName) {}
//destructor
~SonicEDFilter() override = default;

//derived classes use a dedicated produce() interface that incorporates client_.output()
//derived classes use a dedicated produce() interface that incorporates client_->output()
bool filter(edm::Event& iEvent, edm::EventSetup const& iSetup) final {
//measure time between acquire and produce
sonic_utils::printDebugTime(this->client_.debugName(), "dispatch() time: ", this->t_dispatch_);
sonic_utils::printDebugTime(this->debugName_, "dispatch() time: ", this->t_dispatch_);

auto t0 = std::chrono::high_resolution_clock::now();
bool result = filter(iEvent, iSetup, this->client_.output());
sonic_utils::printDebugTime(this->client_.debugName(), "filter() time: ", t0);
bool result = filter(iEvent, iSetup, this->client_->output());
sonic_utils::printDebugTime(this->debugName_, "filter() time: ", t0);

//reset client data
this->client_.reset();
this->client_->reset();

return result;
}
Expand Down
14 changes: 7 additions & 7 deletions HeterogeneousCore/SonicCore/interface/SonicEDProducer.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,22 @@ class SonicEDProducer : public SonicAcquirer<Client, edm::stream::EDProducer<edm
//typedef to simplify usage
typedef typename Client::Output Output;
//constructor
SonicEDProducer(edm::ParameterSet const& cfg)
: SonicAcquirer<Client, edm::stream::EDProducer<edm::ExternalWork, Capabilities...>>(cfg) {}
SonicEDProducer(edm::ParameterSet const& cfg, const std::string& debugName)
: SonicAcquirer<Client, edm::stream::EDProducer<edm::ExternalWork, Capabilities...>>(cfg, debugName) {}
//destructor
~SonicEDProducer() override = default;

//derived classes use a dedicated produce() interface that incorporates client_.output()
//derived classes use a dedicated produce() interface that incorporates client_->output()
void produce(edm::Event& iEvent, edm::EventSetup const& iSetup) final {
//measure time between acquire and produce
sonic_utils::printDebugTime(this->client_.debugName(), "dispatch() time: ", this->t_dispatch_);
sonic_utils::printDebugTime(this->debugName_, "dispatch() time: ", this->t_dispatch_);

auto t0 = std::chrono::high_resolution_clock::now();
produce(iEvent, iSetup, this->client_.output());
sonic_utils::printDebugTime(this->client_.debugName(), "produce() time: ", t0);
produce(iEvent, iSetup, this->client_->output());
sonic_utils::printDebugTime(this->debugName_, "produce() time: ", t0);

//reset client data
this->client_.reset();
this->client_->reset();
}
virtual void produce(edm::Event& iEvent, edm::EventSetup const& iSetup, Output const& iOutput) = 0;
};
Expand Down
40 changes: 25 additions & 15 deletions HeterogeneousCore/SonicCore/interface/SonicOneEDAnalyzer.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,43 +21,53 @@ class SonicOneEDAnalyzer : public edm::one::EDAnalyzer<Capabilities...> {
typedef typename Client::Input Input;
typedef typename Client::Output Output;
//constructor
SonicOneEDAnalyzer(edm::ParameterSet const& cfg) : client_(cfg.getParameter<edm::ParameterSet>("Client")) {
SonicOneEDAnalyzer(edm::ParameterSet const& cfg, const std::string& debugName)
: clientPset_(cfg.getParameterSet("Client")), debugName_(debugName) {
//ExternalWork is not compatible with one modules, so Sync mode is enforced
if (client_.mode() != SonicMode::Sync)
throw cms::Exception("UnsupportedMode") << "SonicOneEDAnalyzer can only use Sync mode for clients";
if (clientPset_.getParameter<std::string>("mode") != "Sync") {
edm::LogWarning("ResetClientMode") << "Resetting client mode to Sync for SonicOneEDAnalyzer";
clientPset_.addParameter<std::string>("mode", "Sync");
}
}
//destructor
~SonicOneEDAnalyzer() override = default;

//derived classes still use a dedicated acquire() interface that incorporates client_.input() for consistency
//construct client at beginning of job
//in case client constructor depends on operations happening in derived module constructors
void beginJob() override { makeClient(); }

//derived classes still use a dedicated acquire() interface that incorporates client_->input() for consistency
virtual void acquire(edm::Event const& iEvent, edm::EventSetup const& iSetup, Input& iInput) = 0;
//derived classes use a dedicated analyze() interface that incorporates client_.output()
//derived classes use a dedicated analyze() interface that incorporates client_->output()
void analyze(edm::Event const& iEvent, edm::EventSetup const& iSetup) final {
auto t0 = std::chrono::high_resolution_clock::now();
acquire(iEvent, iSetup, client_.input());
sonic_utils::printDebugTime(client_.debugName(), "acquire() time: ", t0);
acquire(iEvent, iSetup, client_->input());
sonic_utils::printDebugTime(debugName_, "acquire() time: ", t0);

//pattern similar to ExternalWork, but blocking
auto t1 = std::chrono::high_resolution_clock::now();
client_.dispatch();
client_->dispatch();

//measure time between acquire and produce
sonic_utils::printDebugTime(client_.debugName(), "dispatch() time: ", t1);
sonic_utils::printDebugTime(debugName_, "dispatch() time: ", t1);

auto t2 = std::chrono::high_resolution_clock::now();
analyze(iEvent, iSetup, client_.output());
sonic_utils::printDebugTime(client_.debugName(), "analyze() time: ", t2);
analyze(iEvent, iSetup, client_->output());
sonic_utils::printDebugTime(debugName_, "analyze() time: ", t2);

//reset client data
client_.reset();
client_->reset();
}
virtual void analyze(edm::Event const& iEvent, edm::EventSetup const& iSetup, Output const& iOutput) = 0;

protected:
//for debugging
void setDebugName(const std::string& debugName) { client_.setDebugName(debugName); }
//helper
void makeClient() { client_ = std::make_unique<Client>(clientPset_, debugName_); }

//members
Client client_;
edm::ParameterSet clientPset_;
std::unique_ptr<Client> client_;
std::string debugName_;
};

#endif
31 changes: 19 additions & 12 deletions HeterogeneousCore/SonicCore/src/SonicClientBase.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,31 @@
#include "FWCore/Utilities/interface/Exception.h"
#include "FWCore/ParameterSet/interface/allowedValues.h"

SonicClientBase::SonicClientBase(const edm::ParameterSet& params)
: allowedTries_(params.getUntrackedParameter<unsigned>("allowedTries", 0)) {
SonicClientBase::SonicClientBase(const edm::ParameterSet& params,
const std::string& debugName,
const std::string& clientName)
: allowedTries_(params.getUntrackedParameter<unsigned>("allowedTries", 0)),
debugName_(debugName),
clientName_(clientName),
fullDebugName_(debugName_) {
if (!clientName_.empty())
fullDebugName_ += ":" + clientName_;

std::string modeName(params.getParameter<std::string>("mode"));
if (modeName == "Sync")
mode_ = SonicMode::Sync;
setMode(SonicMode::Sync);
else if (modeName == "Async")
mode_ = SonicMode::Async;
setMode(SonicMode::Async);
else if (modeName == "PseudoAsync")
mode_ = SonicMode::PseudoAsync;
setMode(SonicMode::PseudoAsync);
else
throw cms::Exception("Configuration") << "Unknown mode for SonicClient: " << modeName;
}

void SonicClientBase::setMode(SonicMode mode) {
if (mode_ == mode)
return;
mode_ = mode;

//get correct dispatcher for mode
if (mode_ == SonicMode::Sync or mode_ == SonicMode::Async)
Expand All @@ -21,13 +35,6 @@ SonicClientBase::SonicClientBase(const edm::ParameterSet& params)
dispatcher_ = std::make_unique<SonicDispatcherPseudoAsync>(this);
}

void SonicClientBase::setDebugName(const std::string& debugName) {
debugName_ = debugName;
fullDebugName_ = debugName_;
if (!clientName_.empty())
fullDebugName_ += ":" + clientName_;
}

void SonicClientBase::start(edm::WaitingTaskWithArenaHolder holder) {
start();
holder_ = std::move(holder);
Expand Down
4 changes: 2 additions & 2 deletions HeterogeneousCore/SonicCore/test/DummyClient.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
class DummyClient : public SonicClient<int> {
public:
//constructor
DummyClient(const edm::ParameterSet& params)
: SonicClient<int>(params),
DummyClient(const edm::ParameterSet& params, const std::string& debugName)
: SonicClient<int>(params, debugName, "DummyClient"),
factor_(params.getParameter<int>("factor")),
wait_(params.getParameter<int>("wait")),
fails_(params.getParameter<unsigned>("fails")) {}
Expand Down
Loading