Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SONIC core in CMSSW #29141

Merged
merged 8 commits into from
Mar 13, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions HeterogeneousCore/SonicCore/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<use name="FWCore/Framework"/>
<use name="FWCore/Concurrency"/>
<use name="FWCore/MessageLogger"/>
<use name="FWCore/ParameterSet"/>
<export>
<lib name="1"/>
</export>

103 changes: 103 additions & 0 deletions HeterogeneousCore/SonicCore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# SONIC core infrastructure

SONIC: Services for Optimized Network Inference on Coprocessors

## For analyzers

The `SonicEDProducer` class template extends the basic Stream producer module in CMSSW.

To implement a concrete derived producer class, the following skeleton can be used:
```cpp
#include "HeterogeneousCore/SonicCore/interface/SonicEDProducer.h"
#include "FWCore/Framework/interface/MakerMacros.h"

class MyProducer : public SonicEDProducer<Client>
{
public:
explicit MyProducer(edm::ParameterSet const& cfg) : SonicEDProducer<Client>(cfg) {
//for debugging
setDebugName("MyProducer");
}
void acquire(edm::Event const& iEvent, edm::EventSetup const& iSetup, Input& iInput) override {
//convert event data to client input format
}
void produce(edm::Event& iEvent, edm::EventSetup const& iSetup, Output const& iOutput) override {
//convert client output to event data format
}
static void fillDescriptions(edm::ConfigurationDescriptions & descriptions) {
edm::ParameterSetDescription desc;
Client::fillPSetDescription(desc);
//add producer-specific parameters
descriptions.add("MyProducer",desc);
}
};

DEFINE_FWK_MODULE(MyProducer);
```

The generic `Client` must be replaced with a concrete client (see next section), which has specific input and output types.

The python configuration for the producer should include a dedicated `PSet` for the client parameters:
```python
process.MyProducer = cms.EDProducer("MyProducer",
Client = cms.PSet(
# necessary client options go here
)
)
```
These parameters can be prepopulated and validated by the client using `fillDescriptions` (see below).

An example producer can be found in the [test](./test) folder.

## For developers

To add a new communication protocol for SONIC, follow these steps:
1. Submit the communication protocol software and any new dependencies to [cmsdist](https://github.com/cms-sw/cmsdist) as externals
2. Set up the concrete client(s) that use the communication protocol in a new package in the `HeterogeneousCore` subsystem
3. Add a test producer (see above) to make sure it works

To implement a concrete client, the following skeleton can be used for the `.h` file, with the function implementations in an associated `.cc` file:
```cpp
#ifndef HeterogeneousCore_MyPackage_MyClient
#define HeterogeneousCore_MyPackage_MyClient

#include "FWCore/ParameterSet/interface/ParameterSet.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClient*.h"

class MyClient : public SonicClient*<Input,Output> {
public:
MyClient(const edm::ParameterSet& params);

static void fillPSetDescription(edm::ParameterSetDescription& iDesc);

protected:
void evaluate() override;
};

#endif
```

The generic `SonicClient*` should be replaced with one of the available modes:
* `SonicClientSync`: synchronous call, blocks until the result is returned.
* `SonicClientAsync`: asynchronous, non-blocking call.
* `SonicClientPseudoAsync`: turns a synchronous, blocking call into an asynchronous, non-blocking call, by waiting for the result in a separate `std::thread`.

`SonicClientAsync` is the most efficient, but can only be used if asynchronous, non-blocking calls are supported by the communication protocol in use.

In addition, as indicated, the input and output data types must be specified.
(If both types are the same, only the input type needs to be specified.)

In all cases, the implementation of `evaluate()` must call `finish()`.
For the `Sync` and `PseudoAsync` modes, `finish()` should be called at the end of `evaluate()`.
For the `Async` mode, `finish()` should be called inside the communication protocol callback function (implementations may vary).

The client must also provide a static method `fillPSetDescription` to populate its parameters in the `fillDescriptions` for the producers that use the client:
```cpp
void MyClient::fillPSetDescription(edm::ParameterSetDescription& iDesc) {
edm::ParameterSetDescription descClient;
//add parameters
iDesc.add<edm::ParameterSetDescription>("Client",descClient);
}
```

Example client code can be found in the `interface` and `src` directories of the other Sonic packages in this repository.
22 changes: 22 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientAsync.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientAsync
#define HeterogeneousCore_SonicCore_SonicClientAsync

#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"

#include "HeterogeneousCore/SonicCore/interface/SonicClientBase.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClientTypes.h"

template <typename InputT, typename OutputT = InputT>
class SonicClientAsync : public SonicClientBase, public SonicClientTypes<InputT, OutputT> {
public:
virtual ~SonicClientAsync() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual ~SonicClientAsync() {}
~SonicClientAsync() override = default;

or just leave out because both base classes declare virtual destructors?


//main operation
void dispatch(edm::WaitingTaskWithArenaHolder holder) override final {
holder_ = std::move(holder);
setStartTime();
evaluate();
}
};

#endif
49 changes: 49 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientBase.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientBase
#define HeterogeneousCore_SonicCore_SonicClientBase

#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"
#include "FWCore/MessageLogger/interface/MessageLogger.h"

#include <string>
#include <chrono>
#include <exception>

class SonicClientBase {
public:
//destructor
virtual ~SonicClientBase() = default;

void setDebugName(const std::string& debugName) { debugName_ = debugName; }

//main operation
virtual void dispatch(edm::WaitingTaskWithArenaHolder holder) = 0;

protected:
virtual void evaluate() = 0;

void setStartTime() {
if (debugName_.empty())
return;
t0_ = std::chrono::high_resolution_clock::now();
setTime_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does setTime_ have any other functionality than setTime_ = not debugName_.empty()? It is not re-set to false in finish().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this can be simplified.

}

void finish(std::exception_ptr eptr = std::exception_ptr{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the logic correctly, one thing that I find confusing (and thus do not like much) is that the Sync and PseudoAsync clients will automatically call finish(), while the Async client delegates the user to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is unfortunate but unavoidable. For a real-world example, see:
https://github.com/hls-fpga-machine-learning/SonicCMS/blob/v5.0.0/TensorRT/src/TRTClient.cc#L112

finish() has to be executed in the server's callback. There's no generic way to do it on the client side when using a truly asynchronous call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand.

What about making the user call finish() also for the Sync and PseudoAsync cases ?

This would eliminate the confusion (and the specialisation in the dummy test).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be more consistent, but less user-friendly... to some extent I want to denote that true Async is a special case (some protocols don't support it, requires a callback, etc.). I'll think about it a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this change now, and added a note about it to the README.

if (setTime_) {
auto t1 = std::chrono::high_resolution_clock::now();
edm::LogInfo(debugName_) << "Client time: "
<< std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0_).count();
}
holder_.doneWaiting(eptr);
}

//members
edm::WaitingTaskWithArenaHolder holder_;

//for logging/debugging
std::string debugName_;
std::chrono::time_point<std::chrono::high_resolution_clock> t0_;
bool setTime_ = false;
};

#endif
76 changes: 76 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientPseudoAsync.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientPseudoAsync
#define HeterogeneousCore_SonicCore_SonicClientPseudoAsync

#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClientBase.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClientTypes.h"

#include <memory>
#include <condition_variable>
#include <mutex>
#include <thread>
#include <atomic>
#include <exception>

//pretend to be async + non-blocking by waiting for blocking calls to return in separate std::thread
template <typename InputT, typename OutputT = InputT>
class SonicClientPseudoAsync : public SonicClientBase, public SonicClientTypes<InputT, OutputT> {
public:
//constructor
SonicClientPseudoAsync() : SonicClientBase(), SonicClientTypes<InputT, OutputT>(), hasCall_(false), stop_(false) {
thread_ = std::make_unique<std::thread>([this]() { waitForNext(); });
Copy link
Contributor

@makortel makortel Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thread per module x stream is fine for now, but I created an issue (#29188) for a more general mechanism in case more use cases would surface.

}
//destructor
virtual ~SonicClientPseudoAsync() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual ~SonicClientPseudoAsync() {
~SonicClientPseudoAsync() override {

? Or does our clang-tidy insist on virtual instead of override?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidyshould replace virtual with override

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidyshould replace virtual with override

Right, but this PR is passing code checks so clang-tidy seems to be happy with the virtual?

{
std::lock_guard<std::mutex> guard(mutex_);
stop_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop_ is atomic, why lock the mutex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point long ago, I was encountering a data race, so I moved every read and write inside a lock. This one shouldn't need it, though, so I'll remove it.

}
cond_.notify_one();
if (thread_) {
thread_->join();
Copy link
Contributor

@makortel makortel Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes the destructor to possibly throw (practical implication being the process calling an std::terminate() instead of framework catching the exception in case the destructor is called during stack unwinding caused by an earlier exception being thrown).

Given that these objects have the life time of EDModules, we are not really interested to stop processing because of std::thread::join() failing, and therefore silently ignoring the exception could be sufficient (we are shutting down the application anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean not calling thread_join(), or wrapping it in a try/catch block ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant try-catch (sorry for being imprecise).

thread_.reset();
}
}
//accessor
void dispatch(edm::WaitingTaskWithArenaHolder holder) override final {
//do all read/writes inside lock to ensure cache synchronization
{
std::lock_guard<std::mutex> guard(mutex_);
holder_ = std::move(holder);
setStartTime();

//activate thread to wait for response, and return
hasCall_ = true;
}
cond_.notify_one();
}

protected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected:
private:

?

void waitForNext() {
while (true) {
//wait for condition
{
std::unique_lock<std::mutex> lk(mutex_);
cond_.wait(lk, [this]() { return (hasCall_ or stop_); });
if (stop_)
break;

//do everything inside lock
evaluate();

//reset condition
hasCall_ = false;
}
}
}

//members
bool hasCall_;
std::mutex mutex_;
std::condition_variable cond_;
std::atomic<bool> stop_;
std::unique_ptr<std::thread> thread_;
};

#endif
25 changes: 25 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientSync.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientSync
#define HeterogeneousCore_SonicCore_SonicClientSync

#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"

#include "HeterogeneousCore/SonicCore/interface/SonicClientBase.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClientTypes.h"

#include <exception>

template <typename InputT, typename OutputT = InputT>
class SonicClientSync : public SonicClientBase, public SonicClientTypes<InputT, OutputT> {
public:
virtual ~SonicClientSync() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual ~SonicClientSync() {}
~SonicClientSync() override = default;

or just remove?


//main operation
void dispatch(edm::WaitingTaskWithArenaHolder holder) override final {
holder_ = std::move(holder);
setStartTime();

evaluate();
}
};

#endif
25 changes: 25 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientTypes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientTypes
#define HeterogeneousCore_SonicCore_SonicClientTypes

#include "FWCore/ParameterSet/interface/ParameterSet.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking this #include is not needed, right?


//this base class exists to limit the impact of dependent scope in derived classes
template <typename InputT, typename OutputT = InputT>
class SonicClientTypes {
public:
//typedefs for outside accessibility
typedef InputT Input;
typedef OutputT Output;
//destructor
virtual ~SonicClientTypes() = default;

//accessors
Input& input() { return input_; }
const Output& output() const { return output_; }

protected:
Input input_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that these cannot be pointers because this class acts kind of like a buffer between the possibly asynchronous back-end, and the user code that produces the input (and could go out of scope before it is sent to the backend) and consumes the output (that it may not have allocated space for yet). Is it correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. With the interface of SonicEDProducer, the user code just sets/gets these entries directly (passed by reference). In addition, in some cases these values might need to be accessed in other threads (PseudoAsync case).

Output output_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that the Input and Output should not become "too large" in the sense that they should not keep "too much" memory when the module (stream instance) is not running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we can consider if something like swap() or shrink_to_fit() would be worthwhile to free memory for certain cases.

};

#endif
59 changes: 59 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicEDProducer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#ifndef HeterogeneousCore_SonicCore_SonicEDProducer
#define HeterogeneousCore_SonicCore_SonicEDProducer

#include "FWCore/Framework/interface/Event.h"
#include "FWCore/Framework/interface/stream/EDProducer.h"
#include "FWCore/ParameterSet/interface/ParameterSet.h"
#include "FWCore/Framework/interface/EventSetup.h"
#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"
#include "FWCore/MessageLogger/interface/MessageLogger.h"

#include <string>
#include <chrono>

//this is a stream producer because client operations are not multithread-safe in general
//it is designed such that the user never has to interact with the client or the acquire() callback directly
template <typename Client, typename... Capabilities>
class SonicEDProducer : public edm::stream::EDProducer<edm::ExternalWork, Capabilities...> {
public:
//typedefs to simplify usage
typedef typename Client::Input Input;
typedef typename Client::Output Output;
//constructor
SonicEDProducer(edm::ParameterSet const& cfg) : client_(cfg.getParameter<edm::ParameterSet>("Client")) {}
//destructor
virtual ~SonicEDProducer() = default;

//derived classes use a dedicated acquire() interface that incorporates client_.input()
//(no need to interact with callback holder)
void acquire(edm::Event const& iEvent,
edm::EventSetup const& iSetup,
edm::WaitingTaskWithArenaHolder holder) override final {
auto t0 = std::chrono::high_resolution_clock::now();
acquire(iEvent, iSetup, client_.input());
auto t1 = std::chrono::high_resolution_clock::now();
if (!debugName_.empty())
edm::LogInfo(debugName_) << "Load time: "
<< std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count();
client_.dispatch(holder);
}
virtual void acquire(edm::Event const& iEvent, edm::EventSetup const& iSetup, Input& iInput) = 0;
//derived classes use a dedicated produce() interface that incorporates client_.output()
void produce(edm::Event& iEvent, edm::EventSetup const& iSetup) override final {
//todo: measure time between acquire and produce
produce(iEvent, iSetup, client_.output());
}
virtual void produce(edm::Event& iEvent, edm::EventSetup const& iSetup, Output const& iOutput) = 0;

protected:
//for debugging
void setDebugName(const std::string& debugName) {
debugName_ = debugName;
client_.setDebugName(debugName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same string is stored twice (ok, they may be COW). Maybe the client_ could have an accessor to its debug name?

}
//members
Client client_;
std::string debugName_;
};

#endif
11 changes: 11 additions & 0 deletions HeterogeneousCore/SonicCore/test/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<environment>
<bin file="TestIntegration.cpp" name="TestIntegrationSonic">
<flags TEST_RUNNER_ARGS=" /bin/bash HeterogeneousCore/SonicCore/test run_sonic_test.sh"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test script can also be invoked with something along

  <test name="TestHeterogeneousCoreSonicCore" command="run_sonic_test.sh"/>

IIUC in that case the script is executed from the $CMSSW_BASE, i.e. minimal change in run_sonic_test.sh would be to set LOCAL_TEST_DIR=src/HeterogeneousCore/SonicCore/test.

Continuing with TestIntegration is also fine.

<use name="FWCore/Utilities"/>
</bin>
<library file="SonicDummyProducer.cc">
<flags EDM_PLUGIN="1"/>
<use name="HeterogeneousCore/SonicCore"/>
<use name="DataFormats/TestObjects"/>
</library>
</environment>
Loading