Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SONIC core in CMSSW #29141

Merged
merged 8 commits into from
Mar 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions HeterogeneousCore/SonicCore/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<use name="FWCore/Framework"/>
<use name="FWCore/Concurrency"/>
<use name="FWCore/MessageLogger"/>
<use name="FWCore/ParameterSet"/>
<export>
<lib name="1"/>
</export>

103 changes: 103 additions & 0 deletions HeterogeneousCore/SonicCore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# SONIC core infrastructure

SONIC: Services for Optimized Network Inference on Coprocessors

## For analyzers

The `SonicEDProducer` class template extends the basic Stream producer module in CMSSW.

To implement a concrete derived producer class, the following skeleton can be used:
```cpp
#include "HeterogeneousCore/SonicCore/interface/SonicEDProducer.h"
#include "FWCore/Framework/interface/MakerMacros.h"

class MyProducer : public SonicEDProducer<Client>
{
public:
explicit MyProducer(edm::ParameterSet const& cfg) : SonicEDProducer<Client>(cfg) {
//for debugging
setDebugName("MyProducer");
}
void acquire(edm::Event const& iEvent, edm::EventSetup const& iSetup, Input& iInput) override {
//convert event data to client input format
}
void produce(edm::Event& iEvent, edm::EventSetup const& iSetup, Output const& iOutput) override {
//convert client output to event data format
}
static void fillDescriptions(edm::ConfigurationDescriptions & descriptions) {
edm::ParameterSetDescription desc;
Client::fillPSetDescription(desc);
//add producer-specific parameters
descriptions.add("MyProducer",desc);
}
};

DEFINE_FWK_MODULE(MyProducer);
```

The generic `Client` must be replaced with a concrete client (see next section), which has specific input and output types.

The python configuration for the producer should include a dedicated `PSet` for the client parameters:
```python
process.MyProducer = cms.EDProducer("MyProducer",
Client = cms.PSet(
# necessary client options go here
)
)
```
These parameters can be prepopulated and validated by the client using `fillDescriptions` (see below).

An example producer can be found in the [test](./test) folder.

## For developers

To add a new communication protocol for SONIC, follow these steps:
1. Submit the communication protocol software and any new dependencies to [cmsdist](https://github.com/cms-sw/cmsdist) as externals
2. Set up the concrete client(s) that use the communication protocol in a new package in the `HeterogeneousCore` subsystem
3. Add a test producer (see above) to make sure it works

To implement a concrete client, the following skeleton can be used for the `.h` file, with the function implementations in an associated `.cc` file:
```cpp
#ifndef HeterogeneousCore_MyPackage_MyClient
#define HeterogeneousCore_MyPackage_MyClient

#include "FWCore/ParameterSet/interface/ParameterSet.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClient*.h"

class MyClient : public SonicClient*<Input,Output> {
public:
MyClient(const edm::ParameterSet& params);

static void fillPSetDescription(edm::ParameterSetDescription& iDesc);

protected:
void evaluate() override;
};

#endif
```

The generic `SonicClient*` should be replaced with one of the available modes:
* `SonicClientSync`: synchronous call, blocks until the result is returned.
* `SonicClientAsync`: asynchronous, non-blocking call.
* `SonicClientPseudoAsync`: turns a synchronous, blocking call into an asynchronous, non-blocking call, by waiting for the result in a separate `std::thread`.

`SonicClientAsync` is the most efficient, but can only be used if asynchronous, non-blocking calls are supported by the communication protocol in use.

In addition, as indicated, the input and output data types must be specified.
(If both types are the same, only the input type needs to be specified.)

In all cases, the implementation of `evaluate()` must call `finish()`.
For the `Sync` and `PseudoAsync` modes, `finish()` should be called at the end of `evaluate()`.
For the `Async` mode, `finish()` should be called inside the communication protocol callback function (implementations may vary).

The client must also provide a static method `fillPSetDescription` to populate its parameters in the `fillDescriptions` for the producers that use the client:
```cpp
void MyClient::fillPSetDescription(edm::ParameterSetDescription& iDesc) {
edm::ParameterSetDescription descClient;
//add parameters
iDesc.add<edm::ParameterSetDescription>("Client",descClient);
}
```

Example client code can be found in the `interface` and `src` directories of the other Sonic packages in this repository.
20 changes: 20 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientAsync.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientAsync
#define HeterogeneousCore_SonicCore_SonicClientAsync

#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"

#include "HeterogeneousCore/SonicCore/interface/SonicClientBase.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClientTypes.h"

template <typename InputT, typename OutputT = InputT>
class SonicClientAsync : public SonicClientBase, public SonicClientTypes<InputT, OutputT> {
public:
//main operation
void dispatch(edm::WaitingTaskWithArenaHolder holder) override final {
holder_ = std::move(holder);
setStartTime();
evaluate();
}
};

#endif
48 changes: 48 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientBase.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientBase
#define HeterogeneousCore_SonicCore_SonicClientBase

#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"
#include "FWCore/MessageLogger/interface/MessageLogger.h"

#include <string>
#include <chrono>
#include <exception>

class SonicClientBase {
public:
//destructor
virtual ~SonicClientBase() = default;

void setDebugName(const std::string& debugName) { debugName_ = debugName; }
const std::string& debugName() const { return debugName_; }

//main operation
virtual void dispatch(edm::WaitingTaskWithArenaHolder holder) = 0;

protected:
virtual void evaluate() = 0;

void setStartTime() {
if (debugName_.empty())
return;
t0_ = std::chrono::high_resolution_clock::now();
}

void finish(std::exception_ptr eptr = std::exception_ptr{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the logic correctly, one thing that I find confusing (and thus do not like much) is that the Sync and PseudoAsync clients will automatically call finish(), while the Async client delegates the user to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is unfortunate but unavoidable. For a real-world example, see:
https://github.com/hls-fpga-machine-learning/SonicCMS/blob/v5.0.0/TensorRT/src/TRTClient.cc#L112

finish() has to be executed in the server's callback. There's no generic way to do it on the client side when using a truly asynchronous call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand.

What about making the user call finish() also for the Sync and PseudoAsync cases ?

This would eliminate the confusion (and the specialisation in the dummy test).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be more consistent, but less user-friendly... to some extent I want to denote that true Async is a special case (some protocols don't support it, requires a callback, etc.). I'll think about it a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this change now, and added a note about it to the README.

if (!debugName_.empty()) {
auto t1 = std::chrono::high_resolution_clock::now();
edm::LogInfo(debugName_) << "Client time: "
<< std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0_).count();
}
holder_.doneWaiting(eptr);
}

//members
edm::WaitingTaskWithArenaHolder holder_;

//for logging/debugging
std::string debugName_;
std::chrono::time_point<std::chrono::high_resolution_clock> t0_;
};

#endif
76 changes: 76 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientPseudoAsync.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientPseudoAsync
#define HeterogeneousCore_SonicCore_SonicClientPseudoAsync

#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClientBase.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClientTypes.h"

#include <memory>
#include <condition_variable>
#include <mutex>
#include <thread>
#include <atomic>
#include <exception>

//pretend to be async + non-blocking by waiting for blocking calls to return in separate std::thread
template <typename InputT, typename OutputT = InputT>
class SonicClientPseudoAsync : public SonicClientBase, public SonicClientTypes<InputT, OutputT> {
public:
//constructor
SonicClientPseudoAsync() : SonicClientBase(), SonicClientTypes<InputT, OutputT>(), hasCall_(false), stop_(false) {
thread_ = std::make_unique<std::thread>([this]() { waitForNext(); });
Copy link
Contributor

@makortel makortel Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thread per module x stream is fine for now, but I created an issue (#29188) for a more general mechanism in case more use cases would surface.

}
//destructor
~SonicClientPseudoAsync() override {
stop_ = true;
cond_.notify_one();
if (thread_) {
try {
thread_->join();
thread_.reset();
} catch (...) {
}
}
}
//accessor
void dispatch(edm::WaitingTaskWithArenaHolder holder) override final {
//do all read/writes inside lock to ensure cache synchronization
{
std::lock_guard<std::mutex> guard(mutex_);
holder_ = std::move(holder);
setStartTime();

//activate thread to wait for response, and return
hasCall_ = true;
}
cond_.notify_one();
}

private:
void waitForNext() {
while (true) {
//wait for condition
{
std::unique_lock<std::mutex> lk(mutex_);
cond_.wait(lk, [this]() { return (hasCall_ or stop_); });
if (stop_)
break;

//do everything inside lock
evaluate();

//reset condition
hasCall_ = false;
}
}
}

//members
bool hasCall_;
std::mutex mutex_;
std::condition_variable cond_;
std::atomic<bool> stop_;
std::unique_ptr<std::thread> thread_;
};

#endif
23 changes: 23 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientSync.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientSync
#define HeterogeneousCore_SonicCore_SonicClientSync

#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"

#include "HeterogeneousCore/SonicCore/interface/SonicClientBase.h"
#include "HeterogeneousCore/SonicCore/interface/SonicClientTypes.h"

#include <exception>

template <typename InputT, typename OutputT = InputT>
class SonicClientSync : public SonicClientBase, public SonicClientTypes<InputT, OutputT> {
public:
//main operation
void dispatch(edm::WaitingTaskWithArenaHolder holder) override final {
holder_ = std::move(holder);
setStartTime();

evaluate();
}
};

#endif
23 changes: 23 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicClientTypes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#ifndef HeterogeneousCore_SonicCore_SonicClientTypes
#define HeterogeneousCore_SonicCore_SonicClientTypes

//this base class exists to limit the impact of dependent scope in derived classes
template <typename InputT, typename OutputT = InputT>
class SonicClientTypes {
public:
//typedefs for outside accessibility
typedef InputT Input;
typedef OutputT Output;
//destructor
virtual ~SonicClientTypes() = default;

//accessors
Input& input() { return input_; }
const Output& output() const { return output_; }

protected:
Input input_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that these cannot be pointers because this class acts kind of like a buffer between the possibly asynchronous back-end, and the user code that produces the input (and could go out of scope before it is sent to the backend) and consumes the output (that it may not have allocated space for yet). Is it correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. With the interface of SonicEDProducer, the user code just sets/gets these entries directly (passed by reference). In addition, in some cases these values might need to be accessed in other threads (PseudoAsync case).

Output output_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that the Input and Output should not become "too large" in the sense that they should not keep "too much" memory when the module (stream instance) is not running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we can consider if something like swap() or shrink_to_fit() would be worthwhile to free memory for certain cases.

};

#endif
55 changes: 55 additions & 0 deletions HeterogeneousCore/SonicCore/interface/SonicEDProducer.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#ifndef HeterogeneousCore_SonicCore_SonicEDProducer
#define HeterogeneousCore_SonicCore_SonicEDProducer

#include "FWCore/Framework/interface/Event.h"
#include "FWCore/Framework/interface/stream/EDProducer.h"
#include "FWCore/ParameterSet/interface/ParameterSet.h"
#include "FWCore/Framework/interface/EventSetup.h"
#include "FWCore/Concurrency/interface/WaitingTaskWithArenaHolder.h"
#include "FWCore/MessageLogger/interface/MessageLogger.h"

#include <string>
#include <chrono>

//this is a stream producer because client operations are not multithread-safe in general
//it is designed such that the user never has to interact with the client or the acquire() callback directly
template <typename Client, typename... Capabilities>
class SonicEDProducer : public edm::stream::EDProducer<edm::ExternalWork, Capabilities...> {
public:
//typedefs to simplify usage
typedef typename Client::Input Input;
typedef typename Client::Output Output;
//constructor
SonicEDProducer(edm::ParameterSet const& cfg) : client_(cfg.getParameter<edm::ParameterSet>("Client")) {}
//destructor
virtual ~SonicEDProducer() = default;

//derived classes use a dedicated acquire() interface that incorporates client_.input()
//(no need to interact with callback holder)
void acquire(edm::Event const& iEvent,
edm::EventSetup const& iSetup,
edm::WaitingTaskWithArenaHolder holder) override final {
auto t0 = std::chrono::high_resolution_clock::now();
acquire(iEvent, iSetup, client_.input());
auto t1 = std::chrono::high_resolution_clock::now();
if (!client_.debugName().empty())
edm::LogInfo(client_.debugName()) << "Load time: "
<< std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count();
client_.dispatch(holder);
}
virtual void acquire(edm::Event const& iEvent, edm::EventSetup const& iSetup, Input& iInput) = 0;
//derived classes use a dedicated produce() interface that incorporates client_.output()
void produce(edm::Event& iEvent, edm::EventSetup const& iSetup) override final {
//todo: measure time between acquire and produce
produce(iEvent, iSetup, client_.output());
}
virtual void produce(edm::Event& iEvent, edm::EventSetup const& iSetup, Output const& iOutput) = 0;

protected:
//for debugging
void setDebugName(const std::string& debugName) { client_.setDebugName(debugName); }
//members
Client client_;
};

#endif
8 changes: 8 additions & 0 deletions HeterogeneousCore/SonicCore/test/BuildFile.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<environment>
<test name="TestHeterogeneousCoreSonicCore" command="cmsRun ${LOCALTOP}/src/HeterogeneousCore/SonicCore/test/sonicTest_cfg.py"/>
<library file="SonicDummyProducer.cc">
<flags EDM_PLUGIN="1"/>
<use name="HeterogeneousCore/SonicCore"/>
<use name="DataFormats/TestObjects"/>
</library>
</environment>
Loading