Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECAL Phase 2 weights method amplitude reconstruction on GPU #37695

Merged
merged 23 commits into from
Aug 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
7979baa
Added amplitudeError
ChrisSandever Apr 26, 2022
2cf6995
Added GPU weights reconstruction module for Phase2EcalRecoGPU
ChrisSandever Apr 26, 2022
c9ab238
Modified EcalCPUUncalibRecHitProducer to be Phase 1 & 2 compatible
ChrisSandever Apr 26, 2022
491e688
Modified EcalUncalibRecHitConvertGPU2CPUFormat to be Phase 1 & 2 comp…
ChrisSandever Apr 26, 2022
5fd429d
Added configuration for testing Phase2EcalRecoGPU
ChrisSandever Apr 26, 2022
e35ae4d
Modified ecalUncalibRecHitPhase2_cff to use a switch producer
ChrisSandever Apr 26, 2022
b6eb69a
Added test configuration for Phase2EcalRecoGPU
ChrisSandever Apr 26, 2022
724935f
Added isPhase1 for amplitudeError issue
ChrisSandever Apr 29, 2022
d8c9212
re-added testEcalUncalibRecHitPhase2WeightsProducerGPU_harvesting.py
ChrisSandever Apr 29, 2022
e9f49d8
Changed isPhase1 to not(isPhase2)
ChrisSandever May 3, 2022
5a3e720
Added DeclsForKernelsPhase2.h
ChrisSandever May 30, 2022
67d3466
Changed converter to use switch description
ChrisSandever May 30, 2022
f9eddde
Added Phase2 CPU to GPU converter
ChrisSandever May 30, 2022
b59b8c1
Changed converter to use switch description
ChrisSandever May 30, 2022
487051f
Code checks
ChrisSandever May 30, 2022
c51e579
Changed Ph2 to Phase2
ChrisSandever May 30, 2022
fcd98f8
Changed Ph2 to Phase2
ChrisSandever May 30, 2022
1dcac8f
Changed Ph2 to Phase2
ChrisSandever May 30, 2022
722ef69
Changed weights_ to use cms::cuda::HostAllocator
ChrisSandever May 30, 2022
b1ba084
Chanhed Ph2 to Phase2 and removes EE objects
ChrisSandever May 30, 2022
e48b02d
Code checks
ChrisSandever May 30, 2022
be0268c
Made else statement in converter
ChrisSandever May 31, 2022
0820f19
Reverted kernel changes and added comments
ChrisSandever Jul 5, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ namespace ecal {

typename StoragePolicy::template StorageSelector<reco::ComputationScalarType>::type amplitudesAll;
typename StoragePolicy::template StorageSelector<reco::StorageScalarType>::type amplitude;
typename StoragePolicy::template StorageSelector<reco::StorageScalarType>::type amplitudeError;
typename StoragePolicy::template StorageSelector<reco::StorageScalarType>::type chi2;
typename StoragePolicy::template StorageSelector<reco::StorageScalarType>::type pedestal;
typename StoragePolicy::template StorageSelector<reco::StorageScalarType>::type jitter;
Expand All @@ -32,6 +33,7 @@ namespace ecal {
typename std::enable_if<std::is_same<U, ::calo::common::tags::Vec>::value, void>::type resize(size_t size) {
amplitudesAll.resize(size * EcalDataFrame::MAXSAMPLES);
amplitude.resize(size);
amplitudeError.resize(size);
pedestal.resize(size);
chi2.resize(size);
did.resize(size);
Expand Down
28 changes: 28 additions & 0 deletions RecoLocalCalo/EcalRecProducers/plugins/DeclsForKernelsPhase2.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#ifndef RecoLocalCalo_EcalRecProducers_plugins_DeclsForKernelsPhase2_h
#define RecoLocalCalo_EcalRecProducers_plugins_DeclsForKernelsPhase2_h

#include "CUDADataFormats/EcalRecHitSoA/interface/EcalUncalibratedRecHit.h"
#include "DataFormats/EcalDigi/interface/EcalDataFrame_Ph2.h"

namespace ecal {
namespace weights {

struct EventOutputDataGPU {
UncalibratedRecHit<::calo::common::DevStoragePolicy> recHits;

void allocate(uint32_t digi_size, cudaStream_t cudaStream) {
auto const size = digi_size;
recHits.amplitudesAll =
cms::cuda::make_device_unique<reco::ComputationScalarType[]>(size * EcalDataFrame::MAXSAMPLES, cudaStream);
recHits.amplitude = cms::cuda::make_device_unique<reco::StorageScalarType[]>(size, cudaStream);
recHits.amplitudeError = cms::cuda::make_device_unique<reco::StorageScalarType[]>(size, cudaStream);
recHits.chi2 = cms::cuda::make_device_unique<reco::StorageScalarType[]>(size, cudaStream);
recHits.pedestal = cms::cuda::make_device_unique<reco::StorageScalarType[]>(size, cudaStream);
recHits.did = cms::cuda::make_device_unique<uint32_t[]>(size, cudaStream);
recHits.flags = cms::cuda::make_device_unique<uint32_t[]>(size, cudaStream);
}
};
} //namespace weights
} //namespace ecal

#endif // RecoLocalCalo_EcalRecProducers_plugins_DeclsForKernelsPhase2_h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include "FWCore/Framework/interface/Event.h"
#include "FWCore/Framework/interface/EventSetup.h"
#include "FWCore/Framework/interface/MakerMacros.h"
#include "FWCore/ParameterSet/interface/EmptyGroupDescription.h"

// algorithm specific

Expand All @@ -27,32 +28,40 @@ class EcalCPUUncalibRecHitProducer : public edm::stream::EDProducer<edm::Externa
void produce(edm::Event&, edm::EventSetup const&) override;

private:
const bool isPhase2_;
using InputProduct = cms::cuda::Product<ecal::UncalibratedRecHit<calo::common::DevStoragePolicy>>;
edm::EDGetTokenT<InputProduct> recHitsInEBToken_, recHitsInEEToken_;
const edm::EDGetTokenT<InputProduct> recHitsInEBToken_, recHitsInEEToken_;
using OutputProduct = ecal::UncalibratedRecHit<calo::common::VecStoragePolicy<calo::common::CUDAHostAllocatorAlias>>;
edm::EDPutTokenT<OutputProduct> recHitsOutEBToken_, recHitsOutEEToken_;
const edm::EDPutTokenT<OutputProduct> recHitsOutEBToken_, recHitsOutEEToken_;

OutputProduct recHitsEB_, recHitsEE_;
bool containsTimingInformation_;
const bool containsTimingInformation_;
};

void EcalCPUUncalibRecHitProducer::fillDescriptions(edm::ConfigurationDescriptions& confDesc) {
edm::ParameterSetDescription desc;

desc.add<edm::InputTag>("recHitsInLabelEB", edm::InputTag{"ecalUncalibRecHitProducerGPU", "EcalUncalibRecHitsEB"});
desc.add<edm::InputTag>("recHitsInLabelEE", edm::InputTag{"ecalUncalibRecHitProducerGPU", "EcalUncalibRecHitsEE"});
desc.add<std::string>("recHitsOutLabelEB", "EcalUncalibRecHitsEB");
desc.add<std::string>("recHitsOutLabelEE", "EcalUncalibRecHitsEE");
desc.add<bool>("containsTimingInformation", false);
desc.ifValue(
edm::ParameterDescription<bool>("isPhase2", false, true),
false >> (edm::ParameterDescription<edm::InputTag>(
"recHitsInLabelEE", edm::InputTag{"ecalUncalibRecHitProducerGPU", "EcalUncalibRecHitsEE"}, true) and
edm::ParameterDescription<std::string>("recHitsOutLabelEE", "EcalUncalibRecHitsEE", true)) or
true >> edm::EmptyGroupDescription());

confDesc.add("ecalCPUUncalibRecHitProducer", desc);
}

EcalCPUUncalibRecHitProducer::EcalCPUUncalibRecHitProducer(const edm::ParameterSet& ps)
: recHitsInEBToken_{consumes<InputProduct>(ps.getParameter<edm::InputTag>("recHitsInLabelEB"))},
recHitsInEEToken_{consumes<InputProduct>(ps.getParameter<edm::InputTag>("recHitsInLabelEE"))},
: isPhase2_{ps.getParameter<bool>("isPhase2")},
recHitsInEBToken_{consumes<InputProduct>(ps.getParameter<edm::InputTag>("recHitsInLabelEB"))},
recHitsInEEToken_{isPhase2_ ? edm::EDGetTokenT<InputProduct>{}
: consumes<InputProduct>(ps.getParameter<edm::InputTag>("recHitsInLabelEE"))},
recHitsOutEBToken_{produces<OutputProduct>(ps.getParameter<std::string>("recHitsOutLabelEB"))},
recHitsOutEEToken_{produces<OutputProduct>(ps.getParameter<std::string>("recHitsOutLabelEE"))},
recHitsOutEEToken_{isPhase2_ ? edm::EDPutTokenT<OutputProduct>{}
: produces<OutputProduct>(ps.getParameter<std::string>("recHitsOutLabelEE"))},
containsTimingInformation_{ps.getParameter<bool>("containsTimingInformation")} {}

EcalCPUUncalibRecHitProducer::~EcalCPUUncalibRecHitProducer() {}
Expand All @@ -62,14 +71,11 @@ void EcalCPUUncalibRecHitProducer::acquire(edm::Event const& event,
edm::WaitingTaskWithArenaHolder taskHolder) {
// retrieve data/ctx
auto const& ebRecHitsProduct = event.get(recHitsInEBToken_);
auto const& eeRecHitsProduct = event.get(recHitsInEEToken_);
cms::cuda::ScopedContextAcquire ctx{ebRecHitsProduct, std::move(taskHolder)};
auto const& ebRecHits = ctx.get(ebRecHitsProduct);
auto const& eeRecHits = ctx.get(eeRecHitsProduct);

// resize the output buffers
recHitsEB_.resize(ebRecHits.size);
recHitsEE_.resize(eeRecHits.size);

auto lambdaToTransfer = [&ctx](auto& dest, auto* src) {
using vector_type = typename std::remove_reference<decltype(dest)>::type;
Expand All @@ -81,40 +87,45 @@ void EcalCPUUncalibRecHitProducer::acquire(edm::Event const& event,

// enqeue transfers
lambdaToTransfer(recHitsEB_.did, ebRecHits.did.get());
lambdaToTransfer(recHitsEE_.did, eeRecHits.did.get());

lambdaToTransfer(recHitsEB_.amplitudesAll, ebRecHits.amplitudesAll.get());
lambdaToTransfer(recHitsEE_.amplitudesAll, eeRecHits.amplitudesAll.get());

lambdaToTransfer(recHitsEB_.amplitude, ebRecHits.amplitude.get());
lambdaToTransfer(recHitsEE_.amplitude, eeRecHits.amplitude.get());

lambdaToTransfer(recHitsEB_.chi2, ebRecHits.chi2.get());
lambdaToTransfer(recHitsEE_.chi2, eeRecHits.chi2.get());

lambdaToTransfer(recHitsEB_.pedestal, ebRecHits.pedestal.get());
lambdaToTransfer(recHitsEE_.pedestal, eeRecHits.pedestal.get());

lambdaToTransfer(recHitsEB_.flags, ebRecHits.flags.get());
lambdaToTransfer(recHitsEE_.flags, eeRecHits.flags.get());

if (containsTimingInformation_) {
lambdaToTransfer(recHitsEB_.jitter, ebRecHits.jitter.get());
lambdaToTransfer(recHitsEE_.jitter, eeRecHits.jitter.get());

lambdaToTransfer(recHitsEB_.jitterError, ebRecHits.jitterError.get());
lambdaToTransfer(recHitsEE_.jitterError, eeRecHits.jitterError.get());
}
if (isPhase2_) {
lambdaToTransfer(recHitsEB_.amplitudeError, ebRecHits.amplitudeError.get());

} else {
auto const& eeRecHitsProduct = event.get(recHitsInEEToken_);
auto const& eeRecHits = ctx.get(eeRecHitsProduct);
recHitsEE_.resize(eeRecHits.size);
lambdaToTransfer(recHitsEE_.did, eeRecHits.did.get());
lambdaToTransfer(recHitsEE_.amplitudesAll, eeRecHits.amplitudesAll.get());
lambdaToTransfer(recHitsEE_.amplitude, eeRecHits.amplitude.get());
lambdaToTransfer(recHitsEE_.chi2, eeRecHits.chi2.get());
lambdaToTransfer(recHitsEE_.pedestal, eeRecHits.pedestal.get());
lambdaToTransfer(recHitsEE_.flags, eeRecHits.flags.get());
if (containsTimingInformation_) {
lambdaToTransfer(recHitsEE_.jitter, eeRecHits.jitter.get());
lambdaToTransfer(recHitsEE_.jitterError, eeRecHits.jitterError.get());
}
}
}

void EcalCPUUncalibRecHitProducer::produce(edm::Event& event, edm::EventSetup const& setup) {
// tmp vectors
auto recHitsOutEB = std::make_unique<OutputProduct>(std::move(recHitsEB_));
auto recHitsOutEE = std::make_unique<OutputProduct>(std::move(recHitsEE_));

// put into event
event.put(recHitsOutEBToken_, std::move(recHitsOutEB));
event.put(recHitsOutEEToken_, std::move(recHitsOutEE));

if (!isPhase2_) {
auto recHitsOutEE = std::make_unique<OutputProduct>(std::move(recHitsEE_));
event.put(recHitsOutEEToken_, std::move(recHitsOutEE));
}
}

DEFINE_FWK_MODULE(EcalCPUUncalibRecHitProducer);
103 changes: 103 additions & 0 deletions RecoLocalCalo/EcalRecProducers/plugins/EcalPhase2DigiToGPUProducer.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#include "FWCore/Framework/interface/stream/EDProducer.h"
#include "FWCore/Framework/interface/Event.h"
#include "FWCore/Framework/interface/EventSetup.h"
#include "FWCore/Framework/interface/MakerMacros.h"

#include "HeterogeneousCore/CUDACore/interface/ScopedContext.h"
#include "HeterogeneousCore/CUDAUtilities/interface/host_unique_ptr.h"
#include "CUDADataFormats/EcalDigi/interface/DigisCollection.h"

#include "DataFormats/EcalDigi/interface/EcalDigiCollections.h"
#include "DataFormats/EcalDigi/interface/EcalDataFrame_Ph2.h"

#include "DeclsForKernelsPhase2.h"

class EcalPhase2DigiToGPUProducer : public edm::stream::EDProducer<edm::ExternalWork> {
public:
explicit EcalPhase2DigiToGPUProducer(const edm::ParameterSet& ps);
~EcalPhase2DigiToGPUProducer() override = default;
static void fillDescriptions(edm::ConfigurationDescriptions& descriptions);

void acquire(edm::Event const& event, edm::EventSetup const& setup, edm::WaitingTaskWithArenaHolder holder) override;
void produce(edm::Event& evt, edm::EventSetup const& setup) override;

private:
const edm::EDGetTokenT<EBDigiCollectionPh2> digiCollectionToken_;
const edm::EDPutTokenT<cms::cuda::Product<ecal::DigisCollection<calo::common::DevStoragePolicy>>>
digisCollectionToken_;
uint32_t size_;

ecal::DigisCollection<::calo::common::DevStoragePolicy> digis_;

cms::cuda::ContextState cudaState_;
};

void EcalPhase2DigiToGPUProducer::fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
edm::ParameterSetDescription desc;

desc.add<edm::InputTag>("BarrelDigis", edm::InputTag("simEcalUnsuppressedDigis", ""));
desc.add<std::string>("digisLabelEB", "ebDigis");

descriptions.addWithDefaultLabel(desc);
}

EcalPhase2DigiToGPUProducer::EcalPhase2DigiToGPUProducer(const edm::ParameterSet& ps)
: digiCollectionToken_(consumes<EBDigiCollectionPh2>(ps.getParameter<edm::InputTag>("BarrelDigis"))),
digisCollectionToken_(produces<cms::cuda::Product<ecal::DigisCollection<calo::common::DevStoragePolicy>>>(
ps.getParameter<std::string>("digisLabelEB"))) {}

void EcalPhase2DigiToGPUProducer::acquire(edm::Event const& event,
edm::EventSetup const& setup,
edm::WaitingTaskWithArenaHolder holder) {
cms::cuda::ScopedContextAcquire ctx{event.streamID(), std::move(holder), cudaState_};

//input data from event
const auto& pdigis = event.get(digiCollectionToken_);

size_ = pdigis.size();

digis_.size = size_;
//allocate device pointers for output
digis_.ids = cms::cuda::make_device_unique<uint32_t[]>(size_, ctx.stream());
digis_.data = cms::cuda::make_device_unique<uint16_t[]>(size_ * EcalDataFrame_Ph2::MAXSAMPLES, ctx.stream());

//allocate host pointers for holding product data and id vectors
auto idstmp = cms::cuda::make_host_unique<uint32_t[]>(size_, ctx.stream());
auto datatmp = cms::cuda::make_host_unique<uint16_t[]>(size_ * EcalDataFrame_Ph2::MAXSAMPLES, ctx.stream());

//iterate over digis
uint32_t i = 0;
for (const auto& pdigi : pdigis) {
const int nSamples = pdigi.size();
//assign id to output vector
idstmp.get()[i] = pdigi.id();
//iterate over sample in digi
for (int sample = 0; sample < nSamples; ++sample) {
//get samples from input digi
EcalLiteDTUSample thisSample = pdigi[sample];
//assign adc data to output
datatmp.get()[i * nSamples + sample] = thisSample.raw();
}
++i;
}

//copy output vectors into member variable device pointers for the output struct

cudaCheck(
cudaMemcpyAsync(digis_.ids.get(), idstmp.get(), size_ * sizeof(uint32_t), cudaMemcpyHostToDevice, ctx.stream()));
cudaCheck(cudaMemcpyAsync(digis_.data.get(),
datatmp.get(),
size_ * EcalDataFrame_Ph2::MAXSAMPLES * sizeof(uint16_t),
cudaMemcpyHostToDevice,
ctx.stream()));
}

void EcalPhase2DigiToGPUProducer::produce(edm::Event& event, edm::EventSetup const& setup) {
//get cuda context state for producer
cms::cuda::ScopedContextProduce ctx{cudaState_};

//emplace output in the context
ctx.emplace(event, digisCollectionToken_, std::move(digis_));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think this could also be done asynchronously, without the acquire()/produce() split.

@makortel is the expert here, but could you try to make the change and see if it works ?

  • change the inheritance to simply edm::stream::EDProducer<>
  • move the code from acquire() to the beginning of produce()
  • change
    -cms::cuda::ScopedContextAcquire ctx{event.streamID(), std::move(holder), cudaState_};
    +cms::cuda::ScopedContextProduce ctx{event.streamID()};
  • make size_ and digis_ local variables
  • remove the cudaState_ variable

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The possibility of dropping the ExternalWork was something that has been mentioned already. However, since holder was used to construct ctx we thought that it does not work in the end since holder is not available in produce(). But using ctx with type cms::cuda::ScopedContextProduce could make that possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think this could also be done asynchronously, without the acquire()/produce() split.

@makortel is the expert here, but could you try to make the change and see if it works ?

For what it's worth, I agree.

But using ctx with type cms::cuda::ScopedContextProduce could make that possible.

Right, without ExternalWork the ScopedContextProduce should be used.

}

DEFINE_FWK_MODULE(EcalPhase2DigiToGPUProducer);
Loading