Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions not propagated through Transformer #40868

Closed
makortel opened this issue Feb 24, 2023 · 9 comments
Closed

Exceptions not propagated through Transformer #40868

makortel opened this issue Feb 24, 2023 · 9 comments

Comments

@makortel
Copy link
Contributor

In a case where a module A uses edm::Transformer ability, and module B consumes the transformed product, if an exception is thrown in the processing chain consumed by A, that exception is lost, and instead module B fails with a missing product exception. While the job still fails, the first exception getting lost makes it difficult to understand why the job fails.

I have a demonstrator in makortel@0280750 (commit on top of CMSSW_13_0_0_pre4) that makes the first module in the transformTest_cfg.py to throw an exception. If the Transformer is disabled, the job fails with the proper exception being reported

$ cmsRun FWCore/Integration/test/transformTest_cfg.py -- --exception --noTransform
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 24-Feb-2023 09:16:54.601 CET
----- Begin Fatal Exception 24-Feb-2023 09:16:54 CET-----------------------
An exception of category 'NotFound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'p'
   [2] Prefetching for module IntTestAnalyzer/'tester'
   [3] Prefetching for module TransformIntProducer/'t'
   [4] Calling method for module FailingProducer/'start'
Exception Message:
Intentional 'NotFound' exception for testing purposes

Enabling Transformer shows, instead, a ProductNotFound exception

$ cmsRun FWCore/Integration/test/transformTest_cfg.py -- --exception 
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 24-Feb-2023 09:17:45.654 CET
----- Begin Fatal Exception 24-Feb-2023 09:17:45 CET-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'p'
   [2] Calling method for module IntTestAnalyzer/'tester'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: edmtest::IntProduct
Looking for module label: t
Looking for productInstanceName: transform

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------
@makortel
Copy link
Contributor Author

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Feb 24, 2023

I think I see the origin of the problem. The exception from earlier in the chain is supposed to get propagated in TransformingProductResolver::prefetchAsync_

//Have to create a new task which will make sure the state for TransformingProductResolver
// is properly set after the module has run
auto t = make_waiting_task([this](std::exception_ptr const* iPtr) {
//The exception is being rethrown because resolveProductImpl sets the ProductResolver to a failed
// state for the case where an exception occurs during the call to the function.
// Caught exception is propagated via WaitingTaskList
CMS_SA_ALLOW try {
resolveProductImpl<true>([iPtr]() {
if (iPtr) {
std::rethrow_exception(*iPtr);
}
});
} catch (...) {
waitingTasks_.doneWaiting(std::current_exception());
return;
}
waitingTasks_.doneWaiting(nullptr);
});

The resolveProductImpl() calls the lambda that rethrows iPtr if it is non-null in
auto presentStatus = status();
if (callResolver && presentStatus == ProductStatus::ResolveNotRun) {
//if resolver fails because of exception or not setting product
// make sure the status goes to failed
auto failedStatusSetter = [this](ProductStatus* iPresentStatus) {
if (this->status() == ProductStatus::ResolveNotRun) {
this->setFailedStatus();
}
*iPresentStatus = this->status();
};
std::unique_ptr<ProductStatus, decltype(failedStatusSetter)> failedStatusGuard(&presentStatus,
failedStatusSetter);
//If successful, this will call setProduct
resolver();
}

The problem is that the presentStatus for Transformer is ResolveFailed instead of the expected ResolveNotRun, as set in
explicit TransformingProductResolver(std::shared_ptr<BranchDescription const> bd)
: ProducedProductResolver(bd, ProductStatus::ResolveFailed), mcc_(nullptr) {}

The rationale for ResolveFailed is described in #38454 (comment)

@makortel
Copy link
Contributor Author

I checked that simply using ResolveNotRun in

explicit TransformingProductResolver(std::shared_ptr<BranchDescription const> bd)
: ProducedProductResolver(bd, ProductStatus::ResolveFailed), mcc_(nullptr) {}

indeed does not work.

@makortel
Copy link
Contributor Author

I have a possible fix in #40879

@makortel
Copy link
Contributor Author

Fixed in #40879 and backported to 13_0_X in #41078

@makortel
Copy link
Contributor Author

+core

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants