-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepTauId
failures in RelVals (Incompatible shapes
)
#44333
Comments
cms-bot internal usage |
A new Issue was created by @AdrianoDee. @Dr15Jones, @antoniovilela, @smuzaffar, @makortel, @sextonkennedy, @rappoccio can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign hlt |
assign pdmv |
New categories assigned: hlt,pdmv @Martin-Grunewald,@mmusich,@AdrianoDee,@sunilUIET,@miquork you have been requested to review this Pull request/Issue and eventually sign? Thanks |
HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v7
DeepTauId
failures in HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v7
in RelVals
@cms-sw/tau-pog-l2 FYI |
type tau |
just as an observation this path is not new (first included in the GRun menu in 2022, https://its.cern.ch/jira/browse/CMSHLT-2289) EDIT but was touched recently in https://its.cern.ch/jira/browse/CMSHLT-3052 |
@cms-sw/pdmv-l2
Please help filling in some information:
|
I can't find it in the Dashboard. Since it is labelled HLTDR_2023, and the path in question is not in the Fake* menus, it must be in some 13_X release running the actual 2023 HLT with the 2023 version of that path. |
Quick answers:
For the reproducibility and the CPU pattern I'll need a moment to check those. |
Hmm well, in 14_X, HLTDR_2023 should (now) run the Fake* menus, while the real HLT menus should be within HLTDR_2024. |
Indeed the configuration linked above has |
I see the same (similar) error
in |
This is a different path, so it points to a general problem with |
DeepTauId
failures in HLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v7
in RelValsDeepTauId
failures in RelVals
DeepTauId
failures in RelValsDeepTauId
failures in RelVals (Incompatible shapes
)
For context, it appears the exception comes from here: cmssw/PhysicsTools/TensorFlow/src/TensorFlow.cc Lines 272 to 275 in ff51428
|
assign ml |
assign reconstruction |
New categories assigned: ml,reconstruction @jfernan2,@mandrenguyen,@valsdav,@wpmccormack you have been requested to review this Pull request/Issue and eventually sign? Thanks |
This failure was now seen in Tier0 PromptReco https://cms-talk.web.cern.ch/t/update-t0-skim-config-for-2024-pp-collision/36794/5 . |
urgent
@valsdav, we have established that this issue can affect Prompt Reconstruction and (potentially, when the new nodes for the HLT farm arrive) also online trigger operations. Please prepare PRs with guards to avoid the execution of the model with empty inputs. Marco (as ORM) |
for record, the proposed fixes are: |
+1 |
+ml Basic guards to solve the empty input problem in DeepTauId are in place, but the reason of the empty grid needs to be investigated with Tau experts. A more general guard for empty inputs will be added (see #44481) |
+pdmv |
... hlt will sign once the 14.0.X PR is merged and tested in IBs. |
@cms-sw/reconstruction-l2 this looks like needs a separate issue. Can you open one? |
+hlt
|
This issue is fully signed and ready to be closed. |
@cmsbuild, please close |
Running RelVals we are observing some failures due to a tensorflow exception coming from
DeepTauId
module. Some examples listed here.1) 2023 Data reHLT + reRECO
In
HLTDR3_2023
step in pathHLT_VBF_DoubleMediumDeepTauPFTauHPS20_eta2p1_v7
in14_0_0_pre3
RelValswith the config here, that is what we get from wf
141.035
runningL1REPACK:Full,HLT:@relval2024
(HLT pointing at GRun here). The error here. The wf on Stats2.Also in the same step in
13_3_0_pre5
RunDisplacedJet2023C in a different path (HLT_DoubleMediumDeepTauPFTauHPS30_L2NN_eta2p1_PFJet60_v6
) run inHLT:@relval2023
. The error here. The wf on Stats2.2) 2022 Data reHLT + reRECO
Much rarer in
AODNANORUN3_reHLT_2022
step indeepTau2017v2p1ForMini
inRunJetMET2022D
with14_0_0
The error here. The wf on Stats2.3) MC 2023
In
DigiPU_2023PU
step inhltHpsPFTauDeepTauProducer
inRelValTenTau_15_500
with13_3_0_pre1
(at the moment the first occurrence I found). The error here. The wf on Stats2.CPU
At the moment it appears that in all cases the jobs were running on
Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
(or on aGold
one), Cascade Lake (see #44333 (comment)).The text was updated successfully, but these errors were encountered: