-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
large number of tracking-related warning/error messages in HLT jobs #39234
Comments
A new Issue was created by @missirol Marino Missiroli. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign reconstruction |
New categories assigned: reconstruction @jpata,@clacaputo,@mandrenguyen you have been requested to review this Pull request/Issue and eventually sign? Thanks |
FYI @cms-sw/egamma-pog-l2 @cms-sw/muon-pog-l2 @cms-sw/tracking-pog-l2 |
error in [1] is the same thing I've already reported in #39026 (comment). |
about this message:
and similar common messages, raised from cmssw/TrackingTools/TrajectoryState/src/BasicTrajectoryState.cc Lines 87 to 97 in 2a2ca2e
.... error not pos-def , and remove the details like err.matrix() , state.position() , state.momentum() , state.parameters().magneticFieldInTesla() .Will that be acceptable? (let me also tag @VinInn in case he has suggestions about this topic) |
No. If the warning is something that needs to be followed up, then people should look into it within one-two weeks at most, and solve the underlying issue. If there is nothing to fix, i.e. code is behaving as expected and the physics results are correct, then the warning should be downgraded to |
That's more of a wishful thinking, not how warnings were treated in the last 10+ years. |
I cannot change how RECO and others handled things in the past.
I think I _can_ improve how things are being handled now.
For example, let's pull from the online and offline reconstruction the
offending modules, and see how long it takes to remove the warnings ?
|
It might be useful to keep the |
If I read the code correctly, #39256 is making the check more stringent, from if (updatedTSOS.isValid() && updatedTSOS.localError().valid()) { to if (updatedTSOS.isValid() && updatedTSOS.localError().valid() && updatedTSOS.localError().posDef() && updatedTSOS.curvilinearError().matrix().Det2(det) && det > 0) { Since the if (...) {
result.addState(...);
} else {
edm::LogError("GsfMultiStateUpdator") << "KF updated state " << i << " is invalid. skipping.";
} the change should increase the amount of errors being reported, not reduce them. |
That is what |
case in point no, because the |
I see, so we get a reduction in -%MSG-w BasicTrajectoryState: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
-curv error not pos-def
-[ 3.7613e+07 294.713-2.89415e+06 -6909.75 -2081.74
- 294.713 0.00433287 -22.6764 -0.0489485 0.0109365
- -2.89415e+06 -22.6764 222691 531.393 160.185
- -6909.75 -0.0489485 531.393 -2.46602 0.454109
- -2081.74 0.0109365 160.185 0.454109 0.482128 ]
-pos/mom/mf (7.8834,-14.3362,-6.56777) (10.0654,-18.2823,2.70992) (-0.000146921,0.00026718,3.81131)
-%MSG
-%MSG-w BasicTrajectoryState: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
-local error not pos-def
-[ 3.7613e+07 2.89715e+06 -15369 6915.44 2063.28
- 2.89715e+06 223154 -1183.8 532.384 158.932
- -15369 -1183.8 6.28202 -2.81894 -0.815183
- 6915.44 532.384 -2.81894 -2.46759 0.471503
- 2063.28 158.932 -0.815183 0.471503 0.485435 ]
-pos/mom/mf (7.8834,-14.3362,-6.56777) (10.0654,-18.2823,2.70992) (-0.000146921,0.00026718,3.81131)
-%MSG
-%MSG-e PosteriorWeightsCalculator: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
-PosteriorWeightsCalculator: sumWeight < DBL_MIN
-%MSG
-%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
- no weights could be retreived. invalid updated state !.
-%MSG
-%MSG-e InvalidState: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
-inside hit
-%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:29:18 CEST Run: 357329 Event: 65599505
+KF updated state 0 is invalid. skipping.
+%MSG
+%MSG-e InvalidState: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:29:18 CEST Run: 357329 Event: 65599505
+first hit
+%MSG
-%MSG-w BasicTrajectoryState: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
-curv error not pos-def
-[ 3.7613e+07 294.713-2.89415e+06 -6909.75 -2081.74
- 294.713 0.00433287 -22.6764 -0.0489485 0.0109365
- -2.89415e+06 -22.6764 222691 531.393 160.185
- -6909.75 -0.0489485 531.393 -2.46602 0.454109
- -2081.74 0.0109365 160.185 0.454109 0.482128 ]
-pos/mom/mf (7.8834,-14.3362,-6.56777) (10.0654,-18.2823,2.70992) (-0.000146921,0.00026718,3.81131)
-%MSG
-%MSG-w BasicTrajectoryState: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
-local error not pos-def
-[ 3.7613e+07 2.89715e+06 -15369 6915.44 2063.28
- 2.89715e+06 223154 -1183.8 532.384 158.932
- -15369 -1183.8 6.28202 -2.81894 -0.815183
- 6915.44 532.384 -2.81894 -2.46759 0.471503
- 2063.28 158.932 -0.815183 0.471503 0.485435 ]
-pos/mom/mf (7.8834,-14.3362,-6.56777) (10.0654,-18.2823,2.70992) (-0.000146921,0.00026718,3.81131)
-%MSG
-%MSG-e PosteriorWeightsCalculator: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
-PosteriorWeightsCalculator: sumWeight < DBL_MIN
-%MSG
-%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
- no weights could be retreived. invalid updated state !.
-%MSG
-%MSG-e InvalidState: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65599505
-inside hit
-%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:29:18 CEST Run: 357329 Event: 65599505
+KF updated state 0 is invalid. skipping.
+%MSG
+%MSG-e InvalidState: GsfTrackProducer:hltEgammaGsfTracksForBParking 31-Aug-2022 08:29:18 CEST Run: 357329 Event: 65599505
+first hit
+%MSG -%MSG-w BasicTrajectoryState: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65033984
-curv error not pos-def
-[ 4.50868e+07 -492.822-1.58079e+06 -168.574 -232.861
- -492.822 0.00583096 17.2786 0.000881737 0.00526915
- -1.58079e+06 17.2786 55424 5.89368 8.16263
- -168.574 0.000881737 5.89368 -0.0576909 -0.0051147
- -232.861 0.00526915 8.16263 -0.0051147 0.0179101 ]
-pos/mom/mf (5.49633,-3.94086,-6.28259) (8.86537,-6.21352,-15.619) (-9.90109e-05,7.09906e-05,3.81112)
-%MSG
-%MSG-w BasicTrajectoryState: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65033984
-local error not pos-def
-[ 4.50868e+07 1.58727e+06 -160089 -168.434 425.45
- 1.58727e+06 55879.4 -5635.89 -5.9129 14.9731
- -160089 -5635.89 568.43 0.59339 -1.52462
- -168.434 -5.9129 0.59339 -0.057925 0.0148514
- 425.45 14.9731 -1.52462 0.0148514 0.0527617 ]
-pos/mom/mf (5.49633,-3.94086,-6.28259) (8.86537,-6.21352,-15.619) (-9.90109e-05,7.09906e-05,3.81112)
-%MSG
-%MSG-e PosteriorWeightsCalculator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65033984
-PosteriorWeightsCalculator: sumWeight < DBL_MIN
-%MSG
-%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65033984
- no weights could be retreived. invalid updated state !.
-%MSG
-%MSG-e InvalidState: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:18:49 CEST Run: 357329 Event: 65033984
-inside hit
-%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:29:19 CEST Run: 357329 Event: 65033984
+KF updated state 0 is invalid. skipping.
+%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:29:19 CEST Run: 357329 Event: 65033984
+KF updated state 1 is invalid. skipping.
+%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:29:19 CEST Run: 357329 Event: 65033984
+KF updated state 2 is invalid. skipping.
+%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:29:19 CEST Run: 357329 Event: 65033984
+KF updated state 3 is invalid. skipping.
+%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:29:19 CEST Run: 357329 Event: 65033984
+KF updated state 5 is invalid. skipping.
+%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:29:19 CEST Run: 357329 Event: 65033984
+KF updated state 11 is invalid. skipping.
+%MSG
+%MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:29:19 CEST Run: 357329 Event: 65033984
+KF updated state 0 is invalid. skipping.
+%MSG
+%MSG-e InvalidState: GsfTrackProducer:hltEgammaGsfTracks 31-Aug-2022 08:29:19 CEST Run: 357329 Event: 65033984
+first hit
+%MSG |
Running over 10k events, we go from 4883 warnings and errors
to only 2424:
the differences being a large reduction in warnings and only a small increase in errors: - 33 %MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks
+ 40 %MSG-e GsfMultiStateUpdator: GsfTrackProducer:hltEgammaGsfTracks
- 16 %MSG-e PosteriorWeightsCalculator: GsfTrackProducer:hltEgammaGsfTracksForBParking
- 33 %MSG-e PosteriorWeightsCalculator: GsfTrackProducer:hltEgammaGsfTracks
- 2232 %MSG-w BasicTrajectoryState: GsfTrackProducer:hltEgammaGsfTracks
- 186 %MSG-w BasicTrajectoryState: GsfTrackProducer:hltEgammaGsfTracksForBParking |
I think in a nutshell #39256 is exchanging a more verbose error for multiple less verbose ones. One might wonder why these pathological states even happen to exist (which is where I think the investigation should be moved to) |
@swagata87 , after your PR(s) , do you think we can remove @JanFSchulte , what would you suggest to do about the many warnings like [2]? [2]
|
Hello Marino, |
Dear expert, |
@wonpoint4 , okay, can you please prepare this PR so experts can review it? Thanks. |
The MUO POG has opened the corresponding PR in #39596. |
I think the original issue has been addressed by EGM and MUO POGs, by tweaking or demoting the warning messages. At least now HLT is not forced anymore to mask the warnings of those producers. It's not clear to me if there are still reconstruction issues to follow up on (see #39234 (comment)). I leave it to others to continue that discussion in this issue, or to do so in a separate one if needed. |
Trying to follow-up on a seemingly unrelated issue, I stumbled on the fact that in the alca FullTrack validation produced for one of the Tracker alignment updates, the number of e/gamma @ HLT related messages was drastically reduced when using the more updated conditions. Perhaps it's a lead to the actual problem. |
CMSSW jobs with recent HLT menus [0] show a very large number of
LogWarning
s andLogError
s coming mainly from the HLT e/gamma [1] and muon [2] reconstruction. The warnings/errors seem to be mainly related to tracking. A recipe to reproduce some of these warnings/errors is in [3].In CMSHLT-2449, it was noted that this creates significant size increase of HLT log files, and the short-term 'solution' was to silence these warnings (not the errors) via the
MessageLogger
module of the HLT menu.This issue is about understanding from RECO and POG experts what the best solution is.
Are these messages a sign of issues that should be addressed with some urgency?
If not, should they be downgraded to "info" or "debug" messages, rather than being silenced by HLT?
FYI: @fwyzard @Martin-Grunewald @silviodonato @swagata87 @JanFSchulte
[0] I currently don't know when this started, and if it is different with earlier releases or menus.
[1]
[2]
[3]
The text was updated successfully, but these errors were encountered: