Out of range exception from RPCAMCRawToDigi #38939

makortel · 2022-08-02T13:18:09Z

Workflow 136.8561 step 3 has been failing since CMSSW_12_5_X_2022-07-28-1100 with

----- Begin Fatal Exception 02-Aug-2022 14:38:05 CEST-----------------------
An exception of category 'OutOfRange' occurred while
   [0] Processing  Event run: 314890 lumi: 591 event: 497483740 stream: 2
   [1] Running path 'dqmoffline_step'
   [2] Prefetching for module L1TdeStage2CPPF/'l1tdeStage2Cppf'
   [3] Calling method for module RPCAMCRawToDigi/'rpcCPPFRawToDigi'
Exception Message:
Out-of-range input for RPCAMCLink::bf_set, position 0: 100
----- End Fatal Exception -------------------------------------------------

https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc10/CMSSW_12_5_X_2022-08-02-1100/pyRelValMatrixLogs/run/136.8561_RunZeroBias_hBStarTk+RunZeroBias_hBStarTk+HLTDR2_2018_hBStar+RECODR2_2018reHLT_Offline_hBStar+HARVEST2018_hBStar/step3_RunZeroBias_hBStarTk+RunZeroBias_hBStarTk+HLTDR2_2018_hBStar+RECODR2_2018reHLT_Offline_hBStar+HARVEST2018_hBStar.log#/

The text was updated successfully, but these errors were encountered:

cmsbuild · 2022-08-02T13:18:28Z

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel · 2022-08-02T13:19:25Z

assign reconstruction

cmsbuild · 2022-08-02T13:19:43Z

New categories assigned: reconstruction

@jpata,@clacaputo you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel · 2022-08-02T13:21:01Z

Seems that this was already reported in #38564 (comment)

@zhangcg123

zhangcg123 · 2022-08-02T16:17:22Z

The same error can be reproduced simply by

cmsRun DQM/Integration/python/clients/l1tstage2emulator_dqm_sourceclient-live_cfg.py unitTest=True dataset=/ZeroBias/Commissioning2018-v1/RAW runNumber=314890 eventsPerLumi=-1

It looks like the error only occurs when dataset=/ZeroBias/Commissioning2018-v1/RAW is used as input.

davidlange6 · 2022-08-02T16:20:29Z

L1 would seem to be the more appropriate group to assign this to.

…

On Aug 2, 2022, at 9:19 AM, Matti Kortelainen ***@***.***> wrote: assign reconstruction — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

makortel · 2022-08-02T17:48:00Z

assign l1

L1 would seem to be the more appropriate group to assign this to.

Thanks. I followed RPCAMCRawToDigi module being defined in EventFilter/RPCRawToDigi, and that package being assigned to reconstruction.

cmsbuild · 2022-08-02T17:48:16Z

New categories assigned: l1

@epalencia,@rekovic,@cecilecaillol you have been requested to review this Pull request/Issue and eventually sign? Thanks

mileva · 2022-08-10T08:51:17Z

Hi @makortel all,
The reason for the crash is that during the run 314890 (used by the workflow 136.8561_RunZeroBias) the CPPF data were considered as corrupted. The reason was an firmware update, which led to some problems.. The CPPF good data have been restored after the run 315764.

My personal advice is to change the input data with some recent zerobias run in order to test the workflow.

And from the other side - probably some sanity checks (if there are data, or if they are valid...) need to be added to the analyzer in order to avoid further crash of the code in such cases.
Best!
Roumyana (for RPCs)

qliphy · 2022-08-10T11:40:06Z

@mileva Is #38974 supposed to fix this issue?

mileva · 2022-08-10T12:10:57Z

Hi @qliphy
No, #38974 is not supposed to fix this issue here.

#38974 is intended to fix the CPPF DAQ delay and the unpacked RPC digis, while the current issue relates to a comparison between the unpacked cppf digis vs emulated ones.

The RPCCPPF unpacker processes two different records
TXRecord: Processing it the unpacker fills the CPPFDigi collection (clusters that are sent to L1-EMTF), used by the colleagues for their #38564.
RXRecord contains an information for the initial RPC detector data and used to fill the RPCDigi collection used as an input for local reconstruction.

In the particular case with the test of the ZeroBias workflow, the input run was bad for CPPF - the cppf data were corrupted and thus led to a crash of the L1 CPPF DQM module.

Best!
Roumyana

makortel · 2022-08-10T13:13:56Z

My personal advice is to change the input data with some recent zerobias run in order to test the workflow.

Thanks, adding @cms-sw/pdmv-l2 for that

makortel · 2022-08-10T13:14:00Z

assign pdmv

cmsbuild · 2022-08-10T13:14:16Z

New categories assigned: pdmv

@bbilin,@jordan-martins,@kskovpen you have been requested to review this Pull request/Issue and eventually sign? Thanks

kskovpen · 2022-08-10T14:47:52Z

My personal advice is to change the input data with some recent zerobias run in order to test the workflow.

Thanks, adding @cms-sw/pdmv-l2 for that

One can use for example this input: https://github.com/cms-sw/cmssw/blob/master/Configuration/PyReleaseValidation/python/relval_steps.py#L483

perrotta · 2022-08-18T07:49:32Z

My personal advice is to change the input data with some recent zerobias run in order to test the workflow.

Thanks, adding @cms-sw/pdmv-l2 for that

One can use for example this input: https://github.com/cms-sw/cmssw/blob/master/Configuration/PyReleaseValidation/python/relval_steps.py#L483

@kskovpen can this be done, then? Having continuously failing workflows in the IB is far sub-optimal for the sake of checking the effect of newly integrated PRs on them.

kskovpen · 2022-08-18T12:38:15Z

@mileva @makortel if we want to move it to Run 3, which era should be used for this specific wf? I see that Run2_2018_highBetaStar was used in Run 2.

mmusich · 2022-08-23T15:40:15Z

if we want to move it to Run 3, which era should be used for this specific wf? I see that Run2_2018_highBetaStar was used in Run 2.

there's no equivalent (yet) for Run3 as we didn't have yet a high beta star run.
https://github.com/cms-sw/cmssw/tree/master/Configuration/Eras/python

I don't quite understand the point of changing the input of this wf, since I think it was expressly designed to test the reconstruction with high beta start (tracking) setup

mmusich · 2022-08-23T15:49:15Z

@mileva

And from the other side - probably some sanity checks (if there are data, or if they are valid...) need to be added to the analyzer in order to avoid further crash of the code in such cases.

is there any downstream consumer of CPPF digis? if we know that the data is corrupt exactly in the run range in which we have the high beta star can the unpacker be removed in the sequence run in that era?

mileva · 2022-08-23T15:54:54Z

I don't quite understand the point of changing the input of this wf

Hi @mmusich ,
In fact I tried to explain the reason for the crash with this workflow - namely the cppf data were corrupted in the input run, and the reason for the crash is not in the proposed pull request, but the data.
And I tried the same workflow with one of the recent runs with available data on eos to see that the software runs, nothing more.
I guess that the L1/CPPF/DQM code just needs some sanity checks to avoid such cases with corrupted data.

Best!
Roumyana

mmusich · 2022-08-23T15:56:44Z

@mileva

In fact I tried to explain the reason for the crash with this workflow - namely the cppf data were corrupted in the input run, and the reason for the crash is not in the proposed pull request, but the data.

yes, I understand, but changing the input data is NOT an option, unless we want to give up testing the high beta* reco...

mileva · 2022-08-23T16:09:27Z

is there any downstream consumer of CPPF digis? if we know that the data is corrupt exactly in the run range in which we have the high beta star can the unpacker be removed in the sequence run in that era?

In fact all the 2018A data before run 315764 are affected. The issue happened somewhere in March, before the start of data taking.
The CPPF digis are called by the L1TStage2Emulator. But I think at that moment (2018A) L1 didn't use them and just produced the cppf clusters on flight using the RPCDigis on the emulation step. (CPPF concentrates rpc digis in the endcap and clusterize them)
So, I guess there will be no problem the CPPFRPCunpacker to be removed for this particular workflow.

I just ping @efeyazgan for EMTF, in case I am missing something.

kskovpen · 2022-08-23T16:14:56Z

is there any downstream consumer of CPPF digis? if we know that the data is corrupt exactly in the run range in which we have the high beta star can the unpacker be removed in the sequence run in that era?

In fact all the 2018A data before run 315764 are affected. The issue happened somewhere in March, before the start of data taking. The CPPF digis are called by the L1TStage2Emulator. But I think at that moment (2018A) L1 didn't use them and just produced the cppf clusters on flight using the RPCDigis on the emulation step. (CPPF concentrates rpc digis in the endcap and clusterize them) So, I guess there will be no problem the CPPFRPCunpacker to be removed for this particular workflow.

I just ping @efeyazgan for EMTF, in case I am missing something.

Does it imply modifying some specific step of this workflow or its désactivation in IBs? In the former case, how it should be modified?

mileva · 2022-08-23T16:27:59Z

Just to be clear! I don't think the problem is in the workflow. The workflow shows that there is a problem with the particular pull request.
It might happen that the CPPF data will be corrupted again. So, some checks to not run over corrupted data in the L1/CPPF/DQM needs to be implemented.
But I am not an expert and could be I am wrong.

kskovpen · 2022-08-23T16:33:37Z

As @makortel has mentioned above, the problem is at step3 of 136.8561. Anyhow, if experts could comment on how relevant this wf is for Run3 (as it stands now), it would help to decide on a proper action.

mmusich · 2022-08-23T16:50:44Z

@kskovpen

Anyhow, if experts could comment on how relevant this wf is for Run3 (as it stands now), it would help to decide on a proper action.

this wf has no relevance whatsoever for run-3, but it is there to ensure we can still reconstruct properly the run2 high beta star data. I think someone with higher paygrade than me should decide if this is something that CMS wants to keep being able to do, but I don't see why that would not be the case.
Having said that to me it seems that the right course of action is to provide these checks in the CPPF / RPC code in order to avoid crashing on bad input data. Such checks are customarily included in DPG / POG code to avoid to failures at run time.

mmusich · 2022-08-23T16:59:04Z

The CPPF digis are called by the L1TStage2Emulator. But I think at that moment (2018A) L1 didn't use them and just produced the cppf clusters on flight using the RPCDigis on the emulation step. (CPPF concentrates rpc digis in the endcap and clusterize them)
So, I guess there will be no problem the CPPFRPCunpacker to be removed for this particular workflow.

by the way removing the CPPF unpacker results in

----- Begin Fatal Exception 23-Aug-2022 18:56:12 CEST-----------------------
An exception of category 'ProductNotFound' occurred while
   [0] Processing  Event run: 314890 lumi: 591 event: 497757635 stream: 0
   [1] Running path 'dqmoffline_step'
   [2] Calling method for module L1TStage2CPPF/'l1tStage2Cppf'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: std::vector<l1t::CPPFDigi>
Looking for module label: rpcCPPFRawToDigi
Looking for productInstanceName: 

   Additional Info:
      [a] If you wish to continue processing events after a ProductNotFound exception,
add "SkipEvent = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration.

----- End Fatal Exception -------------------------------------------------

so, there are downstream consumers.

kskovpen · 2022-08-25T06:37:21Z

Shall we disable 136.8561 in IBs and wait for further indications from the relevant groups?

perrotta · 2022-08-25T07:21:39Z

I guess that the L1/CPPF/DQM code just needs some sanity checks to avoid such cases with corrupted data.

Trying to find a solution for this longstanding issue: @mileva, could you or someone in your group please commit to provide those sanity checks in the code? If not for pre5 (today-ish), they should be made available before we build the final 12_5_0, so that this particolar workflow can continue to be tested in the cycle

mileva · 2022-08-25T14:22:34Z

Hi @perrotta
I can try to have a look or to ask some from the RPCs colleagues. However for today I am not able to help, as I am in the mountains. When the final 12_5_0 build is scheduled - 20.09 or earlier?
Roumyana

perrotta · 2022-08-25T14:39:29Z

When the final 12_5_0 build is scheduled - 20.09 or earlier?

Sep 20, see https://twiki.cern.ch/twiki/bin/viewauth/CMS/CMSSW_12_5_0
However, I would aim for a fix well before that date, in order to have still a few IBs available in which the wf can be tested

perrotta · 2022-09-01T04:29:55Z

urgent
(To make it visible in the list of issues: it keeps breaking the IBs)

perrotta · 2022-09-06T07:21:12Z

Fixed by #39307

cmsbuild added the pending-assignment label Aug 2, 2022

cmsbuild added pending-signatures reconstruction-pending and removed pending-assignment labels Aug 2, 2022

cmsbuild added the l1-pending label Aug 2, 2022

cmsbuild added the pdmv-pending label Aug 10, 2022

cmsbuild added the urgent label Sep 1, 2022

zhangcg123 mentioned this issue Sep 5, 2022

A possible solution for RPCAMCLink out-of-range issue #39307

Merged

makortel closed this as completed Sep 6, 2022

zhangcg123 mentioned this issue Sep 6, 2022

A possible solution for RPCAMCLink out-of-range issue[12_5_X] #39320

Merged

zhangcg123 mentioned this issue Dec 7, 2022

CPPF DQM modules backport to 12_4_X #40258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of range exception from RPCAMCRawToDigi #38939

Out of range exception from RPCAMCRawToDigi #38939

makortel commented Aug 2, 2022

cmsbuild commented Aug 2, 2022

makortel commented Aug 2, 2022

cmsbuild commented Aug 2, 2022

makortel commented Aug 2, 2022

zhangcg123 commented Aug 2, 2022

davidlange6 commented Aug 2, 2022 via email

makortel commented Aug 2, 2022 •

edited

Loading

cmsbuild commented Aug 2, 2022

mileva commented Aug 10, 2022

qliphy commented Aug 10, 2022

mileva commented Aug 10, 2022 •

edited

Loading

makortel commented Aug 10, 2022

makortel commented Aug 10, 2022

cmsbuild commented Aug 10, 2022

kskovpen commented Aug 10, 2022

perrotta commented Aug 18, 2022

kskovpen commented Aug 18, 2022

mmusich commented Aug 23, 2022

mmusich commented Aug 23, 2022

mileva commented Aug 23, 2022 •

edited

Loading

mmusich commented Aug 23, 2022

mileva commented Aug 23, 2022

kskovpen commented Aug 23, 2022

mileva commented Aug 23, 2022

kskovpen commented Aug 23, 2022

mmusich commented Aug 23, 2022

mmusich commented Aug 23, 2022

kskovpen commented Aug 25, 2022

perrotta commented Aug 25, 2022

mileva commented Aug 25, 2022

perrotta commented Aug 25, 2022

perrotta commented Sep 1, 2022

perrotta commented Sep 6, 2022

Out of range exception from RPCAMCRawToDigi #38939

Out of range exception from RPCAMCRawToDigi #38939

Comments

makortel commented Aug 2, 2022

cmsbuild commented Aug 2, 2022

makortel commented Aug 2, 2022

cmsbuild commented Aug 2, 2022

makortel commented Aug 2, 2022

zhangcg123 commented Aug 2, 2022

davidlange6 commented Aug 2, 2022 via email

makortel commented Aug 2, 2022 • edited Loading

cmsbuild commented Aug 2, 2022

mileva commented Aug 10, 2022

qliphy commented Aug 10, 2022

mileva commented Aug 10, 2022 • edited Loading

makortel commented Aug 10, 2022

makortel commented Aug 10, 2022

cmsbuild commented Aug 10, 2022

kskovpen commented Aug 10, 2022

perrotta commented Aug 18, 2022

kskovpen commented Aug 18, 2022

mmusich commented Aug 23, 2022

mmusich commented Aug 23, 2022

mileva commented Aug 23, 2022 • edited Loading

mmusich commented Aug 23, 2022

mileva commented Aug 23, 2022

kskovpen commented Aug 23, 2022

mileva commented Aug 23, 2022

kskovpen commented Aug 23, 2022

mmusich commented Aug 23, 2022

mmusich commented Aug 23, 2022

kskovpen commented Aug 25, 2022

perrotta commented Aug 25, 2022

mileva commented Aug 25, 2022

perrotta commented Aug 25, 2022

perrotta commented Sep 1, 2022

perrotta commented Sep 6, 2022

makortel commented Aug 2, 2022 •

edited

Loading

mileva commented Aug 10, 2022 •

edited

Loading

mileva commented Aug 23, 2022 •

edited

Loading