-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQM/Integration unit tests are failing in all releases but 12_6_X #39669
Comments
A new Issue was created by @perrotta Andrea Perrotta. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign dqm,externals |
New categories assigned: dqm,externals @jfernan2,@ahmad3213,@micsucmed,@iarspider,@rvenditti,@smuzaffar,@emanueleusai,@syuvivida,@aandvalenzuela,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks |
I have reproduced the issue, also with CMSSW_12_6_X_2022-10-04-1100 (no idea why it didn't fail in the IBs). However I don't know how to fix it, we need to wait until @smuzaffar is back. |
For the time being, I just reproduced the error in CMSSW_12_3_X_2022-09-30-1100 (after changing the input dataset in https://github.com/cms-sw/cmssw/blob/master/DQM/Integration/python/config/unittestinputsource_cfi.py#L41 to avoid the xrootd error), but we don't have any ideas of the reason why. I tried to run a couple of DQM clients without unit test, and they work properly. |
Could it be that dataset
Or other solution is to backport the |
@makortel , @nhduongvn , @stlammel during Core SW meeting we decided to backport #37278 changes to older release cycles too. Do you see any issues doing this ? I am not sure if all sites are ready and already have new data catalogs from rucio |
Yes, that is the plan (see #37278 (comment)).
We need to be sure that the backports won't cause troubles in the old release cycles. I had earlier collected the list of fixes that need to be included in the backport in #37278 (comment), and this week a new issue on the subsite treatment in the
That was actually my precondition for signing #37278 that @stlammel confirmed in #37278 (comment) (although with 12_6_0_pre2 reality turned out to be more complicated). |
So, there was a campaign earlier this year to get storage.json files in place for all sites. Two sites had held out and they were put in place when this was discovered several week ago, as Matti wrote.
|
Hi, All, This still needs attention, is it still the case that @nhduongvn is preparing a fix here? |
Hi Sal, all, |
Thanks @nhduongvn, but we still need back ports to 12_5 and 12_4. @makortel is there some update there? Otherwise, can we just move to a more recent file for the DQM checks and bypass this entirely to just use a more recent run that's still available? @cms-sw/dqm-l2 ? |
@stlammel we won't release 12_6 until December, we can't really leave the IBs broken for 2 months. |
Hallo Sal, @rappoccio
|
Given the trouble we've had with #37278 I'm not comfortable in backporting it (and all the necessary fixes) to 12_4_X or 12_5_X until the data taking is over (to avoid any risk for Tier0). Said that, I think the unit tests would get fixed by just dropping the
succeeds in CMSSW_12_5_X_2022-10-21-1100. |
Right, dropping the |
this indeed works.
Let me know if some other cycles could use an update. |
I still don't understand why just dropping the # this is what the test used before
$ edmFileUtil -d --catalog file:/cvmfs/cms-ib.cern.ch/SITECONF/local/PhEDEx/storage.xml?protocol=xrootd /store/express/Run2022B/ExpressPhysics/FEVT/Express-v1/000/355/380/00000/b8a57fc4-5656-42b4-9b7b-2e647baf65e8.root
root://cms-xrd-global.cern.ch//store/express/Run2022B/ExpressPhysics/FEVT/Express-v1/000/355/380/00000/b8a57fc4-5656-42b4-9b7b-2e647baf65e8.root
# with explicit ibeos
$ edmFileUtil -d --catalog file:/cvmfs/cms-ib.cern.ch/SITECONF/local/PhEDEx/storage.xml?protocol=ibeos /store/express/Run2022B/ExpressPhysics/FEVT/Express-v1/000/355/380/00000/b8a57fc4-5656-42b4-9b7b-2e647baf65e8.root
root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/express/Run2022B/ExpressPhysics/FEVT/Express-v1/000/355/380/00000/b8a57fc4-5656-42b4-9b7b-2e647baf65e8.root
# dropping --catalog, setting CMS_PATH
$ CMS_PATH=/cvmfs/cms-ib.cern.ch edmFileUtil -d /store/express/Run2022B/ExpressPhysics/FEVT/Express-v1/000/355/380/00000/b8a57fc4-5656-42b4-9b7b-2e647baf65e8.root
root://eoscms.cern.ch//eos/cms/store/user/cmsbuild/store/express/Run2022B/ExpressPhysics/FEVT/Express-v1/000/355/380/00000/b8a57fc4-5656-42b4-9b7b-2e647baf65e8.root The last two cases resolve to exactly the same PFN. Also running Anyway, given that #39829 and #39830 are already merged, there probably isn't practical need to continue the discussion (except maybe why the merge of #39829 did not cause this issue to close). |
This didn't work for me, see #39669 (comment) |
I guess because the recipe in #39669 (comment) did not include overriding the |
humm, yes dropping |
that's interesting, because when I first tried to drop |
Thanks a lot for the efforts here! I think we can now close the issue as the IBs are now correctly completing. Thanks everyone! |
DQM/Integration unit tests are failing in large number in all releases but 12_6_X, in all cases apparently independently from the PR merged in the meanwhile.
I observed it starting in:
CMSSW_12_5_X_2022-10-04-1100
CMSSW_12_4_X_2022-10-03-2300
CMSSW_12_3_X_2022-09-30-1100
CMSSW_12_2_X_2022-10-03-2300
No such issue (yet?) in the master release.
In all cases there were no PR merged for th IB when it appeared first, in particular we are not merging anything in 12_2_X and 12_3_X since a while.
A typical log:
The text was updated successfully, but these errors were encountered: