Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLT Validation tests fail in all (12_3/4/5/6) IBs since 2022-09-07-1100 IBs #39345

Closed
Martin-Grunewald opened this issue Sep 8, 2022 · 17 comments

Comments

@Martin-Grunewald
Copy link
Contributor

Martin-Grunewald commented Sep 8, 2022

HLT Validation tests fail in all (12_3/12_4/12_5/12_6) IBs since 2022-09-07-1100 IBs.

(For other CMSSW releases, there are so far no IBs made after 2022-09-06-2300)

Has anything changed in the environment in which these jobs are executed?

The error message is:

Traceback (most recent call last):
  File "/pool/condor/dir_9286/jenkins/workspace/ib-run-HLT/CMSSW_12_6_X_2022-09-07-2300/bin/el8_amd64_gcc10/hltListPaths", line 190, in <module>
    paths = getPathList(config)
  File "/pool/condor/dir_9286/jenkins/workspace/ib-run-HLT/CMSSW_12_6_X_2022-09-07-2300/bin/el8_amd64_gcc10/hltListPaths", line 32, in getPathList
    raise Exception(f'query did not return a valid HLT menu:\n query="{cmdline}"')
Exception: query did not return a valid HLT menu:
 query="hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_4_0/GRun/V110 --noedsources --noes --noservices"

ie, the execution of the command:

hltConfigFromDB --run3 --v3 --configName /dev/CMSSW_12_4_0/GRun/V110 --noedsources --noes --noservices

in a CMSSW environment!

I can not reproduce this failure offline in a standard CMSSW developer environment.

@Martin-Grunewald
Copy link
Contributor Author

@smuzaffar

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2022

A new Issue was created by @Martin-Grunewald Martin Grunewald.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Martin-Grunewald Martin-Grunewald changed the title HLT Validation tests fail in all (12_2) releases since 2022-09-07-1100 IBs HLT Validation tests fail in all (12_X) releases since 2022-09-07-1100 IBs Sep 8, 2022
@Martin-Grunewald Martin-Grunewald changed the title HLT Validation tests fail in all (12_X) releases since 2022-09-07-1100 IBs HLT Validation tests fail in all (12_4/5/6) IBs since 2022-09-07-1100 IBs Sep 8, 2022
@Martin-Grunewald
Copy link
Contributor Author

assign hlt

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 8, 2022

New categories assigned: hlt

@missirol,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Martin-Grunewald
Copy link
Contributor Author

Similar to issue #37598 except that it is no longer random for a subset of such tests, but for all.

@Martin-Grunewald Martin-Grunewald changed the title HLT Validation tests fail in all (12_4/5/6) IBs since 2022-09-07-1100 IBs HLT Validation tests fail in all (12_3/4/5/6) IBs since 2022-09-07-1100 IBs Sep 8, 2022
@missirol
Copy link
Contributor

missirol commented Sep 8, 2022

Probably related: since yesterday, I have seen erratic behaviour in downloading HLT configurations from the DB via hltConfigFromDB and hltGetConfiguration, but only outside the CERN network (where the --dbproxy option is needed to access the DB).

hltConfigFromDB --dbproxy --configName /dev/CMSSW_12_4_0/GRun/V110 --noedsources --noes --noservices

returns nothing in such setup (reproduced right now). As a bi-product, hltGetConfiguration --dbproxy /dev/CMSSW_12_4_0/GRun/V110 > b.py returns an invalid python config, leading to

> python3 b.py
Traceback (most recent call last):
  File "/work/missiroli_m/test/tsg/storm/issue_egm_39344/CMSSW_12_4_7/src/b.py", line 5, in <module>
    process.source = cms.Source( "PoolSource",
NameError: name 'cms' is not defined

which is the typical error in #37598.

When this happens, the confdb installation in ${CMSSW_BASE}/tmp/confdb appears to be broken:

> ls -l $CMSSW_BASE/tmp/confdb/v3/
 2503 Sep  8 09:23 cmssw-evf-confdb-converter.jar
 2503 Sep  8 09:23 confdb.version
 2503 Sep  8 09:23 ojdbc8.jar

> tail -25 $CMSSW_BASE/tmp/confdb/v3/cmssw-evf-confdb-converter.jar
      <h1>Application is not available</h1>
      <p>The application is currently not serving requests at this endpoint. It may not have been started or is still starting.</p>

      <div class="alert alert-info">
        <p class="info">
          Possible reasons you are seeing this page:
        </p>
        <ul>
          <li>
            <strong>The host doesn't exist.</strong>
            Make sure the hostname was typed correctly and that a route matching this hostname exists.
          </li>
          <li>
            <strong>The host exists, but doesn't have a matching path.</strong>
            Check if the URL path was typed correctly and that the route was created using the desired path.
          </li>
          <li>
            <strong>Route and path matches, but all pods are down.</strong>
            Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running.
          </li>
        </ul>
      </div>
    </div>
  </body>
</html>

Right now, the issue is reproducible outside CERN in both 12_4_X and 12_6_X (that's what I tried so far).

On lxplus, the issue does not occur, even if I use the --dbproxy option (in IB tests, we do not use the --dbproxy option).

@Martin-Grunewald
Copy link
Contributor Author

Hmm, I do not see this confdb directory on vocms006:

 cd ${CMSSW_BASE}/tmp/confdb
/data/user/gruen/126/CMSSW_12_6_X_2022-09-07-2300/tmp/confdb: No such file or directory.

?

@missirol
Copy link
Contributor

missirol commented Sep 8, 2022

I also don't see it on a normal local CMSSW installation on lxplus. I don't know by heart how and when it is generated, I'll have to dig a bit to find out. Maybe #39345 (comment) does not explain the IB issues, but it seems unlikely that the two things are unrelated (getting configs via --dbproxy outside CERN worked well in the recent past).

@missirol
Copy link
Contributor

missirol commented Sep 8, 2022

Right now, what was described in #39345 (comment) (downloading a HLT config from outside CERN with the --dbproxy option) works normally again, and indeed the tmp/confdb directory looks healthy.

ls -l ${CMSSW_BASE}/tmp/confdb/v3/
  576603 Sep  8 17:57 cmssw-evf-confdb-converter.jar
     168 Sep  8 17:57 confdb.version
 5257174 Sep  8 17:57 ojdbc8.jar

The logic that leads to ${CMSSW_BASE}/tmp/confdb/v3/ is in

# try to read the .jar files from AFS, or download them

We rely on a directory on AFS, and elsewhere fall back to a local download of the .jar files of ConfDB.

If this download somehow fails (and in the last 24h, it looks like it often did), the output configurations can create the error messages discussed here and in #37598.

Do the machines running IB tests have access to /afs/cern.ch/user?

Would anybody have suggestions on how to make this more robust?

@missirol
Copy link
Contributor

@smuzaffar

Do the machines running IB tests have access to /afs/cern.ch/user?

Could you please answer this question? (it's just for my understanding)

@smuzaffar
Copy link
Contributor

smuzaffar commented Sep 13, 2022

@missirol , we do not mount AFS on build nodes so no we do not have access to /afs/cern.ch/user . There are some build nodes e.g. ppc , arm and grid nodes where there is AFS available but we discourage to make use of it

@missirol
Copy link
Contributor

Would anybody have suggestions on how to make this more robust?

Giving myself one suggestion: in principle, we do not need any of these downloads to test the menus in IBs, as the menus in question are already part of the CMSSW release. In practise, the issue is that some HLT utilities (e.g. hltIntegrationTests) only work by downloading the menu from the DB; it might be worth extending these scripts to (also) read python configs directly, and that could be a step towards solving this issue and #37598.

@missirol
Copy link
Contributor

missirol commented Oct 21, 2022

Regarding #39345 (comment), I realised only today, thanks to others, that the path /afs/cern.ch/user/c/confdb/www/ is actually a sym-link to /eos/home-c/confdb/www.

@smuzaffar , do the machines running IB tests have access to /eos/home-c/confdb/www?

(If they do, we could maybe avoid some extra downloads during IBs by changing /afs/cern.ch/user/c/confdb/www/ to /eos/home-c/confdb/www in confdb.py of CMSSW.)

@smuzaffar
Copy link
Contributor

@missirol , no we do not have access to afs or eos

@missirol
Copy link
Contributor

+hlt

See #37598 (comment) (I'm assuming that the root cause behind this issue and #37598 is the same).

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@makortel
Copy link
Contributor

@cmsbuild, please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants