Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up filesFromList and filesFromDASQuery in ConfigBuilder.py #29454

Merged
merged 2 commits into from
Apr 16, 2020

Conversation

kpedro88
Copy link
Contributor

@kpedro88 kpedro88 commented Apr 10, 2020

PR description:

Thanks to @DryRun for pointing out that the function filesFromList() was unnecessarily slow, especially when parsing an entire premixed pileup sample.

Loading a text file list generated by this command:

dasgoclient -query="file dataset=/Neutrino_E-10_gun/RunIISummer17PrePremix-PUAutumn18_102X_upgrade2018_realistic_v15-v1/GEN-SIM-DIGI-RAW" > Neutrino_E-10_gun_RunIISummer17PrePremix-PUAutumn18_102X_upgrade2018_realistic_v15-v1_GEN-SIM-DIGI-RAW.txt

Before the change: 20.75 minutes
After the change: 2.8 seconds

The same fix is propagated to filesFromDASQuery().

PR validation:

Confirmed same output from function.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@kpedro88
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-29454/14642

  • This PR adds an extra 36KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 10, 2020

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/5658/console Started: 2020/04/10 17:36

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for master.

It involves the following packages:

Configuration/Applications

@cmsbuild, @silviodonato, @franzoni, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks.
@makortel, @Martin-Grunewald, @fabiocos this is something you requested to watch as well.
@silviodonato, @dpiparo you are the release manager for this.

cms-bot commands are listed here

@kpedro88
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 10, 2020

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/5660/console Started: 2020/04/10 17:53

@kpedro88
Copy link
Contributor Author

code-checks

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-29454/14644

  • This PR adds an extra 36KB to repository

@cmsbuild
Copy link
Contributor

+1
Tested at: 2985833
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-40255d/5660/summary.html
CMSSW: CMSSW_11_1_X_2020-04-10-1100
SCRAM_ARCH: slc7_amd64_gcc820

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-40255d/5660/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2695166
  • DQMHistoTests: Total failures: 2
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2694845
  • DQMHistoTests: Total skipped: 319
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 147 log files, 16 edm output root files, 34 DQM output files

@DryRun
Copy link
Contributor

DryRun commented Apr 14, 2020

Thanks for making the PR, @kpedro88 . Should this be backported to the DR campaign releases, 8_0_X/9_4_X/10_2_X?

@cmsbuild cmsbuild mentioned this pull request Apr 16, 2020
@kpedro88
Copy link
Contributor Author

10_6_X is probably the most impactful backport, but I can also backport to the other suggested release cycles.

I'd like to get the master PR approved first, though, to streamline the backporting process.

@silviodonato please take a look

@silviodonato
Copy link
Contributor

+operations

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

Yes, 10_6_X is the most important backport (Run-2 ultra legacy). The improvement is impressive ( 20.75 minutes --> 2.8 seconds) so I would backport also to 10_2_X, that is used for Run-2 analysis. Also a backport to 8_0_X and 9_4_X can be useful for the DR campaigns.

@silviodonato silviodonato changed the title speed up filesFromList Speed up filesFromList and filesFromDASQuery in ConfigBuilder.py Apr 16, 2020
@silviodonato
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants