Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XrdAdaptor] Delay additional source acquisition when redirect limit error is returned. #35498

Merged

Conversation

osschar
Copy link
Contributor

@osschar osschar commented Sep 30, 2021

When file open returns error status XrdCl::errRedirectLimit this means the limit of retries on a redirector has been reached and that any further requests on the same redirector will result in the same error.

This PR introduces a progressive scaling variable that increases the delay until the next attempt of additional source acquisition.

In redirector hierarchy the reject decision gets made on each site redirector, based on its configuration -- this is why scaling is progressive, a job might later still succeed in opening of another file replica from another site.

Together with the triedrc=resel change in previous PR this allows for tuning of how many:
a) reopen requests due to errors, and
b) reopen requests to get an additional source
are allowed at each point in a redirector hierarchy and avoids constant pinging of redirectors for the same file.

This is really important for XCache (where additional open requests potentially introduce unwanted file replicas in the cache) and might also be relevant for EOS and other installations where data is served from a single set of disks with several servers and opening of additional requests only burdens the storage system.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @osschar (Matevž Tadel) for CMSSW_12_1_DEVEL_X.

It involves the following packages:

  • Utilities/XrdAdaptor (core)

@makortel, @smuzaffar, @cmsbuild, @Dr15Jones can you please review it and eventually sign? Thanks.
@wddgit this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor

code-checks

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 1, 2021

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35498/25687

ERROR: Build errors found during clang-tidy run.

                   IOSize
--
Utilities/XrdAdaptor/src/XrdRequestManager.h:47:25: error: no type named 'IOPosBuffer' in namespace 'edm::storage'; did you mean simply 'IOPosBuffer'? [clang-diagnostic-error]
    using IOPosBuffer = edm::storage::IOPosBuffer;
                        ^~~~~~~~~~~~~~~~~~~~~~~~~
                        IOPosBuffer
--
Utilities/XrdAdaptor/src/XrdRequestManager.h:48:22: error: no type named 'IOOffset' in namespace 'edm::storage'; did you mean simply 'IOOffset'? [clang-diagnostic-error]
    using IOOffset = edm::storage::IOOffset;
                     ^~~~~~~~~~~~~~~~~~~~~~
                     IOOffset
--
Utilities/XrdAdaptor/src/XrdStatistics.h:112:66: error: no type named 'IOSize' in namespace 'edm::storage'; did you mean simply 'IOSize'? [clang-diagnostic-error]
    XrdReadStatistics(std::shared_ptr<XrdSiteStatistics> parent, edm::storage::IOSize size, size_t count);
                                                                 ^~~~~~~~~~~~~~~~~~~~
                                                                 IOSize
--
Utilities/XrdAdaptor/src/XrdStatistics.h:119:5: error: no type named 'IOSize' in namespace 'edm::storage'; did you mean simply 'IOSize'? [clang-diagnostic-error]
    edm::storage::IOSize m_count;
    ^~~~~~~~~~~~~~~~~~~~
    IOSize
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:129: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@smuzaffar
Copy link
Contributor

lets wait for newer IB with #35435

@qliphy qliphy modified the milestone: CMSSW_12_1_X Oct 2, 2021
@qliphy
Copy link
Contributor

qliphy commented Oct 2, 2021

(There is some issue with CMSSWAgendaMaker causing by the fact that this PR enters 12_1_0 milestone there: https://api.github.com/repos/cms-sw/cmssw/pulls?state=open&milestone=86&per_page=100&page=1 (No.11 there) That is why I did some test above. Please ignore. @smuzaffar By chance do you know why it happens? Adding also @silviodonato @davidlange6 @perrotta

@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

may be CMSSWAgendaMaker does not work properly when there is no milestone assigned to it? Note that this is for DEVEL branch and there is no milestone for DEVEL . should I update bot to assign default CMSSW_N_M_X milestone for CMSSW_N_M_*_X branches too?

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 2, 2021

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e872b6/19333/summary.html
COMMIT: 5aa9ad6
CMSSW: CMSSW_12_1_DEVEL_X_2021-10-01-2300/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/35498/19333/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 40
  • DQMHistoTests: Total histograms compared: 3219394
  • DQMHistoTests: Total failures: 11
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3219360
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 39 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 169 log files, 37 edm output root files, 40 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

Code-checks

@smuzaffar
Copy link
Contributor

code-checks

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2021

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35498/25721

  • This PR adds an extra 24KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2021

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-35498/25723

  • This PR adds an extra 24KB to repository

@smuzaffar
Copy link
Contributor

+core
this is for DEVEL IBs

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 3, 2021

This pull request is fully signed and it will be integrated in one of the next CMSSW_12_1_DEVEL_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_12_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

perrotta commented Oct 3, 2021

+1

  • For DEVEL IBs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants