-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New function to get dataset locations from MSPileup (addressing #11620) #11879
Conversation
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andrea, this is the right location where we need to update pileup data location in (global) workqueue, however I made some comments that need to be followed up. Thanks
#datasets = [d for prec in pileupDatasets.values() for d in prec] | ||
#self.pileupData = self.getDatasetLocations(datasets) | ||
### flattening the pileupDatasets to have (dbsurl, dataset) pairs to loop over in the getDatasetLocationsFromMSPileup | ||
datasetsWithDbsURL = [(dbsUrl, dataset) for dbsUrl, datasets in pileupDatasets.items() for dataset in datasets] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I look into this, I would even say that we don't have to reorganize the data that comes out of self.wmspec.listPileupDatasets()
. That structure would be in a format like:
{"cmsweb production DBSReader": [pileupA, pileupB, pileupC],
"cmsweb-testbed DBSReader": [pileupD]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about this for a while, maybe for now it's better to keep it as it is. The alternative would be to have a for loop over pileupDatasets
, and send to getDatasetLocationsFromMSPileup
the key as the dbsUrl
and the list [pileupA, pileupB, pileupC]
. This would imply updating self.pileupData
at each iteration of the outer loop. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing your reply. Yes, I would be in favor of constructing a code like:
for dbsUrl, datasets in pileupDatasets.items():
# note that each workflow will be either configured to production or to testbed url, hence single iteration
self.pileupData = self.getDatasetLocationsFromMSPileup(dbsUrl, datasets)
this way we:
a) don't need to refactor the data structure above (named as pileupDatasets
)
b) we don't need to map the mspileup url for each pileup dataset
c) we don't need to set the mspileup url for each pileup dataset, as in general the majority (or all) will share the same url
if isinstance(datasetsWithDbsURL, str): | ||
datasetsWithDbsURL = [datasetsWithDbsURL] | ||
result = {} | ||
for dataset, dbsUrl in zip(datasetsWithDbsURL): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this line gives me a ValueError exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to commit a new version, were L292 and L293 are deleted, as well as the zip
in L295
src/python/WMCore/WorkQueue/Policy/Start/StartPolicyInterface.py
Outdated
Show resolved
Hide resolved
Jenkins results:
|
From the second test of Jenkins test, I get this traceback: Traceback (most recent call last): 403 Forbidden\nYou are not authorized to access this resource. \n \n\n \n Powered by CherryPy 18.8.0\n \n \n \n\n'
for WMCore_t.WorkQueue_t.Policy_t.Start_t.Block_t.BlockTestCase.testPileupData. Is this relevant? |
Dennis and I discussed the same issue yesterday. It turns out the query to MSPileup is incorrect. Before you make any further changes in this PR, I think it would be better to converge on Dennis' developments in #11870 |
Yes @anpicci . Please let me know if you need any advice on that. If you prefer, you can also create a new development branch/PR. |
Jenkins results:
|
@amaltaro from Jenkins logs, I read: In addition, I still get this error from the second test: 403 Forbidden\nYou are not authorized to access this resource. \n \n\n \n Powered by CherryPy 18.8.0\n \n \n \n\n'
|
@anpicci I suspect you didn't rebase this PR properly. You can rebase your development branch with the same instructions provided in this link: https://steveklabnik.com/writing/how-to-squash-commits-in-a-github-pull-request |
Jenkins results:
|
Jenkins results:
|
@amaltaro thanks to your correction, the error coming from MSPileup in the default test disappeared. Should I worry about the other errors in the default test? |
I still see that this unit test is still failing:
While this one is likely unstable:
|
Giving it a second look, the first unit test is failing with:
This error is reported in the middle of those 100s of log lines in this unit test: https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-PR-test/14809/testReport/junit/WMCore_t.WorkQueue_t.WorkQueue_t/WorkQueueTest/testPileupOnProduction/ |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
With the last commit, the message in root: INFO: pileupDatasets: {'https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader': {'/HighPileUp/Run2011A-v1/RAW'}} to root: INFO: pileupDatasets: {'https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader': {'/HighPileUp/Run2011A-v1/RAW'}} according to the modification I have made. This poses another question to me: Why do we get a 2011 file from here that is not present in the document returned by MSPileup, neither in prod nor in testbed? Is ignoring the pileup datasets from here that are not found in the MSPileup document correct? In any case, the only error reported now as |
I see. In general, data location is resolved through Rucio, so far. And in many cases we are actually mocking calls to Rucio such that unit tests can reliable define the location for some of the DIDs that we use in the unit tests without actually making an HTTP request to a Rucio server. I don't have an easy answer on how to mock that single MSPileup HTTP request. So we might have to change the pileup being used in that workqueue unit test to one of those that we actually have in MSPileup (we need to check if the workflow is using DBS - Rucio - testbed or production?). This might cause other issues with (un)mocked DBS data, but we will have to test that. |
I see, is this a thing I can help with? |
Yes, we have to adapt/fix that unit test according to the changes provided in here. If you are running unit tests over the docker image, you could adventure yourself on that code. Otherwise I will try to pull your branch out and get that fixed, later in my day. |
result[dataset] = doc['currentRSEs'] | ||
except IndexError: | ||
self.logger.warning('Did not find any pileup document for query: %s', queryDict['query']) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really glad I had a careful look at these unit tests, otherwise a nasty bug would sneak through.
Please add
result[dataset] = [] # equal to an empty currentRSEs
before the continue
statement. Otherwise, workqueue elements would be completely missing the pileup information. This fixes one of the unit tests.
@anpicci Andrea, 2nd unit test can be fixed with the changes I provided in this file: It is a sub-optimal fix though, given that it will only work inside this unit test module. Ideally we should have a way to mock the module/function in https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/MSPileup/MSPileupUtils.py, but AFAIU, we would have to convert it to a class to have it properly emulated under: https://github.com/dmwm/WMCore/tree/master/src/python/WMQuality/Emulators Please request another review once you have applied these modifications (and try to keep src/* changes separated from test/*). |
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anpicci I left further comments for your consideration.
I am also in favor of moving forward with this PR, even if that means leaving a broken unit test behind:
WMCore_t.WorkQueue_t.WorkQueue_t.WorkQueueTest:testProcessingWithPileup changed from success to failure
For that, I think we could pull in @d-ylee to look at mocking MSPileup interaction.
@@ -12,6 +12,8 @@ | |||
from WMCore.WorkQueue.DataStructs.WorkQueueElement import WorkQueueElement | |||
from WMCore.DataStructs.LumiList import LumiList | |||
from WMCore.WorkQueue.WorkQueueExceptions import WorkQueueWMSpecError, WorkQueueNoWorkError | |||
#from WMCore.MicroService.Tools.Common import getPileupDocs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to delete this line.
@@ -159,10 +161,15 @@ def __call__(self, wmspec, task, data=None, mask=None, team=None, continuous=Fal | |||
self.validate() | |||
try: | |||
pileupDatasets = self.wmspec.listPileupDatasets() | |||
self.logger.info(f'pileupDatasets: {pileupDatasets}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we already have some logging a few lines below, I would suggest to either remove or to set it as debug level.
#datasets = [d for prec in pileupDatasets.values() for d in prec] | ||
#self.pileupData = self.getDatasetLocations(datasets) | ||
### flattening the pileupDatasets to have (dbsurl, dataset) pairs to loop over in the getDatasetLocationsFromMSPileup | ||
datasetsWithDbsURL = [(dbsUrl, dataset) for dbsUrl, datasets in pileupDatasets.items() for dataset in datasets] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing your reply. Yes, I would be in favor of constructing a code like:
for dbsUrl, datasets in pileupDatasets.items():
# note that each workflow will be either configured to production or to testbed url, hence single iteration
self.pileupData = self.getDatasetLocationsFromMSPileup(dbsUrl, datasets)
this way we:
a) don't need to refactor the data structure above (named as pileupDatasets
)
b) we don't need to map the mspileup url for each pileup dataset
c) we don't need to set the mspileup url for each pileup dataset, as in general the majority (or all) will share the same url
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anpicci Andrea, please see my comments along the code. Once you provide those changes, feel free to have them already squashed with the first commit. Thanks!
datasets = [d for prec in pileupDatasets.values() for d in prec] | ||
self.pileupData = self.getDatasetLocations(datasets) | ||
#datasets = [d for prec in pileupDatasets.values() for d in prec] | ||
#self.pileupData = self.getDatasetLocations(datasets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to completely delete these 2 lines.
Returns a dictionary with the location of the datasets according to MSPileup | ||
The definition of "location" here is a union of all sites holding at least | ||
part of the dataset (defined by the DATASET grouping). | ||
:param datasets: list of datasets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the list of parameters in the docstring.
I think the definition of "location" is now outdated as well, so you might want to delete it and perhaps rephrase the first sentence to something like "Returns a dictionary with the current location of the datasets according to MSPileup".
self.logger.debug(f'doc: {doc}') | ||
if len(currentRSEs) == 0: | ||
self.logger.warning(f'No RSE has a copy of the desired pileup dataset. Expected RSEs: {doc["expectedRSEs"]}') | ||
self.logger.info(f'locationsFromPileup - name: {dataset}, currentRSEs: {doc["currentRSEs"]}, containerFraction: {doc["containerFraction"]}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe delete this logging line in favor of the one above that I will leave a comment.
self.logger.debug(f'msPileupUrl: {msPileupUrl}') | ||
doc = getPileupDocs(msPileupUrl, queryDict, method='POST')[0] | ||
currentRSEs = doc['currentRSEs'] | ||
self.logger.debug(f'doc: {doc}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest setting this one to info
level and print something like Retrieved MSPileup document: {doc}
except IndexError: | ||
self.logger.warning('Did not find any pileup document for query: %s', queryDict['query']) | ||
result[dataset] = [] | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue
statement can actually be removed here, given that it will anyhow move to the next iteration.
'filters': ['expectedRSEs', 'currentRSEs', 'pileupName', 'containerFraction', 'ruleIds']} | ||
pileUpinstance = '-testbed' if 'cmsweb-testbed' in dbsUrl else '-prod' | ||
msPileupUrl = f"https://cmsweb{pileUpinstance}.cern.ch/ms-pileup/data/pileup" | ||
self.logger.debug(f'dbsUrl: {dbsUrl}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps merge these 2 debugging records as well in a single one, e.g.: "Using DbsUrl: {dbsUrl} and MSPileup URL: {msPileupUrl}".
Also move it under the first for
loop, given that it will not change between datasets of the same url instance.
try: | ||
queryDict = {'query': {'pileupName': dataset}, | ||
'filters': ['expectedRSEs', 'currentRSEs', 'pileupName', 'containerFraction', 'ruleIds']} | ||
pileUpinstance = '-testbed' if 'cmsweb-testbed' in dbsUrl else '-prod' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 lines with the url logic don't need to be executed for each dataset. So they can move under the first for
loop instead of the inner one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this would imply replacing dataset
with datasets
in these lines. I am wondering if this further modification would work or not. My original idea relates to the implementation made by @d-ylee here of the MSPileupUtils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to implement something like:
for dbsUrl, datasets in datasetsWithDbsURL.items():
pileUpinstance = '-testbed' if 'cmsweb-testbed' in dbsUrl else '-prod'
msPileupUrl = f"https://cmsweb{pileUpinstance}.cern.ch/ms-pileup/data/pileup"
self.logger.info(f'Will fetch {len(datasets)} from MSPileup url: {dbsUrl}')
for dataset in datasets:
queryDict = {'query': {'pileupName': dataset},
'filters': ['expectedRSEs', 'currentRSEs', 'pileupName', 'containerFraction', 'ruleIds']}
try:
doc = getPileupDocs(msPileupUrl, queryDict, method='POST')[0]
....
does it make sense to you?
…ation in StartPolicyInterface dmwm#11620 Modifying the implementation of MSPileup in StartPolicyInterface dmwm#11620 Implementation of MSPileupUtils in StartPolicyInterface dmwm#11620 Fixing a bug in getDatasetLocationsFromMSPileup dmwm#11620 Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620 Fixing again bugs in getDatasetLocationsFromMSPileup dmwm#11620 Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620 Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620 Trying to debug Jenkins for getDatasetLocationsFromMSPileup dmwm#11620 Fixing an important bug in StartPolicyInterface dmwm#11620 Addressing Alan's comments for StartPolicyInterface dmwm#11620 Optimization of the new function in StartPolicyInterface dmwm#11620
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrea, it looks good to me.
Fix variable name in log record; complement to #11879
Fixes #11620
Status
In development
Description
A new method is introduced to get the pileup dataset location leveraging MSPileup
Is it backward compatible (if not, which system it affects?)
NO (It's a new feature for fetching information)
Related PRs
None
External dependencies / deployment changes
No