-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocks failing to be inserted with Error: concurrency error #98
Comments
@d-ylee Hi Dennis, this is still a problem for central production and we have to keep a service running only because these 2 blocks are not going through. Would you be able to check the server logs and help me understanding why these blocks fail to get injected? |
@amaltaro Are these from global prod? |
@d-ylee yes, against cmsweb-prod cluster. |
@amaltaro I checked the logs around that time. It shows that the concurrency error was happening in Line 1002 in 28e02bd
For For
Line 1018 in 28e02bd
Within the function, there are three locations where
For 2, I do not see It is hard to tell if it is 1 or 3 because in production, the configuration sets verbosity to 0. In this case, I do not see the messages printed corresponding to 1 or 3. For the 3rd location, it could also be occurring in the Line 268 in 28e02bd
This could be happening when the sql statement is being executed: Lines 292 to 314 in 28e02bd
Beyond this, it is hard to see why there was this error while looking at the production logs. @amaltaro Do you know the 13 and 7 files that could potentially have this issue? Side note: I think we might need to have some of the messages be taken out of verbosity statements, especially for the |
From my previous experience with debugging these kind of errors they come from inconsistent JSON configuration where for file/block injection there should be proper configuration in place and in concurrent environment it is not there. I reported already that bulkblock DBS API is not concurrently safe by design, i.e. it has racing condition, see my report in this ticket: dmwm/WMCore#11106 Without proper addressing DBS data injection procedure I still think we may experience this kind of errors from time to time. To resolve them, someone should inject FIRST into DBS all configuration for dataset/block/files and then start injecting blocks/files. |
This workflow is in rejected archived don't worry blow it away |
@d-ylee Dennis, I am going to shutdown the agent that tries to inject those 2 blocks into DBS, as the workflow has been rejected and the output data is no longer needed. Feel free to close this issue out at your convenience - and thank you for helping out with this debugging! |
Hello,
I found 2 central production blocks that are failing to be inserted into DBS since June, they are:
and they belong to the following workflow: cmsunified_task_TRK-Run3Winter23wmLHEGS-00005__v1_T_230615_091700_9237
I see each of them has the maximum number of files defined in WMAgent, 500 files, and the error message we get back in the client is:
these blocks are inserted with the following dbsApi call:
as can be seen in this source code: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py
I am also going to attach both block dumps (what WMAgent is posting to DBS Server) such that it makes debugging easier. Please find them here: https://amaltaro.web.cern.ch/amaltaro/forWMCore/dbs2go_98/
The text was updated successfully, but these errors were encountered: