Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobAccountant workaround for StepChain jobs with duplicate files - wmagent 1.5.7 branch #10971

Closed
wants to merge 9 commits into from
36 changes: 20 additions & 16 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
===========================
How to contribute to WMCore
===========================
Thank you for participating to WMCore!
Thank you for participating in WMCore!

* Please ensure that an `issue <https://github.com/dmwm/WMCore/issues/new/choose>`_ exists before submitting your contribution as a pull request.
* There are two templates available to create a new issue, select the one matching your issue type.
* Please ensure that a GitHub `issue <https://github.com/dmwm/WMCore/issues/new/choose>`_ exists before submitting your contribution through a pull request.
* There are two templates available to create a new Github issue, select the one matching your issue type.
* Pull request will only be merged if there is an associated issue (different solutions/implementations can be discussed on the issue).
* And at least one approval through the GitHub review process.

A contribution can be either a **patch** or a **feature**:
* **patch**: includes a bugfixes or an outstanding enhancement; besides going to the **master** branch, we also backport the same contribution to the latest **wmagent** branch.
* **feature**: includes major developments or potentially disruptive changes and are included in feature releases made multiple times a year.
* **patch**: includes a bug-fixes or an outstanding enhancement; besides going to the **master** branch, we also backport the same contribution to the latest **wmagent** branch.
* **feature**: includes major developments or potentially disruptive changes and are included in feature releases following a monthly cycle.

From the contribution types, we can also define at least two different branches:
* **master**: it includes both features and patches contributions and it only reaches production when there is a CMSWEB/WMAgent upgrade.
Expand All @@ -34,8 +35,8 @@ Setting up the development environment
--------------------------------------

There is no real recipe here, people use different operating systems and different IDE (Integrated Development Environment).
However, please make sure you implement using the python 2.7 interpreter (having a virtualenv to switch between python2 and python3 is practically a must for the near future).
A non-exhaustive list of libraries which WMCore depend on can be found on the `requirements <https://github.com/dmwm/WMCore/blob/master/requirements.txt>`_ file.
However, please make sure your dev environment defaults to python 3.8 interpreter.
A non-exhaustive list of libraries which WMCore depends on can be found on the `requirements_py3 <https://github.com/dmwm/WMCore/blob/master/requirements_py3.txt>`_ file.
Last but not least, please also have a look at the `Coding Style and checks` section below

Setting up the testing environment
Expand All @@ -47,7 +48,7 @@ You can find an extensive documentation on this `wiki_page <https://github.com/d
Contributing
------------

**Step 1**: Make sure there is already an `issue <https://github.com/dmwm/WMCore/issues/new/choose>`_ created, if not, then create one following one of the templates and providing all the necessary information.
**Step 1**: Make sure there is already an `issue <https://github.com/dmwm/WMCore/issues/new/choose>`_ created, if not, then create one according to the templates and providing all the necessary information. Note that there is text in the templates that you must replace by the description you are going to provide.

**Step 2**: Create a local branch to start working on a proposal for that issue, branching off the "master" branch::

Expand All @@ -68,23 +69,26 @@ Contributing
**Step 6**: repeat the Step 4 to add and create a new commit. We **highly recommend** a separate commit for test-related changes like unit tests, emulation, json data,templates and so on.
In addition to unit tests, we ask you that any code refactoring **not changing any logical blocks**, as pylint, pep8 convention, fixing typos, etc; to be added to the same test commit.

**Step 7**: At this point you should have 2 commits in your branch: where the 1st commit contains real changes the proposed fix and; the 2nd commit contains aesthetic and unit tests changes.
Check the commits you have on your branch and then push your them to your forked repository::
**Step 7**: At this point you should have 2 commits in your branch: where the 1st commit contains the real logic for your feature and/or bug-fix; and the 2nd commit contains aesthetic and unit tests changes.
Check the commits you have on your branch and then push them to your forked repository (amend commit messages if needed)::

git log -10 --pretty=oneline --decorate
git push origin your-branch-name

**Step 8**: then create a pull request either from your fork, or from the official github repository. There is a pull request template that you need to edit/update before confirming the pull request creation.
If you're proposing a **patch** that needs to be backported to a specific branch, please make sure to mention it in your pull request, such that the project responsible can properly label it.
If you're proposing a **patch** that needs to be backported to a specific branch, please make sure to mention it in your pull request, such that the project responsible can properly label it. If your pull request requires further effort, please use the labels "Do not merge yet" and "Work in progress".
The pull request title has to be meaningful as well, even though it's not used for the release notes. You might want to describe your changes and the reason behind that, it's quite helpful when we need to check a module's history.

**Step 9**: watch the pull request for comments and; if your pull request is ready to be reviewed, use the `Reviewers` option to ask a specific person(s) to review it.
If further changes are required to your pull request, please make sure to squash your commits in order to keep a clean commit history (remember, if you need to update both src/ and test/ files, then you need to squash them into the correct commits).
**Step 9**: once your pull request is ready to be reviewed, use the `Reviewers` option to ask a specific person(s) to review it. Watch your pull request for comments and feedback.
If further changes are required to your pull request, you might want to provide them in a separate commit, making it easier to review only the latest difference.

**Step 10**: when your pull request gets approved by at least one reviewer, you must squash your commits accordingly in order to keep a clean commit history (remember, if you need to update both src/ and test/ files, then you need to squash them into the correct commits).
If you need further instructions for squashing your commits, please check `this <https://steveklabnik.com/writing/how-to-squash-commits-in-a-github-pull-request>`_ quick and simple document.

Automatic Tests
----------------

Every pull request - and further updates to that - trigger an automatic evaluation of your changes through our DMWM Jenkins infrastructure (only pull requests made against the **master** branch) and results are expected to come back within 30min.
Every pull request - and further updates made to it - trigger an automatic evaluation of your changes through our DMWM Jenkins infrastructure (only pull requests made against the **master** branch) and results are expected to come back within 20min.
This infrastructure is thoroughly described in this `wiki_section <https://github.com/dmwm/WMCore/wiki/Understanding-Jenkins>`_. However, in short there are 4 types of checks done by jenkins:

1. **unit tests**: all the WMCore unit tests are executed on top of your changes and compared against a master/HEAD baseline (which gets created twice a day). Besides unstable unit tests, your pull request will only be accepted once **all** unit tests succeed.
Expand All @@ -96,7 +100,7 @@ This infrastructure is thoroughly described in this `wiki_section <https://githu

3. **pycodestyle**: it corresponds to the pep8 checks and it should usually not report anything, these issues can be easily fixed by an IDE.

4. **python3 compatibility**: runs the futurize check to make pre-python 2.7 idions aren't reinserted in the code. We're currently using python 2.7 and trying to be as compatible as possible with python 3.
4. **python3 compatibility**: runs the futurize check to ensure that pre-python 2.7 idioms aren't reinserted in the code.

Human Review
------------
Expand All @@ -115,5 +119,5 @@ Extra documentation
-------------------

In case you're having issues with git and working through a branch feature, you might want to have a look at this old'ish `wiki <https://github.com/dmwm/WMCore/wiki/Developing-against-WMCore>`_ in our WMCore wiki documentation.
In addition to that, we've also compiled a long list of important git commands `here <https://github.com/dmwm/WMCore/wiki/git-commands>`_. If none of those work for you, google and stackoverflow will be your best friend.
In addition to that, we've also compiled a long list of important git `commands <https://github.com/dmwm/WMCore/wiki/git-commands>`_. If none of those work for you, google and stackoverflow will be your best friend.

6 changes: 2 additions & 4 deletions deploy/deploy-wmagent.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,10 @@
### Usage: -r <repository> Comp repository to look for the RPMs (defaults to comp=comp)
### Usage: -p <patches> List of PR numbers in double quotes and space separated (e.g., "5906 5934 5922")
### Usage: -n <agent_number> Agent number to be set when more than 1 agent connected to the same team (defaults to 0)
### Usage: -2|--py2 Uses the python2 stack WMAgent package (soon to be deprecated)
### Usage:
### Usage: deploy-wmagent.sh -w <wma_version> -d <deployment_tag> -t <team_name> [-s <scram_arch>] [-r <repository>] [-n <agent_number>]
### Usage: Example: sh deploy-wmagent.sh -w 1.5.4.patch3 -d HG2110b -t production -n 30
### Usage: Example: sh deploy-wmagent.sh -w 1.5.4.patch3 -d HG2110b -t testbed-vocms001 -p "10853" -r comp=comp.amaltaro --py2
### Usage: Example: sh deploy-wmagent.sh -w 1.5.7.patch1 -d HG2201e -t production -n 30
### Usage: Example: sh deploy-wmagent.sh -w 1.5.7.patch1 -d HG2201e -t testbed-vocms001 -p "10853" -r comp=comp.amaltaro
### Usage:

IAM=`whoami`
Expand Down Expand Up @@ -189,7 +188,6 @@ for arg; do
-r) REPO=$2; shift; shift ;;
-p) PATCHES=$2; shift; shift ;;
-n) AG_NUM=$2; shift; shift ;;
-2|--py2) RPM_NAME=wmagent; shift;;
-*) usage ;;
esac
done
Expand Down
2 changes: 1 addition & 1 deletion doc/wmcore/WMCore_Jenkins.drawio
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<mxfile host="Electron" modified="2021-06-01T06:40:12.509Z" agent="5.0 (Macintosh; Intel Mac OS X 11_4_0) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/14.6.13 Chrome/89.0.4389.128 Electron/12.0.7 Safari/537.36" etag="ZGyN96sX83I5_nhPPdRd" version="14.6.13" type="device"><diagram id="C5RBs43oDa-KdzZeNtuy" name="Page-1">7Vttd6I4FP41frQHCKB+rLXzstvO6c501vZjhAhskTgQWp1fv4mE94DUBXGrnXPmkEvIy32e3NzcGwfgZrX57MO1fY9N5A4UydwMwGygKBNVp/8zwTYS6PIkEli+Y0YiORX8cH4jLpS4NHRMFOQqEoxd4qzzQgN7HjJITgZ9H7/lqy2xm+91DS1UEvwwoFuWzh2T2JF0rIxS+RfkWHbcs6zz+a1gXJnPJLChid8yInA7ADc+xiR6Wm1ukMt0F+tl/nU7d+9e9M9//BX8gj+nfz5++3sYNfbpPZ8kU/CRRw5uOlRmv7/Ovy0060l+hXffbefBHcogavsVuiFXGJ8s2cYatHwcrgdgymsin6CNCDq4iL+QGg5YTrRI2YfwChF/S7/jrQPeDCeexotvKYpA4jI7g2BSEXLmWEnLqXboA1fQO3DQBbrSXdrtdEEfLPYwu5/fD+f3N9hHw0doTWGAXMdDcT3abVK1pGffxqtFSAc9fbMdgn6socHevNFlSWU2WdHBzmT6mBBRooUl9sgnuHJcpqYvyH1FxDEgf8GXo0y1PIWuY3m0YFBMkM+aIT5+SRbFrmHapeNZtKSnpUdMu58NVdak47o32MX+bsDA1NDYVJOWMm/GygLoejLJLBHqaVgkVyU5ZCnPDlBmhyxix7gFcginoJTIwblwbdG5Dx9RQK5ZK0XQceiZyOQA7AG+CKnlwiDgPBCgmaNJHrrlcqkYhgg6U1/oml633BuiWYaOQyWLsAICrNSusCobvSxWD1twgSuz0kb5ldY/fGoVfDuz+9NzCIMvOGvwOFiJG9YbWFodWGypXfCqWWz941fr9GTQ0n+FzBHe4TAMdkBc0wqyut6kL2PfZwUD5oM08YqohknB/cnB4mHqXuUx5KKSw8Pwor6Re81frBzTZN0IqZWST+oS8LEAX1WAL+gK35EA3wIEyKQnKl7EPrGxhT3o3qbSgrLSOnd45zwy1P5BhGz5AoUhwXlMoz5ZR4dol44Wh76B9rtnBPoWqmtQFaPlIxcS5zU/utaxkOUGYHjmNTsWM2IzE+cYeU2ijUOeOBDs+Zk9X2m8NNtkXs22ccGjo39KK7Lic/Zd+tmuFH/XJmp1J669qCli1DKLShOsqVjWGFzewwN26PTSc6peOIoohbUaTZN/lT2hFxpS5YLxHxUaivRQaojyAW4z1dasQlA9YF3LdTPKxQ3oQ9ReSuNEo4cze3weVgY05KvWq5UpH1UPtjLyOVmZip38YmWE/Sjg+GZGLp82StSOw5lFr3+yHDHbUvb6gQ4mwOw/AArUsq8IxmqZcZOuAqAC/yR78l5vXcoA8FJSeV8HOY39E0Gq7/54Z1mod3/viFtqFZyoPuAVQFW1EqayIvL/tY52A6kBpOst+DCYHnSGq9ivY0unq3sxlfQjYtpkmV4ArUsu7MVTPuYalcsRzxaCMAGBngldFixRJBOyDBLLekaRGcmwkfESnEeIpoINuX1WhDe4GnUFORBBLvRkjuyUqIUIpSgtK1wcnUWwQOMc9n9RVQ1KldF3kcMm0k0rHpuY2qLw7ZF0U5+ZaKybNtL5Yt2IQp+RuWP2y7NESf2H70OCApKxi/m65W31w2b1DzK1+p6N99Sy+mWOFNhwcoeeTrNX9fCN8yf+/jPFSjnuKITvrMGL0Zr0jtakGVqndBztcbXJ47yx7B/AuLNW8ldJNPk58+ZEIsvcq9qbD4hdjP0JrJEY6uPElodFJhV99aax5WEhulXclytCy22Ff0H/+VNV1bMMHEpX0kjbQ8Nd6QH5Dp0/8+FOjZvjc+bme9Me2jjXyzGyHkBpgfb/fxM6udD0Hdk5tYfsXIU3Th270HMIOZtLgFWhkf487wbRxmNePJI7sjqg6UWOxlZH7nVzLF0K0A+0OoUrQUWidXPxKE6idGt0ag+V7ALyxfxUnCv7M0cNotYd31BSFZAzSFfSUb34DgxVr+5Ra4aqeHspyRl0fHvpKLYqpnm7ed4lgiT0WZJ34UPPsM8ip1uVmTuVa/dAdFOt8GPSWMBQPhz+aKMb7DJ5xPFCHAZscBQcy6dLGnusaURYCjDhRdRjfhQfhS4H5LwnhbNdmTz6qB3y0GL6g/LIrKS/yge3/wI=</diagram></mxfile>
<mxfile host="Electron" modified="2022-01-31T01:46:14.354Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/16.4.0 Chrome/96.0.4664.110 Electron/16.0.7 Safari/537.36" etag="SBNWknA6dAyJ5uM_irf9" version="16.4.0" type="device"><diagram id="C5RBs43oDa-KdzZeNtuy" name="Page-1">7Vrbdps4FP2aPJoFEtfHJE4vM0lXpk0n7aNsZGCCkQsisfP1IxlxFxgn0DozNWt5oYOQ4Oytc3S2fQYv19v3Mdr4N8TF4RlQ3e0ZnJ8B4Ogm++aGXWYwNSczeHHgZiatNHwJnrEwqsKaBi5Oah0pISENNnXjkkQRXtKaDcUxeap3W5GwPusGebhl+LJEYdt6H7jUz6w2sEr7Bxx4fj6zZor3W6O8s3iTxEcueaqY4NUZvIwJodnZenuJQ+673C/3H3f34fWD+f6Pv5If6OvFn3ef/p5lg7075pbiFWIc0RcPnYL588f7TwvD+6Y9ouvPfnAbzjSYjf2IwlQ4TLws3eUe9GKSbs7gxcCnEE/7iGOKtzKI0SIfufQiYx8ma0zjHeuX35U7XjDPEM2nEkYAhc2vQFh0RII6XjF06R52Ijwk91b0cHU+N27QzbNKn/x3qZ3a/gwMddZxHqg7tQenTleBtms0XVdsOJF3pMCbEu+YIZv2YsFOPH4yv7m/md3fXJIYz+6Qd4ESHAYRzvuxaYuuLc/GPlkvUvbQF09+QPGXDVryK08sajGbT9fsYecaOy3WqcoaKxLRd2gdhJxEH3D4iGmwROKCiFYa8+wFCgMvYo0lAwLHfBgak4ciZuwHZlMGkcdaZtm6I2z6+UznQwZheElCEu8fGLoGtl29GKlyxQYLaJrSJdXLuyah2nwQSwXK+CBZKvYIXJDytR1WBPTnHnvV2x2c3eGEnvOBmjCTNHKxK1x+AOomiF6IkkQgL8GvRow6WKvVCiyXMrBcc2EaU4JVIDNg9epTAWZ0AcbXKsfraxRQjlnyG7Fyc/MrEesNtxV8zB8p36HsPT9L9q4/Zx00fbMtL+ZRd40SHv2GxGPmK9oIvDUgIsICex01YWqFWu55FpXDc3FhHbgun0ZKppJu6hQQ51dtxdJh+dHbgKuWAmG7SxV8OBX49uGdCHbZPlg0SUx94pEIhVelteHJss812ec0Duk/mNKdWK8opaQO+HHOT0gaL/HhpEFR7OG+zZCIVfwFe6GMcYho8FgvBEaHQhuwK8SRe85rGU56HvCCZd2RzH/x7hsHQjHy5neBy74x39ZauxexvwpA3w7uIABwIACV1WJIdh+5bTBOYoZbErDXLZaqI/iQp1Otseiy1xY3VQukxjis1q0vaKs2LrAa42Zuao3LgEa7SrcN75C0aFa46RXMayftFvPy4q2ZPJ2VxVd+O3lCEzrQPYFyD0o2scCU0MiZqqDRtJZ/qwXMZsdqFwofTmZDZPBDhqm5/4jJqljvP1Ks+wk3PI3WMdWNdhrVJJBCY6JgrQ5AdLODvyHtgVQfgqligJ8Iq6ZLIuGrN8MJRZGLQr5pBaqLuIbAZcFsh6wufbx8SN70Vvm1jKhGZsNSgKOZ+ceSoA8VayoCQBkBuqW5n5fV9Jya3SImlCkzRaYb31WDJd9hfClc/3Jf5RlfUwytepgDfQVsRbcrhzVVmJHV3K9xHRzLdTMAFRNWD2uY7zTHVByzckzmO6szRPNgGHkyjfj284zihFaCbL1vO0//L0TifnIOzuS2pVi65kBDt6FtqY5Zi1QDGKUZUClutthYdptgJlRUWD0m4hdo86vBpJPbr79EwBwJegcqLBfZqqmzKKsBp4a8pgJFB9WjjbwOlMrtqkT47Irm4yPvDEP+lPb1vxB6zbGUHNHiFwhHkaxuyDGWAwum2py0C7QWZgfFtG1AK1oaa32vXCmVNN7IhbTxBLhelA5KoHmWPCjB5eHuRDS4WX8u6Yoox0p0M0ZJFmwqoas2zYFc0yHYjSXHQTACd98cD53fPByJh93CcYdebTZKS0PRQPUwGizPJhhXgu7Nvfx3Y5Z+0yig9M3/dDxW+q2rZ6eTesGA4nZw6q0k3jINy1PvVKHrRCISdPj/AooKyclr/rzKMkBj1b4sIpmqAnt29IxlPUX+SAGp/xl4eKoVBNNHp5zB48rCK4xoGnNNeBGjaOn/9yTgTknvWCnhSK2gS3IYv8yQ/XLa+K9ibuCceDlZskSYCXo0iFKSJvzhGJRezMIKifjQmHLJsGBRNmP9Kd4kuY6S1ru1ikbGbCvopiX7DfF4AZ01y79zZ0Go/E88vPoX</diagram></mxfile>
Binary file modified doc/wmcore/WMCore_Jenkins.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/python/WMComponent/DBS3Buffer/DBSUploadPoller.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def uploadWorker(workInput, results, dbsUrl):
dbsApi.insertBulkBlock(blockDump=block)
results.put({'name': name, 'success': "uploaded"})
except Exception as ex:
exString = str(ex)
exString = str(getattr(ex, "body", ex))
if 'Block %s already exists' % name in exString:
# Then this is probably a duplicate
# Ignore this for now
Expand Down
9 changes: 9 additions & 0 deletions src/python/WMComponent/JobAccountant/AccountantWorker.py
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,8 @@ def handleJob(self, jobID, fwkJobReport):
conn=self.getDBConn(),
transaction=self.existingTransaction())

# FIXME: temporary workaround for: https://github.com/dmwm/WMCore/issues/9633
skipOutputFiles = False
if jobSuccess:
fileList = fwkJobReport.getAllFiles()

Expand Down Expand Up @@ -504,6 +506,7 @@ def handleJob(self, jobID, fwkJobReport):
if not fwjrFile.get("locations") and fwjrFile.get("lfn", "").endswith(".root"):
logging.warning("The following file doesn't have any location: %s", fwjrFile)
jobSuccess = False
skipOutputFiles = True
break
else:
fileList = fwkJobReport.getAllFilesFromStep(step='logArch1')
Expand Down Expand Up @@ -548,6 +551,12 @@ def handleJob(self, jobID, fwkJobReport):
else:
wmbsJob["outcome"] = "failure"

# FIXME: BAD HACK to avoid crashing the component
if skipOutputFiles:
logging.warning("Skipping output file registration for failed job: %d", jobID)
self.listOfJobsToFail.append(wmbsJob)
return jobSuccess

for fwjrFile in fileList:

logging.debug("Job %d , register output %s", jobID, fwjrFile["lfn"])
Expand Down
6 changes: 3 additions & 3 deletions src/python/WMCore/WMExceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,9 +192,9 @@
71104: "JobSubmitter component could not find a job pickle object.",
71105: "JobSubmitter component loaded an empty job pickle object.",
71300: "The job was killed by the WMAgent, reason is unknown (WMAgent).",
71301: "The job was killed by the WMAgent because the site it was running at was set to Aborted (WMAgent).",
71302: "The job was killed by the WMAgent because the site it was running at was set to Draining (WMAgent).",
71303: "The job was killed by the WMAgent because the site it was running at was set to Down (WMAgent).",
71301: "The job was killed by WMAgent because the site it was supposed to run at was set to Aborted (WMAgent).",
71302: "The job was killed by WMAgent because the site it was supposed to run at was set to Draining (WMAgent).",
71303: "The job was killed by WMAgent because the site it was supposed to run at was set to Down (WMAgent).",
71304: "The job was killed by the WMAgent for using too much wallclock time (WMAgent) Job status was Running.",
71305: "The job was killed by the WMAgent for using too much wallclock time (WMAgent) Job status was Pending.",
71306: "The job was killed by the WMAgent for using too much wallclock time (WMAgent) Job status was Error.",
Expand Down