-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build failure masked as a RUN_ERROR #713
Comments
Yeah that's annoying I'll look into it. |
Hey Dan, I think the cause is the lack of atomic file creation on Chicoma. So presumably the only issue on Pavilion's part is that the error was misreported. |
This comes from Line 427 in the build method in if not self._build(self.path, cancel_event, test_id, tracker):` In the _build method from line 516 in builder.py: try:
self._setup_build_dir(build_dir, tracker)
except TestBuilderError as err:
tracker.error(
note=("Error setting up build directory '{}': {}"
.format(build_dir, err)))
return False This fails, returns False, and writes error messages to status from error messages at |
That it doesn't trigger a cancel is almost certainly a bug. |
Right. That's because Cray Shasta systems are the only systems where |
Ok, actually. Looking closer at it. I think it's fixed already. See the passage below from builder.py:TestBuilder.build. with lockfile.LockFilePoker(lock):
# Attempt to perform the actual build, this shouldn't
# raise an exception unless something goes terribly
# wrong.
# This will also set the test status for
# non-catastrophic cases.
if not self._build(self.path, cancel_event, test_id, tracker):
try:
self.path.rename(self.fail_path)
except FileNotFoundError as err:
tracker.error(
"Failed to move build {} from {} to "
"failure path {}"
.format(self.name, self.path,
self.fail_path), err)
try:
self.fail_path.mkdir()
except OSError as err2:
tracker.error(
"Could not create fail directory for "
"build {} at {}"
.format(self.name, self.fail_path, err2))
if cancel_event is not None:
cancel_event.set()
return False if self._build returns False. Which it does in the original case (where the status file shows 'Error setting up build directory'), and the cancel_event is not None (it's a threading.Event type), then cancel_event should get set. Perhaps the version you were using had something missing there, but it should work as far as I can tell. If you can recreate it with the current master, let me know and I'll poke at it. |
tgoetsch@ch-fe1:/usr/projects/hpctools/tgoetsch/repos/pav2-lanl2-> cat /usr/projects/hpctest/pavilion/2.4/working_dir/test_runs/914/job/info
{"id": "7647116", "sys_name": "chicoma"}tgoetsch@ch-fe1:/usr/projects/hpctools/tgoetsch/repos/pav2-lanl2-> cat /usr/projects/hpctest/pavilion/2.4/working_dir/test_runs/914/status
1698862272.972506 STATUS_CREATED Created status file.
1698862272.984902 CREATED Test directory and status file created.
1698862272.990294 BUILD_CREATED Builder created.
1698862272.995562 CREATED Test directory setup complete.
1698862278.854854 BUILD_WAIT Waiting on lock for build 4804c9b55cc8e944.
1698862278.859590 BUILDING Starting build 4804c9b55cc8e944.
1698862278.930399 BUILDING Extracting tarfile /usr/projects/hpctest/test_src/ior.tgz for build /usr/projects/hpctest/pavilion/2.4/working_dir/builds/4804c9b55cc8e944
1698862279.017952 BUILD_ERROR Error setting up build directory '/usr/projects/hpctest/pavilion/2.4/working_dir/builds/4804c9b55cc8e944': Error extracting file '/usr/projects/hpctest/test_src/ior.tgz'\n Could not extract tarfile '/usr/projects/hpctest/test_src/ior.tgz' into '/usr/projects/hpctest/pavilion/2.4/working_dir/builds/4804c9b55cc8e944': [Errno 2] No such file or directory: '/usr/projects/hpctest/pavilion/2.4/working_dir/builds/4804c9b55cc8e944/./doc/sphinx/userDoc/tutorial.rst'
1698862369.929424 SCHEDULED Test kicked off (individually) under slurm scheduler with 500 nodes.
1698862386.513605 PREPPING_RUN Converting run template into run script.
1698862386.514956 RUNNING Starting the run script.
1698862386.518336 RUN_ERROR Unknown error while running test. Refer to the kickoff log.
tgoetsch@ch-fe1:/usr/projects/hpctools/tgoetsch/repos/pav2-lanl2-> cat /usr/projects/hpctest/pavilion/2.4/working_dir/test_runs/914/job/kickoff.log
The text was updated successfully, but these errors were encountered: