-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add slurm job id to the logs and ideally the job db #6914
Comments
robnagler
added a commit
that referenced
this issue
Oct 30, 2024
- does not cancel the sbatch job when terminating - job_agent _SBATCH_ID_FILE write - job_supervisor concept of verify_status but doesn't change semantics yet - pkcli.elegant-schema better approach to updating schema - const.DEV_SRC_RADIASOFT_DIR
robnagler
added a commit
that referenced
this issue
Nov 13, 2024
* Partial #6914 job_agent separated terminating vs destroying - does not cancel the sbatch job when terminating - job_agent _SBATCH_ID_FILE write - job_supervisor concept of verify_status but doesn't change semantics yet - pkcli.elegant-schema better approach to updating schema - const.DEV_SRC_RADIASOFT_DIR - _must_verify_status needs to be global - _ComputeJob.is_destroyed unused
This was referenced Nov 21, 2024
robnagler
added a commit
that referenced
this issue
Dec 20, 2024
- Fix #7308 ui_websocket default is True and removed False case from test.sh - job_supervisor run returns immediately and is not a task - job_supervisor run_status_op pends until run or status watcher complete - run_status_update is new op that is sent asynchronously from agent to supervisor - job_agent separate out logic for run/state; reconnects to sbatch job - job_cmd restructured and more error handling - job_cmd centralized dispatch in _process_msg - job_cmd._do_compute more robust and supports separate run/status - job documents more ops and statuses - Added max_procs=4 to test.sh to parallelize tests - Fixed global state checks (mpiexec) to allow parallel test execution - Increased timeouts to allow for delays during parallel test execution - Improve arg validation in simulation_db.json_filename - sbatchLoginService commented out invalid state transitions - SIREPO.srlog includes time
This was referenced Jan 9, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: