Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slurm job id to the logs and ideally the job db #6914

Closed
robnagler opened this issue Mar 19, 2024 · 0 comments
Closed

Add slurm job id to the logs and ideally the job db #6914

robnagler opened this issue Mar 19, 2024 · 0 comments

Comments

@robnagler
Copy link
Member

No description provided.

robnagler added a commit that referenced this issue Oct 30, 2024
- does not cancel the sbatch job when terminating
- job_agent _SBATCH_ID_FILE write
- job_supervisor concept of verify_status but doesn't change semantics yet
- pkcli.elegant-schema better approach to updating schema
- const.DEV_SRC_RADIASOFT_DIR
robnagler added a commit that referenced this issue Nov 13, 2024
* Partial #6914 job_agent separated terminating vs destroying
- does not cancel the sbatch job when terminating
- job_agent _SBATCH_ID_FILE write
- job_supervisor concept of verify_status but doesn't change semantics yet
- pkcli.elegant-schema better approach to updating schema
- const.DEV_SRC_RADIASOFT_DIR
- _must_verify_status needs to be global
- _ComputeJob.is_destroyed unused
robnagler added a commit that referenced this issue Dec 20, 2024
- Fix #7308 ui_websocket default is True and removed False case from test.sh
- job_supervisor run returns immediately and is not a task
- job_supervisor run_status_op pends until run or status watcher complete
- run_status_update is new op that is sent asynchronously from agent to supervisor
- job_agent separate out logic for run/state; reconnects to sbatch job
- job_cmd restructured and more error handling
- job_cmd centralized dispatch in _process_msg
- job_cmd._do_compute more robust and supports separate run/status
- job documents more ops and statuses
- Added max_procs=4 to test.sh to parallelize tests
- Fixed global state checks (mpiexec) to allow parallel test execution
- Increased timeouts to allow for delays during parallel test execution
- Improve arg validation in simulation_db.json_filename
- sbatchLoginService commented out invalid state transitions
- SIREPO.srlog includes time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant