Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only next job in queue has a sched.t_estimate #1015

Open
grondo opened this issue Mar 14, 2023 · 3 comments
Open

Only next job in queue has a sched.t_estimate #1015

grondo opened this issue Mar 14, 2023 · 3 comments

Comments

@grondo
Copy link
Contributor

grondo commented Mar 14, 2023

The Fluxion scheduler provides a t_estimate job annotation, which flux jobs displays by default in the generic INFO column for jobs in the SCHED state. This is very useful, but typically I have only seen the highest priority job get a t_estimate, e.g. on fluke:

# flux jobs -Af sched,running
       JOBID QUEUE    USER     NAME       ST NTASKS NNODES     TIME INFO
 foSn5bejtUP batch    testqd   ./buildan+  S     54     54     2.1h eta:1.635h
 foSn252y1c3 batch    testqb   ./buildan+  S     54     54     2.1h 
 foSn7LoAtas batch    testqe   ./buildan+  S     54     54     2.1h 
 foSn3pk1jv7 batch    testqc   ./buildan+  R     54     54   27.86m fluke[6-16,18-23,25-60,62]
# flux queue status -vvvv
batch: Job submission is enabled
batch: Scheduling is started
debug: Job submission is enabled
debug: Scheduling is started
0 alloc requests queued
3 alloc requests pending to scheduler
0 free requests pending to scheduler
1 running jobs

I have not looked into how Fluxion provides the t_estimate nor investigated if this can be reproduced in a standalone test, but I'm opening the issue as a reminder to look into it.

@alecbcs
Copy link

alecbcs commented Dec 1, 2023

Hi all! @grondo I think it'd definitely be helpful to get ETAs for all of the jobs in the queue and not just the top entry.

I was recently talking with @white238 and if we could get the ETA for all jobs and aggregate/average them, it'd help us determine when a cluster is under heavy utilization by long running jobs and alert someone of possible system abuse.

@grondo
Copy link
Contributor Author

grondo commented Dec 1, 2023

Thanks @alecbcs! @milroy, @trws, or @jameshcorbett - any idea what's going on with these estimates? I assume Fluxion should be able to provide estimates for all pending jobs up to its queue depth, but maybe I'm mistaken?

@trws
Copy link
Member

trws commented Dec 22, 2023

I don’t see why not. There are some trade offs in how many we try to provide vs turnaround but I’ll add this to my list to investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants