-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less intelligent mode for GPU allocation #11
Comments
Hi @kouyk. Let me clarify my understanding a bit.
So you have 2 GPUs and there might be some other people using them. That's why It is possible to determine free GPUs based on process (I have to check again whether it is possible based on process from a certain user though). Are you sure you want to do this given the risk of crashing not only your but also other people's jobs.
The idea of a configuration file is cool. Thanks for the suggestion.
GPU wait time is removed due to an internal change in the way GPUs are assigned to jobs in the next release. Specifically, now the server will be in charge of assigning GPUs to clients instead of letting clients choose as the current version. |
Yup I understand the risk, my view on this is that since ts already has the
Oh so there it won't be possible to choose a specific gpu anymore? |
IMO, when a user uses Anyway, we can have an option that tells ts to choose GPUs based on processes. Would you like to make a PR? The idea of retrying a process is cool btw! This is not difficult as far as I can tell now.
Yes you still can use |
I would love to make a PR, however, since I don't have a Nvidia card on my personal device it is rather challenging for me to test and debug the issues on a remote node which I don't have root privileges. After my current commitments are over I might look into contributing as well :) |
hi, I would like to know the current status of |
Hey @yanggthomas. Due to some internal changes, |
hi, could we please have a way for I understand you didn't like the "free memory" heuristic because of job init time. May I suggest a "set_max_jobs_per_cpu" kind of flag? If I know that my GPU can fit 3 of my jobs, then I can just set this and things will work, and I'll use my GPUs better. |
Let's say I have 2 GPUs that are shared with others, I would like to allocate a single job to a single GPU.
Using the
--gpus
option will require that a GPU is considered free, but setting the right free percentage might be tricky. The-g
flag will ignore the free requirement but consecutive jobs assigned to the same GPU will start as long as there are available slots. The high level view is that there will be a single slot for each GPU and jobs will run on a GPU as long as the current user does not have a process running on it.Essentially I want to be able to just specify the number of gpus needed by a job, and task-spooler will allocate the gpus based on whether there are any running jobs on the GPU, regardless of the memory usage. It is a hybrid mode between the automatic allocation and manual allocation.
What I am currently doing is to create two different ts servers that uses different
TMPDIR
and use the-g
flag to force a single GPU for jobs submitted to a given server, which isn't ideal and kinda defeats the purpose of ts.BTW could there be a configuration file that permanently sets the env vars? It would be great if things like the GPU wait could be set permanently as well.
The text was updated successfully, but these errors were encountered: