-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add job run-time limit flag to neuro run
#1325
Conversation
You can use the timeout command. |
@serhiy-storchaka sorry I didn't understand you: for what? |
I meant that the user can use the
If add such feature in |
I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update ./docs/jobs_reference.rts
at least as well.
@asvetlov, a few questions if you don't mind:
|
A free format is not good for usage in CLI parameters:
Please use
The philosophy of user-provided configuration files is that they are never modified by the application itself. Even a command for user config file changing is redundant -- every software engineer knows how to use a text editor. Please use the hard-coded default value if absent. P.S. I wonder why do you use |
Re-worked this PR considering the comments.
my bad, a typo-like mistake. |
New question: What should be the short option for |
neuromation/cli/job.py
Outdated
elif seconds < 0: | ||
raise click.UsageError(f"Job life span must be non-negative, got: '{result}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically, our input parser does not allow negative values, so they can come only from the config. I understand, it's better to check it during user config validation, but it's quicker to do it here, after parsing.
- Should I add more validation code to
_validate_user_config()
, similarly as it is with_validate_alias()
? - Why don't we use
trafaret
? Doesn't it have enough expressive power to validate aliases, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically, our input parser does not allow negative values, so they can come only from the config. I understand, it's better to check it during user config validation, but it's quicker to do it here, after parsing.
- Should I add more validation code to
_validate_user_config()
, similarly as it is with_validate_alias()
?
Please keep as is until we implement something like neuro config validate
command.
- Why don't we use
trafaret
? Doesn't it have enough expressive power to validate aliases, does it?
Trafaret provides machine-readable error messages which should be translated to human-readable texts before printing. It makes trafaret
virtually useless for our needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use extract_error
? It might be easier to use a package that was designed to write custom validators, rather than to implement our own from scratch.
If you are in doubt -- don't use short name, that's fine. |
neuromation/cli/job.py
Outdated
@@ -889,6 +927,13 @@ def format_fail(job: str, reason: Exception) -> str: | |||
if not wait_start: | |||
detach = True | |||
|
|||
job_life_span = await calc_life_span(root.client, life_span) | |||
if not root.quiet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rather prefer to see this printout in explicit verbose mode only. In both normal and quiet modes the message should be absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the information that your job will die automatically after certain (in some cases, implicit) amount of time a more important information rather than, say, Shortcuts? I think, if the job has a timeout, it should be printed in non-quiet mode. Other question is, perhaps, a better place for it is formatters/jobs.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of important information about the job. We need a balance between insufficient and too verbose outputs.
Moving reports about remaining time is a very cool idea.
I see the REST API inconsistency here though: the deadline timestamp is not returned in JobDescription
data structure. Seems like the server doesn't file this info, that's sad. In my mind, it should be an absolute UNIX timestamp; relative times are not reliable. The error is one less significant digit due to network latency, 1 minute in your case.
It raises another question: should the absolute timestamp be used as input in REST API? Passing a relative time in Python API is ok.
neuromation/cli/job.py
Outdated
elif seconds < 0: | ||
raise click.UsageError(f"Job life span must be non-negative, got: '{result}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically, our input parser does not allow negative values, so they can come only from the config. I understand, it's better to check it during user config validation, but it's quicker to do it here, after parsing.
- Should I add more validation code to
_validate_user_config()
, similarly as it is with_validate_alias()
?
Please keep as is until we implement something like neuro config validate
command.
- Why don't we use
trafaret
? Doesn't it have enough expressive power to validate aliases, does it?
Trafaret provides machine-readable error messages which should be translated to human-readable texts before printing. It makes trafaret
virtually useless for our needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document
docs/jobs_reference.rst
Outdated
@@ -134,6 +135,9 @@ Jobs | |||
cannot be scheduled because the lack of computation | |||
cluster resources (memory, CPU/GPU etc). | |||
|
|||
:param float life_span: job run-time limit in the format '1d2h3m4s' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The '1d2h3m4s'
is invalid float.
What is None
for? I assume it is for server default, please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
to disable. I'd actually leave the default API value for 1d
, not for None
as it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not enforce the default on server side?
It is more reliable.
I feel potential problems if we want to change 1day to another default eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long time ago, when I was implementing this feature on the backend, my very first version enforced default server-side as you suggest. However, after several review we agreed to pass default value by the client so that:
- no problems with DB migration, or if we decide not to write the default value to the database, then no implicit logic;
- much less code
- not over-complicated protocol (
0
meanslimit=0
) - easier to change the default value per-user: no need to put it to the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last neats.
neuromation/cli/job.py
Outdated
return seconds | ||
|
||
|
||
async def calc_default_life_span(client: Client) -> timedelta: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline the function in calc_life_span()
please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? it's harder to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is never used alone, it is always called from calc_life_span()
.
You can test calc_life_span(client, None)
instead of calc_default_life_span(client)
, should prodice the same result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the matter of taste. changed.
neuromation/cli/job.py
Outdated
seconds = delta.total_seconds() | ||
if seconds == 0: | ||
return None | ||
assert seconds > 0, f"life-span cannot be negative, got: {seconds}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please raise ValueError
instead of AssertionError
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's impossible situation, and we need this check only in case in order to protect ourselves from sending negative values to the server one day if we change something in future (for example, we are going to change toml parsers). So I cannot test this check properly. Here's what we can do:
- leave assertion
- change to ValueError but still keep it untested
- remove the check completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read assert expr, msg
as something that you want to expose to a user.
If the situation is impossible I expect bare assert expr
at least. Like for the sake of mypy
correctness, we have assert var is not None
very often without a custom provided error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was always thinking that assert expr
is a bad practice since if anything ever goes wrong, the user will not get any idea on the reason, just AssertionError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to bare assert expr
Support run-time limit enforcement functionality: neuro-inc/platform-api#1044