-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation of worker memory/cpu request/limit #20889
Add validation of worker memory/cpu request/limit #20889
Conversation
def workers(data) | ||
valid, errors = true, [] | ||
# worker_settings expects a hash-like structure with nested keys: :config and :workers | ||
data = {:config => {:workers => data}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be data = {:config => {:workers => data.to_hash}}
? Only asking because I think data is an OpenStruct here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 I'm glad you noticed this. I had data.to_hash
before, then with testing, it worked as a Config::Options
since it acts like a hash, so I removed it.
lib/vmdb/settings/validator.rb
Outdated
|
||
if cpu_request > cpu_limit | ||
valid = false | ||
errors << [:invalid_value, "#{worker_class.settings_name}: cpu_request_percent: #{cpu_request} cannot exceed cpu_threshold_percent: #{cpu_limit}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key here should technically be the specific key that is invalid. We can probably pick one? Or pick the first one present in the incoming data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good 👁️ . I did this only because it looks awful when the UI presents it (see the screenshots for the current invalid_value
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, the code in validate
does a similar thing as activerecord errors.add
does for providing a section prefix + whatever message we put here. Unfortunately, the UI tries to humanize the cpu_threshold_percent
value into Cpu Threshold Percent
and it looks bad so I'd rather put the actual problem in the body of the message and not in the prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only problem there is i18n - I think all of the existing keys should be in the dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll use the ugly format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is fixed, see updated screenshots and tests.
@@ -167,6 +167,42 @@ def smtp(data) | |||
|
|||
return valid, errors | |||
end | |||
|
|||
def workers(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall, the validator is only presented with the keys that were changed, so we have deal with the data not having both keys. You might have to fetch the other key from the existing ::Settings section or do some other merge with the default settings before passing that off to worker_class.worker_settings(data)
.
You should be able to verify that with specs passing in only partial data here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall, the validator is only presented with the keys that were changed
?
manageiq/lib/vmdb/settings/validator.rb
Lines 17 to 22 in a94268f
@config.each_key do |k| | |
next unless respond_to?(k.to_s, true) | |
_log.debug("Validating #{k}") | |
ost = OpenStruct.new(@config[k].stringify_keys) | |
section_valid, errors = send(k.to_s, ost) |
It loops through all keys in the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but that incoming @config
there might be sparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at
Lines 62 to 69 in a94268f
new_settings = build_without_local(resource).load!.merge!(hash.deep_symbolize_keys).to_hash | |
replace_magic_values!(new_settings, resource) | |
valid, errors = validate(new_settings) | |
raise ConfigurationInvalid.new(errors) unless valid # rubocop:disable Style/RaiseArgs | |
parent_settings = parent_settings_without_local(resource).load!.to_hash | |
diff = HashDiffer.diff(parent_settings, new_settings) |
validate
is called using the new settings. This is what I tested manually and it always had both values even if only 1 of them was changed.
I'm not sure where it would be partial. I'll keep looking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe validate was designed to support sparse, but doesn't actually call it that way? I'll have to dig in with you, because there might be other callers than .save!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fryguy I couldn't find how @config
could be a partial hash. Can you double check if you think that's a possibility? It would be tricky code to deal with only differences. Plus, the code referenced above calls validate
before the HashDiffer is used to determine the diffs so I think it's really the complete settings at validate
time but can't be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can review it together. I'd rather not support something unless it makes sense.
lib/vmdb/settings/validator.rb
Outdated
cpu_request = worker_settings.fetch(:cpu_request_percent, 0) | ||
cpu_limit = worker_settings.fetch(:cpu_threshold_percent, Float::INFINITY) | ||
memory_request = worker_settings.fetch(:memory_request, 0.bytes) | ||
memory_limit = worker_settings.fetch(:memory_threshold, Float::INFINITY.megabytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these defaults are correct. The user could specify cpu_threshold_percent
, but cpu_request_percent
would be inherited and not passed to the data. The test below ensure there's a value, but it doesn't ensure there's an inherited value being checked against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worker_class.worker_settings(data)
handles inheritance. It doesn't matter where the value came from, right? We just default a value if not found for itself or inherited. This avoids ugly nil checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed up the tests to properly test inherited and overridden values and discarding of nils and usage of these sensible "valid" values just so the <
>
will be valid. Note, the code that takes these values and determines if they should patch the deployment also ignores any nil values so this code follows that expectation.
This is great @jrafanie ... this should make the experience for the user much better. |
a3e06aa
to
b0f62c7
Compare
@jrafanie As discussed, the following 2 things are needed
|
Kubernetes doesn't allow request > limit. Follow up work seen in ManageIQ#20865
This allows us to call the extracted method directly with our options hash without changing the existing behavior of preferring the options hash before the server settings.
b0f62c7
to
a2756ad
Compare
Skip validation of limits/request unless one is specified: mostly for testing. If you specify one of them for a worker, all four need to be specified or inherited from ancestor worker classes.
a2756ad
to
4775c0b
Compare
Only the last two commits are new. The rest were rebased and left as is.. I'm testing this now in the UI to verify it works as desired. |
Ok, I've tested with this change and ManageIQ/manageiq-ui-classic#7725 and have updated the screenshots. The only thing left is to support EDIT. This is now fixed. If you try to change this valid value: To this, where the parent memory_request would make the remaining memory_threshold on the generic_worker invalid: |
spec/models/miq_worker_spec.rb
Outdated
|
||
it "RESET_VALUE at concrete subclass is discarded" do | ||
settings.store_path(:workers, :worker_base, :queue_worker_base, :ems_refresh_worker, :ems_refresh_worker_amazon, {:memory_threshold => Vmdb::Settings::RESET_VALUE}) | ||
stub_settings(settings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is kind of weird here because we don't actually ever store a RESET_VALUE in the settings itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Let me see if there's a better way to test this. It was more to see if it would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just dropped these tests. They were only really there to verify what I did in the UI and don't really make sense here.
Without this, if a user puts <<reset>> in the advanced settings in the UI, when we try to validate worker_settings, this code will merge! <<reset>>, overwriting inherited values from ancestor classes.
ea3dd25
to
cf0f04b
Compare
Checked commits jrafanie/manageiq@e19b535~...cf0f04b with ruby 2.6.3, rubocop 1.13.0, haml-lint 0.35.0, and yamllint app/models/miq_worker.rb
|
@jrafanie Does this have to be merged with ManageIQ/manageiq-ui-classic#7725 or is it independent? I think the bugs in ManageIQ/manageiq-ui-classic#7725 will cause problems if we merged this now, right? |
Good question. I didn't think they're dependent on each other at first thought. But on second thought, I feel like this needs to go in now as it protects the bug fixed in ManageIQ/manageiq-ui-classic#7725 from causing worse problems. |
@jrafanie Is this lasker/yes? |
Backported to
|
Add validation of worker memory/cpu request/limit (cherry picked from commit 97bc537)
Kubernetes doesn't allow request > limit.
Follow up work seen in #20865
Here are some examples of what it looks like when you try to saving an invalid high request value:
In the workers tab:
![image](https://user-images.githubusercontent.com/19339/116471660-eea48780-a842-11eb-96e6-63b4f8bc89d6.png)
In the advanced tab:
![image](https://user-images.githubusercontent.com/19339/116471757-0f6cdd00-a843-11eb-9062-4646674a5747.png)
In CLI using configure_worker_settings.rb: