Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation of worker memory/cpu request/limit #20889

Merged
merged 11 commits into from
Apr 30, 2021

Conversation

jrafanie
Copy link
Member

@jrafanie jrafanie commented Dec 10, 2020

Kubernetes doesn't allow request > limit.

Follow up work seen in #20865

Here are some examples of what it looks like when you try to saving an invalid high request value:

  1. In the workers tab:
    image

  2. In the advanced tab:
    image

  3. In CLI using configure_worker_settings.rb:

% ./tools/configure_server_settings.rb -t string -s 1 -p workers/worker_base/defaults/memory_request --value "1.2.gigabytes"
{:dry_run=>false, :serverid=>1, :path=>"workers/worker_base/defaults/memory_request", :value=>"1.2.gigabytes", :force=>false, :type=>"string", :all_servers=>false, :help=>false, :type_given=>true, :serverid_given=>true, :path_given=>true, :value_given=>true}
** ManageIQ master, codename: Morphy
Setting [workers/worker_base/defaults/memory_request], old value: [800.megabytes], new value: [1.2.gigabytes]
ERROR: Configuration is invalid:
	workers-memory_request: agent_coordinator_worker: memory_request: 1288490188 cannot exceed memory_threshold: 1073741824

def workers(data)
valid, errors = true, []
# worker_settings expects a hash-like structure with nested keys: :config and :workers
data = {:config => {:workers => data}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be data = {:config => {:workers => data.to_hash}} ? Only asking because I think data is an OpenStruct here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 I'm glad you noticed this. I had data.to_hash before, then with testing, it worked as a Config::Options since it acts like a hash, so I removed it.


if cpu_request > cpu_limit
valid = false
errors << [:invalid_value, "#{worker_class.settings_name}: cpu_request_percent: #{cpu_request} cannot exceed cpu_threshold_percent: #{cpu_limit}"]
Copy link
Member

@Fryguy Fryguy Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key here should technically be the specific key that is invalid. We can probably pick one? Or pick the first one present in the incoming data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good 👁️ . I did this only because it looks awful when the UI presents it (see the screenshots for the current invalid_value).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the code in validate does a similar thing as activerecord errors.add does for providing a section prefix + whatever message we put here. Unfortunately, the UI tries to humanize the cpu_threshold_percent value into Cpu Threshold Percent and it looks bad so I'd rather put the actual problem in the body of the message and not in the prefix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only problem there is i18n - I think all of the existing keys should be in the dictionary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll use the ugly format

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is fixed, see updated screenshots and tests.

@@ -167,6 +167,42 @@ def smtp(data)

return valid, errors
end

def workers(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, the validator is only presented with the keys that were changed, so we have deal with the data not having both keys. You might have to fetch the other key from the existing ::Settings section or do some other merge with the default settings before passing that off to worker_class.worker_settings(data).

You should be able to verify that with specs passing in only partial data here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, the validator is only presented with the keys that were changed

?

@config.each_key do |k|
next unless respond_to?(k.to_s, true)
_log.debug("Validating #{k}")
ost = OpenStruct.new(@config[k].stringify_keys)
section_valid, errors = send(k.to_s, ost)

It loops through all keys in the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but that incoming @config there might be sparse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at

new_settings = build_without_local(resource).load!.merge!(hash.deep_symbolize_keys).to_hash
replace_magic_values!(new_settings, resource)
valid, errors = validate(new_settings)
raise ConfigurationInvalid.new(errors) unless valid # rubocop:disable Style/RaiseArgs
parent_settings = parent_settings_without_local(resource).load!.to_hash
diff = HashDiffer.diff(parent_settings, new_settings)
HashDiffer is given two settings on line 69 that must be full otherwise it can't easily diff them. I'm targeting line 65, where validate is called using the new settings. This is what I tested manually and it always had both values even if only 1 of them was changed.

I'm not sure where it would be partial. I'll keep looking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe validate was designed to support sparse, but doesn't actually call it that way? I'll have to dig in with you, because there might be other callers than .save!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy I couldn't find how @config could be a partial hash. Can you double check if you think that's a possibility? It would be tricky code to deal with only differences. Plus, the code referenced above calls validate before the HashDiffer is used to determine the diffs so I think it's really the complete settings at validate time but can't be sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can review it together. I'd rather not support something unless it makes sense.

cpu_request = worker_settings.fetch(:cpu_request_percent, 0)
cpu_limit = worker_settings.fetch(:cpu_threshold_percent, Float::INFINITY)
memory_request = worker_settings.fetch(:memory_request, 0.bytes)
memory_limit = worker_settings.fetch(:memory_threshold, Float::INFINITY.megabytes)
Copy link
Member

@Fryguy Fryguy Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these defaults are correct. The user could specify cpu_threshold_percent, but cpu_request_percent would be inherited and not passed to the data. The test below ensure there's a value, but it doesn't ensure there's an inherited value being checked against.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker_class.worker_settings(data) handles inheritance. It doesn't matter where the value came from, right? We just default a value if not found for itself or inherited. This avoids ugly nil checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed up the tests to properly test inherited and overridden values and discarding of nils and usage of these sensible "valid" values just so the < > will be valid. Note, the code that takes these values and determines if they should patch the deployment also ignores any nil values so this code follows that expectation.

@Fryguy
Copy link
Member

Fryguy commented Dec 10, 2020

This is great @jrafanie ... this should make the experience for the user much better.

@Fryguy Fryguy self-assigned this Jan 4, 2021
@jrafanie jrafanie force-pushed the validate_request_limit_settings branch from a3e06aa to b0f62c7 Compare January 6, 2021 17:12
@Fryguy
Copy link
Member

Fryguy commented Jan 20, 2021

@jrafanie As discussed, the following 2 things are needed

  • Verifying the keys exist before moving on (in case the user, for example, erases all settings in the advanced settings form, and submits it)
  • extracting the method that flattens worker settings into a dedicated method so you don't have to call it through that weird server-based method passing true

@jrafanie jrafanie force-pushed the validate_request_limit_settings branch from b0f62c7 to a2756ad Compare April 27, 2021 19:32
Skip validation of limits/request unless one is specified: mostly for testing.
If you specify one of them for a worker, all four need to be specified or
inherited from ancestor worker classes.
@jrafanie jrafanie force-pushed the validate_request_limit_settings branch from a2756ad to 4775c0b Compare April 27, 2021 19:39
@jrafanie
Copy link
Member Author

Only the last two commits are new. The rest were rebased and left as is.. I'm testing this now in the UI to verify it works as desired.

@jrafanie
Copy link
Member Author

jrafanie commented Apr 28, 2021

Ok, I've tested with this change and ManageIQ/manageiq-ui-classic#7725 and have updated the screenshots.

The only thing left is to support <<reset>> in advanced settings as the keyword to "remove" values from child worker classes:

image

EDIT.

This is now fixed.

If you try to change this valid value:
workers_tab_1a

To this, where the parent memory_request would make the remaining memory_threshold on the generic_worker invalid:
advanced_settings_reset1

You'll now see this correct validation failure:
advanced_settings_reset2


it "RESET_VALUE at concrete subclass is discarded" do
settings.store_path(:workers, :worker_base, :queue_worker_base, :ems_refresh_worker, :ems_refresh_worker_amazon, {:memory_threshold => Vmdb::Settings::RESET_VALUE})
stub_settings(settings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is kind of weird here because we don't actually ever store a RESET_VALUE in the settings itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Let me see if there's a better way to test this. It was more to see if it would work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just dropped these tests. They were only really there to verify what I did in the UI and don't really make sense here.

Without this, if a user puts <<reset>> in the advanced settings in the UI,
when we try to validate worker_settings, this code  will merge! <<reset>>,
overwriting inherited values from ancestor classes.
@jrafanie jrafanie force-pushed the validate_request_limit_settings branch from ea3dd25 to cf0f04b Compare April 30, 2021 17:10
@miq-bot
Copy link
Member

miq-bot commented Apr 30, 2021

Checked commits jrafanie/manageiq@e19b535~...cf0f04b with ruby 2.6.3, rubocop 1.13.0, haml-lint 0.35.0, and yamllint
4 files checked, 8 offenses detected

app/models/miq_worker.rb

@Fryguy
Copy link
Member

Fryguy commented Apr 30, 2021

@jrafanie Does this have to be merged with ManageIQ/manageiq-ui-classic#7725 or is it independent? I think the bugs in ManageIQ/manageiq-ui-classic#7725 will cause problems if we merged this now, right?

@jrafanie
Copy link
Member Author

@jrafanie Does this have to be merged with ManageIQ/manageiq-ui-classic#7725 or is it independent? I think the bugs in ManageIQ/manageiq-ui-classic#7725 will cause problems if we merged this now, right?

Good question. I didn't think they're dependent on each other at first thought. But on second thought, I feel like this needs to go in now as it protects the bug fixed in ManageIQ/manageiq-ui-classic#7725 from causing worse problems.

@Fryguy Fryguy merged commit 97bc537 into ManageIQ:master Apr 30, 2021
@Fryguy
Copy link
Member

Fryguy commented Apr 30, 2021

@jrafanie Is this lasker/yes?

@jrafanie jrafanie deleted the validate_request_limit_settings branch May 3, 2021 15:00
@Fryguy
Copy link
Member

Fryguy commented May 6, 2021

Backported to lasker in commit 684d1f9.

commit 684d1f9e434a49ebf6b547785f42d23a3a292c6f
Author: Jason Frey <[email protected]>
Date:   Fri Apr 30 17:47:19 2021 -0400

    Merge pull request #20889 from jrafanie/validate_request_limit_settings
    
    Add validation of worker memory/cpu request/limit
    
    (cherry picked from commit 97bc5379adf0b921d63bb06a2ffa76f77b3b0c4d)

Fryguy added a commit that referenced this pull request May 6, 2021
Add validation of worker memory/cpu request/limit

(cherry picked from commit 97bc537)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants