-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle queue workers inheriting values from worker_base: fix incorrect count and memory threshold values on workers tab #7725
Handle queue workers inheriting values from worker_base: fix incorrect count and memory threshold values on workers tab #7725
Conversation
const baseMemDefault = toBytes(wb.queue_worker_base.defaults.memory_threshold); | ||
const monitorDefault = toBytes(wb.event_catcher.defaults.memory_threshold); | ||
|
||
const selectCount = value => (typeof value === 'number' ? value : countDefault); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only thing I'm curious about is what this === 'number'
means... @kavyanekkalapu do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's to handle 0
which is a number but is also falsey?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we also don't check for countDefault to be a number but theoretically, that could be empty/"". We might need validation on all worker values + all *Default
values here. I can't tell.
defaults: { | ||
memory_threshold: '500.megabytes', | ||
}, | ||
defaults: {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the 500.megabytes example for the queue_worker_base defaults to recreate the situation we have on master where some queue based workers have a value specified and others inherit from worker_base, such as reporting worker in this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - would also like @kavyanekkalapu to review, since I had a hand in helping @jrafanie .
The existing code was checking only the worker's value and falling back to queue_worker_base but not falling back to the worker_base if the other two locations had no value. This causes the select's value to be the first item in the select, which is 200 MB. If you as a user try to change some other unrelated value and then save, all of these "incorrect" 200 MB values get saved instead of the correct value inherited from worker_base. For example, smart_proxy_worker was correct because that worker class specified a memory_threshold. Reporting, C & U processor and collector, Generic, and Priority workers were all showing 200 MB in the select because that's the first entry in the select and because the default of 1.gigabyte was coming from worker_base and no value was coming from queue_worker_base or the worker class itself. For more information, review https://github.com/ManageIQ/manageiq/blob/19b57d3c88679b12718e301f4a19d4b1fee11705/config/settings.yml#L1063-L1113 and see that the reporting worker specifies no memory_threshold and the queue_worker_base also specifies no memory_threshold but should inherit the worker_base value of 1.gigabytes. This is the same problem for the other workers mentioned above.
b44e2cc
to
8474293
Compare
Checked commit jrafanie@8474293 with ruby 2.6.3, rubocop 0.82.0, haml-lint 0.35.0, and yamllint |
yes, I blame @Fryguy for recently being in this code and knowing enough to make me dangerous in changing this code. 🤣 |
|
||
const parsedValues = { | ||
generic_worker: { | ||
memory_threshold: parseWorker(wb.queue_worker_base.generic_worker).bytes || baseMemDefault, | ||
count: selectCount(wb.queue_worker_base.generic_worker.count), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this UI code knows way too much about the worker class hierarchy. In other words, if we ever change it, we need to be mindful of this area of code. Also, wondering if changing same via our API is possible and, if so, how safe that code is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is exactly the problem here. We're reimplementing the backend logic in the frontend.
Apparently, we don't have an api for getting worker settings yet. Unfortunately, the current code is really bad, it's the source of a user accidentally changing lots of workers to use 200 MB memory threshold, so we need to fix this. I'm not sure when this was broken.
See below...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 agree - unfortunately, we don't have an API that normalizes the worker settings, and that would be the "real" fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, thanks... I should have clarified, the UI is calling API.get('/api/servers/${id}/settings')
, so we have the API for getting server settings but this does not walk the workers subhash to resolve each subclass's settings based on inherited values.
This isn't the first time that this special inheritance structure of the workers subhash has caused confusion and complicated code. Even if we flatten worker settings, we still have this inheritance that isn't always obvious for most users.
Is it time to consider if it makes sense doing inheritance at all in the workers hash? I believe the original idea was to enable setting more general defaults that get inherited but if this is not easy, is it buying anything for us other than creating complicated code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke with @jrafanie and I'm leaning towards removing the inheritance as well (not for this PR). If we did so, we'd need to update this UI to present all of the workers and allow them to edit it. In its current presentation it is not feasible but perhaps in a tabular form it would make more sense.
Technically we'd lose "bulk" changing of all workers of a particular class, but it's arguable if that is useful. I'd think users would only change values where there are having problems, not necessarily across the board. Even so, it's not hard to go in and change, at most, 20 workers to the same value. @chessbyte Thoughts?
The flatten workers code would still be necessary for upgrade, but it would be simpler as well, because it would remove the inheritance entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @agrare Would like your perspective too, because this overlaps with providers a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues on my side re: removing the inheritance. The only thing I can see being more complex (and this is just on the developer side so we can manage it) is if we want to add a setting that is used on e.g. the base MiqWorker class we have to blast it out to every worker config section instead of just adding in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather take on that developer inconvenience than push it onto users. In fact, that makes me lean even more to removing the inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and something we can do a shared provider spec test for, e.g. make sure all provider queue workers have :dequeue_method
, :poll_method
, and :queue_timeout
settings because core will fail without them.
I'm fine with doing a different change. We can keep the I found this yesterday by accident while doing UI testing for ManageIQ/manageiq#20889. I saw the 200 MB values in the various memory thresholds and realized that was the reason we had the exact same problem with a user the other day where they changed something else on the worker tab and it set 200 MB memory threshold on a bunch of workers. |
Actually, I just tested it and even if it shows 200 MB like in my screenshots and you change something unrelated, such as the generic worker count to 1 and save, only the worker count of 1 is shown in advanced tab as changed. So, the real problem is it doesn't show the current value correctly, so if you decide to increase the value from 200 MB to 400 MB for the generic worker, you're actually changing it from the default of 1 GB to 400 MB. |
No, it's fine as is.
While we are cutting downstream soon, upstream release will still happen. Additionally, we will ideally have more downstream fixpacks, so we want this to land in the future. This bug is pretty bad, so I'd like to get it back as soon as possible. |
@kavyanekkalapu If you are good with this change can you merge? I was involved in writing it, so don't feel comfortable doing so. |
Backported to
|
…ings Handle queue workers inheriting values from worker_base: fix incorrect count and memory threshold values on workers tab (cherry picked from commit b242799)
The existing code was checking only the worker's value and falling back to
queue_worker_base but not falling back to the worker_base if the other two
locations had no value.
This causes the select's value to be the first item in the select, which is 200 MB.
If you as a user try to change some other unrelated value and then save, all
of these "incorrect" 200 MB values get saved instead of the correct value
inherited from worker_base.
For example, smart_proxy_worker was correct because that worker class specified a
memory_threshold. Reporting, C & U processor and collector, Generic, and
Priority workers were all showing 200 MB in the select because that's the first
entry in the select and because the default of 1.gigabyte was coming from
worker_base and no value was coming from queue_worker_base or the worker class
itself.
For more information, review https://github.com/ManageIQ/manageiq/blob/19b57d3c88679b12718e301f4a19d4b1fee11705/config/settings.yml#L1063-L1113 and see that the reporting worker specifies no memory_threshold and the queue_worker_base also specifies no memory_threshold but should inherit the worker_base value of 1.gigabytes. This is the same problem for the other workers mentioned above.
Before
After