-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rounding error in htex block scale in #3721
Conversation
@jrueb not sure if you're still interested, but this PR affects code you introduced so you might want to look at it |
excess_slots = math.ceil(active_slots - (active_tasks * parallelism)) | ||
excess_blocks = math.ceil(float(excess_slots) / (tasks_per_node * nodes_per_block)) | ||
excess_slots = math.floor(active_slots - (active_tasks * parallelism)) | ||
excess_blocks = math.floor(float(excess_slots) / (tasks_per_node * nodes_per_block)) | ||
excess_blocks = min(excess_blocks, active_blocks - min_blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure it's accounted for prior to this if-branch — and it exists prior to this hunk — but the first thing that comes to mind is "is it possible for excess_blocks to become negative?)
The if-condition guarantees that active_blocks - min_blocks
is at least 1, but I'm not clear on the guarantees of active_slots - (active_tasks * parallelism)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably many of these values are assumed to be "sensible" and this code will go wrong if not. see for example this recent issue #3726
Description
PR #2196 calculates a number of blocks to scale in, in the htex strategy, rather than scaling in one block per strategy iteration. However, it rounds the wrong way: it scales in a rounded up, rather than rounded down, number of blocks.
Issue #3696 shows that then resulting in oscillating behaviour: With 14 tasks and 48 workers per block, on alternating strategy runs, the code will either scale up to the rounded up number of needed blocks (14/48 => 1), or scale down to the rounded down number of needed blocks (14/48 => 0).
This PR changes the rounding introduced in #2196 to be consistent: rounding up the number of blocks to scale up, and rounding down the number of blocks to scale down.
Changed Behaviour
HTEX scale down should oscillate less
Fixes
Fixes #3696
Type of change