Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Fix cache warming rapid loop #928

Merged
merged 1 commit into from
Feb 5, 2018
Merged

Fix cache warming rapid loop #928

merged 1 commit into from
Feb 5, 2018

Conversation

stefanprodan
Copy link
Member

@stefanprodan stefanprodan commented Feb 3, 2018

Fix #926

Tested with stefanprodan/flux:cpufix-090b064-wip3 on GKE:

screen shot 2018-02-03 at 16 46 05

@stefanprodan stefanprodan changed the title switch to blocking select to avoid rapid loop [WIP] Try fix cache warming rapid loop Feb 3, 2018
Copy link
Contributor

@rade rade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct way to fix is is to

  1. keep the fall-through in the first select
  2. ditch the fall-through in the 2nd select
  3. expand the 2nd select with branches for stop and w.Priority

@rade
Copy link
Contributor

rade commented Feb 3, 2018

FWIW, the spin loop was introduced in #835.

@stefanprodan stefanprodan changed the title [WIP] Try fix cache warming rapid loop Fix cache warming rapid loop Feb 3, 2018
@@ -91,10 +91,18 @@ func (w *Warmer) Loop(logger log.Logger, stop <-chan struct{}, wg *sync.WaitGrou
w.warm(ctx, logger, im.Name, im.Credentials)
} else {
select {
case <-stop:
logger.Log("stopping", "true")

This comment was marked as abuse.

w.warm(ctx, logger, name, creds)
} else {
logger.Log("priority", name.String(), "err", "no creds available")
}

This comment was marked as abuse.

@rade
Copy link
Contributor

rade commented Feb 3, 2018

Why was the original code not consuming an entire core all the time? Is that because every askForNewImagesInterval (1min) it did some real work, which involves i/o and hence released the CPU?

@stefanprodan
Copy link
Member Author

Yes every time the refresh timer hits it does some work for a couple of seconds. The rest of the time just sits in the default loop.

screen shot 2018-02-03 at 20 10 45

Copy link
Contributor

@rade rade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though @Sambooo or @squaremo should take a look since they are more familiar with this code than me.

@rade rade requested review from squaremo and samb1729 February 4, 2018 09:33
@squaremo
Copy link
Member

squaremo commented Feb 5, 2018

Looks good in sum @stefanprodan, thank you. Thanks @rade for helping knock it into shape.

Please rebase into a single commit -- or two, if you want to do the factoring of priorityWarm separately -- and write a short commit message explaining what the problem was and how this fixes it.

@stefanprodan
Copy link
Member Author

@squaremo I've squashed the commits into one.
@rade thanks for guiding me through this.

@stefanprodan
Copy link
Member Author

@squaremo let me know if it's ok for me to merge this PR.

@squaremo
Copy link
Member

squaremo commented Feb 5, 2018

I've squashed the commits into one.

Thank you. I am a bit baffled by the commit message though -- did you really add those things? (I don't think you did.) Something like this would describe what's being fixed:

To implement the loop as described in the comment above it, when the backlog is empty we need to block until the refresh tick fires (or a priority image comes in). By having a default case, it's possible to spin.

(the commit summary can be a bit shorter too -- just "Avoid spin in cache refresh loop" would be fine IMO).

To implement the loop as described in the comment above it, when the backlog is empty we need to block until the refresh tick fires (or a priority image comes in). By having a default case, it's possible to spin.
Copy link
Member

@squaremo squaremo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your patience Stefan.

@squaremo squaremo removed the request for review from samb1729 February 5, 2018 13:51
@stefanprodan stefanprodan merged commit fc799ca into master Feb 5, 2018
@stefanprodan stefanprodan deleted the cpufix branch February 5, 2018 13:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flux using >50% of one CPU
3 participants