-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
solver: prevent edge merge to inactive states #4887
Conversation
Before this, it was possible for an edge merge to happen to a target edge/state that is no longer in the actives map, e.g. 1. Job A solves some edges 2. Job B solves some edges that get merged with job A's edges 3. Job A is discarded 4. Job C solves some edges that get merged with job B's edges, which recursively end up merging to job A's, which no longer exist in the actives map. While this doesn't always result in an error, given the right state and order of operations this can result in `getState` or `getEdge` being called on the "stale" edges that are inactive and an error. E.g. if a stale dep transitions from desired state `cache-fast` to `cache-slow`, the slow cache logic will call `getState` on it and return a `compute cache` error. I also *suspect* this is the same root cause behind `inconsistent graph state` errors still seen occasionally, but have not repro'd that error locally so can't be 100% sure yet. The fix here updates `state.setEdge` to register all the jobs in the source edge's state with the target edge's state. This works out because: 1. edges that are the target of a merge will not have their state removed from the actives map if only the original job creating them is discarded 1. those edges are still removed eventually, but only when all jobs referencing them (now including jobs referencing them via edge merges) are discarded Signed-off-by: Erik Sipsma <[email protected]>
While debugging solver bugs with scheduler debug logs I ended up with so many logs that I needed to write a program that parsed them and extracted relevant information so it could be summarized more comprehensibly. That was greatly simplified by updating some of the old scheduler debug logs to use logrus fields rather than one-off `fmt.Sprintf`s. All the same information as before is present, just formatted more consistently for easier parsability. I also ended up removing one of the logs I recently added that printed each job for a vertex in `loadUnlocked`. I realized that information was parsable from the rest of the logs and thus was mostly just extra noise. Signed-off-by: Erik Sipsma <[email protected]>
In the case where job is released and it releases all the |
Oh yeah, good catch, that sounds highly plausible. Will fix by just adding jobs recursively to the state and it's ancestors + cover in unit test |
Signed-off-by: Erik Sipsma <[email protected]>
@tonistiigi Pushed a fix for that: ffdbd86 I realized in practice I'm not sure how that case could actually get hit since it would require that one edge is mergeable but it has ancestors that aren't mergeable. But I left the fix in anyways since there's a high probability I'm either not thinking of a case currently possible or future modifications will make that possible somehow, in which case we don't want to be on shaky ground. |
If the merge happens based on content checksum (cache-slow case) then the parent chains can be completely different. |
Although, in this case if the cache-slow has been computed that should mean the parent is already in completed state and likely will not be called into again. But I don't think there is a guarantee the parent has merged even without cache-slow, even if the practical case is hard to think of for current LLB ops. |
I might be misunderstanding the recursive part of this here - but essentially we're taking all ancestors of a node and adding the job to each state here? Isn't Not quite sure about this, so take it with a grain of salt 😆 |
Good question, on one hand I agree that attaching session resources to those ancestor states feels bad, on the other hand if we don't then you can get into a situation where those vertices execute and no longer have any sessions attached and thus are guaranteed to fail... And even if we do go with the current approach it's reasonably unlikely the client of the source of the edge merge has all the right attachables, so still could fail. Same idea for progress, but rather than failing you would end up being blocked on execution that you get no progress updates about. I'm not seeing any good immediate answers here. It might feel reasonable to just never do an edge merge into an edge that has uncompleted vertices with session attachables in itself or in an ancestor, but I don't think there's any plumbing right now to identify vertices with session attachables on this level of the solver (yet). Wdyt @tonistiigi ? Does tracking vertices with session attachables and updating the edge merge logic as described above feel reasonable to you? I would personally also be okay with just going back to not including ancestors for now and thinking about this in followups. This issue is causing tons of problems for us now so getting the fix in ASAP is preferable even if it's not theoretically 100% comprehensive. It seems to solve the immediate problems for us as is, so it's still a step forward. But if we agree on an approach to deal with this and it's not enormously time consuming I can squash it now. |
Hm, I hadn't thought of that side - I think I'd rather have something that "leaks" in a maybe slightly unexpected way, instead of erroring out, or not edge merging where we might expect. I think as implemented, this looks right to me then - already, when edge merging, we merge together sessions in this way, this doesn't feel like a significantly larger issue to me (though in general, how we handle this does feel like it has implications for how we're using/planning to use sessions in dagger - but that's more a downstream kind of question 😆) |
Generally agree, honestly if there was a "userspace" option on LLB vertexes to say "never merge this" that would solve a lot of these cases for us. |
That's already how all the requests are combined in the graph, it is is connected directly via LLB digest on load or later via |
It should not connect progress from the parents, but yes it would get tricky if there are parents that are not fully loaded yet. Based on #4887 (comment) maybe we can assume that this can't happen in LLB atm.
Any estimation how many merged edges would behave like this. Most of the cases come from multiple local sources tracking the same files. So these do have sessions associated with them but all of them should be completed already because the cache key based on content has been computed. Another approach: should be the in merged edge set the |
ExecOps with secrets also come to mind, but I think they are in the same boat in that they should already have ran if they are part of an edge merge (sure there's some case that's not true somehow but probably small subset, and the only downside is that the slight optimization of edge merge doesn't happen).
To confirm I'm following: if edge transitions to a state where it would actually execute the sharedOp but the original job is gone, "unmerge" and use a different src edge with a job still around? That makes conceptual sense if so. TBH my only concern with it is that it seems like it would involve tricky intricate changes to the scheduler, which is hard to modify without breaking something else unintentionally. Maybe it's not as bad as I'm imagining though?
|
I don't think we would need "full unmerge". The possible issues could come For something like mounting secrets during exec I'm not sure if this is even possible. We just need to make sure that the job/session from JobB is added to the merged edge and when it runs it uses If it is possible that the parent gets called then it would be different, but then we don't really have a way to "unmerge" anyway because of the step failing is not the merged step, but it just happens to have a child step later that is merged. Not sure if this is possible in LLB though. |
} | ||
inputState.addJobs(srcState, memo) | ||
|
||
// tricky case: if the inputState's edge was *already* merged we should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this case actually produces inputState != mergedInputState
, is there even a need to bother with the inputState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided that the whole codebase strictly uses getEdge
/getState
(and thus always uses the merge target if a merge happened), then yeah you could probably skip inputState.addJobs
and just unconditionally call inputState.getEdge
and use the return of that.
I'm not sure if that's true (haven't audited everything), do you happen to know if it is?
@tonistiigi I just looked into the details of this:
So AFAICT the fix would require that Is that what you were imagining? |
Until something like moby/buildkit#4887 lands upstream, looks like this is can cause the cache to not build further
Until something like moby/buildkit#4887 lands upstream, looks like this is can cause the cache to not build further
ping @tonistiigi on the questions asked here #4887 (comment) |
I think we first need to figure out if this case is even possible. Otherwise, we might be complicating things unnecessarily. So are there cases where 1) edge is merged with target from jobA, 2) that target still needs to resolve its own inputs or call other sharedOp methods 3) when the original jobA cancels it would cause a sharedOp method to error as well (normally convered by sessionmanager.Any). For 3 to appear it would mean that different tasks from different sessions resolved in the same cache key (it needs to be tasks that only works on one session and not the other for the If we have a case then we can look what are the constraints for "unmerge" for that case. It could be possible for example that it can exist but only for roots, to the |
One quick update, it does appear that some form of the more complex change (where we add jobs to all ancestor states) is needed due to the fact that Still looking through what cases are possible in practice and whether adding jobs to all ancestors is sufficient or if more is needed. |
That |
Before this, it was possible for an edge merge to happen to a target
edge/state that is no longer in the actives map, e.g.
recursively end up merging to job A's, which no longer exist in the
actives map.
While this doesn't always result in an error, given the right state and
order of operations this can result in
getState
orgetEdge
beingcalled on the "stale" edges that are inactive and an error. E.g. if a
stale dep transitions from desired state
cache-fast
tocache-slow
,the slow cache logic will call
getState
on it and return acompute cache
error.I also suspect this is the same root cause behind
inconsistent graph state
errors still seen occasionally, but have not repro'd that errorlocally so can't be 100% sure yet.
The fix here updates
state.setEdge
to register all the jobs in thesource edge's state with the target edge's state. This works out
because:
removed from the actives map if only the original job creating them
is discarded
referencing them (now including jobs referencing them via edge
merges) are discarded
I previously was able to repro a
failed to compute cache key: failed to get state for index 0
error very consistently locally but no longer can with this change. It's hard to integ test since it involves a very specific set of state's and ordering, but added a unit test for the change made here to not discard jobs that are still referenced via edge merges.Also have some updates to logs that were useful for parsing them: eab04dc
In case it's ever useful to anyone reading this, the code I wrote for parsing, filtering and displaying the scheduler debugs logs is here: https://gist.github.com/sipsma/e67119d96cebe06e69e69f9ba9be2e78
Draft Until: