-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
history() performance regression due to #4568 #5199
Comments
A sample scene where enabling the crop takes 5 seconds ( and that's with the loop controlling the number of diamond connections on the globals set to just 17 - with it set to 30, it would for all practical purposes be hung ).
|
That's an interesting idea. We already kindof have a cache-within-a-cache with I suppose the other way of doing it is to store a set of already-monitored-things in the monitor, and use that to turn caching back on after something has been monitored once? |
The GAFFER_HASHCACHE_MODE approach doesn't make it easy to clear just the monitored caches when started a new monitored process - we don't want to share caches with other monitors, only ourselves. After thinking about it a bit more, I was starting to lean towards the "set of already-monitored-things". Should this set be held by CapturingMonitor or by ValuePlug::ComputeProcess? Do we want to allow the possibility that someone might want some kind of monitor that can visit a node more than once? I think this would always be a bad idea in a production graph. |
CapturingMonitor I think - it's the only monitor with this requirement, and it it will involve some overhead we wouldn't want to pay at any other time. |
I've just found a very similar pattern in another production setup. I think we're going to have to do this "set of already-monitored-things". |
This gives us the following pleasing performance improvement : ``` testHistoryDiamondPerformance (GafferSceneTest.SceneAlgoTest.SceneAlgoTest) : was 9.11s now 0.00s (100% reduction) ``` As noted in the comment, this does mean that we're now returning a reduced history, but we don't anticipate this being a problem in practice. Clients such as `attributeHistory()` aim to whittle the graph down to a single path relevant to a particular attribute, and our HistoryWindow only shows a single linear history anyway. The histories we're returning now are a closer match for the ones we returned prior to GafferHQ#4568, while retaining the main benefit of that PR - the evaluations of thing we're not trying to track are still pulled from the cache if possible. Fixes GafferHQ#5199
In #4568, we switched from using a unique integer in the context to force a history() monitor to re-evaluate upstream nodes, to adding a feature that lets us temporarily disable caching.
This seems cleaner, and allows us to fix an issue where evaluations that snuck upstream outside the plug type we tracking could cache the upstream scene, and cause history() to miss some upstream nodes.
However ... it creates the potential for completely disastrous performance. If there are diamond shapes in the upstream network, which actually involve the plug we are interested in, then we can evaluate upstream nodes multiple times. Chaining multiple diamond shapes causes exponential increase in the number of copies of nodes evaluated.
I'll try and include a simple example script below that illustrates where this was reported in production - a scene where enabling the crop window tool takes 5 seconds ( or 10 or 20 seconds if you add another Options node or two ).
Conceptually, I suppose the correct thing would be a second cache for the plugs where forceMonitoring is true. We don't want to skip monitoring something because it has been evaluated outside of the monitored call, but once a plug has been monitored, we probably don't want to monitor it multiple times in the same context?
Post-script:
While I think we probably do need to fix this, in terms of work-arounds, it's worth mentioning that the cause of large numbers of diamond connections in the IE render pass settings was due to a kinda specific thing that can be worked around: there's a default IE render pass settings node which is commonly used. One of the things this node can do is delete a selection of IE specific render options by building a glob based on some user inputs. It does this using DeleteOptions, but DeleteOptions does not have the ability to whitelist some options for never being deleted - so to achieve a whitelist, it does a DeleteOptions followed by a CopyOptions to restore the options that should never be deleted. The CopyOptions is the source of diamond shaped connections. It would be nice if there was a cleaner way to implement a whitelist on DeleteOptions - I guess maybe instead of using a glob on DeleteOptions, we should be evaluating the options and performing the matching ourselves in Python to produce a baked list of options that match the glob but not the whitelist. For the moment, I can hack around it by just disabling the DeleteOptions and CopyOptions if the user selection is empty, which it often is.
The text was updated successfully, but these errors were encountered: