Do not always check if __main__ in result
when pickling
#8443
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This logic evolved over time but the doc string already suggests that we're performing type checks first before we do the "is main in result" check. Some refactoring along the way changed this. Particularly for large results this can be a big difference such that the
thing in result
check can be more expensive than the actual serialization. This can be most strongly observed when pickling bytes directly (not sure if we're actually doing that) or more generally for everything that we're blocklisting in_always_use_pickle_for
(I think we should expand this, e.g. to include arrow tables for p2p)I ended up rewriting the logic to something that is easier to understand imo. This includes a minor functional change. Previously, it would have been possible for an object to be classified as eligible for cloudpickle by the
main in result or pickle_by_value
guard even though it is blocklisted by thealways_use_pickle_for
but only if the object was very small. This is a bit off an odd logic. In fact, it cannot even occur sincealways_use_pickle_for
concerns instances while the pickle_by_value and main in result check concerns functions and classes. Still, imo this made the logic less readable. The new logic is subjectively easier to read and short circuits much more quickly in the happy path oralways_use_pickle_for==True