Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to pickle numpy.dtype.__class__ object for numpy >= 1.21 #434

Closed
hernot opened this issue Oct 15, 2021 · 6 comments
Closed

Comments

@hernot
Copy link

hernot commented Oct 15, 2021

Just came across an error causes by breaking changes in numpy 1.20 (may be related to #405 ) which persists in dill with numpy 1.21 while the it is resolved in this version for plain vanilla pickle.

The following code causes the issue:

dtype = numpy.dtype('int64')
dill.dumps(dtype.__class__)

Will result in the following error
_pickle.PicklingError: Can't pickle <class 'numpy.dtype[int64]'>: it's not found as numpy.dtype[int64]

while

dtype = numpy.dtype('int64')
pickle.dumps(dtype.__class__)

returns the resulting pickle string as expected.

python version: >= 3.7
numpy: 1.21
dill: 0.3.4 (fresh download from pypi in tox managed virtual env)

This error seems to be dill side residue of numpy issue #18325 (details) which was fixed for plain vanilla pickle in numpy >=1.21.

The only hint i have is that dill tries to use the __qualname__ or __name__ of the dtypes class to restore the dtype. At least it passes that as name to StockPickler.save_type on line python3.7/site-packages/dill/_dill.py(1439)save_type()

@hernot
Copy link
Author

hernot commented Oct 15, 2021

a hint how this could be solved or better why plain vanilla pickle handles without any issue was given by @seberg on the aforementeioned numpy issue/pr #18325.
Pickle seems to allow for classes which doe not have a globally accessible __qualname__ to be pickled if they do have a custom metaclass for which dedicated pickling and unpickling methods have been registered, which is the case for numpy.dtype classes in numpy >= 1.21. So when dill before resorting to StockPickler.save_global tries the pickling by metaclass route.

@jakirkham
Copy link

The error actually looks similar to issue ( #56 )

@hernot
Copy link
Author

hernot commented Oct 15, 2021

Hope some of my thoughts are somehow helpful in finding solution for all similar.

For example make _dill.save_type awareof hat there could exists copyreg.pickle table entries for type(type_obj) to be used for serializing type_obj

@mmckerns
Copy link
Member

it should be handled by: 5da2543
reopen if this is not the case

@hernot
Copy link
Author

hernot commented Oct 16, 2021

Hm it at least looks like.
To continue my thoughts from discussion on numpy issue #18325 it is dill which claims to be a drop replacement for vanilla pickle for use-cases where knowing the internal state of python matters too, but is not behaving in sync with vanilla pickle for example with respect to copyreg and custom pickler dispatch_table.
So I do suggest that beyond the already long list of workarounds think of starting doing a major code refactor of dill with the goal to catch up again with vanilla pickle. The side effect would be that all these hacky workarounds would become obsolete. In other words no need to maintain them and related unit and behavioural tests.

EDIT
By the way what is the reason why dill in pickling resorts to the python only pickle._Pickler class which seems to be a bit outdated instead of relying on the public pickle.Pickler class. Is it just cause pickle._Pickler has allways a dispatch dict while the pickle.Pickler.dispatch_table is only present if declared by subclass of pickle._Pickler. Is it cause pickle._Pickler is more python 2 style. If the <Pickler>.dispatch or <Pickler>.dispatch_table is essential for dill.Pickler working properly and some other Python 2-ish things, how about the following approach to fix all the issues currently manually fixed by workarounds like the one you just did for numpy.dtype:

# do the following for both python 2 and three as handled differently in suggested approach
from pickle import Pickler as StockPickler, Unpickler as StockUnpickler
import copyreg

class Pickler(StockPickler): 
    """python's Pickler extended to interpreter sessions"""
    # the followowing is handled by __new__ 
    # dispatch = MetaCatchingDict(StockPickler.dispatch.copy())
    _session = False
    from .settings import settings

    def __new__(cls,*args,**kwds):

        if sys.version_info[0] < 3:
              stock_dispatch = StockPickler.dispatch.copy()
        else:
             # in python 3 StockPickler does not have a dispatch_table per default
             # so check cls whether it has defined a dispatch_table
             stock_disptatch = getattr(cls,disptach_table',None)
             if stock_dispatch is None:
                 stock_disptach = copreg.dispatch_table.copy()
             else:
                  stock_dispatch =  stock_dispatch.copy()
             # in addition check if cls expects a python 2 Pickler dispatch
             # attribute if merge it into stock_dispatch.
             stock_dispatch.update(getattr(cls,'dispatch',{}))
             # add anything expected form Python 2 Pickler dispatch
             # not present in Python3 and either not handled implicitly by Python 3 Pickler
             # or explicitly needs to be addressed by dill.Pickler for proper interpreter state
             # transmission
        stock_dispatch = MetaCatchingDict(stock_dispatch)
         
        class Pickler(cls):
               dispatch = stock_dispatch
               if sys.version_info[0] >= 3:
                    dispatch_table = stock_dispatch
               # if necessary also move _session and other parts to here
               # or overlad them.

        pickler = super().__new__(Pickler)
        ## uncomment next line if __init__ happens not to be implicitly called for subclasses of cls
        ## a quick test in another project where we use __new__ to create customized subclasses
        ## indicates that it is called implicitly for subclassess of cls.
        # pickler.__init__(*args,**kwds)
        return pickler

    # Here follows the rest of dill.Pickler class as is

@mmckerns
Copy link
Member

mmckerns commented Oct 16, 2021

@hernot: ok, why don't you raise this in a new ticket, or better yet, submit it as a PR? Note that the above is similar to #148.

@mmckerns mmckerns added this to the dill-0.3.5 milestone Apr 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants