Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix image cachefile serializtion #437

Closed
wants to merge 1 commit into from
Closed

Fix image cachefile serializtion #437

wants to merge 1 commit into from

Conversation

ron8mcr
Copy link
Contributor

@ron8mcr ron8mcr commented Oct 19, 2017

Fix pickle serialization for ImageCacheFile

When Celery CachedFileBackend used with filesystem storage (django.core.files.storage.FileSystemStorage), everything works fine.
But there are issues with storages.backends.s3boto3.S3Boto3Storage (and it's fix from #391), as well as with django_s3_storage.storage.S3Storage.

I didn't debug it to the real core reason, but seem like pickle serializer used for celery task can't serialize ImageCacheFile.storage. So passing instance of
ImageCacheFile was replaced with passing its generator without storage attrs, which serializes without any errors

It seems like a hacky solution, but it works

Exception was:

Traceback (most recent call last):
  ...

  File "/src/django-imagekit/imagekit/cachefiles/__init__.py", line 131, in __bool__
    existence_required.send(sender=self, file=self)
  ...
  File "/libs/utils.py", line 380, in on_existence_required
    file.generate()
  File "/src/django-imagekit/imagekit/cachefiles/__init__.py", line 94, in generate
    self.cachefile_backend.generate(self, force)
  File "/src/django-imagekit/imagekit/cachefiles/backends.py", line 136, in generate
    self.schedule_generation(file, force=force)
  File "/src/django-imagekit/imagekit/cachefiles/backends.py", line 165, in schedule_generation
    _celery_task.delay(self, file.generator, force=force)
  ...
  File "/lib/python3.6/site-packages/kombu/serialization.py", line 221, in dumps
    payload = encoder(data)
  File "/lib/python3.6/site-packages/kombu/serialization.py", line 350, in pickle_dumps
    return dumper(obj, protocol=pickle_protocol)
kombu.exceptions.EncodeError: can't pickle _thread._local objects

@vstoykov
Copy link
Collaborator

From the traceback I see that the problem is that _thread._local can't be pickled. This is probably due to this https://github.com/jschneier/django-storages/blob/1.6.5/storages/backends/s3boto3.py#L240

In django_s3_storage is probably due to this https://github.com/etianen/django-s3-storage/blob/0.12.0/django_s3_storage/storage.py#L137-L140 because probably s3_connection (result from boto3.client) is not pickleable.

From one point of view every custom storage need to behave as close to Django builtin storage in order to be used interchangeably, but the reality is that only the basic operations are the same.

At second look in order to provide smaller pickle representations we need to tune __getstate__ metod of objects in ImageKit wich will be pickled, to include only attributes which are important and can't be recreated.

In this case a better approach can be changing ImageCacheFile.__getstate__ (https://github.com/matthewwithanm/django-imagekit/blob/4.0.1/imagekit/cachefiles/__init__.py#L140-L146) to not include the storage and backend if they are the same as these in the settings and can be recreated from there.

What you think about that?

@ron8mcr
Copy link
Contributor Author

ron8mcr commented Oct 20, 2017

@vstoykov thanks, your solution is much better and this really explains the issue.

But I'm not sure how it should act in case of custom storage (i.e. passed to __init__)

@vstoykov
Copy link
Collaborator

Yep that will be a problem and probably separate issues need to be made for django-storages and django-s3-storage in order to pickleable.

Probably we also need to add in documentation that wen non default storage is used (storage passed to the field different than configured one) it's required the storage to be pickleable if celery or rq backend are used (which I hope that is not so common scenario).

Can you try to change the PR (or create a new one) with modified ImageCacheFile.__reduce__ method and see if it really works for your case?

When Celery CachedFileBackend used with filesystem storage (django.core.files.storage.FileSystemStorage), everything works fine.
But there are issues with storages.backends.s3boto3.S3Boto3Storage (and it's fix from #391), as well as with django_s3_storage.storage.S3Storage.

Exception was:

```
Traceback (most recent call last):
  ...

  File "/src/django-imagekit/imagekit/cachefiles/__init__.py", line 131, in __bool__
    existence_required.send(sender=self, file=self)
  ...
  File "/libs/utils.py", line 380, in on_existence_required
    file.generate()
  File "/src/django-imagekit/imagekit/cachefiles/__init__.py", line 94, in generate
    self.cachefile_backend.generate(self, force)
  File "/src/django-imagekit/imagekit/cachefiles/backends.py", line 136, in generate
    self.schedule_generation(file, force=force)
  File "/src/django-imagekit/imagekit/cachefiles/backends.py", line 165, in schedule_generation
    _celery_task.delay(self, file.generator, force=force)
  ...
  File "/lib/python3.6/site-packages/kombu/serialization.py", line 221, in dumps
    payload = encoder(data)
  File "/lib/python3.6/site-packages/kombu/serialization.py", line 350, in pickle_dumps
    return dumper(obj, protocol=pickle_protocol)
kombu.exceptions.EncodeError: can't pickle _thread._local objects
```
@ron8mcr
Copy link
Contributor Author

ron8mcr commented Oct 20, 2017

Thanks, now it looks better and working

@vstoykov
Copy link
Collaborator

@ron8mcr thank you very much for your work on this issue.

One last thing. Can you try to add a test case that will pickle (see if there are test that will test pickleability) of ImageCacheFile?

@ron8mcr
Copy link
Contributor Author

ron8mcr commented Oct 30, 2017

@vstoykov, there are few test cases already

@vstoykov
Copy link
Collaborator

Yes there are test cases for cache files but there is no test that will ensure that if ImageCacheFile is pickled and then restored will use the same storage (same instance and not similar instance).

The change in this merge request will ensure that if the storage is default storage than there is no need to pickle it, and if there is no storage to restore then use default storage (which is singleton). Then if checked with is before pickle and after restore it should be the same instance. Something like:

assert before.storage is after.storage

Without your change this assertion should fail.

@vstoykov
Copy link
Collaborator

@ron8mcr are you still interested this to be merged?

If not I can make tests for it when I have some time available. Still I will be very glad if you can make the tests for this issue?

Thanks in advance.

@ron8mcr
Copy link
Contributor Author

ron8mcr commented Dec 1, 2017

@vstoykov sorry for late response. Yes, I'm interested in, but have not enough time to finish it. I'll do my best to finish on this weekend

@MikeVL
Copy link

MikeVL commented Jan 23, 2018

This pr closes #451

@bkvirendra
Copy link

@ron8mcr any updates on this PR? Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants