Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

autoclose files opened in load_img #261

Merged
merged 12 commits into from
Nov 26, 2019

Conversation

duhaime
Copy link
Contributor

@duhaime duhaime commented Nov 18, 2019

Summary

The current load_img function opens files but never closes them, which can cause OSError: [Errno 24] Too many open files if one reads in too many images in a process.

Related Issues

#260

PR Overview

  • [n] This PR requires new unit tests [y/n] (make sure tests are included)
  • [n] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
  • [y] This PR is backwards compatible [y/n]
  • [n] This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

@duhaime
Copy link
Contributor Author

duhaime commented Nov 19, 2019

Thanks for your quick followup @rragundez! I had thought the same, but it seems closing the filehandler before returning the object forbids a number of operations on the returned object. Minimal example:

import numpy as np
from PIL import Image as pil_image

im = pil_image.open('cats.jpg')
im.close()
im.getdata()

That will throw ValueError: Operation on closed image.

For what it's worth, the approach here is straight from the Pillow docs on file handling. That said, I'm happy to make any changes you see fit!

@duhaime duhaime changed the title use with block when opening images to autoclose filehandlers autoclose files opened in load_img Nov 22, 2019
@rragundez
Copy link
Contributor

rragundez commented Nov 25, 2019

Thanks for your explanations. After having a closer look I think the root of the problem is the inconsistency load_img can return only a pointer to the file if no data augmentation is done (keeps file open) and an actual image in memory if data augmentation is done (no open file). I think this inconsistency is gone by simply changing img = pil_image.open(path) to img = pil_image.load(path) in
https://github.com/keras-team/keras-preprocessing/blob/master/keras_preprocessing/image/utils.py#L112

The drawback is of course that for no augmentation the image object occupies more memory but since this function is used in the Iterator class in a loop where it gets immediately converted to an array then is OK.

I would do the change above and simply add a unit-test that load_img is actually returning an image in memory and not a pointer to the file, with and without augmentation. What do you think?

@duhaime
Copy link
Contributor Author

duhaime commented Nov 25, 2019

@rragundez Your approach (with pil_image.load(path) as img) is much cleaner, nice research. I just replaced the io.BytesIO business with that approach, and added a test to validate that load_img does indeed return an image in memory.

How does the updated PR look?

@duhaime
Copy link
Contributor Author

duhaime commented Nov 25, 2019

@rragundez ah I was testing against a cached version of the module--it appears the load method is missing from PIL.Image--where did you see that method?

@rragundez
Copy link
Contributor

rragundez commented Nov 25, 2019

It appears it acts over the Image class objects

https://pillow.readthedocs.io/en/3.1.x/reference/Image.html?highlight=load#PIL.Image.Image.load

then something like pil_image.open().load() I think is appropriate.

@duhaime
Copy link
Contributor Author

duhaime commented Nov 25, 2019

Hmm, these two return different objects:

pil_image.open('a.png')        # returns `PIL.PngImagePlugin.PngImageFile` object
pil_image.open('a.png').load() # returns `PixelAccess` object 

I understand the Pillow docs on the open method to indicate that PIL.Image.open() opens the file (and never closes it) but waits to read the pixel data into RAM. load() evidently is a method that one can call to manually force the PIL.Image object to yield up its pixel data.

That said, combining open and load don't seem to close the file handlers. Returning PixelAccess data rather than a PIL Image object also breaks some methods in keras_preprocessing; e.g. you can't call img_to_array on the PixelAccess data:

>>> from keras.preprocessing.image import img_to_array
Using TensorFlow backend.
>>> from PIL import Image as pil_image
>>> a = pil_image.open('a.png').load()
>>> img_to_array(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/doug/anaconda/envs/3.5/lib/python3.5/site-packages/keras/preprocessing/image.py", line 75, in img_to_array
    return image.img_to_array(img, data_format=data_format, dtype=dtype)
  File "/Users/doug/anaconda/envs/3.5/lib/python3.5/site-packages/keras_preprocessing/image/utils.py", line 299, in img_to_array
    x = np.asarray(img, dtype=dtype)
  File "/Users/doug/anaconda/envs/3.5/lib/python3.5/site-packages/numpy/core/numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'PixelAccess'

It's entirely possible I misunderstood you or am missing something! From my perspective, using the io stream method outlined in the Pillow docs seems the best way forward...

keras_preprocessing/text.py Outdated Show resolved Hide resolved
tests/image/utils_test.py Outdated Show resolved Hide resolved
tests/image/utils_test.py Outdated Show resolved Hide resolved
@duhaime
Copy link
Contributor Author

duhaime commented Nov 26, 2019

@rragundez sounds good! I just submitted some updates with the changes you requested. The flake8 checks are of course going to fail because of the undefined unicode in the text module. I factored the changes that rectify that problem into a separate PR per your request above.

@rragundez
Copy link
Contributor

I think is good now, I'll have a closer look tomorrow :)

@duhaime
Copy link
Contributor Author

duhaime commented Nov 26, 2019

thanks for your help with this!

@rragundez
Copy link
Contributor

No problem, I don't understand why I still see changes in keras_preprocessing/text.py

@rragundez rragundez merged commit 9a836c2 into keras-team:master Nov 26, 2019
@rragundez
Copy link
Contributor

thanks for the work @duhaime!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants