-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_image - decode b64encode and encodebytes strings #30192
load_image - decode b64encode and encodebytes strings #30192
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -320,7 +320,7 @@ def load_image(image: Union[str, "PIL.Image.Image"], timeout: Optional[float] = | |||
|
|||
# Try to load as base64 | |||
try: | |||
b64 = base64.b64decode(image, validate=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a bit why b64decode
here can't handle the data encoded by encodebytes
?
src/transformers/image_utils.py
Outdated
@@ -320,7 +320,7 @@ def load_image(image: Union[str, "PIL.Image.Image"], timeout: Optional[float] = | |||
|
|||
# Try to load as base64 | |||
try: | |||
b64 = base64.b64decode(image, validate=True) | |||
b64 = base64.decodebytes(image.encode() if isinstance(image, str) else image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the fix is to make it handle string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But image
is always str
inside this if
branch if isinstance(image, str):
, so I feel something is strange here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sorry, this is a hangover from when I was testing with my own scripts. I've changed it so it's just image.encode()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see we have
image.encode() if isinstance(image, str) else image
??
Other than this, LGTM.
Hi @amyeroberts Let me know your thoughts on my comments whenever you get some bandwidth |
@ydshieh OK, so the solution I put in here was a hacky one I found worked without digging too much into it - sorry for the rushed solution. So, import base64
from io import BytesIO
from PIL import Image
import torch
buffered = BytesIO()
im = torch.rand((256, 256,3))
image = Image.fromarray(im.numpy().astype('uint8'), 'RGB')
image.save(buffered, format="JPEG")
base64.b64decode(base64.encodebytes(buffered.getvalue()), validate=True) This is because encodebytes inserts newline characters into the encoded bytes string after every 76 bytes if output and ensures there's a trailing new line. This fails validation with The solution to enable these images would be to remove |
This is a good question. I tried to feed some bytes string without. Without adding extra arguments to |
I tried to decode some arbitrary bytes and import base64
s = b'data to be encoded'
print(type(s))
data = base64.decodebytes(s)
print(data) |
Awesome. Thanks for looking into this! I'll tidy up the code and let you know when it's ready for re-review. |
abe8906
to
e21dd17
Compare
@ydshieh Ready for re-review 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix, the iterations and the explanations!
* Decode b64encode and encodebytes strings * Remove conditional encode -- image is always a string
What does this PR do?
Updates base64 decoding login to decode base64 string encoded with either
base64.b64encode
orbase64.encodebytes
Fixes #30114
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.