-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GIF decoder #8406
Add GIF decoder #8406
Changes from 24 commits
beda2dc
e5a541a
f14fbf7
7be488e
4680e5a
13c40f3
1889fcc
3330565
c3eda8a
07294a5
00e3e75
ea24ae2
6487940
1b159f6
ebcd1bf
b0222b5
2cc1041
2295b10
6bcbbb1
bcb62e1
35ff4c9
8f39a37
71c3660
2d066de
da0512b
e26a3ff
5809e40
ce3c2b2
7db3e31
9b19352
f13c7ae
b8e9427
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ Images | |
encode_jpeg | ||
decode_jpeg | ||
write_jpeg | ||
decode_gif | ||
encode_png | ||
decode_png | ||
write_png | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -332,7 +332,11 @@ def get_extensions(): | |
image_macros += [("NVJPEG_FOUND", str(int(use_nvjpeg)))] | ||
|
||
image_path = os.path.join(extensions_dir, "io", "image") | ||
image_src = glob.glob(os.path.join(image_path, "*.cpp")) + glob.glob(os.path.join(image_path, "cpu", "*.cpp")) | ||
image_src = ( | ||
glob.glob(os.path.join(image_path, "*.cpp")) | ||
+ glob.glob(os.path.join(image_path, "cpu", "*.cpp")) | ||
+ glob.glob(os.path.join(image_path, "cpu", "giflib", "*.c")) | ||
) | ||
|
||
if is_rocm_pytorch: | ||
image_src += glob.glob(os.path.join(image_path, "hip", "*.cpp")) | ||
|
@@ -341,18 +345,17 @@ def get_extensions(): | |
else: | ||
image_src += glob.glob(os.path.join(image_path, "cuda", "*.cpp")) | ||
|
||
if use_png or use_jpeg: | ||
ext_modules.append( | ||
extension( | ||
"torchvision.image", | ||
image_src, | ||
include_dirs=image_include + include_dirs + [image_path], | ||
library_dirs=image_library + library_dirs, | ||
define_macros=image_macros, | ||
libraries=image_link_flags, | ||
extra_compile_args=extra_compile_args, | ||
) | ||
ext_modules.append( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are now always building the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice to add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curious why you'd need it @adamjstewart ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm coming from the perspective of a maintainer of a from-source package manager: Spack. We need to be able to build packages like torchvision from source on all kinds of weird systems (Linux aarch64, ppc64le, etc.) and with compilers (OneAPI, AOCC, Fujitsu, etc.) for which they are not widely tested. This requires patching these packages to fix compilation errors. For example, our giflib recipe already has multiple patches required for certain systems. Vendoring results in us having to patch multiple locations instead of one. I'm not opposed to it being optionally vendored, but would love support for externally-installed copies too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be useful to have an option to use the shared library. We maintain things at conda-forge for shared linking which would make this easier to integrated. One bug i'm running into (with the current startegy) is that clang for example likes to complain that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adam, you can likely take my small patches to unvendor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would love to see these merged upstream so we don't need to maintain a separate set of patches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. they would definitely need more work to be submitted here |
||
extension( | ||
"torchvision.image", | ||
image_src, | ||
include_dirs=image_include + include_dirs + [image_path], | ||
library_dirs=image_library + library_dirs, | ||
define_macros=image_macros, | ||
libraries=image_link_flags, | ||
extra_compile_args=extra_compile_args, | ||
) | ||
) | ||
|
||
# Locating ffmpeg | ||
ffmpeg_exe = shutil.which("ffmpeg") | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,10 +6,11 @@ | |
|
||
import numpy as np | ||
import pytest | ||
import requests | ||
import torch | ||
import torchvision.transforms.functional as F | ||
from common_utils import assert_equal, needs_cuda | ||
from PIL import __version__ as PILLOW_VERSION, Image, ImageOps | ||
from PIL import __version__ as PILLOW_VERSION, Image, ImageOps, ImageSequence | ||
from torchvision.io.image import ( | ||
_read_png_16, | ||
decode_image, | ||
|
@@ -548,5 +549,34 @@ def test_pathlib_support(tmpdir): | |
write_png(img, write_path) | ||
|
||
|
||
@pytest.mark.parametrize("name", ("gifgrid", "fire", "porsche", "treescap", "treescap-interlaced", "solid2", "x-trans")) | ||
def test_decode_gif(tmpdir, name): | ||
# Using test images from GIFLIB | ||
# https://sourceforge.net/p/giflib/code/ci/master/tree/pic/, we assert PIL | ||
# and torchvision decoded outputs are equal. | ||
# We're not testing against "welcome2" because PIL and GIFLIB disagee on what | ||
# the background color should be (likely a difference in the way they handle | ||
# transparency?) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the curious, top row is what giflib / torchvision is decoding, bottom row is what PIL is decoding (PIL doesn't use giflib and has its own decoder). There seem to be a disagreement in what the background color should be. When I open the file on a file viewer, that background is marked as "transparent" anyways. I don't think it's worth worrying about that, at least for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is PIL using internally to decode gif files? I wonder whether it would be more interesting to (re-)implement PIL's version of gif decoding instead of vendoring giflib here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Part of the PIL GIF decoder is here: https://github.com/python-pillow/Pillow/blob/58a47978af9f34851ce926303d05fa677010ce2a/src/libImaging/GifDecode.c But I think there are other parts elsewhere because I suspect this part only decodes the file into a "colormap" image, i.e. the values aren't RGB, only indices pointing to values to the image's colormap. I'm not sure where the other part is, I didn't look.
I wouldn't want to re-implement a GIF decoder on our own, there is simply too much room for us to make mistakes. It is a lot safer and simpler to rely on GIFLIB which has been around for decades, and is still maintained to this day (https://sourceforge.net/p/giflib/code/ci/master/tree/NEWS). |
||
|
||
path = tmpdir / f"{name}.gif" | ||
url = f"https://sourceforge.net/p/giflib/code/ci/master/tree/pic/{name}.gif?format=raw" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have avoided downloading test stuff from the internet in the past because it was tricky to make those work on the internal CI, but I'll just skip those tests internally and add something else. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can also create random gif files using PIL for the internal CI? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might eventually do that, but this won't be a problem for a while. We just need to focus on the GitHub CI for this PR, I'll handle the internal stuff later |
||
with open(path, "wb") as f: | ||
f.write(requests.get(url).content) | ||
|
||
tv_out = read_image(path) | ||
if tv_out.ndim == 3: | ||
tv_out = tv_out[None] | ||
|
||
assert tv_out.is_contiguous(memory_format=torch.channels_last) | ||
|
||
# For some reason, not using Image.open() as a CM causes "ResourceWarning: unclosed file" | ||
with Image.open(path) as pil_img: | ||
pil_seq = ImageSequence.Iterator(pil_img) | ||
|
||
for pil_frame, tv_frame in zip(pil_seq, tv_out): | ||
pil_frame = F.pil_to_tensor(pil_frame.convert("RGB")) | ||
torch.testing.assert_close(tv_frame, pil_frame, atol=0, rtol=0) | ||
|
||
|
||
if __name__ == "__main__": | ||
pytest.main([__file__]) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
#include "decode_gif.h" | ||
#include "giflib/gif_lib.h" | ||
|
||
namespace vision { | ||
namespace image { | ||
|
||
typedef struct reader_helper_t { | ||
uint8_t const* encoded_data; // input tensor data pointer | ||
size_t encoded_data_size; // size of input tensor in bytes | ||
int num_bytes_read; // number of bytes read so far in the tensor | ||
} reader_helper_t; | ||
|
||
// That function is used by GIFLIB routines to read the encoded bytes. | ||
// This reads `len` bytes and writes them into `buf`. The data is read from the | ||
// input tensor passed to decode_gif() starting at the `num_bytes_read` | ||
// position. | ||
int read_from_tensor(GifFileType* gifFile, GifByteType* buf, int len) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sizes and lengths should ideally be size_t not ints There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately this function's signature is enforced by GIFLIB, so I'm afraid we can't change it? https://sourceforge.net/p/giflib/code/ci/master/tree/dgif_lib.c#l33 (it forces us to cast below in |
||
// the UserData field was set in DGifOpen() | ||
reader_helper_t* reader_helper = (reader_helper_t*)gifFile->UserData; | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
auto i = 0; | ||
auto num_bytes_to_read = std::min( | ||
len, | ||
(int)(reader_helper->encoded_data_size - reader_helper->num_bytes_read)); | ||
while (i < num_bytes_to_read) { | ||
buf[i] = reader_helper->encoded_data[reader_helper->num_bytes_read + i]; | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
i++; | ||
} | ||
reader_helper->num_bytes_read += i; | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return i; | ||
} | ||
|
||
torch::Tensor decode_gif(const torch::Tensor& encoded_data) { | ||
// LibGif docs: https://giflib.sourceforge.net/intro.html | ||
// Refer over there for more details on the libgif API, API ref, and a | ||
// detailed description of the GIF format. | ||
|
||
int error = D_GIF_SUCCEEDED; | ||
|
||
// We're using DGidOpen. The other entrypoints of libgif are | ||
// DGifOpenFileName and DGifOpenFileHandle but we don't want to use those, | ||
// since we need to read the encoded bytes from a tensor of encoded bytes, not | ||
// from a file (for consistency with existing jpeg and png decoders). Using | ||
// DGifOpen is the only way to read from a custom source. | ||
// For that we need to provide a reader function `read_from_tensor` that | ||
// reads from the tensor, and we have to keep track of the number of bytes | ||
// read so far: this is why we need the reader_helper struct. | ||
|
||
// TODO: We are potentially doing an unnecessary copy of the encoded bytes: | ||
// - 1 copy in from file to tensor (in read_file()) | ||
// - 1 copy from tensor to GIFLIB buffers (in read_from_tensor()) | ||
// Since we're vendoring GIFLIB we can potentially modify the calls to | ||
// InternalRead() and just set the `buf` pointer to the tensor data directly. | ||
// That might even save allocation of those buffers. | ||
// If we do that, we'd have to make sure the buffers are never written to by | ||
// GIFLIB, otherwise we'd be overridding the tensor data. | ||
Comment on lines
+59
to
+66
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leaving this out for now, we can try to address it if gif decoding ever becomes a bottleneck... |
||
reader_helper_t reader_helper; | ||
reader_helper.encoded_data = encoded_data.data_ptr<uint8_t>(); | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
reader_helper.encoded_data_size = encoded_data.numel(); | ||
reader_helper.num_bytes_read = 0; | ||
GifFileType* gifFile = | ||
DGifOpen((void*)&reader_helper, read_from_tensor, &error); | ||
|
||
TORCH_CHECK( | ||
(gifFile != nullptr) && (error == D_GIF_SUCCEEDED), | ||
"DGifOpenFileName() failed - ", | ||
error); | ||
|
||
if (DGifSlurp(gifFile) == GIF_ERROR) { | ||
auto gifFileError = gifFile->Error; | ||
DGifCloseFile(gifFile, &error); | ||
TORCH_CHECK(false, "DGifSlurp() failed - ", gifFileError); | ||
} | ||
|
||
// Note: | ||
// The GIF format has this notion of "canvas" and "canvas size", where each | ||
// image could be displayed on the canvas at different offsets, forming a | ||
// mosaic/picture wall like so: | ||
// | ||
// <--- canvas W ---> | ||
// ------------------------ ^ | ||
// | | | | | ||
// | img1 | img3 | | | ||
// | |------------| canvas H | ||
// |---------- | | | ||
// | img2 | img4 | | | ||
// | | | | | ||
// ------------------------ v | ||
// The GifLib docs indicate that this is mostly vestigial, and modern viewers | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// ignore the canvas size as well as image offsets. Hence, we're ignoring that | ||
// too: | ||
// - We're ignoring the canvas width and height and assume that the shape of | ||
// the canvas and of all images is the shape of the first image. | ||
// - We're enforcing that all images have the same shape. | ||
// - Left and Top offsets of each image are ignored as well and assumed to be | ||
// 0. | ||
|
||
auto out_h = gifFile->SavedImages[0].ImageDesc.Height; | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
auto out_w = gifFile->SavedImages[0].ImageDesc.Width; | ||
auto num_images = gifFile->ImageCount; | ||
|
||
auto options = torch::TensorOptions() | ||
.dtype(torch::kU8) | ||
.memory_format(torch::MemoryFormat::ChannelsLast); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why this is ChannelsLast while below and in the loop the channels are before height and width? I am likely missing something. Could you explain in a code comment so other readers can also understand better? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed offline the shape of the tensor is NCHW, but its format/layout is channels-last i.e. NHWC. Below we are using an accessor on the tensor data, whose indexing follows that of the shape i.e. @vfdev-5 can you confirm that the accessor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, |
||
auto out = torch::empty( | ||
{int64_t(num_images), 3, int64_t(out_h), int64_t(out_w)}, options); | ||
auto out_a = out.accessor<uint8_t, 4>(); | ||
|
||
for (int i = 0; i < num_images; i++) { | ||
const SavedImage& img = gifFile->SavedImages[i]; | ||
const GifImageDesc& desc = img.ImageDesc; | ||
TORCH_CHECK( | ||
desc.Width == out_w && desc.Height == out_h, | ||
"All images in the gif should have the same dimensions."); | ||
|
||
const ColorMapObject* cmap = | ||
desc.ColorMap ? desc.ColorMap : gifFile->SColorMap; | ||
TORCH_CHECK( | ||
cmap != nullptr, | ||
"Global and local color maps are missing. This should never happen!"); | ||
|
||
for (int h = 0; h < desc.Height; h++) { | ||
for (int w = 0; w < desc.Width; w++) { | ||
auto c = img.RasterBits[h * desc.Width + w]; | ||
GifColorType rgb = cmap->Colors[c]; | ||
vfdev-5 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
out_a[i][0][h][w] = rgb.Red; | ||
out_a[i][1][h][w] = rgb.Green; | ||
out_a[i][2][h][w] = rgb.Blue; | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
} | ||
out = out.squeeze(0); // remove batch dim if there's only one image | ||
vfdev-5 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
DGifCloseFile(gifFile, &error); | ||
TORCH_CHECK(error == D_GIF_SUCCEEDED, "DGifCloseFile() failed - ", error); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use TORCH_CHECK_EQ here so when it fails it prints both the expected and actual value in the error message? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
TORCH_CHECK_EQ lead to a segfault. According to https://github.com/pytorch/pytorch/blob/a89177936cc46c476f73a555f499a89ee4c3aa01/c10/util/Exception.h#L421-L439, TORCH_CHECK is the right util to use here. |
||
|
||
return out; | ||
} | ||
|
||
} // namespace image | ||
} // namespace vision |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#pragma once | ||
|
||
#include <torch/types.h> | ||
|
||
namespace vision { | ||
namespace image { | ||
|
||
C10_EXPORT torch::Tensor decode_gif(const torch::Tensor& encoded_data); | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
} // namespace image | ||
} // namespace vision |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to review any file in |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
These files come from the GIFLIB project (https://giflib.sourceforge.net/) and | ||
are licensed under the MIT license. | ||
|
||
Some modifications have been made to the original files: | ||
- Remove use of "register" keyword in gifalloc.c for C++17 compatibility. | ||
- Declare loop variable i in DGifGetImageHeader as int instead of unsigned int. | ||
|
||
Below is the original license text from the COPYING file of the GIFLIB project: | ||
|
||
= MIT LICENSE | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in | ||
all copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
THE SOFTWARE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed for newly added test.