Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GIF decoder #8406

Merged
merged 32 commits into from
May 8, 2024
Merged

Add GIF decoder #8406

merged 32 commits into from
May 8, 2024

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented May 3, 2024

Reviewers: DON'T PANIC this PR is a lot simpler than it looks. It's not 2.6k locs, only ~300 loc.

This PR adds the ability to decode GIF files via the decode_gif() function.

We are vendoring GIFLIB i.e. I copy/pasted the relevant giflib files into torchvision/csrc/io/image/cpu/giflib - this is why the PR looks big, but there's no need to review these files. GIFLIB is MIT licensed so we can do that. Vendoring avoids the dependency nightmare that libjpeg[-turbo] and libpng have been. I don't think giflib even exists on conda (except on conda-forge but even if it did, vendoring would still be easier.

Benchmark

TL;DR Sometimes it's faster than PIL (2x at most), somtimes it's slower (2x at most)

Benchmark script:

import torch
from time import perf_counter_ns


def bench(f, *args, num_exp=100, warmup=0, **kwargs):

    for _ in range(warmup):
        f(*args, **kwargs)

    times = []
    for _ in range(num_exp):
        start = perf_counter_ns()
        f(*args, **kwargs)
        end = perf_counter_ns()
        times.append(end - start)
    return torch.tensor(times).float()

def report_stats(times, unit="ms"):
    mul = {
        "ns": 1,
        "µs": 1e-3,
        "ms": 1e-6,
        "s": 1e-9,
    }[unit]
    times = times * mul
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}{unit} +- {std:.2f}")
    return med


from PIL import Image, ImageSequence
from torchvision import io
from pathlib import Path

# files from https://sourceforge.net/p/giflib/code/ci/master/tree/pic/
paths = list(Path("/home/nicolashug/dev/giflib-code/pic/").glob("*.gif"))
paths += [Path("/home/nicolashug/grace.gif")]  # grace_hopper from torchvision/test/assets

def read_image_pil(path):
    for img in ImageSequence.Iterator(Image.open(path)):
        img.convert("RGB")

for path in paths:
    print()
    print(f"{path.name:<26} size={list(io.read_image(path).shape)}")
    print("tv : ", end="")
    times = bench(io.read_image, path, warmup=10)
    tv_med = report_stats(times)
    print("pil: ", end="")
    times = bench(read_image_pil, path, warmup=10)
    pil_med = report_stats(times)
    print(f"tv is {pil_med / tv_med:.2f}x faster")

Benchmark results:

fire.gif                  size=[33, 3, 60, 30]
tv : med = 0.62ms +- 0.03
pil: med = 1.24ms +- 0.11
tv is 2.01x faster

gifgrid.gif               size=[3, 100, 100]
tv : med = 0.07ms +- 0.01
pil: med = 0.09ms +- 0.01
tv is 1.25x faster

porsche.gif               size=[3, 200, 320]
tv : med = 0.43ms +- 0.04
pil: med = 0.28ms +- 0.04
tv is 0.65x faster

solid2.gif                size=[3, 400, 640]
tv : med = 1.62ms +- 0.15
pil: med = 0.90ms +- 0.09
tv is 0.55x faster

treescap-interlaced.gif   size=[3, 40, 40]
tv : med = 0.03ms +- 0.00
pil: med = 0.07ms +- 0.01
tv is 2.53x faster

treescap.gif              size=[3, 40, 40]
tv : med = 0.03ms +- 0.01
pil: med = 0.07ms +- 0.01
tv is 2.55x faster

welcome2.gif              size=[6, 3, 48, 290]
tv : med = 0.76ms +- 0.08
pil: med = 1.07ms +- 0.10
tv is 1.41x faster

x-trans.gif               size=[3, 100, 100]
tv : med = 0.08ms +- 0.01
pil: med = 0.11ms +- 0.01
tv is 1.33x faster

grace.gif                 size=[3, 606, 517]
tv : med = 3.41ms +- 0.17
pil: med = 2.64ms +- 0.11
tv is 0.77x faster

Copy link

pytorch-bot bot commented May 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8406

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b8e9427 with merge base 1644fff (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@@ -9,7 +9,7 @@ eval "$($(which conda) shell.bash hook)" && conda deactivate && conda activate c

echo '::group::Install testing utilities'
# TODO: remove the <8 constraint on pytest when https://github.com/pytorch/vision/issues/8238 is closed
pip install --progress-bar=off "pytest<8" pytest-mock pytest-cov expecttest!=0.2.0
pip install --progress-bar=off "pytest<8" pytest-mock pytest-cov expecttest!=0.2.0 requests
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed for newly added test.

libraries=image_link_flags,
extra_compile_args=extra_compile_args,
)
ext_modules.append(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now always building the image library even if png and jpeg haven't been found, since we can always build libgif anyways. Maybe we'll add a WITH_LIBGIF flag or something similar, but this can come later.

Copy link
Contributor

@adamjstewart adamjstewart Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a TORCHVISION_USE_GIF env var similar to all other image backends for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why you'd need it @adamjstewart ?
Those are useful for the other libraries as we dynamically link against them and they introduce dependencies, but since we vendor libgif we don't have the same issues with the gif decoder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coming from the perspective of a maintainer of a from-source package manager: Spack. We need to be able to build packages like torchvision from source on all kinds of weird systems (Linux aarch64, ppc64le, etc.) and with compilers (OneAPI, AOCC, Fujitsu, etc.) for which they are not widely tested. This requires patching these packages to fix compilation errors. For example, our giflib recipe already has multiple patches required for certain systems. Vendoring results in us having to patch multiple locations instead of one. I'm not opposed to it being optionally vendored, but would love support for externally-installed copies too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be useful to have an option to use the shared library. We maintain things at conda-forge for shared linking which would make this easier to integrated.

One bug i'm running into (with the current startegy) is that clang for example likes to complain that -std-c++17 is added to compilation of a c file....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adam, you can likely take my small patches to unvendor
conda-forge/torchvision-feedstock#94

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to see these merged upstream so we don't need to maintain a separate set of patches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they would definitely need more work to be submitted here

# and torchvision decoded outputs are equal.
# We're not testing against "welcome2" because PIL and GIFLIB disagee on what
# the background color should be (likely a difference in the way they handle
# transparency?)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the curious, top row is what giflib / torchvision is decoding, bottom row is what PIL is decoding (PIL doesn't use giflib and has its own decoder). There seem to be a disagreement in what the background color should be. When I open the file on a file viewer, that background is marked as "transparent" anyways. I don't think it's worth worrying about that, at least for now.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is PIL using internally to decode gif files? I wonder whether it would be more interesting to (re-)implement PIL's version of gif decoding instead of vendoring giflib here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the PIL GIF decoder is here: https://github.com/python-pillow/Pillow/blob/58a47978af9f34851ce926303d05fa677010ce2a/src/libImaging/GifDecode.c

But I think there are other parts elsewhere because I suspect this part only decodes the file into a "colormap" image, i.e. the values aren't RGB, only indices pointing to values to the image's colormap. I'm not sure where the other part is, I didn't look.

I wonder whether it would be more interesting to (re-)implement PIL's version of gif decoding instead of vendoring giflib here?

I wouldn't want to re-implement a GIF decoder on our own, there is simply too much room for us to make mistakes. It is a lot safer and simpler to rely on GIFLIB which has been around for decades, and is still maintained to this day (https://sourceforge.net/p/giflib/code/ci/master/tree/NEWS).
We could vendor the PIL decoder instead of vendoring GIFLIB but I don't see how this would simplify anything.

# transparency?)

path = tmpdir / f"{name}.gif"
url = f"https://sourceforge.net/p/giflib/code/ci/master/tree/pic/{name}.gif?format=raw"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have avoided downloading test stuff from the internet in the past because it was tricky to make those work on the internal CI, but I'll just skip those tests internally and add something else.
This avoids having weak tests and polluting the repo with image test files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also create random gif files using PIL for the internal CI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might eventually do that, but this won't be a problem for a while. We just need to focus on the GitHub CI for this PR, I'll handle the internal stuff later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to review any file in cpu/giflib, except maybe for this one.

@NicolasHug NicolasHug marked this pull request as ready for review May 6, 2024 12:18
@NicolasHug NicolasHug changed the title WIP Add GIF decoder Add GIF decoder May 6, 2024
Comment on lines +48 to +55
// TODO: We are potentially doing an unnecessary copy of the encoded bytes:
// - 1 copy in from file to tensor (in read_file())
// - 1 copy from tensor to GIFLIB buffers (in read_from_tensor())
// Since we're vendoring GIFLIB we can potentially modify the calls to
// InternalRead() and just set the `buf` pointer to the tensor data directly.
// That might even save allocation of those buffers.
// If we do that, we'd have to make sure the buffers are never written to by
// GIFLIB, otherwise we'd be overridding the tensor data.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this out for now, we can try to address it if gif decoding ever becomes a bottleneck...

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug the code looks OK to me, except maybe reader_helper_t to use simple datatypes... As for CL output, this may be done in a follow-up PR...

Copy link
Contributor

@ahmadsharif1 ahmadsharif1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the noob comments.

torchvision/csrc/io/image/cpu/decode_gif.cpp Outdated Show resolved Hide resolved
// This reads `len` bytes and writes them into `buf`. The data is read from the
// input tensor passed to decode_gif() starting at the `num_bytes_read`
// position.
int read_from_tensor(GifFileType* gifFile, GifByteType* buf, int len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizes and lengths should ideally be size_t not ints

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this function's signature is enforced by GIFLIB, so I'm afraid we can't change it? https://sourceforge.net/p/giflib/code/ci/master/tree/dgif_lib.c#l33

(it forces us to cast below in std::min which is annoying!)

torchvision/csrc/io/image/cpu/decode_gif.cpp Outdated Show resolved Hide resolved
torchvision/csrc/io/image/cpu/decode_gif.cpp Show resolved Hide resolved
torchvision/csrc/io/image/cpu/decode_gif.cpp Outdated Show resolved Hide resolved
torchvision/csrc/io/image/cpu/decode_gif.cpp Show resolved Hide resolved

auto options = torch::TensorOptions()
.dtype(torch::kU8)
.memory_format(torch::MemoryFormat::ChannelsLast);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is ChannelsLast while below and in the loop the channels are before height and width? I am likely missing something. Could you explain in a code comment so other readers can also understand better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline the shape of the tensor is NCHW, but its format/layout is channels-last i.e. NHWC. Below we are using an accessor on the tensor data, whose indexing follows that of the shape i.e. [n][c][h][w] but the underlying operation still respects the channels-last format. If it didn't, I think our correctness tests would fail hard (and maybe even we'd have a segfault since W >> C?)

@vfdev-5 can you confirm that the accessor out_a properly respects the channels-last attribute?

Copy link
Collaborator

@vfdev-5 vfdev-5 May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out = out.squeeze(0); // remove batch dim if there's only one image

DGifCloseFile(gifFile, &error);
TORCH_CHECK(error == D_GIF_SUCCEEDED, "DGifCloseFile() failed - ", error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TORCH_CHECK_EQ here so when it fails it prints both the expected and actual value in the error message?

Copy link
Member Author

@NicolasHug NicolasHug May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to useTORCH_CHECK_EQ while still provinding a custom error message 🤔 , not sure that is supported.

TORCH_CHECK_EQ lead to a segfault. According to https://github.com/pytorch/pytorch/blob/a89177936cc46c476f73a555f499a89ee4c3aa01/c10/util/Exception.h#L421-L439, TORCH_CHECK is the right util to use here.

torchvision/csrc/io/image/cpu/decode_gif.h Show resolved Hide resolved
reader_helper.encoded_data_size = encoded_data.numel();
reader_helper.num_bytes_read = 0;
GifFileType* gifFile =
DGifOpen(static_cast<void*>(&reader_helper), read_from_tensor, &error);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmadsharif1 I ended up using a static cast here as well - is that OK?

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @NicolasHug !

@NicolasHug NicolasHug merged commit e4d2d1a into pytorch:main May 8, 2024
80 of 81 checks passed
@NicolasHug NicolasHug deleted the gif_fun branch May 8, 2024 09:36
Copy link

github-actions bot commented May 8, 2024

Hey @NicolasHug!

You merged this PR, but no labels were added.
The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request May 8, 2024
Reviewed By: vmoens

Differential Revision: D57099453

fbshipit-source-id: 0e85ec0c92cc4e2ee06b5d4183fedc639f38dec0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants