Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress canonicalization for lazy IO [DNM] #51

Merged
merged 1 commit into from
Jun 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ OpenEXR = "0.3"
PNGFiles = "0.3"
QOI = "1"
Sixel = "0.1.2"
TiffImages = "0.3, 0.4, 0.5"
TiffImages = "0.3, 0.4, 0.5, 0.6"
julia = "1.6"

[extras]
Expand Down
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ save("test.tiff", rand(RGB, 100, 100))
load("test.tiff")
```

## Canonicalization

Some image loaders may return custom AbstractArray types. By default, this package "canonicalizes" the returned type to be either `Array` or [`IndirectArray`](https://github.com/JuliaArrays/IndirectArrays.jl).
An exception is for calls like `load(filename; mmap=true)` where the image data will be "lazily" loaded using [memory-mapped IO](https://en.wikipedia.org/wiki/Memory-mapped_I/O), in which case the default is to allow the lower-level I/O package to return whatever AbstractArray type it chooses.

You can manually control canonicalization with `load(filename; canonicalize=tf)` where `tf` is `true` or `false`.

## Compatibility

If you're using old Julia versions (`VERSION < v"1.3"`), a dummy ImageIO version v0.0.1 with no real function will be installed.
Expand Down
16 changes: 12 additions & 4 deletions src/ImageIO.jl
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@ function enforce_canonical_type(f, data)
end
end

# For lazy loads by either of
# load(io; mmap=true)
# load(io; lazyio=true)
# do not canonicalize by default. A package that supports both is TiffImages v0.6+.
uses_lazy(kwargs) = get(kwargs, :mmap, false) || get(kwargs, :lazyio, false)
Copy link
Member

@johnnychen94 johnnychen94 May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware of the world-age issue

With this branch and tlnagy/TiffImages.jl#79:

julia> using TestImages, FileIO

julia> function f()
           file = testimage("cameraman", download_only=true)
           img = load(file; mmap=true)
           return length(img)
       end
f (generic function with 1 method)

julia> f()
ERROR: MethodError: no method matching size(::TiffImages.MmappedTIFF{ColorTypes.Gray{FixedPointNumbers.N0f8}, 2, UInt32, Base.ReinterpretArray{ColorTypes.Gray{FixedPointNumbers.N0f8}, 2, UInt8, Base.ReshapedArray{UInt8, 2, SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{Int64}}, true}, Tuple{}}, true}})
The applicable method may be too new: running in world age 31358, while current world is 31367.
Closest candidates are:
  size(::TiffImages.MmappedTIFF{T, 2, O, A} where {O<:Unsigned, A<:AbstractMatrix{T}}) where T at ~/Documents/Julia/TiffImages.jl/src/types/mmapped.jl:60 (method too new to be called from this world context.)
  size(::AbstractArray{T, N}, ::Any) where {T, N} at ~/packages/julias/julia-1.7/share/julia/base/abstractarray.jl:42
  size(::Union{LinearAlgebra.Adjoint{T, var"#s859"}, LinearAlgebra.Transpose{T, var"#s859"}} where {T, var"#s859"<:(AbstractVector)}) at ~/packages/julias/julia-1.7/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:172
  ...
Stacktrace:
 [1] length(t::TiffImages.MmappedTIFF{ColorTypes.Gray{FixedPointNumbers.N0f8}, 2, UInt32, Base.ReinterpretArray{ColorTypes.Gray{FixedPointNumbers.N0f8}, 2, UInt8, Base.ReshapedArray{UInt8, 2, SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{Int64}}, true}, Tuple{}}, true}})
   @ Base ./abstractarray.jl:273
 [2] f()
   @ Main ./REPL[2]:4
 [3] top-level scope
   @ REPL[3]:1

julia> f()
262144

We should probably eagerly check if TiffImages is already loaded when canonicalize=false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably eagerly check if TiffImages is already loaded when canonicalize=false.

This can be done via the new helper: johnnychen94/LazyModules.jl#6

Copy link
Member Author

@timholy timholy May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(EDIT: I should have refreshed the browser before responding, so this doesn't reflect some of your later comments.)

But this would also be a problem:

img = load(file; mmap=true)
# now wait 2.5 hours for that 4TB file to be converted to an Array, and it will fail because it exhausts memory on your machine

So canonicalization just turns it into a different, harder to circumvent problem. The world age issue is circumvented merely by trying again.

This does seem to be a good case for an error hint, and it seems feasible to do since MethodError gets a lot of detail.

Copy link
Member

@johnnychen94 johnnychen94 May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably eagerly check if TiffImages is already loaded when canonicalize=false.

I mean, how about

 function load(f::File{DataFormat{:TIFF}}; canonicalize::Union{Bool,Nothing}=nothing, kwargs...)
     canonicalize = something(canonicalize, !uses_lazy(kwargs))
+    # If we skip canonicalization, we're open to world age issues unless TiffImages
+    # is already loaded.
+    canonicalize || LazyModules.require(TiffImages)
     data = TiffImages.load(f.filename; kwargs...)
     return canonicalize ? enforce_canonical_type(f, data) : data
 end
 function load(s::Stream{DataFormat{:TIFF}}; canonicalize::Union{Bool,Nothing}=nothing, kwargs...)
     canonicalize = something(canonicalize, !uses_lazy(kwargs))
+    canonicalize || LazyModules.require(TiffImages)
     data = TiffImages.load(stream(s); kwargs...)
     return canonicalize ? enforce_canonical_type(s, data) : data
 end

and this gives a perhaps useful enough error message already -- if we ignore the FileIO's complicated error stack.

julia> using TestImages, FileIO

julia> function f()
           file = testimage("cameraman", download_only=true)
           img = load(file; mmap=true)
           return length(img)
       end
f (generic function with 1 method)

julia> f()
[ Info: Precompiling ImageIO [82e4d734-157c-48bb-816b-45c225c6df19]
[ Info: Precompiling ImageMagick [6218d12a-5da1-5696-b52f-db25d2ecc6d1]
Errors encountered while load File{DataFormat{:TIFF}, String}("/home/jc/.julia/artifacts/27a4c26bcdd47eb717bee089ec231a899cb8ef69/cameraman.tif").
All errors:
===========================================
TiffImages is required to be loaded first, maybe `using TiffImages` or `import TiffImages` and try again.
===========================================
MethodError: no method matching load_(::String, ::Bool; mmap=true)
Closest candidates are:
  load_(::Union{AbstractString, IO, Vector{UInt8}}, ::Any; ImageType, extraprop, extrapropertynames, view) at ~/.julia/packages/ImageMagick/Fh2BX/src/ImageMagick.jl:137 got unsupported keyword argument "mmap"
  load_(::Union{AbstractString, IO, Vector{UInt8}}) at ~/.julia/packages/ImageMagick/Fh2BX/src/ImageMagick.jl:137 got unsupported keyword argument "mmap"
===========================================

Fatal error:
ERROR: TiffImages is required to be loaded first, maybe `using TiffImages` or `import TiffImages` and try again.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] require
   @ ~/Documents/Julia/LazyModules.jl/src/LazyModules.jl:170 [inlined]
 [3] load(f::File{DataFormat{:TIFF}, String}; canonicalize::Nothing, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:mmap,), Tuple{Bool}}})
   @ ImageIO ~/Documents/Julia/ImageIO.jl/src/ImageIO.jl:113
 [4] invokelatest(f::Any, args::Any; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:mmap,), Tuple{Bool}}})
   @ Base ./essentials.jl:718
 [5] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Formatted; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:mmap,), Tuple{Bool}}})
   @ FileIO ~/.julia/packages/FileIO/Nl9Lh/src/loadsave.jl:219
 [6] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:mmap,), Tuple{Bool}}})
   @ FileIO ~/.julia/packages/FileIO/Nl9Lh/src/loadsave.jl:185
 [7] load(::String; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:mmap,), Tuple{Bool}}})
   @ FileIO ~/.julia/packages/FileIO/Nl9Lh/src/loadsave.jl:113
 [8] f()
   @ Main ./REPL[3]:3
 [9] top-level scope
   @ REPL[4]:1

julia> using TiffImages

julia> f()
262144

Copy link
Member Author

@timholy timholy May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the problem would be that calling LazyModules.require(TiffImages) would always force users to explicitly load TiffImages---at that point, FileIO.load is pretty useless and we might as well suggest that people use TiffImages.load directly.

Now consider what happens if we handle this with an error hint instead:

  • the world-age error only happens if you load images from inside a function and then perform operations on them before returning from that function. A hint will at least tell users how to cope with the error, much like your LazyModules.require strategy
  • users who load image(s) from the REPL and then return to the REPL before doing anything with them (probably the most common usage in interactive data analysis, I'd guess) would not get an error.

To me that seems strictly better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh... I knew there must be something I missed when I opened PR #48 now I get it:

julia> f() = collect(load("tmp.png"))
f (generic function with 1 method)

julia> f()
ERROR: MethodError: no method matching size(::IndirectArrays.IndirectArray{ColorTypes.RGB{FixedPointNumbers.N0f8}, 2, UInt8, Matrix{UInt8}, OffsetArrays.OffsetVector{ColorTypes.RGB{FixedPointNumbers.N0f8}, Vector{ColorTypes.RGB{FixedPointNumbers.N0f8}}}})
The applicable method may be too new: running in world age 31349, while current world is 31382.
Closest candidates are:
  size(::IndirectArrays.IndirectArray) at ~/.julia/packages/IndirectArrays/BUQO3/src/IndirectArrays.jl:52 (method too new to be called from this world context.)
  size(::AbstractArray{T, N}, ::Any) where {T, N} at ~/packages/julias/julia-1.7/share/julia/base/abstractarray.jl:42
  size(::Union{LinearAlgebra.Adjoint{T, var"#s859"}, LinearAlgebra.Transpose{T, var"#s859"}} where {T, var"#s859"<:(AbstractVector)}) at ~/packages/julias/julia-1.7/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:172

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there is a real solution to battle the world-age issue with all the lazy loading benefits.

But maybe for now we'll just let this happen, i.e., merge this PR, and document the issue somewhere JuliaIO/FileIO.jl#362


## PNGs


Expand Down Expand Up @@ -100,13 +106,15 @@ end

## TIFFs

function load(f::File{DataFormat{:TIFF}}; kwargs...)
function load(f::File{DataFormat{:TIFF}}; canonicalize::Union{Bool,Nothing}=nothing, kwargs...)
canonicalize = something(canonicalize, !uses_lazy(kwargs))
data = TiffImages.load(f.filename; kwargs...)
return enforce_canonical_type(f, data)
return canonicalize ? enforce_canonical_type(f, data) : data
end
function load(s::Stream{DataFormat{:TIFF}}; kwargs...)
function load(s::Stream{DataFormat{:TIFF}}; canonicalize::Union{Bool,Nothing}=nothing, kwargs...)
canonicalize = something(canonicalize, !uses_lazy(kwargs))
data = TiffImages.load(stream(s); kwargs...)
return enforce_canonical_type(s, data)
return canonicalize ? enforce_canonical_type(s, data) : data
end

function save(f::File{DataFormat{:TIFF}}, image::S) where {T, S<:Union{AbstractMatrix, AbstractArray{T,3}}}
Expand Down
10 changes: 10 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,16 @@ Threads.nthreads() <= 1 && @info "Threads.nthreads() = $(Threads.nthreads()), mu
img_saveload = open(io->ImageIO.load(Stream{format"TIFF"}(io)), joinpath(tmpdir, "test_io.tiff"))
@test img == img_saveload
@test typeof(img_saveload) == ImageIO.canonical_type(f, img_saveload)

# mmapped images should not canonicalize by default, and can be controlled manually
img_saveload = ImageIO.load(f; mmap=true)
@test typeof(img_saveload) != ImageIO.canonical_type(f, img_saveload)
img_saveload = ImageIO.load(f; canonicalize=false)
@test typeof(img_saveload) != ImageIO.canonical_type(f, img_saveload)
img_saveload = open(io->ImageIO.load(Stream{format"TIFF"}(io); mmap=true), joinpath(tmpdir, "test_io.tiff"))
@test typeof(img_saveload) != ImageIO.canonical_type(f, img_saveload)
img_saveload = open(io->ImageIO.load(Stream{format"TIFF"}(io); canonicalize=false), joinpath(tmpdir, "test_io.tiff"))
@test typeof(img_saveload) != ImageIO.canonical_type(f, img_saveload)
end
end
end
Expand Down