Skip to content

Commit

Permalink
Merge pull request reallyasi9#37 from reallyasi9/12-allow-passing-of-…
Browse files Browse the repository at this point in the history
…keyword-arugments-from-zip_files-convenience-method-to-zipfilesink-for-control-over-compression-etc

12 allow passing of keyword arugments from zip files convenience method to zipfilesink for control over compression etc
  • Loading branch information
reallyasi9 authored Nov 5, 2024
2 parents 7bee843 + 551b47d commit 475018e
Show file tree
Hide file tree
Showing 4 changed files with 87 additions and 33 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ CodecZlib = "0.7"
StringEncodings = "0.3"
TranscodingStreams = "0.11"
TruncatedStreams = "2.0"
julia = "1.10.5"
julia = "1.11"

[extras]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
Expand Down
41 changes: 23 additions & 18 deletions docs/src/sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ using ZipStreams

zipsource("archive.zip") do zs
f = next_file(zs)
validate(f) # throws if there is an inconsistency
validate(f) # throws if there is an inconsistency in the file's checksums
end
```

Expand All @@ -133,44 +133,49 @@ using ZipStreams

io = open("archive.zip")
zipsource(io) do zs
validate(zs) # validate all files and the archive itself
validate(zs) # validate all local checksums and the central directory
end

seekstart(io)
zipsource(io) do zs
f = next_file(zs) # read the first file
validate(zs) # validate all files except the first!
validate(zs) # validate all remaining local checksums and the central directory
end

close(io)
```

The `validate` methods consume the data in the source and return vectors of
raw bytes. When called on an archived file, it returns a single `Vector{UInt8}`.
When called on the archive itself, it returns a `Vector{Vector{UInt8}}` containing
the remaining unread file data in archive order, _excluding any files that have already
been read by iterating or with `next_file`_.
The `validate` methods consume the remaining data in the source. When called on an archived
file, it reads the remainder of the data in the file that has not yet been read from the
source (if any) and discards it. When called on the archive itself, will invalidate any
currently referenced file, meaning reading from an open file within the archive after
running validate on the archive will result in undefined behavior.

!!! warning "Reading from two places in an archive at once"

Do not attempt to read from two places in an open archive at once, or jump between one
open file and another, as this will result in undefined behavior!

```julia
using ZipStreams

zs = zipsource("archive.zip")
f1 = next_file(zs)
data1 = validate(f1) # contains all the file data as raw bytes
@assert typeof(data1) == Vector{UInt8}
validate(f1) # reads all data in f1 and discards it
@assert eof(f1) == true
@assert isempty(read(f1)) == true
close(zs)

zs = zipsource("archive.zip")
f2 = next_file(zs)
println(readline(f2)) # read a line off the file first
data2 = validate(f2) # contains the remaining file data excluding the first line!
@assert typeof(data2) == Vector{UInt8}
@assert sizeof(data2) < sizeof(data1)
close(zs)
readline(f2) # reads a line off the file first
validate(zs) # reads everything, including the remainder of the first file

# DO NOT READ FROM f2 HERE AFTER CALLING validate ON THE ARCHIVE!
# THIS RESULTS IN UNDEFINED BEHAVIOR!
# @assert eof(f2) == false # THIS IS A LIE!
# read(f2) # THIS MAY CRASH YOUR COMPUTER!

zs = zipsource("archive.zip")
all_data = validate(zs) # returns a Vector{Vector{UInt8}} of all remaining files
@assert all_data[1] == data1
close(zs)
```

Expand Down
52 changes: 38 additions & 14 deletions src/convenience.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,49 @@
Create an archive from files on disk.
The archive `out_filename` will be created using the `zipsink` method with the given keyword
arguments. `in_filename` can be a single path or a vector of multiple paths on disk. The
files will be written in the archive with paths matching the closest common relative path
between the current directory (`"."`) and the full path of the file, so if `archive_filename`
is "/a/b/archive.zip" and one of `files` is "/a/c/file", then the file will be witten
with the path "c/file".
The archive `out_filename` will be created using the `zipsink` method with the keyword
arguments split as listed below. `in_filename` can be a single path or a vector of multiple
paths on disk. The files will be written in the archive with paths matching the closest
common relative path between the current directory (`"."`) and the full path of the file, so
if `archive_filename` is "/a/b/archive.zip" and one of `files` is "/a/c/file", then the file
will be witten with the path "c/file".
If `dir` is a directory and `recurse_directories` is `true`, then all files and directories found when
traversing the directory will be added to the archive. If `recurse_directories` is `false` (the
default), then subdirectories of `dir` will not be traversed.
If `dir` is a directory and `recurse_directories` is `true`, then all files and directories
found when traversing the directory will be added to the archive. If `recurse_directories`
is `false` (the default), then subdirectories of `dir` will not be traversed.
All files are written to the archive using the default arguments specified by
`open(zipsink, fn)`. See [`open(::ZipArchiveSink, ::AbstractString)`](@ref) for more information.
`open(zipsink, fn; keyword_args..)`, with special keyword arguments split as described
below.
See [`zipsink`](@ref) for more information about the optional keyword arguments.
# Arguments
- `out_filename::AbstractString`: the output archive filename to create.
- `files::AbstractVector{<:AbstractString}`: a list of file paths to add to the newly
created archive.
- `dir::AbstractString`: a path to a directory to add to the newly created archive.
# Keyword arguments
- `utf8::Bool = true`: use UTF-8 encoding for file names (if `false`, use IBM437).
- `archive_comment::AbstractString = ""`: archive comment string to add to the central
directory, equivalent to passing the `comment` keyword to `zipsink`.
- `file_options::Dict{String, Any} = nothing`: if a file name added to the archive _exactly_
matches (`==`) a key in `file_options`, then the value corresponding to that key will be
splatted as keyword arguments for that file only, overriding keyword arguments passed as
described below.
- All other keyword arguments: passed unmodified to the `open(sink, filename)` method.
See [`open(::ZipArchiveSink, ::AbstractString)`](@ref) and [`zipsink`](@ref) for more
information about the optional keyword arguments available for each method.
"""
function zip_files(archive_filename::AbstractString, input_filenames::AbstractVector{<:AbstractString}; kwargs...)
zipsink(archive_filename; kwargs...) do sink
function zip_files(archive_filename::AbstractString, input_filenames::AbstractVector{<:AbstractString}; utf8::Bool=true, archive_comment::AbstractString="", kwargs...)
file_options, global_kwargs = TranscodingStreams.splitkwargs(kwargs, (:file_options,))
zipsink(archive_filename; utf8=utf8, comment=archive_comment) do sink
for filename in input_filenames
# pull out file options and override global_kwargs, if possible
file_kwargs = Dict{Symbol, Any}(pairs(global_kwargs))
if !isempty(file_options) && filename in keys(file_options)
push!(file_kwargs, pairs(file_options[filename])...)
end
# note: relpath treats path elements with different casing as different, even on case-insensitive filesystems
# this can be a problem if, e.g., tempdir() and pwd() return path elements with different cases
# so we have to make sure to normalize the paths
Expand All @@ -32,7 +56,7 @@ function zip_files(archive_filename::AbstractString, input_filenames::AbstractVe
mkpath(sink, clean_path)
else
open(filename, "r") do io
open(sink, clean_path; make_path=true) do fsink
open(sink, clean_path; make_path=true, file_kwargs...) do fsink
write(fsink, io)
end
end
Expand Down
25 changes: 25 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,31 @@ end
end
end

@testitem "zip_files with file options" tags = [:utils] begin
include("common.jl")

multi_file = test_file_name(true, true, false, false, false, false, "multi")

mktempdir(pwd()) do tdir
unzip_files(multi_file; output_path=tdir, make_path=true)
mktemp(pwd()) do path, io
filenames = ["hello.txt", "subdir/hello.txt"]
test_filename = joinpath(tdir, "hello.txt")
file_kwargs = Dict(test_filename => (compression=:deflate, level=1))
zip_files(path, joinpath.(Ref(tdir), filenames); compression=:store)
zipsource(path) do source
for (f, filename) in zip(source, filenames)
if filename == test_filename
@test info(f).compression_method == 0x0008 # deflate
else
@test info(f).compression_method == 0x0000 # store
end
end
end
end
end
end

@testitem "zip_files entire directory, no recurse (default)" tags = [:utils] begin
include("common.jl")

Expand Down

0 comments on commit 475018e

Please sign in to comment.