Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make chosing filename the job of the fetch_method #54

Merged
merged 3 commits into from
Jul 16, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ register(DataDep(
remote_path::Union{String,Vector{String}...},
[checksum::Union{String,Vector{String}...},]; # Optional, if not provided will generate
# keyword args (Optional):
fetch_method=download # (remote_filepath, local_filepath)->Any
fetch_method=http_download # (remote_filepath, local_directory_path)->local_filepath
post_fetch_method=identity # (local_filepath)->Any
))
```
Expand Down Expand Up @@ -187,12 +187,14 @@ register(DataDep(
- Can take a vector of checksums, being one for each file, or a single checksum in which case the per file hashes are `xor`ed to get the target hash. (See [Recursive Structure](Recursive Structure) below)


- `fetch_method=download` a function to run to download the files.
- Function should take 2 parameters (remote_fikepath, local_filepath), and can return anything
- Defaults to `Base.download` which invokes commandline download tools.
- `fetch_method=http_download` a function to run to download the files.
- Function should take 2 parameters `(remote_filepath, local_directorypath)`, and can must return the local filepath to the file downloaded
- Can take a vector of methods, being one for each file, or a single method, in which case that method is used to download all of them. (See [Recursive Structure](Recursive Structure) below)
- Very few people will need to override this, but potentially it can be used to deal with things like authorisation (let me know if you try)

- Overloading this lets you change things about how the download is done -- the transport protocol.
- The default is suitable for HTTP[/S], without auth. Modifying it can add authentication or an entirely different protocol (e.g. git, google drive etc)
- This function is also responsible for workout out what the local file should be called (as this is protocol dependent)


- `post_fetch_method` a function to run after the files have download
- Should take the local filepath as its first and only argument. Can return anything.
- Default is to do nothing.
Expand Down
5 changes: 3 additions & 2 deletions src/DataDeps.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,16 @@ include("types.jl")
include("util.jl")
include("registration.jl")

include("filename_solving.jl")

include("locations.jl")
include("verification.jl")

include("resolution.jl")
include("resolution_automatic.jl")
include("resolution_manual.jl")

include("helpers.jl")
include("fetch_helpers.jl")
include("post_fetch_helpers.jl")
include("deprecations.jl")

end # module
22 changes: 0 additions & 22 deletions src/deprecations.jl
Original file line number Diff line number Diff line change
@@ -1,23 +1 @@
# This file is a part of DataDeps.jl. License is MIT.

@deprecate(
RegisterDataDep(name::String,
message::String,
remotepath,
hash=nothing;
fetch_method=download,
post_fetch_method=identity,
),
register(DataDep(name::String,
message::String,
remotepath,
hash;
fetch_method=fetch_method,
post_fetch_method=post_fetch_method)
)
)

@deprecate(
RegisterDataDep(name::String, message::String),
register(ManualDataDep(name, message))
)
43 changes: 38 additions & 5 deletions src/filename_solving.jl → src/fetch_helpers.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,40 @@
# This file is a part of DataDeps.jl. License is MIT.

# TODO Remove this whole thing once https://github.com/JuliaWeb/HTTP.jl/pull/273

"""
fetch_http(remotepath, localdir)

Pass in a HTTP[/S] URL and a directory to save it to,
and it downloads that file, returing the local path.
This is using the HTTP protocol's method of defining filenames in headers,
if that information is present.
"""
function fetch_http(remotepath, localdir)
@assert(localdir |> isdir)
filename = get_filename(remotepath)
localpath = safer_joinpath(localdir, filename)
Base.download(remotepath, localpath)
end


"""
safer_joinpath(basepart, parts...)

A variation on `joinpath`, that is more resistant to directory traveral attack
The parts to be joined (excluding the `basepart`),
are not allowed to contain `..`, or begin with a `/`.
If they do then this throws an `DomainError`.
"""
function safer_joinpath(basepart, parts...)
explain = "Possible Directory Traversal Attack detected."
for part in parts
contains(part, "..") && throw(DomainError(part, "contains illegal string \"..\". $explain"))
startswith(part, '/') && throw(DomainError(part, "begins with \"/\". $explain"))
end
joinpath(basepart, parts...)
end


"""
get_filename(remotepath)
Expand All @@ -17,13 +52,11 @@ function get_filename(remotepath)
filename = nothing
end

ret = if filename == nothing
if filename == nothing
# couldn't get it from the headers
basename(remotepath)
else
filename
filename = basename(remotepath)
end
ret
filename
end


Expand Down
File renamed without changes.
13 changes: 5 additions & 8 deletions src/resolution_automatic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -65,18 +65,15 @@ Performs in (async) parallel if multiple paths are given
"""
function run_fetch(fetch_method, remotepath, localdir)
mkpath(localdir)
filename = get_filename(remotepath)
localpath = joinpath(localdir, filename)
#use the local folder and the remote filename
fetch_method(remotepath, localpath)
localpath = fetch_method(remotepath, localdir)
localpath
end

function run_fetch(fetch_method, remotepaths::Vector, localdir)
function run_fetch(fetch_method, remotepaths::AbstractVector, localdir)
asyncmap(rp->run_fetch(fetch_method, rp, localdir), remotepaths)
end

function run_fetch(fetch_methods::Vector, remotepaths::Vector, localdir)
function run_fetch(fetch_methods::AbstractVector, remotepaths::AbstractVector, localdir)
asyncmap((meth, rp)->run_fetch(meth, rp, localdir), fetch_method, remotepaths)
end

Expand All @@ -95,11 +92,11 @@ function run_post_fetch(post_fetch_method, fetched_path)
end
end

function run_post_fetch(post_fetch_method, fetched_paths::Vector)
function run_post_fetch(post_fetch_method, fetched_paths::AbstractVector)
asyncmap(fp->run_post_fetch(post_fetch_method, fp), fetched_paths)
end

function run_post_fetch(post_fetch_methods::Vector, fetched_paths::Vector)
function run_post_fetch(post_fetch_methods::AbstractVector, fetched_paths::AbstractVector)
asyncmap((meth, fp)->run_post_fetch(meth, fp), post_fetch_methods, fetched_paths)
end

Expand Down
17 changes: 12 additions & 5 deletions src/types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,13 @@ DataDep(
- Can take a vector of checksums, being one for each file, or a single checksum in which case the per file hashes are `xor`ed to get the target hash. (See [Recursive Structure](Recursive Structure) below)


- `fetch_method=download` a function to run to download the files.
- Function should take 2 parameters (remote_fikepath, local_filepath), and can return anything
- Defaults to `Base.download` which invokes commandline download tools.
- `fetch_method=fetch_http` a function to run to download the files.
- Function should take 2 parameters (remotepath, local_directory), and must return a local filepath
- It is responsible for determining what the local filename should be
- Change this to change the transfer protocol, for example to use an auth'ed connection.
- Default `fetch_http` is a wrapper around `Base.download` which invokes commandline download tools.
- Can take a vector of methods, being one for each file, or a single method, in which case that method is used to download all of them. (See [Recursive Structure](Recursive Structure) below)
- Very few people will need to override this, but potentially it can be used to deal with things like authorisation (let me know if you try)
- Very few people will need to override this if they are just downloading public HTTP files.

- `post_fetch_method` a function to run after the files have download
- Should take the local filepath as its first and only argument. Can return anything.
Expand All @@ -78,6 +80,11 @@ DataDep(
- You can call `cwd()` to get the the data directory for your own functions. (Or `dirname(local_filepath)`)
- Can take a vector of methods, being one for each file, or a single method, in which case that ame method is applied to all of the files. (See **Recursive Structure** in the README.md)
"""
function DataDep(name::String, message::String, remotepath, hash=nothing; fetch_method=download, post_fetch_method=identity)
function DataDep(name::String,
message::String,
remotepath, hash=nothing;
fetch_method=fetch_http,
post_fetch_method=identity)

DataDep(name, remotepath, hash, fetch_method, post_fetch_method, message)
end
4 changes: 2 additions & 2 deletions src/verification.jl
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ If a vector of paths is provided
and a vector of hashing methods (of any form)
then they are all required to match.
"""
function run_checksum(hash::Vector, path::Vector)
function run_checksum(hash::AbstractVector, path::AbstractVector)
all(run_checksum.(hash, path))
end

Expand All @@ -84,6 +84,6 @@ and returns a UInt8 array of the hash.
xored if there are multiple files
"""
checksum(hasher, filename) = open(hasher, filename, "r")
checksum(hasher, filenames::Vector) = xor.(checksum.(hasher, filenames)...)
checksum(hasher, filenames::AbstractVector) = xor.(checksum.(hasher, filenames)...)

hexchecksum(hasher, filename) = bytes2hex(checksum(hasher, filename))
Loading