Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace HTTP.download #273

Merged
merged 12 commits into from
Sep 17, 2018
Merged

Replace HTTP.download #273

merged 12 commits into from
Sep 17, 2018

Conversation

oxinabox
Copy link
Member

@oxinabox oxinabox commented Jul 6, 2018

rather than deleting it as in #271
this replace it with a new and better version.

It is very verbose, but I think the 0.7 way of handling that is to tell your logger to be quieter.
Also to tell your logger to display a progress bar instead, that should be possible with the new system.

Locally this was working when I had it as a separate package but it is screwing up now that I've merge it in and I'm at a lose as to why.

ERROR: DNSError: , unknown node or service (EAI_NONAME)

@oxinabox
Copy link
Member Author

oxinabox commented Jul 8, 2018

Fixed it, I think
Needed to be using Base.open

@oxinabox
Copy link
Member Author

oxinabox commented Jul 8, 2018

So this is failing on 0.6, because of usign the new @info.
with lots of kwargs.
I think using @info that way is good, because it will play nice with a Logger that implements progress bars.

@codecov-io
Copy link

codecov-io commented Jul 8, 2018

Codecov Report

Merging #273 into master will increase coverage by 19.17%.
The diff coverage is 95.91%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #273       +/-   ##
===========================================
+ Coverage   71.84%   91.01%   +19.17%     
===========================================
  Files          35       30        -5     
  Lines        1964     1258      -706     
===========================================
- Hits         1411     1145      -266     
+ Misses        553      113      -440
Impacted Files Coverage Δ
src/compat.jl 80% <ø> (+1.05%) ⬆️
src/HTTP.jl 100% <ø> (+22.72%) ⬆️
src/client.jl 61.36% <ø> (+5.97%) ⬆️
src/download.jl 95.91% <95.91%> (ø)
src/TimeoutRequest.jl 50% <0%> (-20%) ⬇️
src/StreamRequest.jl 91.17% <0%> (-2.95%) ⬇️
src/debug.jl 20% <0%> (-1.43%) ⬇️
src/Parsers.jl 96.15% <0%> (-0.57%) ⬇️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5630bf4...a6b4dba. Read the comment docs.

@oxinabox
Copy link
Member Author

oxinabox commented Jul 8, 2018

Demo of use (with OhMyLog.jl)
HTTP Demo

@samoconnor
Copy link
Contributor

Regarding 0.6 compatibility. I would just hold of merging this until we update REQUIRE to say julia 0..7.

See comment here: #264 (comment)

"I would say that at this point, with 0.7 beta out, feature addition PRs for HTTP.jl could(should?) be 0.7-only. i.e. we should put julia 0.7 in REQUIRE soon anyway."

@oxinabox
Copy link
Member Author

oxinabox commented Jul 10, 2018

Fair, I am curious what it would take for 0.6 compat so I just made a commit (also I want to retrigger 0.7 CI which is being flaky).

@oxinabox
Copy link
Member Author

While I remember: this should take a headers argument.
because that is required to work with OATH.

@quinnj
Copy link
Member

quinnj commented Jul 10, 2018

Very nice demo; It'd be nice if the numbers could be pretty-printed so they were easier to glance at and grok. W/ regards to headers, it'd be nice if this basically matched some of the convenience methods like HTTP.get/HTTP.post in terms of what you can pass it (headers, body, etc).

Otherwise, I don't have a super strong preference for 0.6 vs. 0.7; if it works on 0.6, I'm happy to merge. I think we'll do one more release supporting both and then will go 0.7 only.

@oxinabox oxinabox closed this Jul 11, 2018
@oxinabox oxinabox reopened this Jul 11, 2018
@oxinabox
Copy link
Member Author

Ahhhh,
this has a Directory Traversal vulnerability I think.
Or at least needs to be checked carefully for one.

Consider that it lets the file set its own filepath.
And that

julia> joinpath("alocaldir", "/etc/passwd")
"/etc/passwd"

So I think we need to blacklist any filepath that starts with a /, or contains any ..
and probably some other things.
Does anyone know of a set of blacklist rules for this?

@samoconnor
Copy link
Contributor

As a double-check, how about adding an assertion that the resulting path is actually under the download directory.

path = joinpath(download_dir, file_path)
@assert startswith(normpath(path), normpath(download_dir))

@oxinabox
Copy link
Member Author

Ah, I didn't realize that normpath existed.
I was thinking about doing this using realpath,
but that has a variety of problems since symlinks make it not nesc true.

On the other hand, normpath literally just looked for .. and removes them.
Given we have banned .. outright (because you never want this in a downloaded file)
idk, it would be an assertion, and it isn't like it would take long per check.

@EricForgy
Copy link
Contributor

I'm a little confused. I would expect something like HTTP.download to just download a specified file to some specified location.

@oxinabox
Copy link
Member Author

oxinabox commented Jul 11, 2018

I would expect something like HTTP.download to just download a specified file to some specified location.

It does. But the question is what does a specific location mean?
HTTP.download should work a lot like clicking download in a browser.
Only instead of a dialog appearing, it uses the second argument to give the specific location

  • if a specific location is path to a file* then yes, send it there. Easy Done.
    - (* or technicall to a nonexistant location, which this PR currently assumes to be a file)
  • if a specific location is a directory, then we need to workout the filename to put in that directory.
    - We need to get the filename right, because
    - it is part of the file being downloaded -- intent of the file creator
    - FileIO.jl (etc) need the file extension to know how to open it.
    - If it is coming from a very simple HTTP server, then the file name is generally just the last part of the URL
    - But HTTP lets it be specified in the header.
    - A surfiently evil header on a compromised server might say that the filename of this is say ../../.shh/accept/evilkey.pub, and then break into your system

@EricForgy
Copy link
Contributor

Why allow the case where the filename is located in the header? It would be reasonable, I think, to reject the download if the filename is not in the url :thinking_face:

@oxinabox
Copy link
Member Author

oxinabox commented Jul 11, 2018

There are a lot of files I want to download (millions)
That do not have it in the URL.
Anything on any modern filehosting service like Dropbox or google drive, anything on data.gov etc etc
I want to download them and I want their original filenames

@KristofferC
Copy link
Contributor

Should this also accept headers?

@oxinabox
Copy link
Member Author

oxinabox commented Aug 1, 2018

Yes, it should.
And it will before being merged.

See also #273 (comment)

@quinnj
Copy link
Member

quinnj commented Aug 4, 2018

@oxinabox, what's the status here? ready to review? still things you want to cleanup? just let me know. I'm going to target one final 0.6 release in the next few days and then we can rip out all the 0.6 compat stuff.

@oxinabox
Copy link
Member Author

oxinabox commented Aug 4, 2018

I want to add everything you suggested in
#273 (comment)
And I was kinda not rushing on it since can wait for 0.7

It won't take much to finish that off

@oxinabox
Copy link
Member Author

oxinabox commented Aug 7, 2018

Right, added headers and kwargs.
I didn't add body, because that doesn't seem reasonable.
I could be wrong, maybe someone uses it for auth.
But a GET request with a body; particularly for a download doesn't make sense to me.

I've not added human units, as I am of many minds and bikesheds over that.
So it seems better to add that later in another PR.

So this is ready for review, I hope.

The compat stuff can be stripped out, though I am not sure the best way to do that right now.
Can I just delete the compat.jl file?

@quinnj quinnj mentioned this pull request Sep 7, 2018
@oxinabox
Copy link
Member Author

oxinabox commented Sep 13, 2018

Ok, this is all rebased for 0.7/1.0
and working
and working in 0.7

@oxinabox
Copy link
Member Author

Assuming windows passes, I am done with this now.

src/download.jl Outdated Show resolved Hide resolved
@oxinabox
Copy link
Member Author

Bump

@oxinabox
Copy link
Member Author

oxinabox commented Sep 17, 2018

New video, with OhMyLog.

asciicast

- any additional keyword args (`kw...`) are passed on to the HTTP request.
"""
function download(url::AbstractString, local_path=nothing, headers=Header[]; update_period=1, kw...)
format_progress(x) = "$(round(100x, digits=2))%"
Copy link
Member Author

@oxinabox oxinabox Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly this should not be formatted.
idk what Juno is doing with progress.
OhMyLog doesn't mind if it is given a string or a float.
But something else doing progress bars might like it to be a float.

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @oxinabox ! Thanks for doing this.

@quinnj quinnj merged commit 51fe562 into JuliaWeb:master Sep 17, 2018
@oxinabox
Copy link
Member Author

Can we have a release tagged?
I want to Switch over to using this in my packages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants