Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternate Downloads.jl-based backend (alternative to HTTP.jl) #396

Merged
merged 24 commits into from
Aug 13, 2021
Merged

Alternate Downloads.jl-based backend (alternative to HTTP.jl) #396

merged 24 commits into from
Aug 13, 2021

Conversation

ericphanson
Copy link
Member

@ericphanson ericphanson commented Jul 10, 2021

I was running into issues with HTTP when trying to make many concurrent requests to s3, so I thought I'd try libcurl through the Downloads.jl stdlib. Turned out to be pretty easy to at least get something basic working, e.g. I can pull down a test file from s3 with no problems with this little bit of code-- and you can even pass a Downloads.Downloader along to have more control over the client. I haven't had a chance to do more thorough testing, but I thought I'd put the PR up so maybe we can see what CI says to test out the Downloads-based backend. (Right now Downloads.jl is set to the default in this PR for testing purposes, but I think we should keep HTTP.jl as the default unless it really is a big upgrade).

@ericphanson
Copy link
Member Author

I tried a simple asyncmap loop to pull down many small objects (380k evenly sized objects totalling ~7GB), just

using Serialization, AWS, AWSS3
@time results = @showprogress asyncmap(keys) do key
    value = deserialize(IOBuffer(s3_get(bucket, key; raw=true))) # these were serialized Julia objects
    return key => value
end

With the HTTP.jl backend I left it overnight and it was stuck at 0% in the morning-- I've seen this in other experiments too, and calling

empty!(HTTP.ConnectionPool.POOL.conns)

tends to un-stick it for a little bit, but it gets stuck again when using many concurrent connections. @ararslan has said he had similar issues awhile ago and isolated it to MbedTLS issues.

But with the Downloads.jl backend it completed in 12 minutes with no issues-- and probably can be optimized further, this was just the simplest approach to start with (using the global Downloader with 100 concurrent tasks in an asyncmap loop).

src/utilities/request.jl Outdated Show resolved Hide resolved
src/utilities/request.jl Outdated Show resolved Hide resolved
src/utilities/request.jl Outdated Show resolved Hide resolved
src/utilities/request.jl Outdated Show resolved Hide resolved
src/utilities/request.jl Outdated Show resolved Hide resolved
src/utilities/request.jl Outdated Show resolved Hide resolved
Copy link
Member

@mattBrzezinski mattBrzezinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I'll kick bors to give it a shot!

src/utilities/request.jl Outdated Show resolved Hide resolved
src/utilities/request.jl Outdated Show resolved Hide resolved
@mattBrzezinski
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Jul 12, 2021
@bors
Copy link
Contributor

bors bot commented Jul 12, 2021

try

Build failed:

@ericphanson
Copy link
Member Author

Oh no: https://github.com/JuliaLang/Downloads.jl/blob/cd002c3c6936d144ae668d70e18337931706c63a/Project.toml#L15. That might be a big problem since we support 1.0 here. Would AWS.jl be up for going to 1.3+? Maybe at least once 1.6 is the LTS?

@mattBrzezinski
Copy link
Member

Oh no: https://github.com/JuliaLang/Downloads.jl/blob/cd002c3c6936d144ae668d70e18337931706c63a/Project.toml#L15. That might be a big problem since we support 1.0 here. Would AWS.jl be up for going to 1.3+? Maybe at least once 1.6 is the LTS?

I'd like to keep support for all LTS builds. It seems that with 1.6 and your Preferences.jl comment that this would be the best version going forward.

@ericphanson
Copy link
Member Author

Just as an update on this-- I'm finding various edge cases when trying to use it internally, leading to fixes like a481a46 and 16419cd. I am still having issues using it with credentials_from_webtoken(); for some reason I get a redirect (which led to #397) instead of the credentials.

I'm now using 57141e4 on my internal project in order to use the HTTPBackend for configuration and the Downloads backend for s3_get, to at least make progress there.

I'm thinking this might be a bigger job than I anticipated in order to sort out the edge cases and match the current implementation's handling of various things, so with that and the lack of 1.0 support in Downloads.jl, I'm thinking this might be a slow work-in-progress for awhile. But I plan to actively use this branch at least for s3_get because in my use it takes 12 minutes with asyncmap with Downloads.jl instead of ~ 4hr for serial access with the HTTP.jl backend, and hopefully I'll be able to find and fix edge cases over time so that when 1.6 LTS lands this might be reliable enough.

bors bot added a commit that referenced this pull request Jul 19, 2021
401: provide hooks for custom backends r=omus a=ericphanson

Instead of adding a whole new backend as in #396, I thought it would be better to just start with an `AbstractBackend` to provide hooks to use alternate backends. This way we don't need to bump the Julia requirements to start trying out alternate backends downstream. It also provides a smaller incremental step than having AbstractBackends + a specific alternate backend in the same PR.

This also provides what is hopefully a useful feature: you could do
```julia
using AWS
AWS.DEFAULT_BACKEND[] = HTTPBackend(custom_http_options)
```
to specify custom HTTP options globally in your application (which can still be overwritten per-request); I would have found this useful when trying experiments like tweaking the pipelining and connection pooling settings.

In the same way, a Downloads.jl-based backend could pass a `Downloads.Downloader` along inside the backend object instead of requiring adding more fields to Request for the `downloader` like in #396.

Co-authored-by: Eric Hanson <[email protected]>
@ericphanson
Copy link
Member Author

I've redone this on top of #401. It requires Julia 1.3+ so it'll need to wait until the new LTS, but we can at least start to get it ready for then.

bors try

bors bot added a commit that referenced this pull request Jul 19, 2021
bors bot added a commit that referenced this pull request Jul 19, 2021
@ericphanson
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 19, 2021
@bors
Copy link
Contributor

bors bot commented Jul 19, 2021

try

Build failed:

@c42f
Copy link
Contributor

c42f commented Jul 20, 2021

Thanks for this @ericphanson, we're in a similar position of having trouble with HTTP.jl and S3 in production. It will be great to have the option to use Downloads.jl. I've been making some progress on the HTTP.jl side over at JuliaWeb/HTTP.jl#517 and I think I may have identified the problem and a potential solution.

Did you consider whether it's possible to add an alternative Downloads-based backend to HTTP.jl instead of here in AWS.jl? That would fix the problem for all libraries using HTTP rather than AWS.jl only. One idea would be to allow an alternative backend to replace the connection and stream layers (maybe also other layers) in the HTTP default stack (as returned by HTTP.stack()).

@JuliaCloud JuliaCloud deleted a comment from bors bot Jul 20, 2021
@JuliaCloud JuliaCloud deleted a comment from bors bot Jul 20, 2021
@ericphanson
Copy link
Member Author

Thanks for this @ericphanson, we're in a similar position of having trouble with HTTP.jl and S3 in production. It will be great to have the option to use Downloads.jl. I've been making some progress on the HTTP.jl side over at JuliaWeb/HTTP.jl#517 and I think I may have identified the problem and a potential solution.

Woah! That's great. I think I saw that issue awhile ago but somehow forgot about it. That looks like great progress. Perhaps an alternate backend isn't really needed then, if HTTP's client issues instead can be fixed. I found it a bit too intimidating to try to fix myself which is why I switched to just swapping over to Downloads.jl.

Did you consider whether it's possible to add an alternative Downloads-based backend to HTTP.jl instead of here in AWS.jl? That would fix the problem for all libraries using HTTP rather than AWS.jl only. One idea would be to allow an alternative backend to replace the connection and stream layers (maybe also other layers) in the HTTP default stack (as returned by HTTP.stack()).

I did-- I actually remember that was once floated as a plan for HTTP.jl back when there was a bit of talk about making it a stdlib to replace Base.download. I believe that effort led to Downloads.jl instead. I asked a week or two ago on #web on Slack about using libcurl for the clientside functionality of HTTP.jl and Avik suggested using Downloads.jl directly instead, and that's what I've done here. But I think doing it on the HTTP.jl side could make sense too. #401 could be used here to swap between HTTP.jl-based-backends here too, instead of between direct-Downloads.jl and HTTP.jl backends.

@c42f
Copy link
Contributor

c42f commented Jul 21, 2021

I found it a bit too intimidating to try to fix myself

Agreed, the code is kind of intimidating to dive into! Especially the connection pool which is where I believe the issues are.

I believe that effort led to Downloads.jl instead. I asked a week or two ago on #web on Slack about using libcurl for the clientside functionality of HTTP.jl and Avik suggested using Downloads.jl directly instead, and that's what I've done here.

Yes, I think this is great. Even if we have a larger plan that shouldn't block making progress here in AWS.jl. What you've got here is heading in the right direction IMO and would be extremely useful for me (I just need to port all our code from AWSSDK to AWS.jl 😬 )

Jacob isn't very available right now but I managed to catch him on a DM on slack. He's of the opinion that HTTP and Downloads should come together in some way so there's definitely appetite for change if we can pull together some sensible PRs and get things moving on that front. Perhaps I should open an issue in HTTP.jl with thoughts about how to proceed.

@ericphanson
Copy link
Member Author

ericphanson commented Jul 21, 2021

Cool! I unfortunately don't have a lot of time for this (/ HTTP.jl) right now-- this draft of a PR is enough that I can performantly access cached data on s3 which unblocks me to do the actual thing I'm working on (working on an ML model which happens to train on k8s in AWS) but I hope to at least be able to push this PR through in the next few weeks.

@mattBrzezinski
Copy link
Member

Cool! I unfortunately don't have a lot of time for this (/ HTTP.jl) right now-- this draft of a PR is enough that I can performantly access cached data on s3 which unblocks me to do the actual thing I'm working on (working on an ML model which happens to train on k8s in AWS) but I hope to at least be able to push this PR through in the next few weeks.

I think we can definitely get this done and unblock everyone here! I'll be gone post-JuliaCon until August 9th (feel free to push through if I'm gone). I'm hoping we announce a new LTS at JuliaCon this year, which should make this PR go through easily enough. If not, I'm sure we can create something that works as an alternative.

.github/workflows/CI.yml Outdated Show resolved Hide resolved
@ericphanson
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Jul 27, 2021
@bors
Copy link
Contributor

bors bot commented Jul 27, 2021

try

Build failed:

@mattBrzezinski
Copy link
Member

If there's nothing left to do on this, can we merge it before the giant formatting PR goes in? I'd prefer not to deal with the merge conflicts.

Yep! Makes sense to me!

Copy link
Member Author

@ericphanson ericphanson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! (Modulo one small test fix)

test/AWS.jl Outdated Show resolved Hide resolved
@christopher-dG
Copy link
Member

christopher-dG commented Aug 13, 2021

bors r+

:)

edit: :(

@bors
Copy link
Contributor

bors bot commented Aug 13, 2021

🔒 Permission denied

Existing reviewers: click here to make christopher-dG a reviewer

@mattBrzezinski
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Aug 13, 2021
396: Alternate Downloads.jl-based backend (alternative to HTTP.jl) r=mattBrzezinski a=ericphanson

I was running into issues with HTTP when trying to make many concurrent requests to s3, so I thought I'd try libcurl through the Downloads.jl stdlib. Turned out to be pretty easy to at least get something basic working, e.g. I can pull down a test file from s3 with no problems with this little bit of code-- and you can even pass a `Downloads.Downloader` along to have more control over the client. I haven't had a chance to do more thorough testing, but I thought I'd put the PR up so maybe we can see what CI says to test out the Downloads-based backend. (Right now Downloads.jl is set to the default in this PR for testing purposes, but I think we should keep HTTP.jl as the default unless it really is a big upgrade).

Co-authored-by: Eric Hanson <[email protected]>
Co-authored-by: Chris de Graaf <[email protected]>
@christopher-dG
Copy link
Member

Isn't this supposed to be merged now?

@ericphanson
Copy link
Member Author

ericphanson commented Aug 13, 2021

Bors runs CI and then merges when CI passes

edit: oh it passed. Uhh idk

@bors
Copy link
Contributor

bors bot commented Aug 13, 2021

try

Timed out.

@ericphanson
Copy link
Member Author

Maybe it’s waiting for another check and timing out while waiting? (The format check?). Can this just be manually merged?

@bors
Copy link
Contributor

bors bot commented Aug 13, 2021

Canceled.

@christopher-dG
Copy link
Member

I think the bors config just needed updating to look for the new 1.3 status. Not sure if it'll apply to this PR though, I bet it gets its value from the main branch.

@mattBrzezinski do you have the power to manually merge?

@ericphanson
Copy link
Member Author

I think the bors config just needed updating to look for the new 1.3 status. Not sure if it'll apply to this PR though, I bet it gets its value from the main branch.

It actually sounds like the other way:

Note: that bors reads bors.toml from the pull requests it’s merging, not the one in master, so changes to the file get checked before they land. You can therefore verify whether bors is working by using bors r+ to try and land the PR that enables it!

https://bors.tech/documentation/getting-started/

bors merge

@bors
Copy link
Contributor

bors bot commented Aug 13, 2021

🔒 Permission denied

Existing reviewers: click here to make ericphanson a reviewer

@bors
Copy link
Contributor

bors bot commented Aug 13, 2021

try

Timed out.

@mattBrzezinski
Copy link
Member

You need to modify the bors.toml file to remove 1.0 from it with 1.3

@mattBrzezinski
Copy link
Member

Sorry I’m running around doing some errands currently.

Copy link
Member

@mattBrzezinski mattBrzezinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

@bors
Copy link
Contributor

bors bot commented Aug 13, 2021

Build succeeded:

@bors bors bot merged commit 380d4cf into JuliaCloud:master Aug 13, 2021
@ericphanson ericphanson deleted the eph/libcurl branch August 13, 2021 21:37
@mattBrzezinski
Copy link
Member

Just needed the magic touch! ✨

read_body(x::IOBuffer) = take!(x)
function read_body(x::IO)
close(x)
return read(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just reading through this, I don't understand close(x); read(x) — for normal files read(x) will return zero bytes after it's closed. (For IOBuffer it throws an exception).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a workaround specifically for Base.BufferStream, without closeing it beforehand, read hangs. readavailable works as well and that was my first choice, but @ericphanson pointed out this warning:

  │ Warning
  │
  │  The amount of data returned is implementation-dependent; for example it can depend on the internal choice of buffer size. Other
  │  functions such as read should generally be used instead.

I'm no expert when it comes to the various IO types and how to properly handle them all, so any feedback is good feedback!

Copy link
Contributor

@c42f c42f Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps we just need to use read() here and hope for the best? It turns out there's a lot of inconsistency in the blocking behavior of streams in Julia — see the issue here JuliaLang/julia#24526

Perhaps for this case, the newly available — but strangely named — Base.shutdown() is what you want JuliaLang/julia#40783

But even so, I'm not sure that AWS should be unilaterally closing or shutting down the write side of the the stream at all — to me that seems to be the business of the code which passes in the stream. For example, if reading from a pipe, presumably you want to keep reading until the writing side decides it's done?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, seems like whoever was filling the buffer needs to call shutdown (or close) to mark it as being done

Copy link
Member Author

@ericphanson ericphanson Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that AWS should be unilaterally closing or shutting down the write side of the the stream at all — to me that seems to be the business of the code which passes in the stream.

Well, we only do it if return_stream is false, in which case you are supposed to get back some bytes as far as I understand it, and it seems like we need to close it to get those bytes.

Agreed, seems like whoever was filling the buffer needs to call shutdown (or close) to mark it as being done

I’m not sure who that is- Downloads.jl fills it but I don’t think they should be closing it (because of exactly what @c42f just said).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, somebody was wrong before, and now AWS.jl is now possibly also very bad at stream EOF handling here too. Whoever created the stream is generally responsible for marking when it is done, or passing it to another function which will take care of that. I suspect Downloads.jl absolutely should be copying the EOF marker from the underlying stream also. It is out-of-band data, but likely just as relevant as the in-band bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure who that is- Downloads.jl fills it but I don’t think they should be closing it

Perhaps I've misunderstood what's going on here, but is the problem in Downloads.jl then? Shouldn't they close the write side when writing is done?

Copy link
Member Author

@ericphanson ericphanson Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 100% don't understand what's going on. But let me try to summarize the current behavior (including this PR).

Summary

The HTTPBackend only sets a response_stream if both the user has not passed one and they ask for return_stream=true, whereas with the DownloadsBackend we create one whenever the user has not passed one. This is because Downloads.jl wants you to pass an IO object to get the response always, unlike HTTP.jl which will either populate a stream you pass or return a vector of bytes directly.

  • Case 1: user passes response_stream, sets return_stream=true
    • neither backend creates a stream, nor closes it ✔️
  • Case 2: user passes response_stream, sets return_stream=false
    • HTTP backend passes the stream along; HTTP.jl does not return a body, so the caller does not get back a body but does have their stream populated. No one closes the stream. Seems correct ✔️
    • Downloads backend passes the stream along, but closes the stream and collects the bytes to return. Seems wrong ❌
  • Case 3: user does not pass response_stream, sets return_stream=true
    • Both backends create a stream and return it; neither closes it. Not sure if this is correct or not ❓
  • Case 4: user does not pass response_stream, and sets return_stream=false
    • HTTPBackend does not create a stream, just gets the bytes from the body HTTP.jl returns ✔️
    • DownloadsBackend internally creates a stream, then closes it and collects the bytes to return ✔️

If I'm understanding then, I think Case 2 is probematic for the DownloadsBackend, and we should just not try to return bytes in this case, matching the behavior of the HTTP backend. I.e. we should not be closing the stream in this case.

Case 3: I don't really know if we should close the stream or not. I think ideally this case wouldn't exist, because I don't think there really should be a return_stream option-- I think a simpler API is "if you want a stream you should pass one and it will get populated (and not closed); if you don't want a stream, don't pass one and you will get a vector of bytes back".

So: if that analysis is right, Downloads.jl is fine, and we have at least one problematic case here. But I'll admit to definitely not fully understanding the semantics of how streams should work (and the manual isn't very helpful in this regard), though the discussion here helps me a bit.

Though possibly, HTTP.jl and Downloads.jl should be closing the streams whenever they are passed and the backend is done populating them? I think maybe that's what @vtjnash is saying in #396 (comment). In which case most/all of these cases are wrong for both backends.

Appendix: Following the path of get_object

To see these cases, play yout, we let's say we are doing an s3 get_object.

  • We start

    AWS.jl/src/services/s3.jl

    Lines 2239 to 2246 in 181c340

    function get_object(
    Bucket,
    Key,
    params::AbstractDict{String};
    aws_config::AbstractAWSConfig=global_aws_config(),
    )
    return s3("GET", "/$(Bucket)/$(Key)", params; aws_config=aws_config)
    end
    and possibly submit params which is a dictionary with additional parameters
  • we use those params to build Request, changing the name from params to args:

    AWS.jl/src/AWS.jl

    Lines 213 to 217 in 181c340

    request = Request(;
    _extract_common_kw_args(service, args)...,
    request_method=request_method,
    content=_pop!(args, "body", ""),
    )
  • we populate with default values:
    function _extract_common_kw_args(service, args)
    return (
    service=service.signing_name,
    api_version=service.api_version,
    return_stream=_pop!(args, "return_stream", false),
    return_raw=_pop!(args, "return_raw", false),
    response_stream=_pop!(args, "response_stream", nothing),
    headers=LittleDict{String,String}(_pop!(args, "headers", [])),
    http_options=_pop!(args, "http_options", LittleDict{Symbol,String}()),
    response_dict_type=_pop!(args, "response_dict_type", LittleDict),
    backend=_pop!(args, "backend", DEFAULT_BACKEND[]),
    )
    end
  • we end up with our Request object:
    Base.@kwdef mutable struct Request
    service::String
    api_version::String
    request_method::String
    headers::AbstractDict{String,String} = LittleDict{String,String}()
    content::Union{String,Vector{UInt8}} = ""
    resource::String = ""
    url::String = ""
    return_stream::Bool = false
    response_stream::Union{<:IO,Nothing} = nothing
    http_options::AbstractDict{Symbol,<:Any} = LittleDict{Symbol,String}()
    return_raw::Bool = false
    response_dict_type::Type{<:AbstractDict} = LittleDict
    backend::AbstractBackend = DEFAULT_BACKEND[]
    end
  • We pass it to submit_request:
    function submit_request(
    aws::AbstractAWSConfig, request::Request; return_headers::Bool=false
    )

Here, what happens depends on which backend we are using. This PR is about the new DownloadsBackend.

Using the DownloadsBackend

  • submit_request calls _http_request and if we are using a DownloadsBackend, we end up at the method in question,
    function _http_request(backend::DownloadsBackend, request)
  • Here, if user has not passed a response_stream, we create a default one:
    if request.response_stream === nothing
    request.response_stream = IOBuffer()
    end
  • And if they have not specified return_stream==true, we arrange to collect the bytes from the response stream, including possibly closing the response_stream:
    body_arg = if request.request_method == "HEAD" || request.return_stream
    () -> NamedTuple()
    else
    () -> (; body=read_body(request.response_stream))
    end

Using the HTTPBackend

If the user is using the existant HTTP.jl backend, what happens?

  • submit_request calls _http_request with an HTTPBackend, leading us to this method:
    function _http_request(http_backend::HTTPBackend, request::Request)
    http_options = merge(http_backend.http_options, request.http_options)
    @repeat 4 try
    http_stack = HTTP.stack(; redirect=false, retry=false, aws_authorization=false)
    if request.return_stream && request.response_stream === nothing
    request.response_stream = Base.BufferStream()
    end
    return @mock HTTP.request(
    http_stack,
    request.request_method,
    HTTP.URI(request.url),
    HTTP.mkheaders(request.headers),
    request.content;
    require_ssl_verification=false,
    response_stream=request.response_stream,
    http_options...,
    )
    catch e
    # Base.IOError is needed because HTTP.jl can often have errors that aren't
    # caught and wrapped in an HTTP.IOError
    # https://github.com/JuliaWeb/HTTP.jl/issues/382
    @delay_retry if isa(e, Sockets.DNSError) ||
    isa(e, HTTP.ParseError) ||
    isa(e, HTTP.IOError) ||
    isa(e, Base.IOError) ||
    (isa(e, HTTP.StatusError) && _http_status(e) >= 500)
    end
    end
    end
  • There we create a response_stream if both the user has not passed one and they ask for return_stream=true:
    if request.return_stream && request.response_stream === nothing
    request.response_stream = Base.BufferStream()
    end
  • We pass the possibly-nothing request.response_stream to HTTP.request:
    response_stream=request.response_stream,
  • HTTP.jl will give us back a body if they were not given a response_stream, otherwise they put the bytes in the stream: https://github.com/JuliaWeb/HTTP.jl/blob/a2b467e24c9bbd45691f9d0f57b57ee7463bd15a/src/HTTP.jl#L77-L78

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed #433 so we don't forget about this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants