Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3_get_file allocates the whole file in memory before writing to disk #272

Open
mortenpi opened this issue Oct 27, 2022 · 1 comment
Open

Comments

@mortenpi
Copy link

Looking at how s3_get_file is implemented

AWSS3.jl/src/AWSS3.jl

Lines 176 to 191 in f3989bc

function s3_get_file(
aws::AbstractAWSConfig,
bucket,
path,
filename;
version::AbstractS3Version=nothing,
kwargs...,
)
stream = s3_get(aws, bucket, path; version=version, return_stream=true, kwargs...)
open(filename, "w") do file
while !eof(stream)
write(file, readavailable(stream))
end
end
end

I think the intent is that it streams the output continuously to a file, but from what I can tell, this is actually not the case. I added a print statement into it to test how many times it loops and how many bytes readavailable(stream) returns, and in all cases it just returns the whole file in one go at the end of the download (even if the file is 1 GiB).

Unless the issue is on my end, this can be a problem when downloading large, multi-gigabyte files, as memory use unnecessarily blows up. Also, it actually looks like at some point there are multiple copies of the data in memory, but I haven't looked into it very closely (there is a memory spike where ~1GiB file needs several GiBs).

Package versions
  [fbe9abb3] AWS v1.79.0
  [1c724243] AWSS3 v0.10.2
  [cd3eb016] HTTP v1.5.1
@evetion
Copy link

evetion commented Jan 5, 2023

I've dived into this a bit, it's not due this code, but AWS itself, as it creates its own buffer, which it closes, before writing to a buffer passed by AWSS3/user: https://github.com/JuliaCloud/AWS.jl/blob/master/src/utilities/request.jl#L218.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants