`s3_get_file` allocates the whole file in memory before writing to disk #272

mortenpi · 2022-10-27T05:35:34Z

Looking at how s3_get_file is implemented

Lines 176 to 191 in f3989bc

    
           function s3_get_file( 
        
               aws::AbstractAWSConfig, 
        
               bucket, 
        
               path, 
        
               filename; 
        
               version::AbstractS3Version=nothing, 
        
               kwargs..., 
        
           ) 
        
               stream = s3_get(aws, bucket, path; version=version, return_stream=true, kwargs...) 
        
               open(filename, "w") do file 
        
                   while !eof(stream) 
        
                       write(file, readavailable(stream)) 
        
                   end 
        
               end 
        
           end

I think the intent is that it streams the output continuously to a file, but from what I can tell, this is actually not the case. I added a print statement into it to test how many times it loops and how many bytes readavailable(stream) returns, and in all cases it just returns the whole file in one go at the end of the download (even if the file is 1 GiB).

Unless the issue is on my end, this can be a problem when downloading large, multi-gigabyte files, as memory use unnecessarily blows up. Also, it actually looks like at some point there are multiple copies of the data in memory, but I haven't looked into it very closely (there is a memory spike where ~1GiB file needs several GiBs).

Package versions

  [fbe9abb3] AWS v1.79.0
  [1c724243] AWSS3 v0.10.2
  [cd3eb016] HTTP v1.5.1

The text was updated successfully, but these errors were encountered:

evetion · 2023-01-05T14:26:57Z

I've dived into this a bit, it's not due this code, but AWS itself, as it creates its own buffer, which it closes, before writing to a buffer passed by AWSS3/user: https://github.com/JuliaCloud/AWS.jl/blob/master/src/utilities/request.jl#L218.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`s3_get_file` allocates the whole file in memory before writing to disk #272

`s3_get_file` allocates the whole file in memory before writing to disk #272

mortenpi commented Oct 27, 2022

evetion commented Jan 5, 2023

s3_get_file allocates the whole file in memory before writing to disk #272

s3_get_file allocates the whole file in memory before writing to disk #272

Comments

mortenpi commented Oct 27, 2022

evetion commented Jan 5, 2023

`s3_get_file` allocates the whole file in memory before writing to disk #272

`s3_get_file` allocates the whole file in memory before writing to disk #272