Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp(file,file-copy) broken for >2GB files #14574

Closed
samtkaplan opened this issue Jan 6, 2016 · 8 comments
Closed

cp(file,file-copy) broken for >2GB files #14574

samtkaplan opened this issue Jan 6, 2016 · 8 comments
Labels
bug Indicates an unexpected problem or unintended behavior io Involving the I/O subsystem: libuv, read, write, etc. priority This should be addressed urgently

Comments

@samtkaplan
Copy link

Hello,

I'm not sure if this is an issue with my installation, but the cp function is not working for files larger than 2GB:

The following code:

io = open("test.bin","w")
write(io,rand(Uint8,3_000_000_000)
close(io)

cp("test.bin","test-copy.bin", remove_destination=true)

@show filesize("test.bin")
@show filesize("test-copy.bin")

produces:

filesize("test.bin") = 3000000000
filesize("test-copy.bin") = 2147479552

I guess it might be a 32bit file pointer issue, so I thought I should also note that on my install, I see: sizeof(Int)=8.

Here's the rest of versioninfo():

Julia Version 0.4.2
Commit bb73f34* (2015-12-06 21:47 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblasp.so.0
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

Thanks!

Sam

@eschnett
Copy link
Contributor

eschnett commented Jan 6, 2016

Yes, there is a 32-bit problem. The return value (indicating the number of bytes written) is restricted to Int32 (should probably rather be Cint, but that doesn't make a difference). This is a limitation of libuv.

function sendfile(dst::File, src::File, src_offset::Int64, bytes::Int)          
    check_open(dst)                                                             
    check_open(src)                                                             
    err = ccall(:jl_fs_sendfile, Int32, (Int32, Int32, Int64, Csize_t),         
                src.handle, dst.handle, src_offset, bytes)                      
    uv_error("sendfile", err)                                                   
    nothing                                                                     
end                                                                             

@samtkaplan
Copy link
Author

OK... I guess that's unfortunate. Do you think we should modify cp to return the number of bytes written, or should we throw an exception if the number of bytes written doesn't match the src filesize?

@vtjnash
Copy link
Member

vtjnash commented Jan 6, 2016

we should probably have uv_fs_sendhandle retry copying the file

@ksmcreynolds
Copy link

Has there been any progress on this issue? I am running into this problem when running on Linux. However, cp() works fine on macOS.

@StefanKarpinski
Copy link
Member

@vtjnash: did we ever find/file and upstream libuv bug here? Any chance that you can use your newly earned libuv commit bit to find and fix this issue there?

@vtjnash
Copy link
Member

vtjnash commented Oct 4, 2018

The upstream issues are fixed (recently), but that usage looks possibly wrong too (doesn't handle partial writes). We can also just switch to using the new API (uv_fs_copyfile) which handles that, and also handles doing it fast on platforms where it is possible (macOS and Windows).

Yes, there is a 32-bit problem. The return value (indicating the number of bytes written) is restricted to Int32 (should probably rather be Cint, but that doesn't make a difference). This is a limitation of libuv.

Theoretically yes, but in practice, most kernels won't accept writes bigger than 31-bits either.

@JeffBezanson JeffBezanson added io Involving the I/O subsystem: libuv, read, write, etc. bug Indicates an unexpected problem or unintended behavior labels Oct 4, 2018
@JeffBezanson
Copy link
Member

It seems to me we should do both: call uv_fs_sendfile repeatedly in our wrapper in case of short writes, and also eventually use uv_fs_copyfile where appropriate. Libuv may be intending to expose system sendfile semantics, allowing short writes.

gdkrmr added a commit to JuliaDataCubes/EarthDataLab.jl that referenced this issue Nov 6, 2018
@gdkrmr
Copy link
Contributor

gdkrmr commented Nov 6, 2018

Please fix!

@StefanKarpinski StefanKarpinski added the priority This should be addressed urgently label Jan 25, 2019
@StefanKarpinski StefanKarpinski added this to the 1.2 milestone Jan 25, 2019
JeffBezanson added a commit that referenced this issue Feb 7, 2019
JeffBezanson added a commit that referenced this issue Feb 7, 2019
JeffBezanson added a commit that referenced this issue Feb 8, 2019
JeffBezanson added a commit that referenced this issue Jun 6, 2019
(cherry picked from commit 05785f9)
KristofferC pushed a commit that referenced this issue Aug 26, 2019
(cherry picked from commit 05785f9)
KristofferC pushed a commit that referenced this issue Aug 26, 2019
(cherry picked from commit 05785f9)
KristofferC pushed a commit that referenced this issue Feb 20, 2020
(cherry picked from commit 05785f9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior io Involving the I/O subsystem: libuv, read, write, etc. priority This should be addressed urgently
Projects
None yet
Development

No branches or pull requests

7 participants