Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client upload of 5.7 GB fails with internal server error 500 #128

Closed
jnweiger opened this issue Jul 23, 2020 · 25 comments
Closed

client upload of 5.7 GB fails with internal server error 500 #128

jnweiger opened this issue Jul 23, 2020 · 25 comments

Comments

@jnweiger
Copy link

Reproduce on ocis 1.0.0-beta8 with eos installed via https://github.com/owncloud-docker/compose-playground/blob/master/examples/hetzner-deploy/make_ocis_eos_compose_test.sh

On a second hetzner cloud machine start an ubuntu desktop and install desktop client
ownCloud version 2.7.0daily20200708 (build 2764)

  • connect the client to the first machine, and wait until the initial sync succeeds.
  • in the sync folder of the client create a file of 5.7 GB size.
  • the upload runs for ca 5 minutes then fails with an internal server error
    image
  • The ocis docker container in the server server has a /reva mountpoint of 38GB size, initially empty. It fills up while the client uploads, then stops filling when the client prints the error:
[root@ocis /]# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          38G   35G  1.1G  98% /
tmpfs            64M     0   64M   0% /dev
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda1        38G   35G  1.1G  98% /reva
shm              64M     0   64M   0% /dev/shm
  • the server seems to require a multiple of more than four times the disk space than the uploaded file has.

Expected behaviour:

  • the server only uses roughly the same amount on disk space as the uploaded file. Maybe twice, temporarily.
  • Uploads should not result in 'internal server error 500' - we should send a more specific message to the client.

Please advise further debugging.

  • No files can be seen inside /reva, the mountpoint appears always empty.
  • Also, a temporaory upload location cannot be seen with eos newfind /
  • it is unclear, how the lost disk space can be reclaimed. A reboot does not help.
@micbar micbar transferred this issue from owncloud/ocis Jul 28, 2020
@micbar micbar added bug Something isn't working C1-oCIS p2-high qa labels Jul 28, 2020
@exalate-issue-sync
Copy link

Michael Barz commented: Scope: Debug the problem, find out what is wrong.

@exalate-issue-sync
Copy link

Vincent Petry commented: Is that version of the client using TUS ?

A snipped of the OCIS log would have been nice (and could also reveal whether TUS was used or not on the Webdav layer)

@exalate-issue-sync
Copy link

Vincent Petry commented: The UI shows a POST request. So I guess it's likely TUS with the "creation with upload" extension, since a regular upload would use PUT.

So this is a single request upload and it fails with 500.
If it failed with some other error code the client would be able to continue with the next chunk.

@PVince81
Copy link

Reproducible against EOS with the following curl command:

UPLOAD_FILE=~/temp/test-uploads/huge.dat; UPLOAD_NAME=huge.dat; UPLOAD_SIZE=$(filesize $UPLOAD_FILE); curl 'https://localhost:9200/remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51/' \
  -X 'POST' \
  -H 'Tus-Resumable: 1.0.0' \
  -H "Upload-Length: $UPLOAD_SIZE" \
  -H "Upload-Metadata: filename `echo -n $UPLOAD_NAME | base64`,filetype ,size `echo -n $UPLOAD_SIZE | base64`" \
  -H 'Content-Type: application/offset+octet-stream' \
  -H 'Expect: ' \
  --insecure \
  --data-binary "@${UPLOAD_FILE}" \
  -D - \
  -o temp-upload.txt \
  -u einstein:relativity -v

output:

Note: Unnecessary use of -X or --request, POST is already inferred.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1:9200...
* Connected to localhost (127.0.0.1) port 9200 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [6 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [819 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: O=Acme Corp; CN=OCIS
*  start date: Aug 17 12:53:37 2020 GMT
*  expire date: Aug 17 12:53:37 2021 GMT
*  issuer: O=Acme Corp; CN=OCIS
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Server auth using Basic with user 'einstein'
} [5 bytes data]
> POST /remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51/ HTTP/1.1
> Host: localhost:9200
> Authorization: Basic ZWluc3RlaW46cmVsYXRpdml0eQ==
> User-Agent: curl/7.71.1
> Accept: */*
> Tus-Resumable: 1.0.0
> Upload-Length: 5354536448
> Upload-Metadata: filename aHVnZS5kYXQ=,filetype ,size NTM1NDUzNjQ0OA==
> Content-Type: application/offset+octet-stream
> Content-Length: 5354536448
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [146 bytes data]
 97 5106M    0     0   97 4995M      0   108M  0:00:47  0:00:46  0:00:01  178M* We are completely uploaded and fine
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
HTTP/1.1 500 Internal Server Error
< Access-Control-Allow-Headers: Tus-Resumable, Upload-Length, Upload-Metadata, If-Match
Access-Control-Allow-Headers: Tus-Resumable, Upload-Length, Upload-Metadata, If-Match
< Access-Control-Allow-Origin: *
Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Tus-Resumable, Location
Access-Control-Expose-Headers: Tus-Resumable, Location
< Content-Length: 0
Content-Length: 0
< Content-Security-Policy: default-src 'none';
Content-Security-Policy: default-src 'none';
< Date: Thu, 20 Aug 2020 10:34:19 GMT
Date: Thu, 20 Aug 2020 10:34:19 GMT
< Location: https://localhost:9200/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDA2MDE3LCJpYXQiOjE1OTc5MTk2MTcsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhL2YyODM0YmM0LTE5NzQtNGNkMi1hZWE1LTNmMWFkYWEwMDhjNSJ9.p1jTPysRlEdIXPQoZj04Ae_oYtKytr3MIkZ9KUN04kk
Location: https://localhost:9200/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDA2MDE3LCJpYXQiOjE1OTc5MTk2MTcsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhL2YyODM0YmM0LTE5NzQtNGNkMi1hZWE1LTNmMWFkYWEwMDhjNSJ9.p1jTPysRlEdIXPQoZj04Ae_oYtKytr3MIkZ9KUN04kk
< Tus-Resumable: 1.0.0
Tus-Resumable: 1.0.0
< Upload-Offset: 5354536448
Upload-Offset: 5354536448
< Vary: Origin
Vary: Origin
< X-Access-Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk3OTIzMjE2LCJpYXQiOjE1OTc5MTk2MTYsImlzcyI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJ1c2VyIjp7ImlkIjp7ImlkcCI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJvcGFxdWVfaWQiOiI0YzUxMGFkYS1jODZiLTQ4MTUtODgyMC00MmNkZjgyYzNkNTEifSwidXNlcm5hbWUiOiJlaW5zdGVpbiIsIm1haWwiOiJlaW5zdGVpbkBleGFtcGxlLm9yZyIsImRpc3BsYXlfbmFtZSI6IkFsYmVydCBFaW5zdGVpbiJ9fQ.qvW50NJnmpAlgDLyzBwPupH-Y7HDjrcrnvsmdNl2Nek
X-Access-Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk3OTIzMjE2LCJpYXQiOjE1OTc5MTk2MTYsImlzcyI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJ1c2VyIjp7ImlkIjp7ImlkcCI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJvcGFxdWVfaWQiOiI0YzUxMGFkYS1jODZiLTQ4MTUtODgyMC00MmNkZjgyYzNkNTEifSwidXNlcm5hbWUiOiJlaW5zdGVpbiIsIm1haWwiOiJlaW5zdGVpbkBleGFtcGxlLm9yZyIsImRpc3BsYXlfbmFtZSI6IkFsYmVydCBFaW5zdGVpbiJ9fQ.qvW50NJnmpAlgDLyzBwPupH-Y7HDjrcrnvsmdNl2Nek
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Download-Options: noopen
X-Download-Options: noopen
100 5106M    0     0  100 5106M      0   106M  0:00:47  0:00:47 --:--:--  142M{ [5 bytes data]
< X-Frame-Options: SAMEORIGIN
X-Frame-Options: SAMEORIGIN
< X-Permitted-Cross-Domain-Policies: none
X-Permitted-Cross-Domain-Policies: none
< X-Robots-Tag: none
X-Robots-Tag: none
< X-Xss-Protection: 1; mode=block
X-Xss-Protection: 1; mode=block

< 
100 5106M    0     0  100 5106M      0   106M  0:00:47  0:00:47 --:--:--  136M
* Connection #0 to host localhost left intact

Server log:

ocis          | 2020-08-20T10:33:37Z DBG http routing: head=data tail=/f2834bc4-1974-4cd2-aea5-3f1adaa008c5 svc=data pkg=rhttp service=reva
ocis          | [tusd] 2020/08/20 10:33:37 event="RequestIncoming" method="PATCH" path="/f2834bc4-1974-4cd2-aea5-3f1adaa008c5" requestId="" 
ocis          | 2020-08-20T10:33:37Z INF tusd routing: path=/f2834bc4-1974-4cd2-aea5-3f1adaa008c5 pkg=rhttp service=reva traceid=7ac60433db8aa79fe8af9c44767db817
ocis          | [tusd] 2020/08/20 10:33:37 event="ChunkWriteStart" id="f2834bc4-1974-4cd2-aea5-3f1adaa008c5" maxSize="5354536448" offset="0" 
ocis          | 2020-08-20T10:34:01Z DBG Refreshing external service-registration service={"endpoints":[],"metadata":null,"name":"com.owncloud.reva","nodes":[{"address":"0.0.0.0:9142","id":"com.owncloud.reva-a089a0e7-2911-4ab9-a2ca-06d8254727c8","metadata":{"broker":"http","protocol":"grpc","registry":"mdns","server":"grpc","transport":"grpc"}}],"version":""}
ocis          | [tusd] 2020/08/20 10:34:19 event="ChunkWriteComplete" id="f2834bc4-1974-4cd2-aea5-3f1adaa008c5" bytesWritten="5354536448" 
ocis          | [tusd] 2020/08/20 10:34:19 event="ResponseOutgoing" status="500" method="PATCH" path="/f2834bc4-1974-4cd2-aea5-3f1adaa008c5" error="eosclient: error while executing command: exit status 54" requestId="" 
ocis          | 2020-08-20T10:34:19Z ERR http end="20/Aug/2020:10:34:19 +0000" host=127.0.0.1 method=PATCH pkg=rhttp proto=HTTP/1.1 service=reva size=57 start="20/Aug/2020:10:33:37 +0000" status=500 time_ns=42011382389 traceid=7ac60433db8aa79fe8af9c44767db817 uri=/data/f2834bc4-1974-4cd2-aea5-3f1adaa008c5 url=/data/f2834bc4-1974-4cd2-aea5-3f1adaa008c5
ocis          | 2020-08-20T10:34:19Z ERR http end="20/Aug/2020:10:34:19 +0000" host=127.0.0.1 method=PATCH pkg=rhttp proto=HTTP/1.1 service=reva size=0 start="20/Aug/2020:10:33:37 +0000" status=500 time_ns=42138098330 traceid=7ac60433db8aa79fe8af9c44767db817 uri=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDA2MDE3LCJpYXQiOjE1OTc5MTk2MTcsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhL2YyODM0YmM0LTE5NzQtNGNkMi1hZWE1LTNmMWFkYWEwMDhjNSJ9.p1jTPysRlEdIXPQoZj04Ae_oYtKytr3MIkZ9KUN04kk url=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDA2MDE3LCJpYXQiOjE1OTc5MTk2MTcsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhL2YyODM0YmM0LTE5NzQtNGNkMi1hZWE1LTNmMWFkYWEwMDhjNSJ9.p1jTPysRlEdIXPQoZj04Ae_oYtKytr3MIkZ9KUN04kk
ocis          | 2020-08-20 10:34:19.820033 I | httputil: ReverseProxy read error during body copy: unexpected EOF
ocis          | 2020-08-20 10:34:19.820803 I | suppressing panic for copyResponse error in test; copy error: unexpected EOF
ocis          | 2020-08-20T10:34:19Z ERR http end="20/Aug/2020:10:34:19 +0000" host=127.0.0.1 method=POST pkg=rhttp proto=HTTP/1.1 service=reva size=0 start="20/Aug/2020:10:33:34 +0000" status=500 time_ns=45526218478 traceid=7ac60433db8aa79fe8af9c44767db817 uri=/remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51/ url=/remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51/

That "read error" might be a timeout or lost connection between the services. We've seen this message before for downloads but likely unrelated: https://github.com/owncloud/ocis-proxy/issues/86

@PVince81
Copy link

one theory so far is that the connection we are using between the services might time out after a minute or so.
but I thought that we already increased some of the default timeout values ?

even if there's a connection drop, we should make the internal TUS request (not the one on Webdav level) resume the upload to the internal service.

cc @butonic in case you have an idea about internal timeouts

@PVince81
Copy link

PVince81 commented Aug 20, 2020

here is the same test against the ownCloud storage which uploads fine:

Note: Unnecessary use of -X or --request, POST is already inferred.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1:9200...
* Connected to localhost (127.0.0.1) port 9200 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [6 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [819 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: O=Acme Corp; CN=OCIS
*  start date: Aug 17 12:53:37 2020 GMT
*  expire date: Aug 17 12:53:37 2021 GMT
*  issuer: O=Acme Corp; CN=OCIS
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Server auth using Basic with user 'einstein'
} [5 bytes data]
> POST /remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51/ HTTP/1.1
> Host: localhost:9200
> Authorization: Basic ZWluc3RlaW46cmVsYXRpdml0eQ==
> User-Agent: curl/7.71.1
> Accept: */*
> Tus-Resumable: 1.0.0
> Upload-Length: 5354536448
> Upload-Metadata: filename aHVnZS5kYXQ=,filetype ,size NTM1NDUzNjQ0OA==
> Content-Type: application/offset+octet-stream
> Content-Length: 5354536448
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [146 bytes data]
 97 5106M    0     0   97 4985M      0   131M  0:00:38  0:00:37  0:00:01  155M* We are completely uploaded and fine
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 201 Created
HTTP/1.1 201 Created
< Access-Control-Allow-Headers: Tus-Resumable, Upload-Length, Upload-Metadata, If-Match
Access-Control-Allow-Headers: Tus-Resumable, Upload-Length, Upload-Metadata, If-Match
< Access-Control-Allow-Origin: *
Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Tus-Resumable, Location
Access-Control-Expose-Headers: Tus-Resumable, Location
< Content-Length: 0
Content-Length: 0
< Content-Security-Policy: default-src 'none';
Content-Security-Policy: default-src 'none';
< Content-Type: video/mpeg
Content-Type: video/mpeg
< Date: Thu, 20 Aug 2020 11:24:14 GMT
Date: Thu, 20 Aug 2020 11:24:14 GMT
< Etag: "ead60057192d0e1cc96f77b5919ce929"
Etag: "ead60057192d0e1cc96f77b5919ce929"
< Last-Modified: Thu, 20 Aug 2020 13:24:13 +0200
Last-Modified: Thu, 20 Aug 2020 13:24:13 +0200
< Location: https://localhost:9200/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDA5MDE4LCJpYXQiOjE1OTc5MjI2MTgsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2NC9kYXRhLzdmNWVmMTYzLTlmOGQtNGE3OS1iM2Q3LWU5OGM0YzFjYzhhNSJ9.r463GBDxNuxGrPGVP0NMPTVTovv2QNysRq1HC6pRLMM
Location: https://localhost:9200/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDA5MDE4LCJpYXQiOjE1OTc5MjI2MTgsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2NC9kYXRhLzdmNWVmMTYzLTlmOGQtNGE3OS1iM2Q3LWU5OGM0YzFjYzhhNSJ9.r463GBDxNuxGrPGVP0NMPTVTovv2QNysRq1HC6pRLMM
< Oc-Etag: "ead60057192d0e1cc96f77b5919ce929"
Oc-Etag: "ead60057192d0e1cc96f77b5919ce929"
< Oc-Fileid: MTI4NGQyMzgtYWE5Mi00MmNlLWJkYzQtMGIwMDAwMDA5MTYyOjQxMzEyZTQzLWM2NmYtNDMwZS1hMzY2LTY0YjU5ZWI2NDczOQ==
Oc-Fileid: MTI4NGQyMzgtYWE5Mi00MmNlLWJkYzQtMGIwMDAwMDA5MTYyOjQxMzEyZTQzLWM2NmYtNDMwZS1hMzY2LTY0YjU5ZWI2NDczOQ==
< Tus-Resumable: 1.0.0
Tus-Resumable: 1.0.0
< Upload-Offset: 5354536448
Upload-Offset: 5354536448
< Vary: Origin
Vary: Origin
100 5106M    0     0  100 5106M      0   129M  0:00:39  0:00:39 --:--:--  134M{ [5 bytes data]
< X-Access-Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk3OTI2MjE4LCJpYXQiOjE1OTc5MjI2MTgsImlzcyI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJ1c2VyIjp7ImlkIjp7ImlkcCI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJvcGFxdWVfaWQiOiI0YzUxMGFkYS1jODZiLTQ4MTUtODgyMC00MmNkZjgyYzNkNTEifSwidXNlcm5hbWUiOiJlaW5zdGVpbiIsIm1haWwiOiJlaW5zdGVpbkBleGFtcGxlLm9yZyIsImRpc3BsYXlfbmFtZSI6IkFsYmVydCBFaW5zdGVpbiIsIm9wYXF1ZSI6eyJtYXAiOnsiZ2lkIjp7ImRlY29kZXIiOiJwbGFpbiIsInZhbHVlIjoiTXpBd01EQT0ifSwidWlkIjp7ImRlY29kZXIiOiJwbGFpbiIsInZhbHVlIjoiTWpBd01EQT0ifX19fX0.g6JmEaS_7KNips7JHNB8EHmeSeqJE3JVQRzTVi3lhbc
X-Access-Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk3OTI2MjE4LCJpYXQiOjE1OTc5MjI2MTgsImlzcyI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJ1c2VyIjp7ImlkIjp7ImlkcCI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJvcGFxdWVfaWQiOiI0YzUxMGFkYS1jODZiLTQ4MTUtODgyMC00MmNkZjgyYzNkNTEifSwidXNlcm5hbWUiOiJlaW5zdGVpbiIsIm1haWwiOiJlaW5zdGVpbkBleGFtcGxlLm9yZyIsImRpc3BsYXlfbmFtZSI6IkFsYmVydCBFaW5zdGVpbiIsIm9wYXF1ZSI6eyJtYXAiOnsiZ2lkIjp7ImRlY29kZXIiOiJwbGFpbiIsInZhbHVlIjoiTXpBd01EQT0ifSwidWlkIjp7ImRlY29kZXIiOiJwbGFpbiIsInZhbHVlIjoiTWpBd01EQT0ifX19fX0.g6JmEaS_7KNips7JHNB8EHmeSeqJE3JVQRzTVi3lhbc
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Download-Options: noopen
X-Download-Options: noopen
< X-Frame-Options: SAMEORIGIN
X-Frame-Options: SAMEORIGIN
< X-Permitted-Cross-Domain-Policies: none
X-Permitted-Cross-Domain-Policies: none
< X-Robots-Tag: none
X-Robots-Tag: none
< X-Xss-Protection: 1; mode=block
X-Xss-Protection: 1; mode=block

< 
100 5106M    0     0  100 5106M      0   129M  0:00:39  0:00:39 --:--:--  129M
* Connection #0 to host localhost left intact

as we can see it takes fewer seconds to complete

@PVince81 PVince81 self-assigned this Aug 20, 2020
@PVince81
Copy link

need to rerun the command as I discovered that --data-binary stores the whole file in memory which might have slowed down the test. Need to use "-T" instead and remove the trailing slash in the URL.

Now owncloud storage with a 10gb file:

UPLOAD_FILE=~/temp/test-uploads/huge.dat; UPLOAD_NAME=huge10g.dat; UPLOAD_SIZE=$(filesize $UPLOAD_FILE); curl 'https://localhost:9200/remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51' \ 
  -X 'POST' \
  -H 'Tus-Resumable: 1.0.0' \
  -H "Upload-Length: $UPLOAD_SIZE" \
  -H "Upload-Metadata: filename `echo -n $UPLOAD_NAME | base64`,filetype ,size `echo -n $UPLOAD_SIZE | base64`" \
  -H 'Content-Type: application/offset+octet-stream' \
  -H 'Expect: ' \
  --insecure \
  -T "${UPLOAD_FILE}" \
  -D - \
  -o temp-upload.txt \
  -u einstein:relativity -v
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1:9200...
* Connected to localhost (127.0.0.1) port 9200 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [6 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [819 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: O=Acme Corp; CN=OCIS
*  start date: Aug 17 12:53:37 2020 GMT
*  expire date: Aug 17 12:53:37 2021 GMT
*  issuer: O=Acme Corp; CN=OCIS
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Server auth using Basic with user 'einstein'
} [5 bytes data]
> POST /remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51 HTTP/1.1
> Host: localhost:9200
> Authorization: Basic ZWluc3RlaW46cmVsYXRpdml0eQ==
> User-Agent: curl/7.71.1
> Accept: */*
> Tus-Resumable: 1.0.0
> Upload-Length: 10709072896
> Upload-Metadata: filename aHVnZTEwZy5kYXQ=,filetype ,size MTA3MDkwNzI4OTY=
> Content-Type: application/offset+octet-stream
> Content-Length: 10709072896
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [146 bytes data]
 98  9.9G    0     0   98  9.8G      0   149M  0:01:08  0:01:07  0:00:01  144M* We are completely uploaded and fine
100  9.9G    0     0  100  9.9G      0   147M  0:01:09  0:01:09 --:--:--  118M{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 201 Created
HTTP/1.1 201 Created
< Access-Control-Allow-Headers: Tus-Resumable, Upload-Length, Upload-Metadata, If-Match
Access-Control-Allow-Headers: Tus-Resumable, Upload-Length, Upload-Metadata, If-Match
< Access-Control-Allow-Origin: *
Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Tus-Resumable, Location
Access-Control-Expose-Headers: Tus-Resumable, Location
< Content-Length: 0
Content-Length: 0
< Content-Security-Policy: default-src 'none';
Content-Security-Policy: default-src 'none';
< Content-Type: video/mpeg
Content-Type: video/mpeg
< Date: Thu, 20 Aug 2020 12:36:42 GMT
Date: Thu, 20 Aug 2020 12:36:42 GMT
< Etag: "cd18eb876b158b5cac14861992eb14d2"
Etag: "cd18eb876b158b5cac14861992eb14d2"
< Last-Modified: Thu, 20 Aug 2020 14:36:41 +0200
Last-Modified: Thu, 20 Aug 2020 14:36:41 +0200
< Location: https://localhost:9200/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDEzMzMzLCJpYXQiOjE1OTc5MjY5MzMsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2NC9kYXRhL2Q1NDU2MTdkLWYzYTktNGYwNS05NmNjLTdjN2Q0NjQxYzNiYiJ9.ocRnuUob3Q46FZy17a7I7TZOgpB7favZ4w-qQ_Ft-7M
Location: https://localhost:9200/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDEzMzMzLCJpYXQiOjE1OTc5MjY5MzMsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2NC9kYXRhL2Q1NDU2MTdkLWYzYTktNGYwNS05NmNjLTdjN2Q0NjQxYzNiYiJ9.ocRnuUob3Q46FZy17a7I7TZOgpB7favZ4w-qQ_Ft-7M
< Oc-Etag: "cd18eb876b158b5cac14861992eb14d2"
Oc-Etag: "cd18eb876b158b5cac14861992eb14d2"
< Oc-Fileid: MTI4NGQyMzgtYWE5Mi00MmNlLWJkYzQtMGIwMDAwMDA5MTYyOjNiODc3ZmQ1LTY4YWQtNGJhZC1hYTFlLTFkYmRkMDdkMjk4Ng==
Oc-Fileid: MTI4NGQyMzgtYWE5Mi00MmNlLWJkYzQtMGIwMDAwMDA5MTYyOjNiODc3ZmQ1LTY4YWQtNGJhZC1hYTFlLTFkYmRkMDdkMjk4Ng==
< Tus-Resumable: 1.0.0
Tus-Resumable: 1.0.0
< Upload-Offset: 10709072896
Upload-Offset: 10709072896
< Vary: Origin
Vary: Origin
{ [5 bytes data]
< X-Access-Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk3OTMwNTMzLCJpYXQiOjE1OTc5MjY5MzMsImlzcyI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJ1c2VyIjp7ImlkIjp7ImlkcCI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJvcGFxdWVfaWQiOiI0YzUxMGFkYS1jODZiLTQ4MTUtODgyMC00MmNkZjgyYzNkNTEifSwidXNlcm5hbWUiOiJlaW5zdGVpbiIsIm1haWwiOiJlaW5zdGVpbkBleGFtcGxlLm9yZyIsImRpc3BsYXlfbmFtZSI6IkFsYmVydCBFaW5zdGVpbiIsIm9wYXF1ZSI6eyJtYXAiOnsiZ2lkIjp7ImRlY29kZXIiOiJwbGFpbiIsInZhbHVlIjoiTXpBd01EQT0ifSwidWlkIjp7ImRlY29kZXIiOiJwbGFpbiIsInZhbHVlIjoiTWpBd01EQT0ifX19fX0.FX74bD2iQu5ntC3C6sH86Yb1nhNNOUZwT8AHTIBm_cA
X-Access-Token: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk3OTMwNTMzLCJpYXQiOjE1OTc5MjY5MzMsImlzcyI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJ1c2VyIjp7ImlkIjp7ImlkcCI6Imh0dHBzOi8vbG9jYWxob3N0OjkyMDAiLCJvcGFxdWVfaWQiOiI0YzUxMGFkYS1jODZiLTQ4MTUtODgyMC00MmNkZjgyYzNkNTEifSwidXNlcm5hbWUiOiJlaW5zdGVpbiIsIm1haWwiOiJlaW5zdGVpbkBleGFtcGxlLm9yZyIsImRpc3BsYXlfbmFtZSI6IkFsYmVydCBFaW5zdGVpbiIsIm9wYXF1ZSI6eyJtYXAiOnsiZ2lkIjp7ImRlY29kZXIiOiJwbGFpbiIsInZhbHVlIjoiTXpBd01EQT0ifSwidWlkIjp7ImRlY29kZXIiOiJwbGFpbiIsInZhbHVlIjoiTWpBd01EQT0ifX19fX0.FX74bD2iQu5ntC3C6sH86Yb1nhNNOUZwT8AHTIBm_cA
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Download-Options: noopen
X-Download-Options: noopen
< X-Frame-Options: SAMEORIGIN
X-Frame-Options: SAMEORIGIN
< X-Permitted-Cross-Domain-Policies: none
X-Permitted-Cross-Domain-Policies: none
< X-Robots-Tag: none
X-Robots-Tag: none
< X-Xss-Protection: 1; mode=block
X-Xss-Protection: 1; mode=block

< 
100  9.9G    0     0  100  9.9G      0   147M  0:01:09  0:01:09 --:--:--  111M
* Connection #0 to host localhost left intact

Now 1 minute but no trace of time outs or errors.

@PVince81
Copy link

next up, digging more into the Reva code and finding out what eosclient: error while executing command: exit status 54 means.

@PVince81
Copy link

as far I can see in the code there should be log entries with the text "eos cmd" where some might include the output of stderr, which might contain more information.

I'll try and filter those in my next run

@PVince81
Copy link

I added more log messages in reva:

diff --git a/pkg/eosclient/eosclient.go b/pkg/eosclient/eosclient.go
index 2ea4891..29f3a5d 100644
--- a/pkg/eosclient/eosclient.go
+++ b/pkg/eosclient/eosclient.go
@@ -179,6 +179,7 @@ func (c *Client) execute(ctx context.Context, cmd *exec.Cmd) (string, string, er
                cmd.Env = append(cmd.Env, "XrdSecSSSKT="+c.opt.Keytab)
        }
 
+       print("### EOS: execute() running command " + cmd.String())
        err := cmd.Run()
 
        var exitStatus int
@@ -231,6 +232,7 @@ func (c *Client) executeEOS(ctx context.Context, cmd *exec.Cmd) (string, string,
        trace := trace.FromContext(ctx).SpanContext().TraceID.String()
        cmd.Args = append(cmd.Args, "--comment", trace)
 
+       print("### EOS: execute() running command " + cmd.String())
        err := cmd.Run()
 
        var exitStatus int
@@ -554,7 +556,11 @@ func (c *Client) Write(ctx context.Context, uid, gid, path string, stream io.Rea
 func (c *Client) WriteFile(ctx context.Context, uid, gid, path, source string) error {
        xrdPath := fmt.Sprintf("%s//%s", c.opt.URL, path)
        cmd := exec.CommandContext(ctx, c.opt.XrdcopyBinary, "--nopbar", "--silent", "-f", source, xrdPath, fmt.Sprintf("-ODeos.ruid=%s&eos.rgid=%s", uid, gid))
-       _, _, err := c.execute(ctx, cmd)
+       stdout, stderr, err := c.execute(ctx, cmd)
+       print("### EOS: WriteFile done\n")
+       print("stdout: " + stdout + "\n")
+       print("stderr: " + stderr + "\n")
+       print("######\n")
        return err
 }
 
diff --git a/pkg/eosclientgrpc/eosclientgrpc.go b/pkg/eosclientgrpc/eosclientgrpc.go
index def76ba..7831b1d 100644
--- a/pkg/eosclientgrpc/eosclientgrpc.go
+++ b/pkg/eosclientgrpc/eosclientgrpc.go
@@ -1285,6 +1285,7 @@ func (c *Client) execute(ctx context.Context, cmd *exec.Cmd) (string, string, er
                cmd.Env = append(cmd.Env, "XrdSecSSSKT="+c.opt.Keytab)
        }
 
+       print("### EOS: execute() running command " + cmd.String())
        err := cmd.Run()
 
        var exitStatus int
diff --git a/pkg/storage/utils/eosfs/upload.go b/pkg/storage/utils/eosfs/upload.go
index ce24e60..5b89162 100644
--- a/pkg/storage/utils/eosfs/upload.go
+++ b/pkg/storage/utils/eosfs/upload.go
@@ -242,6 +242,7 @@ func (upload *fileUpload) GetReader(ctx context.Context) (io.Reader, error) {
 // WriteChunk writes the stream from the reader to the given offset of the upload
 // TODO use the grpc api to directly stream to a temporary uploads location in the eos shadow tree
 func (upload *fileUpload) WriteChunk(ctx context.Context, offset int64, src io.Reader) (int64, error) {
+       print("### EOS WriteChunk: " + upload.binPath + "\n")
        file, err := os.OpenFile(upload.binPath, os.O_WRONLY|os.O_APPEND, defaultFilePerm)
        if err != nil {
                return 0, err
@@ -303,6 +304,7 @@ func (upload *fileUpload) FinishUpload(ctx context.Context) error {
        // eos creates revisions internally
        //}
 
+       print("### EOS: WriteFile\n")
        err := upload.fs.c.WriteFile(ctx, upload.info.Storage["UID"], upload.info.Storage["GID"], np, upload.binPath)
 
        // only delete the upload if it was successfully written to eos

and the matching logs for the huge upload of 10gb:

ocis          | 2020-08-20T14:31:01Z DBG skipping auth method=/cs3.gateway.v1beta1.GatewayAPI/CreateHome pkg=rgrpc service=reva traceid=507addf7708502ea6eb88b6fd574aed3
ocis          | 2020-08-20T14:31:01Z INF unary code=OK end="20/Aug/2020:14:31:01 +0000" from=tcp://127.0.0.1:38880 pkg=rgrpc service=reva start="20/Aug/2020:14:31:01 +0000" time_ns=2024324 traceid=507addf7708502ea6eb88b6fd574aed3 uri=/cs3.storage.registry.v1beta1.RegistryAPI/GetStorageProvider user-agent=grpc-go/1.26.0
ocis          | 2020-08-20T14:31:01Z DBG eos: wrap external=/ internal=/eos/dockertest/reva/users/e/einstein pkg=rgrpc service=reva traceid=507addf7708502ea6eb88b6fd574aed3
ocis          | ### EOS: execute() running command /usr/bin/eos -r 0 0 file info /eos/dockertest/reva/users/e/einstein -m --comment 507addf7708502ea6eb88b6fd574aed32020-08-20T14:31:01Z INF eos cmd args="[/usr/bin/eos -r 0 0 file info /eos/dockertest/reva/users/e/einstein -m --comment 507addf7708502ea6eb88b6fd574aed3]" env=[EOS_MGM_URL=root://mgm-master.testnet:1094] err= exit=0 pkg=rgrpc service=reva traceid=507addf7708502ea6eb88b6fd574aed3
ocis          | ### EOS: execute() running command /usr/bin/eos -r 0 0 file info /eos/dockertest/reva/users/.shadow/e/einstein -m --comment 507addf7708502ea6eb88b6fd574aed32020-08-20T14:31:01Z INF eos cmd args="[/usr/bin/eos -r 0 0 file info /eos/dockertest/reva/users/.shadow/e/einstein -m --comment 507addf7708502ea6eb88b6fd574aed3]" env=[EOS_MGM_URL=root://mgm-master.testnet:1094] err= exit=0 pkg=rgrpc service=reva traceid=507addf7708502ea6eb88b6fd574aed3
ocis          | 2020-08-20T14:31:01Z INF unary code=OK end="20/Aug/2020:14:31:01 +0000" from=tcp://127.0.0.1:35902 pkg=rgrpc service=reva start="20/Aug/2020:14:31:01 +0000" time_ns=62466794 traceid=507addf7708502ea6eb88b6fd574aed3 uri=/cs3.storage.provider.v1beta1.ProviderAPI/CreateHome user-agent=grpc-go/1.26.0
ocis          | 2020-08-20T14:31:01Z INF unary code=OK end="20/Aug/2020:14:31:01 +0000" from=tcp://127.0.0.1:37956 pkg=rgrpc service=reva start="20/Aug/2020:14:31:01 +0000" time_ns=67946096 traceid=507addf7708502ea6eb88b6fd574aed3 uri=/cs3.gateway.v1beta1.GatewayAPI/CreateHome user-agent=grpc-go/1.26.0
ocis          | 2020-08-20T14:31:01Z DBG director found path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDIwMjYxLCJpYXQiOjE1OTc5MzM4NjEsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhLzVmMTYyYWIzLWNkOTctNGY1YS1iNjQ3LWNkNDIzZTJiZTZiMiJ9.szRgrTFrP3Phnfu_IjUanxZN24eDb33rSnsJRZdNEnY policy=reva prefix=/data routeType=prefix service=proxy
ocis          | 2020-08-20T14:31:01Z INF access token is already provided pkg=rhttp service=reva traceid=0ac0033c9d97573de85f4fa80795a97f
ocis          | 2020-08-20T14:31:01Z DBG http routing: head=data tail=/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDIwMjYxLCJpYXQiOjE1OTc5MzM4NjEsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhLzVmMTYyYWIzLWNkOTctNGY1YS1iNjQ3LWNkNDIzZTJiZTZiMiJ9.szRgrTFrP3Phnfu_IjUanxZN24eDb33rSnsJRZdNEnY svc=data pkg=rhttp service=reva
ocis          | 2020-08-20T14:31:01Z DBG sending request to internal data server pkg=rhttp service=reva target=http://localhost:9160/data/5f162ab3-cd97-4f5a-b647-cd423e2be6b2 traceid=0ac0033c9d97573de85f4fa80795a97f
ocis          | 2020-08-20T14:31:01Z INF access token is already provided pkg=rhttp service=reva traceid=0ac0033c9d97573de85f4fa80795a97f
ocis          | 2020-08-20T14:31:01Z DBG http routing: head=data tail=/5f162ab3-cd97-4f5a-b647-cd423e2be6b2 svc=data pkg=rhttp service=reva
ocis          | [tusd] 2020/08/20 14:31:01 event="RequestIncoming" method="PATCH" path="/5f162ab3-cd97-4f5a-b647-cd423e2be6b2" requestId="" 
ocis          | 2020-08-20T14:31:01Z INF tusd routing: path=/5f162ab3-cd97-4f5a-b647-cd423e2be6b2 pkg=rhttp service=reva traceid=0ac0033c9d97573de85f4fa80795a97f
ocis          | [tusd] 2020/08/20 14:31:01 event="ChunkWriteStart" id="5f162ab3-cd97-4f5a-b647-cd423e2be6b2" maxSize="10709072896" offset="0" 
ocis          | 2020-08-20T14:31:08Z DBG Refreshing external service-registration service={"endpoints":[],"metadata":null,"name":"com.owncloud.reva","nodes":[{"address":"0.0.0.0:9142","id":"com.owncloud.reva-9f551eef-b5ab-4cdf-9ca5-338d5ed277c8","metadata":{"broker":"http","protocol":"grpc","registry":"mdns","server":"grpc","transport":"grpc"}}],"version":""}
ocis          | 2020-08-20T14:31:38Z DBG Refreshing external service-registration service={"endpoints":[],"metadata":null,"name":"com.owncloud.reva","nodes":[{"address":"0.0.0.0:9142","id":"com.owncloud.reva-9f551eef-b5ab-4cdf-9ca5-338d5ed277c8","metadata":{"broker":"http","protocol":"grpc","registry":"mdns","server":"grpc","transport":"grpc"}}],"version":""}
ocis          | [tusd] 2020/08/20 14:32:04 event="ChunkWriteComplete" id="5f162ab3-cd97-4f5a-b647-cd423e2be6b2" bytesWritten="10709072896" 
ocis          | [tusd] 2020/08/20 14:32:05 event="ResponseOutgoing" status="500" method="PATCH" path="/5f162ab3-cd97-4f5a-b647-cd423e2be6b2" error="eosclient: error while executing command: exit status 54" requestId="" 
ocis          | 2020-08-20T14:32:05Z ERR http end="20/Aug/2020:14:32:05 +0000" host=127.0.0.1 method=PATCH pkg=rhttp proto=HTTP/1.1 service=reva size=57 start="20/Aug/2020:14:31:01 +0000" status=500 time_ns=63748869689 traceid=0ac0033c9d97573de85f4fa80795a97f uri=/data/5f162ab3-cd97-4f5a-b647-cd423e2be6b2 url=/data/5f162ab3-cd97-4f5a-b647-cd423e2be6b2
ocis          | 2020-08-20T14:32:05Z ERR http end="20/Aug/2020:14:32:05 +0000" host=127.0.0.1 method=PATCH pkg=rhttp proto=HTTP/1.1 service=reva size=0 start="20/Aug/2020:14:31:01 +0000" status=500 time_ns=63803901748 traceid=0ac0033c9d97573de85f4fa80795a97f uri=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDIwMjYxLCJpYXQiOjE1OTc5MzM4NjEsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhLzVmMTYyYWIzLWNkOTctNGY1YS1iNjQ3LWNkNDIzZTJiZTZiMiJ9.szRgrTFrP3Phnfu_IjUanxZN24eDb33rSnsJRZdNEnY url=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDIwMjYxLCJpYXQiOjE1OTc5MzM4NjEsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhLzVmMTYyYWIzLWNkOTctNGY1YS1iNjQ3LWNkNDIzZTJiZTZiMiJ9.szRgrTFrP3Phnfu_IjUanxZN24eDb33rSnsJRZdNEnY
ocis          | 2020-08-20 14:32:05.279823 I | httputil: ReverseProxy read error during body copy: unexpected EOF
ocis          | 2020-08-20 14:32:05.290176 I | suppressing panic for copyResponse error in test; copy error: unexpected EOF
ocis          | 2020-08-20T14:32:05Z ERR http end="20/Aug/2020:14:32:05 +0000" host=127.0.0.1 method=POST pkg=rhttp proto=HTTP/1.1 service=reva size=0 start="20/Aug/2020:14:31:00 +0000" status=500 time_ns=64547697754 traceid=0ac0033c9d97573de85f4fa80795a97f uri=/remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51 url=/remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51

however I never see any of "WriteFile" or "WriteChunk".
also none of the EOS execution code paths are executed ?!

it's as if the TUS code path is using yet another method to talk to EOS.
I grepped the code for "error while executing command" and added print statements there, but they are not visible.

so this means there is another hidden code path somewhere, maybe in a library, that also triggers this message.

@PVince81
Copy link

strange... after restarting and recompiling now I get more messages for that huge upload:

ocis          | ### EOS: execute() running command /usr/bin/eos -r 0 0 file info /eos/dockertest/reva/users/.shadow/e/einstein -m --comment 4be3505b623b4e66bad4c0f9866157fd2020-08-20T15:11:34Z INF eos cmd args="[/usr/bin/eos -r 0 0 file info /eos/dockertest/reva/users/.shadow/e/einstein -m --comment 4be3505b623b4e66bad4c0f9866157fd]" env=[EOS_MGM_URL=root://mgm-master.testnet:1094] err= exit=0 pkg=rgrpc service=reva traceid=4be3505b623b4e66bad4c0f9866157fd
ocis          | 2020-08-20T15:11:34Z INF unary code=OK end="20/Aug/2020:15:11:34 +0000" from=tcp://127.0.0.1:34986 pkg=rgrpc service=reva start="20/Aug/2020:15:11:34 +0000" time_ns=68167235 traceid=4be3505b623b4e66bad4c0f9866157fd uri=/cs3.storage.provider.v1beta1.ProviderAPI/CreateHome user-agent=grpc-go/1.26.0
ocis          | 2020-08-20T15:11:34Z INF unary code=OK end="20/Aug/2020:15:11:34 +0000" from=tcp://127.0.0.1:37618 pkg=rgrpc service=reva start="20/Aug/2020:15:11:34 +0000" time_ns=70472247 traceid=4be3505b623b4e66bad4c0f9866157fd uri=/cs3.gateway.v1beta1.GatewayAPI/CreateHome user-agent=grpc-go/1.26.0
ocis          | 2020-08-20T15:11:34Z DBG director found path=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDIyNjk0LCJpYXQiOjE1OTc5MzYyOTQsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhL2RiOGM3YWNkLWU1N2MtNDRkMy1iYTA4LWE1NmZjZDgzNWZlMSJ9.DpBNkFs6n6V0kEqr-OFOClq2RqKa3TjyQ-LoMSQhz3c policy=reva prefix=/data routeType=prefix service=proxy
ocis          | 2020-08-20T15:11:34Z INF access token is already provided pkg=rhttp service=reva traceid=76acbb88bda1f8741d6a7b5da7e10156
ocis          | 2020-08-20T15:11:34Z DBG http routing: head=data tail=/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDIyNjk0LCJpYXQiOjE1OTc5MzYyOTQsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhL2RiOGM3YWNkLWU1N2MtNDRkMy1iYTA4LWE1NmZjZDgzNWZlMSJ9.DpBNkFs6n6V0kEqr-OFOClq2RqKa3TjyQ-LoMSQhz3c svc=data pkg=rhttp service=reva
ocis          | 2020-08-20T15:11:34Z DBG sending request to internal data server pkg=rhttp service=reva target=http://localhost:9160/data/db8c7acd-e57c-44d3-ba08-a56fcd835fe1 traceid=76acbb88bda1f8741d6a7b5da7e10156
ocis          | 2020-08-20T15:11:34Z INF access token is already provided pkg=rhttp service=reva traceid=76acbb88bda1f8741d6a7b5da7e10156
ocis          | 2020-08-20T15:11:34Z DBG http routing: head=data tail=/db8c7acd-e57c-44d3-ba08-a56fcd835fe1 svc=data pkg=rhttp service=reva
ocis          | [tusd] 2020/08/20 15:11:34 event="RequestIncoming" method="PATCH" path="/db8c7acd-e57c-44d3-ba08-a56fcd835fe1" requestId="" 
ocis          | 2020-08-20T15:11:34Z INF tusd routing: path=/db8c7acd-e57c-44d3-ba08-a56fcd835fe1 pkg=rhttp service=reva traceid=76acbb88bda1f8741d6a7b5da7e10156
ocis          | [tusd] 2020/08/20 15:11:34 event="ChunkWriteStart" id="db8c7acd-e57c-44d3-ba08-a56fcd835fe1" maxSize="10709072896" offset="0" 
ocis          | ### EOS WriteChunk: /tmp/db8c7acd-e57c-44d3-ba08-a56fcd835fe1
ocis          | 2020-08-20T15:11:43Z DBG Refreshing external service-registration service={"endpoints":[],"metadata":null,"name":"com.owncloud.reva","nodes":[{"address":"0.0.0.0:9142","id":"com.owncloud.reva-204e54ed-95e8-49eb-8f7b-2a0adc954e5b","metadata":{"broker":"http","protocol":"grpc","registry":"mdns","server":"grpc","transport":"grpc"}}],"version":""}
ocis          | 2020-08-20T15:12:13Z DBG Refreshing external service-registration service={"endpoints":[],"metadata":null,"name":"com.owncloud.reva","nodes":[{"address":"0.0.0.0:9142","id":"com.owncloud.reva-204e54ed-95e8-49eb-8f7b-2a0adc954e5b","metadata":{"broker":"http","protocol":"grpc","registry":"mdns","server":"grpc","transport":"grpc"}}],"version":""}
ocis          | [tusd] 2020/08/20 15:12:39 event="ChunkWriteComplete" id="db8c7acd-e57c-44d3-ba08-a56fcd835fe1" bytesWritten="10709072896" 
ocis          | ### EOS: WriteFile
ocis          | ### EOS: execute() running command /usr/bin/xrdcopy --nopbar --silent -f /tmp/db8c7acd-e57c-44d3-ba08-a56fcd835fe1 root://mgm-master.testnet:1094///eos/dockertest/reva/users/4/4c510ada-c86b-4815-8820-42cdf82c3d51/huge10g.dat -ODeos.ruid=20000&eos.rgid=30000### EOS: WriteFile done
ocis          | stdout: 
ocis          | stderr: Run: [ERROR] Server responded with an error: [3010] Unable to open file /eos/dockertest/reva/users/4/4c510ada-c86b-4815-8820-42cdf82c3d51/huge10g.dat; Operation not permitted
ocis          | 
ocis          | 
ocis          | ######
ocis          | [tusd] 2020/08/20 15:12:40 event="ResponseOutgoing" status="500" method="PATCH" path="/db8c7acd-e57c-44d3-ba08-a56fcd835fe1" error="eosclient: error while executing command: exit status 54" requestId="" 
ocis          | 2020-08-20T15:12:40Z ERR http end="20/Aug/2020:15:12:40 +0000" host=127.0.0.1 method=PATCH pkg=rhttp proto=HTTP/1.1 service=reva size=57 start="20/Aug/2020:15:11:34 +0000" status=500 time_ns=66138764101 traceid=76acbb88bda1f8741d6a7b5da7e10156 uri=/data/db8c7acd-e57c-44d3-ba08-a56fcd835fe1 url=/data/db8c7acd-e57c-44d3-ba08-a56fcd835fe1
ocis          | 2020-08-20T15:12:40Z ERR http end="20/Aug/2020:15:12:40 +0000" host=127.0.0.1 method=PATCH pkg=rhttp proto=HTTP/1.1 service=reva size=0 start="20/Aug/2020:15:11:34 +0000" status=500 time_ns=66433779909 traceid=76acbb88bda1f8741d6a7b5da7e10156 uri=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDIyNjk0LCJpYXQiOjE1OTc5MzYyOTQsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhL2RiOGM3YWNkLWU1N2MtNDRkMy1iYTA4LWE1NmZjZDgzNWZlMSJ9.DpBNkFs6n6V0kEqr-OFOClq2RqKa3TjyQ-LoMSQhz3c url=/data/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJyZXZhIiwiZXhwIjoxNTk4MDIyNjk0LCJpYXQiOjE1OTc5MzYyOTQsInRhcmdldCI6Imh0dHA6Ly9sb2NhbGhvc3Q6OTE2MC9kYXRhL2RiOGM3YWNkLWU1N2MtNDRkMy1iYTA4LWE1NmZjZDgzNWZlMSJ9.DpBNkFs6n6V0kEqr-OFOClq2RqKa3TjyQ-LoMSQhz3c
ocis          | 2020-08-20 15:12:41.001995 I | httputil: ReverseProxy read error during body copy: unexpected EOF
ocis          | 2020-08-20 15:12:41.002214 I | suppressing panic for copyResponse error in test; copy error: unexpected EOF
ocis          | 2020-08-20T15:12:41Z ERR http end="20/Aug/2020:15:12:41 +0000" host=127.0.0.1 method=POST pkg=rhttp proto=HTTP/1.1 service=reva size=0 start="20/Aug/2020:15:11:33 +0000" status=500 time_ns=67190015908 traceid=76acbb88bda1f8741d6a7b5da7e10156 uri=/remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51 url=/remote.php/dav/files/4c510ada-c86b-4815-8820-42cdf82c3d51

and strange that I get a permission error here, which is likely the reason for the 500:

ocis          | ### EOS: WriteFile
ocis          | ### EOS: execute() running command /usr/bin/xrdcopy --nopbar --silent -f /tmp/db8c7acd-e57c-44d3-ba08-a56fcd835fe1 root://mgm-master.testnet:1094///eos/dockertest/reva/users/4/4c510ada-c86b-4815-8820-42cdf82c3d51/huge10g.dat -ODeos.ruid=20000&eos.rgid=30000### EOS: WriteFile done
ocis          | stdout: 
ocis          | stderr: Run: [ERROR] Server responded with an error: [3010] Unable to open file /eos/dockertest/reva/users/4/4c510ada-c86b-4815-8820-42cdf82c3d51/huge10g.dat; Operation not permitted

@PVince81
Copy link

damn... I also receive the same error for small files... so it's likely not the same like @jnweiger saw.

and now I see that upload does not work in the web UI either... I thought I saw it working earlier today...

I'll need to find out what's different in my env to make uploads work first...

@jnweiger did you have any ocis log entries from the failure?

@PVince81
Copy link

I had an issue in my env related to storage layout. I've adjusted according to https://owncloud.github.io/ocis/eos to use the opaque id.

But still:

ocis          | ### EOS: execute() running command /usr/bin/xrdcopy --nopbar --silent -f /tmp/baa4f1ef-7b4d-4940-baf2-8696818a576b root://mgm-master.testnet:1094///eos/dockertest/reva/users/4/4c510ada-c86b-4815-8820-42cdf82c3d51/small.dat -ODeos.ruid=20000&eos.rgid=30000### EOS: WriteFile done
ocis          | stdout: 
ocis          | stderr: Run: [ERROR] Server responded with an error: [3010] Unable to open file /eos/dockertest/reva/users/4/4c510ada-c86b-4815-8820-42cdf82c3d51/small.dat; Operation not permitted

@PVince81
Copy link

I managed to fix my env by starting completely from scratch.

And now small file uploads work, but the 10 gb file fails with "no space left":

ocis          | [tusd] 2020/08/21 08:45:46 event="ChunkWriteComplete" id="66533c00-98e3-47a5-93ca-106b93213177" bytesWritten="107090728
96" 
ocis          | ### EOS FinishUpload()
ocis          | ### EOS: WriteFile
ocis          | ### EOS: WriteFile
ocis          | ### EOS: execute() running command /usr/bin/xrdcopy --nopbar --silent -f /tmp/66533c00-98e3-47a5-93ca-106b93213177 root
://mgm-master.testnet:1094///eos/dockertest/reva/users/4/4c510ada-c86b-4815-8820-42cdf82c3d51/huge10g.dat -ODeos.ruid=20000&eos.rgid=30
000### EOS: WriteFile done
ocis          | stdout: 
ocis          | stderr: Run: [ERROR] Server responded with an error: [3009] Unable to get free physical space /eos/dockertest/reva/user
s/4/4c510ada-c86b-4815-8820-42cdf82c3d51/huge10g.dat; No space left on device
ocis          | 
ocis          | 
ocis          | ######
ocis          | [tusd] 2020/08/21 08:45:47 event="ResponseOutgoing" status="500" method="PATCH" path="/66533c00-98e3-47a5-93ca-106b9321
3177" error="eosclient: error while executing command: exit status 54" requestId="" 

I do have 64 gb free but if there's indeed a bug that the file is replicated several times, I might be bumping against that limit...

@PVince81
Copy link

@jnweiger I talked with @butonic and he said the extra used space is due to EOS doing replication.
So while testing we might need at least 5x to 6x more free space.

(we're still looking into this on my setup as I only have ~50 GB free)

@jnweiger
Copy link
Author

Okay. I am running my tests on an 80GB Hetzner box, and we need to test 6GB uploads for the MVP. That should cover it then.

Good to know about the replication!
That might also contributing to the upload being slower than expected.
Oh, when we do 100GB uploads, some day, oh, oooh... then we need terrabytes of storage to test that. :-)

My setup has 4 FST instances running, so if they do replication, I figure, it is plausible to have one temporary copy and four final copies, giving us a total factor of 5.

@PVince81
Copy link

what happens is that actually the bytes get first transferred to a temp space, that operation is rather quick.
then the network connection "hangs" while it file is getting copied to EOS (and I guess EOS replicates them on the fly)

@butonic
Copy link
Member

butonic commented Aug 21, 2020

the current eoshome and eos storage drivers need a temp file for uploads. it is written using the tus protocol. when it is complete the eos driver uses xrdcopy to 'upload' the file into eos. the latter step is something to be yet taken care of. tracked in cs3org/reva#700.

eos is a software defined storage. it can do replication as well as erasure coding. we should either try to use a single replica, or set up a proper set of fst containers using volumes, so the size calculations all make sense.

eos also has a

other interesting commands:

eos status
eos health -a
eos space ls
eos space ls --io
eos fs ls
eos fs ls --io

and in case there is a limit maybe

eos space config default space.headroom=5G

helps

in case the space is offline:

eos -r 0 0 space set default on

@jnweiger
Copy link
Author

jnweiger commented Aug 21, 2020

On my 80 GB box, eos space ls --io reports 320 GB total space.
That means, I could have misconfiguration:

  • Each of the four fst has 80G, but that is the same 80GB seen by all four of them.
  • It is in reality a total of 80, and each should eat 20 max.
  • the headroom for a 6GB upload should probably be set to 24GB then??

for fsid in 1 2 3 4; do docker-compose exec ocis eos -r 0 0 fs dumpmd $fsid; echo --------; done
shows me different files on each fst. So it can also scatter the files, which make it faster! (instead of replicating, which makes it slower?) 😕

@PVince81
Copy link

in any case, I think we don't have a problem with uploads so we should close this ticket.

@jnweiger if you have any setup issues or concerns, or think we should change the setup to work differently, please raise a different ticket.

@jnweiger
Copy link
Author

I don't have a DUT anyway. Waiting or the next beta or RC.
Until then, I'd need advice, if the docker-compose from ocis repo + patch stack from github.io/ocis/eos is still the correct setup.

(What I tried to say above: Each fst should have its own disk volume.)

@PVince81
Copy link

@jnweiger both @butonic and I used the docker-compose setup from the ocis repo, without any patch, and were able to upload a 10 GB file. We had enough free space.

@jnweiger
Copy link
Author

Thanks for confirming the docker-compose from ocis repo! But as it is it is without eos.We want the MVP for an eos customer.

@PVince81
Copy link

indeed, but I see this as a discussion for another context: how to provide a deployment setup

maybe we could reuse #165

@jnweiger
Copy link
Author

Retested with ocis rc1 and desktop client beta4: Upload with 5.5GB works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants