-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client upload of 5.7 GB fails with internal server error 500 #128
Comments
Michael Barz commented: Scope: Debug the problem, find out what is wrong. |
Vincent Petry commented: Is that version of the client using TUS ? A snipped of the OCIS log would have been nice (and could also reveal whether TUS was used or not on the Webdav layer) |
Vincent Petry commented: The UI shows a POST request. So I guess it's likely TUS with the "creation with upload" extension, since a regular upload would use PUT. So this is a single request upload and it fails with 500. |
Reproducible against EOS with the following curl command:
output:
Server log:
That "read error" might be a timeout or lost connection between the services. We've seen this message before for downloads but likely unrelated: https://github.com/owncloud/ocis-proxy/issues/86 |
one theory so far is that the connection we are using between the services might time out after a minute or so. even if there's a connection drop, we should make the internal TUS request (not the one on Webdav level) resume the upload to the internal service. cc @butonic in case you have an idea about internal timeouts |
here is the same test against the ownCloud storage which uploads fine:
as we can see it takes fewer seconds to complete |
need to rerun the command as I discovered that Now owncloud storage with a 10gb file:
Now 1 minute but no trace of time outs or errors. |
next up, digging more into the Reva code and finding out what |
as far I can see in the code there should be log entries with the text "eos cmd" where some might include the output of stderr, which might contain more information. I'll try and filter those in my next run |
I added more log messages in reva: diff --git a/pkg/eosclient/eosclient.go b/pkg/eosclient/eosclient.go
index 2ea4891..29f3a5d 100644
--- a/pkg/eosclient/eosclient.go
+++ b/pkg/eosclient/eosclient.go
@@ -179,6 +179,7 @@ func (c *Client) execute(ctx context.Context, cmd *exec.Cmd) (string, string, er
cmd.Env = append(cmd.Env, "XrdSecSSSKT="+c.opt.Keytab)
}
+ print("### EOS: execute() running command " + cmd.String())
err := cmd.Run()
var exitStatus int
@@ -231,6 +232,7 @@ func (c *Client) executeEOS(ctx context.Context, cmd *exec.Cmd) (string, string,
trace := trace.FromContext(ctx).SpanContext().TraceID.String()
cmd.Args = append(cmd.Args, "--comment", trace)
+ print("### EOS: execute() running command " + cmd.String())
err := cmd.Run()
var exitStatus int
@@ -554,7 +556,11 @@ func (c *Client) Write(ctx context.Context, uid, gid, path string, stream io.Rea
func (c *Client) WriteFile(ctx context.Context, uid, gid, path, source string) error {
xrdPath := fmt.Sprintf("%s//%s", c.opt.URL, path)
cmd := exec.CommandContext(ctx, c.opt.XrdcopyBinary, "--nopbar", "--silent", "-f", source, xrdPath, fmt.Sprintf("-ODeos.ruid=%s&eos.rgid=%s", uid, gid))
- _, _, err := c.execute(ctx, cmd)
+ stdout, stderr, err := c.execute(ctx, cmd)
+ print("### EOS: WriteFile done\n")
+ print("stdout: " + stdout + "\n")
+ print("stderr: " + stderr + "\n")
+ print("######\n")
return err
}
diff --git a/pkg/eosclientgrpc/eosclientgrpc.go b/pkg/eosclientgrpc/eosclientgrpc.go
index def76ba..7831b1d 100644
--- a/pkg/eosclientgrpc/eosclientgrpc.go
+++ b/pkg/eosclientgrpc/eosclientgrpc.go
@@ -1285,6 +1285,7 @@ func (c *Client) execute(ctx context.Context, cmd *exec.Cmd) (string, string, er
cmd.Env = append(cmd.Env, "XrdSecSSSKT="+c.opt.Keytab)
}
+ print("### EOS: execute() running command " + cmd.String())
err := cmd.Run()
var exitStatus int
diff --git a/pkg/storage/utils/eosfs/upload.go b/pkg/storage/utils/eosfs/upload.go
index ce24e60..5b89162 100644
--- a/pkg/storage/utils/eosfs/upload.go
+++ b/pkg/storage/utils/eosfs/upload.go
@@ -242,6 +242,7 @@ func (upload *fileUpload) GetReader(ctx context.Context) (io.Reader, error) {
// WriteChunk writes the stream from the reader to the given offset of the upload
// TODO use the grpc api to directly stream to a temporary uploads location in the eos shadow tree
func (upload *fileUpload) WriteChunk(ctx context.Context, offset int64, src io.Reader) (int64, error) {
+ print("### EOS WriteChunk: " + upload.binPath + "\n")
file, err := os.OpenFile(upload.binPath, os.O_WRONLY|os.O_APPEND, defaultFilePerm)
if err != nil {
return 0, err
@@ -303,6 +304,7 @@ func (upload *fileUpload) FinishUpload(ctx context.Context) error {
// eos creates revisions internally
//}
+ print("### EOS: WriteFile\n")
err := upload.fs.c.WriteFile(ctx, upload.info.Storage["UID"], upload.info.Storage["GID"], np, upload.binPath)
// only delete the upload if it was successfully written to eos and the matching logs for the huge upload of 10gb:
however I never see any of "WriteFile" or "WriteChunk". it's as if the TUS code path is using yet another method to talk to EOS. so this means there is another hidden code path somewhere, maybe in a library, that also triggers this message. |
strange... after restarting and recompiling now I get more messages for that huge upload:
and strange that I get a permission error here, which is likely the reason for the 500:
|
damn... I also receive the same error for small files... so it's likely not the same like @jnweiger saw. and now I see that upload does not work in the web UI either... I thought I saw it working earlier today... I'll need to find out what's different in my env to make uploads work first... @jnweiger did you have any ocis log entries from the failure? |
I had an issue in my env related to storage layout. I've adjusted according to https://owncloud.github.io/ocis/eos to use the opaque id. But still:
|
I managed to fix my env by starting completely from scratch. And now small file uploads work, but the 10 gb file fails with "no space left":
I do have 64 gb free but if there's indeed a bug that the file is replicated several times, I might be bumping against that limit... |
Okay. I am running my tests on an 80GB Hetzner box, and we need to test 6GB uploads for the MVP. That should cover it then. Good to know about the replication! My setup has 4 FST instances running, so if they do replication, I figure, it is plausible to have one temporary copy and four final copies, giving us a total factor of 5. |
what happens is that actually the bytes get first transferred to a temp space, that operation is rather quick. |
the current eoshome and eos storage drivers need a temp file for uploads. it is written using the tus protocol. when it is complete the eos driver uses eos is a software defined storage. it can do replication as well as erasure coding. we should either try to use a single replica, or set up a proper set of fst containers using volumes, so the size calculations all make sense. eos also has a other interesting commands:
and in case there is a limit maybe
helps in case the space is offline:
|
On my 80 GB box,
|
in any case, I think we don't have a problem with uploads so we should close this ticket. @jnweiger if you have any setup issues or concerns, or think we should change the setup to work differently, please raise a different ticket. |
I don't have a DUT anyway. Waiting or the next beta or RC. (What I tried to say above: Each fst should have its own disk volume.) |
Thanks for confirming the docker-compose from ocis repo! But as it is it is without eos.We want the MVP for an eos customer. |
indeed, but I see this as a discussion for another context: how to provide a deployment setup maybe we could reuse #165 |
Retested with ocis rc1 and desktop client beta4: Upload with 5.5GB works fine. |
Reproduce on ocis 1.0.0-beta8 with eos installed via https://github.com/owncloud-docker/compose-playground/blob/master/examples/hetzner-deploy/make_ocis_eos_compose_test.sh
On a second hetzner cloud machine start an ubuntu desktop and install desktop client
ownCloud version 2.7.0daily20200708 (build 2764)
Expected behaviour:
Please advise further debugging.
eos newfind /
The text was updated successfully, but these errors were encountered: