-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: multipart: stream error: stream ID 1; INTERNAL_ERROR #987
Comments
The internal error ultimately comes from the server. We don't know why it's sending that. I see the upload type is multipart. That either means you're sending data that is smaller than your chunks ize, or you've set If you find that just creating a new Canceling the context and retrying should work and is a reasonable workaround. Anything you can provide that will let us reproduce this issue would be really helpful. |
Thank you so much for the fast answer @jba, this is strongly impacting us and we were about to open a ticket in the console. In our case, we didn't override the ChunkSize so it defaults to the 8Mb, however the files are mostly lower than that.
We deployed that fix an hour ago, unfortunately it doesn't help. The context consistently reaches the 2 minutes timeout. It is to note that this is affecting us since today 4am UTC, and not in the 3 weeks we have this piece of code deployed. I see my options as:
I'd be very open to share more information regarding our project and maybe more specifics. I feel like this really is related to the server side of GCS as on our side, the data sizes and the code didn't change. Let me know what the best would be to move forward. I'm pretty much dedicated to fixing this issue given it currently delays the processing of our production services. |
I just tried again to reproduce this issue by writing a thousand 1M files concurrently. They all succeeded. Can you suggest another pattern that may match what you're doing better? If you didn't change your code, then it does sound like a server-side issue. I would recommend opening a ticket. If the context isn't helping, then I don't think the long delay is due to retries. But you can add logging around here to make sure. |
In my current use case, I don't even write files concurrently. Here is our current implementation (only relevant part): // Store stores gzip compressed file into GCS and publishes a message in a Pubsub topic according to the log kind.
func (s *LegacyStore) Store(ctx context.Context, kind LogKind, r io.Reader, ts time.Time) (written int64, err error) {
ctx, cancel := context.WithTimeout(ctx, 2*time.Minute)
// When the context is canceled, GCS writers will abort their job and ensure no object is stored.
// It is important NOT to Close the GCS writer if something went wrong.
// See: https://godoc.org/cloud.google.com/go/storage#Writer.CloseWithError
defer cancel()
w := s.writer.Resolve(ctx, s.bucket, s.StoragePath(kind, ts))
gzipWriter := gzip.NewWriter(w)
written, err = io.Copy(gzipWriter, r)
if err != nil {
gzipWriter.Close()
return
}
err = gzipWriter.Close()
if err != nil {
return
}
err = w.Close()
if err != nil {
return
}
err = s.notifier.Notify(ctx, kind, s.hostname, ts)
return
}
// Resolve resolves a writer that writes an object in Google Cloud Storage.
func (w *GCSWriterResolver) Resolve(ctx context.Context, bucket string, name string) io.WriteCloser {
b := w.client.Bucket(bucket)
o := b.Object(name)
writer := o.NewWriter(ctx)
writer.ContentType = "application/x-gzip" // We know this is wrong, and we will adapt in the future.
return writer
} Is there anything that looks wrong to you? |
I don't see anything wrong with your code. |
I took another look at this, to see if the information in golang/go#26338 was helpful. There the problem was that writing took too long and tripped a timeout in the Go HTTPS server. Although GCS doesn't use Go, I did something similar anyway. I slowed down writing so that it took me almost 12 hours to write a 20M file to GCS. The write succeeded. Do you have any further updates from your end? Are you still encountering the problem? I'm inclined to close this issue because whatever the problem, it's not in the storage client. |
@jba thanks for the investigation. I indeed think the issue doesn't come from this client. I'll close the issue. |
We have a serious issue on our production machines.
We recorded the following errors while sending objects:
The main issue being that when this arises, it takes 47 minutes to have the operation fail. As a result our operations are all very late.
After this error, it seems like that something is left in a weird state, because we are unable to continue using the client properly and need to manually restart our services.
We are using 0.20.
Any help would be much appreciated as this is currently affecting a lot of our production servers.
The text was updated successfully, but these errors were encountered: