Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azblob.Client.UploadStream allows incomplete uploads #21837

Closed
sam701 opened this issue Oct 26, 2023 · 11 comments · Fixed by #22109
Closed

azblob.Client.UploadStream allows incomplete uploads #21837

sam701 opened this issue Oct 26, 2023 · 11 comments · Fixed by #22109
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Storage Storage Service (Queues, Blobs, Files)

Comments

@sam701
Copy link

sam701 commented Oct 26, 2023

Bug Report

  • import path of package in question: github.com/Azure/azure-sdk-for-go/sdk/storage/azblob
  • SDK version: 1.2.0
  • output of go version: go version go1.21.1 darwin/arm64

What happened?

Call

  _, err = client.UploadStream(context.Background(), "container1", "test1", theReader, nil)

returns err == nil when theReader.Read(..) returns 0, io.ErrUnexpectedEOF

What did you expect or want to happen?

The err == io.ErrUnexpectedEOF

How can we reproduce it?

package main

import (
	"context"
	"io"
	"log"

	"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
	"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob"
)

type fakeReader struct {
	cnt int
}

func (a *fakeReader) Read(bytes []byte) (count int, err error) {
	if a.cnt < 2 {
		var buf [1024]byte
		n := copy(bytes, buf[:])
		a.cnt++
		return n, nil
	}
	return 0, io.ErrUnexpectedEOF
}

func main() {
	credential, err := azidentity.NewDefaultAzureCredential(nil)
	if err != nil {
		panic(err)
	}

	client, err := azblob.NewClient("https://mystorageaccount.blob.core.windows.net", credential, nil)
	if err != nil {
		panic(err)
	}

	r := &fakeReader{}
	_, err = client.UploadStream(context.Background(), "container1", "test/test1", r, nil)
	log.Println("error:", err)
}

This line in azblob treats io.ErrUnexpectedEOF as an expected error.

As an example http.Request.Body returns io.ErrUnexpectedEOF on connection lost. When I call

  _, err := client.UploadStream(context.Background(), "container1", "test1", req.Body, nil)

the returned err is nil and the container contains a partially uploaded file.

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-triage Workflow: This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files) labels Oct 26, 2023
@jhendrixMSFT jhendrixMSFT removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that needs-team-triage Workflow: This issue needs the team to triage. labels Oct 26, 2023
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Oct 26, 2023
@souravgupta-msft
Copy link
Member

Hi @sam701. Thanks for reaching out. io.ErrUnexpectedEOF is an expected error and this is by design.
Consider a case where the reader has closed, and the size of the last chunk being uploaded is less than the BlockSize. This part of the code will then return io.ErrUnexpectedEOF, which is an expected error.

@souravgupta-msft
Copy link
Member

For your use case where you are using req.Body as the reader, you can copy the data to a buffer or file and use UploadBuffer or UploadFile method instead.

@souravgupta-msft souravgupta-msft self-assigned this Nov 27, 2023
@sam701
Copy link
Author

sam701 commented Nov 27, 2023

@souravgupta-msft thanks for the heads up.

Consider a case where the reader has closed, and the size of the last chunk being uploaded is less than the BlockSize. This part of the code will then return io.ErrUnexpectedEOF, which is an expected error.

This is a good example. According to the Reader docs, the io.EOF is the only expected error. I'd say, it is not correct to use io.ReadFull in the code line you mentioned exactly because the last chunk might be smaller than the buffer. The underlying source Reader successfully returns io.EOF but the wrapping io.ReadFull returns io.UnexpectedEOF. Maybe using io.Reader.Read is enough here, i.e. n, err = src.Read(buffer)?

For your use case where you are using req.Body as the reader, you can copy the data to a buffer or file and use UploadBuffer or UploadFile method instead.

It is not always an option. Consider large uploads where the entire file being uploaded does not fit into the available memory.

@souravgupta-msft
Copy link
Member

souravgupta-msft commented Dec 1, 2023

Even if io.Reader.Read is used here, we will get io.EOF error when the connection is lost, which will be treated as an expected error here and the blob will be partially uploaded. So, I think you might have to check this at the client side.

@sam701
Copy link
Author

sam701 commented Dec 1, 2023

we will get io.EOF error when the connection is lost

What exactly implementation do you mean? http.Request.Body.Read will return io.UnexpectedEOF in case of connection lost, while net.TCPConn.Read will probably return a timeout error. As the documentation says, the only expected error is io.EOF. It looks like using io.ReadFull here and accepting io.UnexpectedEOF here is wrong because it hides potential io.UnexpectedEOF from the source reader. Maybe the fix will be just to replace io.ReafFull with src.Read(..) and remove io.UnexpectedEOF from the list of expected errors.

@sam701
Copy link
Author

sam701 commented Dec 1, 2023

My current fix is to wrap the http.Request.Body reader into a custom reader that replaces io.UnexpectedEOF with a custom error that is is not overwritten by io.ReadFull call as discussed above. But it is rather a hack.

@souravgupta-msft
Copy link
Member

One issue I see using src.Read(...) is that if users implement their own Reader interface like the fakeReader mentioned in the description, the src.Read(...) method will just read 1024 bytes (less than the block size, say 4MB) to the buffer. Then it will call the PutBlock API to stage it where the size of the block being staged is 1024 bytes. What we would rather want is to stage a block where its size is same as the block size (4MB, or any value provided in the options) unless it is the last block.

io.ReadFull helps here as it iterates and reads the entire block and then stage it.

@sam701
Copy link
Author

sam701 commented Dec 4, 2023

It is an absolutely fair point that you want to fill the buffer as much as possible. io.ReadFull calls io.ReadAtLeast that in turn returns io.ErrUnexpectedEOF if the buffer is not full. The behavior of io.ReadAtLeast is correct, since the documentation states ErrUnexpectedEOF means that EOF was encountered in the middle of reading a fixed-size block or data structure. But io.ReadAtLeast does not fit into the current situation when the final block might be less than the buffer.

Maybe you can implement your own readFull function where you copy io.ReadAtLeast but remove the part that returns io.ErrUnexpectedEOF?

@souravgupta-msft
Copy link
Member

This is a good point. Let me discuss this with my team as well.

@souravgupta-msft
Copy link
Member

The fix has been merged and will be part of the next release.

@sam701
Copy link
Author

sam701 commented Dec 8, 2023

@souravgupta-msft Thank you for implementing!

@github-actions github-actions bot locked and limited conversation to collaborators Mar 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants