Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATA Race when downloading blob item from storage container #22156

Closed
Todorov99 opened this issue Dec 18, 2023 · 11 comments · Fixed by #22334
Closed

DATA Race when downloading blob item from storage container #22156

Todorov99 opened this issue Dec 18, 2023 · 11 comments · Fixed by #22334
Assignees
Labels
AzBlob bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Storage Storage Service (Queues, Blobs, Files)

Comments

@Todorov99
Copy link

DATA Race when downloading blob item from storage container

  • Imported package github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/container
  • SDK version: 1.2.0
  • Go Version 1.21.5

Running one of our tests with the race detector flag (ginkgo --race ...) reported this data race issue:

==================
WARNING: DATA RACE
Read at 0x00c001c719c8 by goroutine 1206:
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.(*mmbPool).Grow()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/buffer_manager.go:49 +0x3c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func2()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:474 +0x7c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func4()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:524 +0xb0
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer.func2()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:65 +0x9c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer.func1()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:50 +0x4b

Previous write at 0x00c001c719c8 by goroutine 1208:
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.(*mmbPool).Grow()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/buffer_manager.go:55 +0xfc
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func2()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:474 +0x7c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func4()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:524 +0xb0
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer.func2()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:65 +0x9c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer.func1()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:50 +0x4b

Goroutine 1206 (running) created at:
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:48 +0x2dc
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:517 +0x1164
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).DownloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:624 +0x35e
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob.(*Client).DownloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob/client.go:588 +0x10e

Goroutine 1208 (running) created at:
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:48 +0x2dc
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:517 +0x1164
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).DownloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:624 +0x35e
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob.(*Client).DownloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob/client.go:588 +0x10e
==================
==================
WARNING: DATA RACE
Write at 0x00c001c719c8 by goroutine 1207:
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.(*mmbPool).Grow()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/buffer_manager.go:55 +0xfc
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func2()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:474 +0x7c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func4()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:524 +0xb0
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer.func2()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:65 +0x9c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer.func1()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:50 +0x4b

Previous read at 0x00c001c719c8 by goroutine 1204:
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.(*mmbPool).Grow()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/buffer_manager.go:57 +0x117
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func2()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:474 +0x7c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func4()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:524 +0xb0
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer.func2()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:65 +0x9c
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer.func1()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:50 +0x4b

Goroutine 1207 (running) created at:
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:48 +0x2dc
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:517 +0x1164
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).DownloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:624 +0x35e
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob.(*Client).DownloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob/client.go:588 +0x10e


Goroutine 1204 (running) created at:
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.DoBatchTransfer()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared/batch_transfer.go:48 +0x2dc
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:517 +0x1164
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).DownloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob/client.go:624 +0x35e
  github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob.(*Client).DownloadFile()
      vendor/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob/client.go:588 +0x10e
==================

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-triage Workflow: This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files) labels Dec 18, 2023
@vibhansa-msft
Copy link
Member

Does this create any sort of issue or data corruption problem for you or just that tool is pointing there is a race condition here ?

@jhendrixMSFT jhendrixMSFT removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that needs-team-triage Workflow: This issue needs the team to triage. labels Jan 2, 2024
@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Jan 2, 2024
@jhendrixMSFT
Copy link
Member

jhendrixMSFT commented Jan 2, 2024

I don't think data corruption is a risk here. Since this is happening when growing the buffer pool, it could cause us to over-allocate buffers which would cause Grow() to hang if the pool is already full. This appears to only be a problem for downloading which can call Grow() on multiple goroutines; uploading doesn't have this problem.

Note that we document Grow() should not be concurrently called. The way download is implemented, it violates this invariant.

@Todorov99
Copy link
Author

Todorov99 commented Jan 8, 2024

Does this create any sort of issue or data corruption problem for you or just that tool is pointing there is a race condition here ?

Hi vibhansa-msft,
In my use case there was not any data corruption. Just running the test with the race detector flag pointed out the there is race condition when downloading an artifact from Azure storage container.

@roelarents
Copy link

We seem to experience this issue (or at least hangups/freezes for some reason) when utilizing argo workflows. The call to the SDK is here. It's hard to reproduce because it only happens sporadically. We download about 10 to 100 blobs/files (to give an estimate) in a loop (DownloadDirectory func in argo workflows) in that specific step in our workflow. On an estimated 1 in a 1000 times it hangs.

@souravgupta-msft
Copy link
Member

Running one of our tests with the race detector flag (ginkgo --race ...) reported this data race issue

@Todorov99, can you please share the steps to replicate this?

@kgadams
Copy link

kgadams commented Jan 11, 2024

Hi, I work in the same team as Todor.
The call that caused the data race was blockblob.Client.DownloadFile() with a Concurrency of 5.
Since we observed this, we set the concurrency to 1 and this has no longer occurred.
Hope this helps.

@Todorov99
Copy link
Author

Todorov99 commented Jan 11, 2024

Running one of our tests with the race detector flag (ginkgo --race ...) reported this data race issue

@Todorov99, can you please share the steps to replicate this?

So basically i could share a code sample of a function which in case is included in ginkgo It container and executed with the race detector flag (ginkgo --race) might report the stacktrace which i mentioned above.

func (s *StorageAccount) DownloadBlobItem(ctx context.Context, containerClient *container.Client, containerName, blobName, destination string) error {
	blobItemNames, err := s.containerBlobItemsNames(ctx, containerClient)
	if err != nil {
		return err
	}

	for _, blobItemName := range blobItemNames {
		fileBasePath := filepath.Base(*blobItemName)
		if fileBasePath == blobName {
			blobClt := containerClient.NewBlockBlobClient(*blobItemName)
			f, err := os.Create(destination)
			if err != nil {
				return err
			}

			_, err = blobClt.DownloadFile(ctx, f, nil)
			if err != nil {
				return err
			}
		}
	}
	return nil
}

I think the issue is that in case nil is provided for the DownloadFileOptions (3rd argument of the DownloadFile function signature) the default value of 5 will be used for the Concurrency property. Respectively the situation which was describe by @jhendrixMSFT will be reproduced i think.

Note: The report of the stacktrace from the race detector is not constantly reproducible

@souravgupta-msft
Copy link
Member

Thanks for the info. This bug was introduced in this commit 1dc804a. We will fix this in the next release.

@souravgupta-msft souravgupta-msft added this to the Storage-Feb-24 milestone Jan 11, 2024
@vibhansa-msft vibhansa-msft added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Jan 15, 2024
@godelized
Copy link

godelized commented Jan 15, 2024

Hello there.
We have been trying to migrate from https://pkg.go.dev/github.com/Azure/azure-storage-blob-go to the new SDK and encountered downloads being stuck.
In our setup, we do use the 1.2.1 SDK with parallel downloads.

Here are stack dumps of the blocked goroutines.
You can see that it never converges, the goroutines have been blocked on channels for more than 4 hours.

goroutine 82055 [chan receive, 250 minutes]:
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile(0xc000f28c18, {0x1f5e680, 0xc002d50960}, {0x1f3e700?, 0xc0024b6438}, {{0x0, 0x0}, 0x400000, 0x0, 0x0, ...})
	/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/storage/[email protected]/blob/client.go:553 +0xaf7
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).DownloadFile(0xc000f28b70?, {0x1f5e680, 0xc002d50960}, 0x17?, 0xc002625b20?)
	/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/storage/[email protected]/blob/client.go:626 +0x28b
github.com/algolia/metis/modules/pkg/objectstorage/azure.Container.download({0xc0004c00a0, 0xc0004bc2d0, {0xc00005c070, 0x6}, 0xc000e74690, 0xc0004d82d0}, {0x1f5e680, 0xc002d50960}, {0xc002193f00, 0x3b}, ...)
	/go/src/github.com/algolia/metis/modules/pkg/objectstorage/azure/azure.go:257 +0x6ae
github.com/algolia/metis/modules/pkg/objectstorage/azure.Container.Download({0xc0004c00a0, 0xc0004bc2d0, {0xc00005c070, 0x6}, 0xc000e74690, 0xc0004d82d0}, {0x1f5e680, 0xc002d50960}, {0xc002193f00, 0x3b}, ...)
	/go/src/github.com/algolia/metis/modules/pkg/objectstorage/azure/azure.go:209 +0x105
github.com/algolia/metis/modules/pkg/objectstorage.DownloadFileResult.func1()
	/go/src/github.com/algolia/metis/modules/pkg/objectstorage/helpers.go:60 +0x6b
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 155
	/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:72 +0x96
goroutine 82087 [chan send, 250 minutes]:
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/internal/shared.(*mmbPool).Release(0xc0024b6438?, {0x7fa8e6640000, 0x400000, 0x400000})
	/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/storage/[email protected]/internal/shared/buffer_manager.go:61 +0x2a
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile.func3(0xc001d01560?)
	/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/storage/[email protected]/blob/client.go:508 +0x1cd
created by github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob.(*Client).downloadFile in goroutine 82055
	/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/storage/[email protected]/blob/client.go:500 +0x96d

@souravgupta-msft
Copy link
Member

There is a bug in the DownloadFile method which was introduced after this commit 1dc804a. We will fix this in the next release.
For temporary mitigation, you can try azblob v1.1.0.

roelarents added a commit to roelarents/argo-workflows that referenced this issue Jan 16, 2024
roelarents added a commit to roelarents/argo-workflows that referenced this issue Jan 17, 2024
@vibhansa-msft vibhansa-msft linked a pull request Feb 1, 2024 that will close this issue
4 tasks
@vibhansa-msft vibhansa-msft linked a pull request Feb 5, 2024 that will close this issue
5 tasks
@souravgupta-msft
Copy link
Member

Hi, we have released the bug fix. Please update to the latest release azblob v1.3.0.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
AzBlob bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
8 participants