Hang when downloading a large blob #25358

ktaebum · 2022-07-22T02:20:44Z

Package Name: azure-storage-blob
Package Version: 12.13.0
Operating System: Ubuntu 20.04 x86
Python Version: 3.10

Describe the bug
I think this is related to #10572,
I am trying to download a blob whose size is 24GB.
I use download_blob of BlobClient and I set max_concurrency as 32 on my azure VM (VM size is Standard_D4ds_v5).

Expected behavior
I expect the downloading to be completed successfully.

Screenshots
Downloading is hanged as the following screenshot.
Ignore MB, it is B (bytes)

Additional context
This is a heisenbug. Sometimes downloading is finished successfully.

I've seen that the previous issue is fixed as https://github.com/Azure/azure-sdk-for-python/pull/18164/files.
However, I think it would be better if a user can configure max_retry which is fixed as 3 currently.

The text was updated successfully, but these errors were encountered:

ghost · 2022-07-22T15:38:37Z

Thank you for your feedback. This has been routed to the support team for assistance.

jalauzon-msft · 2022-07-27T19:05:43Z

Hi @ktaebum, thanks for reaching out and sorry for the delay. A couple of follow-up questions/points.

How long do you wait when the download starts hanging? Or do you ever get an error returned? I ask because we have a setting called read_timeout which has very high default value of 80,000 seconds (we know this isn't great and will likely be changing it soon). So, if the server has stopped sending data, that's how long we will wait before raising an error. You can try configuring this read_timeout on your client constructor to something more reasonable and waiting to see if you then get an error we can investigate further. See README.
The retry logic in the issue you linked is only for response streaming which I'm not sure is taking place here. We have a different, higher-level, configurable retry policy that can be adjusted. See this section of the README for more details on that.
It may be interesting to enable debug logging if possible so we can see more details on the requests going out and maybe find the request that's hanging. You can see this page for more info but there's a bit there so here's a sample of one way you could enable the logging.

import sys
import logging
from azure.storage.blob import BlobClient

# Set the logging level for the azure.storage.blob library
logger = logging.getLogger('azure.storage.blob')
logger.setLevel(logging.DEBUG)

# Direct logging output to stdout. Without adding a handler,
# no logging output is visible.
handler = logging.StreamHandler(stream=sys.stdout)
logger.addHandler(handler)

blob_client = BlobClient(..., logging_enable=True)

Ultimately, this is likely a server-side issue but let's try and gather some more info before involving the service team. Thanks!

ktaebum · 2022-07-28T03:25:49Z

@jalauzon-msft Thanks for the reply.

I waited not much time (just a couple of minutes) and no error returned.
If I set read_timeout as 10 seconds, I've seen

However, I've checked that downloading does not fail if I set retry_total as a very large number.
Thanks for letting me know about logging. I will try it if I have another problem.

ktaebum · 2022-07-28T04:46:30Z

Unfortunately, Read timed out still occurs (not always) when I set retry_total as a very large number (about 10000000) 🙁

jalauzon-msft · 2022-08-27T00:48:09Z

Hi @ktaebum, apologies for the long delay. Read timeouts will be automatically retried by the SDK and it seems, from the screenshot you shared, this did help for that particular download as you see it time out and then continue. Changing the retry count will not eliminate read timeouts but will change the number of times a read timeout can be retried. Are you still seeing downloads not complete because of read timeouts? If they are completing after a read timeout, then the retry mechanism is working as expected and you should be good.

I would recommend setting your read_timeout to 60 seconds and your retry_total to something reasonable like 5-10. These are both changes we are planning to make to the defaults in the SDK in an upcoming release.

If you've done all this and are still having trouble downloading blobs and experiencing so many read timeouts where the blob will not complete downloading, I would recommend opening a support ticket for your Storage account to have the service team investigate further. Thanks!

ghost · 2022-09-01T00:22:55Z

Hi @ktaebum. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “/unresolve” to remove the “issue-addressed” label and continue the conversation.

ghost · 2022-09-08T04:03:36Z

Hi @ktaebum, since you haven’t asked that we “/unresolve” the issue, we’ll close this out. If you believe further discussion is needed, please add a comment “/unresolve” to reopen the issue.

ghost added customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jul 22, 2022

ktaebum changed the title ~~Hang when downloading large blob~~ Hang when downloading a large blob Jul 22, 2022

xiangyan99 assigned jalauzon-msft and vincenttran-msft Jul 22, 2022

xiangyan99 added Storage Storage Service (Queues, Blobs, Files) Client This issue points to a problem in the data-plane of the library. CXP Attention labels Jul 22, 2022

ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Jul 22, 2022

jalauzon-msft added the issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. label Sep 1, 2022

ghost removed the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Sep 1, 2022

ghost closed this as completed Sep 8, 2022

YuvalItzchakov mentioned this issue Feb 9, 2023

Large file downloads hang indefinitely #28713

Closed

github-actions bot locked and limited conversation to collaborators Apr 11, 2023

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hang when downloading a large blob #25358

Hang when downloading a large blob #25358

ktaebum commented Jul 22, 2022 •

edited

Loading

ghost commented Jul 22, 2022

jalauzon-msft commented Jul 27, 2022

ktaebum commented Jul 28, 2022 •

edited

Loading

ktaebum commented Jul 28, 2022 •

edited

Loading

jalauzon-msft commented Aug 27, 2022 •

edited

Loading

ghost commented Sep 1, 2022

ghost commented Sep 8, 2022

Hang when downloading a large blob #25358

Hang when downloading a large blob #25358

Comments

ktaebum commented Jul 22, 2022 • edited Loading

ghost commented Jul 22, 2022

jalauzon-msft commented Jul 27, 2022

ktaebum commented Jul 28, 2022 • edited Loading

ktaebum commented Jul 28, 2022 • edited Loading

jalauzon-msft commented Aug 27, 2022 • edited Loading

ghost commented Sep 1, 2022

ghost commented Sep 8, 2022

ktaebum commented Jul 22, 2022 •

edited

Loading

ktaebum commented Jul 28, 2022 •

edited

Loading

ktaebum commented Jul 28, 2022 •

edited

Loading

jalauzon-msft commented Aug 27, 2022 •

edited

Loading