Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up printing blob names in large blob container? #25243

Closed
pathumd opened this issue Jul 15, 2022 · 2 comments
Closed

How to speed up printing blob names in large blob container? #25243

pathumd opened this issue Jul 15, 2022 · 2 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)

Comments

@pathumd
Copy link

pathumd commented Jul 15, 2022

Recently, I've been working on a Python script that prints a blob's name if it matches a keyword specified by the user. There are several blob containers in the storage account.

Everything seems to work fine, except for one blob container. This container contains roughly 1,100,000 blobs and it takes my Python script approximately 23 minutes to scan through all the blobs and check for a match.
I am fairly new to working with Azure in Python, and I was wondering if there is any possible way to speed up the process of printing blob names in a blob container.

The following code is how I am currently printing out the blob names:

`next_marker = None
 while True:
   generator = container_client.list_blobs(marker=next_marker)
 
   for item in generator:
     if search_keyword in item.name:
       print("Container: {0}, Blob: {1}\n".format(container_client.container_name, item.name))
     
   if not next_marker:
     break
   next_marker = generator.next_marker`
@ghost ghost added customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jul 15, 2022
@xiangyan99 xiangyan99 added Storage Storage Service (Queues, Blobs, Files) Client This issue points to a problem in the data-plane of the library. CXP Attention labels Jul 15, 2022
@ghost ghost added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Jul 15, 2022
@ghost
Copy link

ghost commented Jul 15, 2022

Thank you for your feedback. This has been routed to the support team for assistance.

@vincenttran-msft
Copy link
Member

Hi @pathumd Pathum, thanks for your inquiry. This seems to be a duplicate ask similar to this GitHub issue: Listing blobs names is very slow (#19755)

Rest-assured, our team has been made aware of this scenario and it is on our radar for future feature work to be completed. Unfortunately, I do not have a good time estimate for when this feature will be completed, as we do have some other higher priority work that needs to be completed.

With that being said, I will mark this as closed for now and feel free to post your specific scenario as a follow-up to the aforementioned thread so that we can better track your request all in one place, but you may also re-open this issue if you find that your issue is in fact not a duplicate request.

Thanks!

@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

4 participants