Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_block_list(...): 'utf-8' codec can't decode byte 0x92 in position 5: invalid start byte #16314

Closed
ecc256 opened this issue Jan 24, 2021 · 3 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@ecc256
Copy link

ecc256 commented Jan 24, 2021

azure-storage-blob
12.7.1
Winserver 2016
Conda Python 3.9.0

The code:
block_list = blob_client.get_block_list(block_list_type='committed')
throws
# 'utf-8' codec can't decode byte 0x92 in position 5: invalid start byte
for any blockblob which is weblog created by Azure App Service logging.
It works fine for any blockblob created from Python SDK, where my code makes all block_ids 'utf-8' encoded:
block_id = f'{a}'.encode('utf-8')

Did I do something wrong?
Or is there a way to read blockblob block_list with block_ids NOT 'utf-8' encoded?

GetBlockList() works w/o issues with C# SDK.
It produces long list of block names like:
AAAAAD2SEsiRhJdJhjwCnnXFI3U=
AQAAALoHi8mdLzRCucUfofA4DuU=
AgAAAAyB31Bx305PvQ/T7vtFpVg=
AwAAAA11eqSgqvBCogtQtmRIoQ0=

pip list
azure-common 1.1.26
azure-core 1.10.0
azure-eventhub 5.2.0
azure-eventhub-checkpointstoreblob-aio 1.1.1
azure-identity 1.5.0
azure-kusto-data 1.0.3
azure-kusto-ingest 1.0.3
azure-mgmt-core 1.2.2
azure-mgmt-datafactory 0.15.0
azure-mgmt-kusto 0.10.0
azure-mgmt-resource 15.0.0
azure-storage-blob 12.7.1
azure-storage-queue 12.1.4

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 24, 2021
@xiangyan99 xiangyan99 added bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 25, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jan 25, 2021
@tasherif-msft tasherif-msft self-assigned this Jan 28, 2021
@tasherif-msft
Copy link
Contributor

Hi @ecc256 I will investigate this issue and get back to you as soon as possible!

xiafu-msft added a commit to xiafu-msft/azure-sdk-for-python that referenced this issue May 14, 2021
@xiafu-msft
Copy link
Contributor

Hi @ecc256 This pr #18751 should be able to fix the problem, sorry about the inconvenience!

@tjprescott tjprescott added the Service Attention Workflow: This issue is responsible by Azure service team. label May 14, 2021
@ghost
Copy link

ghost commented May 14, 2021

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

Issue Details

azure-storage-blob
12.7.1
Winserver 2016
Conda Python 3.9.0

The code:
block_list = blob_client.get_block_list(block_list_type='committed')
throws
# 'utf-8' codec can't decode byte 0x92 in position 5: invalid start byte
for any blockblob which is weblog created by Azure App Service logging.
It works fine for any blockblob created from Python SDK, where my code makes all block_ids 'utf-8' encoded:
block_id = f'{a}'.encode('utf-8')

Did I do something wrong?
Or is there a way to read blockblob block_list with block_ids NOT 'utf-8' encoded?

GetBlockList() works w/o issues with C# SDK.
It produces long list of block names like:
AAAAAD2SEsiRhJdJhjwCnnXFI3U=
AQAAALoHi8mdLzRCucUfofA4DuU=
AgAAAAyB31Bx305PvQ/T7vtFpVg=
AwAAAA11eqSgqvBCogtQtmRIoQ0=

pip list
azure-common 1.1.26
azure-core 1.10.0
azure-eventhub 5.2.0
azure-eventhub-checkpointstoreblob-aio 1.1.1
azure-identity 1.5.0
azure-kusto-data 1.0.3
azure-kusto-ingest 1.0.3
azure-mgmt-core 1.2.2
azure-mgmt-datafactory 0.15.0
azure-mgmt-kusto 0.10.0
azure-mgmt-resource 15.0.0
azure-storage-blob 12.7.1
azure-storage-queue 12.1.4

Author: ecc256
Assignees: xiafu-msft, tasherif-msft
Labels:

Client, Service Attention, Storage, bug, customer-reported

Milestone: -

rakshith91 pushed a commit to rakshith91/azure-sdk-for-python that referenced this issue Jul 16, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

6 participants