Detect content encoding if invalid charset was specified #2549

decaz · 2017-11-23T13:08:12Z

What do these changes do?

Add processing of invalid charsets while detecting content encoding
Make the aiohttp.ClientResponse.get_encoding method public

Are there changes in behavior for the user?

Autodetection of content encoding is now working even if content provided with invalid charset (such charset is taken as if it was not provided).

Related issue number

There are no any opened issues that will be resolved by merging this change.

Checklist

I think the code is well written
Unit tests for the changes exist
Documentation reflects the changes
If you provide code modification, please add yourself to CONTRIBUTORS.txt
- The format is <Name> <Surname>.
- Please keep alphabetical order, the file is sorted by names.
Add a new news fragment into the CHANGES folder
- name it <issue_id>.<type> for example (588.bug)
- if you don't have an issue_id change it to the pr id after creating the pr
- ensure type is one of the following:
  - .feature: Signifying a new feature.
  - .bugfix: Signifying a bug fix.
  - .doc: Signifying a documentation improvement.
  - .removal: Signifying a deprecation or removal of public API.
  - .misc: A ticket has been closed, but it is not of interest to users.
- Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

asvetlov · 2017-11-23T13:14:52Z

Sorry, I don't understand your use case.
How making a private method public fixes content encoding autodetection (as you mentioned in PR's initial message)?

decaz · 2017-11-23T13:22:18Z

@asvetlov I have rewrote the description :)
P.S.: there is something strange with doc-spelling at the Travis, it succeeds on my machine, but it fails at CI =/

asvetlov · 2017-11-23T13:27:08Z

Catching LookupError looks good but I still don't understand the reason for _get_encoding -> get_encoding renaming.

Local make doc-spelling doesn't check a CHANGES/xxx file.

decaz · 2017-11-23T13:44:26Z

@asvetlov currently I am fetching sites by reading content by chunks and have to use "private" _get_encoding method to get resulting content's encoding:

encoding = response._get_encoding()

So it will be very handy to have such method as public helper. What do you think?

codecov-io · 2017-11-23T13:46:19Z

Codecov Report

Merging #2549 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2549      +/-   ##
==========================================
+ Coverage   97.08%   97.08%   +<.01%     
==========================================
  Files          40       40              
  Lines        8135     8141       +6     
  Branches     1438     1439       +1     
==========================================
+ Hits         7898     7904       +6     
  Misses        100      100              
  Partials      137      137

Impacted Files	Coverage Δ
aiohttp/client_reqrep.py	`97.22% <100%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e180b12...994b4f6. Read the comment docs.

asvetlov · 2017-11-23T14:18:15Z

Aaah, makes sense.
Another nice to have feature is allowing to specify max data size for sniffing to prevent reading the whole response in memory.
UniversalDetector may help: https://chardet.readthedocs.io/en/latest/usage.html#example-detecting-encoding-incrementally

decaz · 2017-11-23T17:16:22Z

@asvetlov thanks for the advice! I'll think about implementation of detection encoding incrementally. It may be worth to implement this at the response.content.read method level (add new parameter, for instance detect_encoding=True), which will detect encoding incrementally by chunks and assign it to the response.content.encoding attribute.

asvetlov · 2017-11-23T18:50:00Z

Well, let's merge the PR as is.
Please make an issue/PR for encoding detection improvements

asvetlov · 2017-11-23T18:50:19Z

Thanks!

lock · 2019-10-28T12:04:42Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

decaz added 4 commits November 23, 2017 15:52

Add the processing of invalid charsets while detecting content encoding

3fe8eb2

Make the aiohttp.ClientResponse.get_encoding method public

14d2e4a

Add docs

fcaa303

Fix tests

90a33a2

Fix change description

994b4f6

Update client_reference.rst

325209c

asvetlov approved these changes Nov 23, 2017

View reviewed changes

asvetlov merged commit 67eb1e7 into aio-libs:master Nov 23, 2017

decaz deleted the resp-get-encoding branch November 24, 2017 17:09

decaz mentioned this pull request Sep 27, 2019

Memory efficient encoding detection #4112

Closed

lock bot added the outdated label Oct 28, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect content encoding if invalid charset was specified #2549

Detect content encoding if invalid charset was specified #2549

decaz commented Nov 23, 2017 •

edited

Loading

asvetlov commented Nov 23, 2017 •

edited

Loading

decaz commented Nov 23, 2017

asvetlov commented Nov 23, 2017 •

edited

Loading

decaz commented Nov 23, 2017

codecov-io commented Nov 23, 2017 •

edited

Loading

asvetlov commented Nov 23, 2017

decaz commented Nov 23, 2017 •

edited

Loading

asvetlov commented Nov 23, 2017

asvetlov commented Nov 23, 2017

lock bot commented Oct 28, 2019

Detect content encoding if invalid charset was specified #2549

Detect content encoding if invalid charset was specified #2549

Conversation

decaz commented Nov 23, 2017 • edited Loading

What do these changes do?

Are there changes in behavior for the user?

Related issue number

Checklist

asvetlov commented Nov 23, 2017 • edited Loading

decaz commented Nov 23, 2017

asvetlov commented Nov 23, 2017 • edited Loading

decaz commented Nov 23, 2017

codecov-io commented Nov 23, 2017 • edited Loading

Codecov Report

asvetlov commented Nov 23, 2017

decaz commented Nov 23, 2017 • edited Loading

asvetlov commented Nov 23, 2017

asvetlov commented Nov 23, 2017

lock bot commented Oct 28, 2019

decaz commented Nov 23, 2017 •

edited

Loading

asvetlov commented Nov 23, 2017 •

edited

Loading

asvetlov commented Nov 23, 2017 •

edited

Loading

codecov-io commented Nov 23, 2017 •

edited

Loading

decaz commented Nov 23, 2017 •

edited

Loading