Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickletools Default Encoding #126997

Closed
Legoclones opened this issue Nov 19, 2024 · 7 comments
Closed

Pickletools Default Encoding #126997

Legoclones opened this issue Nov 19, 2024 · 7 comments
Assignees
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@Legoclones
Copy link
Contributor

Legoclones commented Nov 19, 2024

Bug report

Bug description:

When unpickling using _pickle.c or pickle.py through load/loads, an encoding can be specified using the encoding argument, with the default being ASCII. However, pickletools does not support custom encodings and instead makes assumptions about what encoding it uses, which can lead to either incorrect data being displayed or erroring/not erroring when the normal unpickling process would error.

The three opcodes that I have found this in are STRING, BINSTRING, and SHORT_BINSTRING:

  • On line 359, it assumes ASCII for STRING
  • On line 456, it assumes latin-1 for BINSTRING
  • On line 422, it assumes latin-1 for SHORT_BINSTRING

I think the best solution would be to support encodings as an optional argument in pickletools.py, with the default being set to ASCII (since that's the default encoding for pickle.py and _pickle.c).

CPython versions tested on:

3.11

Operating systems tested on:

Linux

Linked PRs

@Legoclones Legoclones added the type-bug An unexpected behavior, bug, or error label Nov 19, 2024
@kanishka-coder0809
Copy link

hey @Legoclones I think I can work with this issue ...can you please assign me with this

@Legoclones
Copy link
Contributor Author

I don't think I have permission to assign someone

@kanishka-coder0809
Copy link

So should I start working upon this and thank make a PR.?

@serhiy-storchaka serhiy-storchaka self-assigned this Nov 20, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Nov 20, 2024
* Fix support of STRING and GLOBAL opcodes with non-ASCII arguments.
* dis() now outputs non-ASCII bytes in STRING, BINSTRING and
  SHORT_BINSTRING arguments as escaped (\xXX).
@serhiy-storchaka
Copy link
Member

It is not a problem if pickletools accepts an input which is not valid for pickle. But there are real bugs here:

  • pickletools does not support STRING with non-ASCII argument. pickle allows to specify encoding (ASCII by default).
  • pickletools does not support GLOBAL with non-ASCII argument. pickle uses UTF-8.
  • pickletools.dis() outputs non-ASCII arguments for BINSTRING and SHORT_BINSTRING as Latin1-encoded. pickle allows to specify encoding (ASCII by default). It would be better to output non-ASCII bytes as \xXX.

#127062 fixes them.

@serhiy-storchaka
Copy link
Member

@kanishka-coder0809, sorry, I already worked on this issue. It is slightly larger than initially was reported. On other hand, I think that I found a better solution than supporting optional argument for encoding. The latter would require more significant rewriting of the pickletools module and likely could be too complex for backporting.

@Legoclones
Copy link
Contributor Author

That looks great!

@picnixz picnixz added the stdlib Python modules in the Lib dir label Nov 21, 2024
serhiy-storchaka added a commit that referenced this issue Nov 21, 2024
* Fix support of STRING and GLOBAL opcodes with non-ASCII arguments.
* dis() now outputs non-ASCII bytes in STRING, BINSTRING and
  SHORT_BINSTRING arguments as escaped (\xXX).
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 21, 2024
…honGH-127062)

* Fix support of STRING and GLOBAL opcodes with non-ASCII arguments.
* dis() now outputs non-ASCII bytes in STRING, BINSTRING and
  SHORT_BINSTRING arguments as escaped (\xXX).
(cherry picked from commit eaf2171)

Co-authored-by: Serhiy Storchaka <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 21, 2024
…honGH-127062)

* Fix support of STRING and GLOBAL opcodes with non-ASCII arguments.
* dis() now outputs non-ASCII bytes in STRING, BINSTRING and
  SHORT_BINSTRING arguments as escaped (\xXX).
(cherry picked from commit eaf2171)

Co-authored-by: Serhiy Storchaka <[email protected]>
@serhiy-storchaka
Copy link
Member

Thank you for your report @Legoclones.

@serhiy-storchaka serhiy-storchaka added 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 21, 2024
serhiy-storchaka added a commit that referenced this issue Nov 21, 2024
…-127062) (GH-127095)

* Fix support of STRING and GLOBAL opcodes with non-ASCII arguments.
* dis() now outputs non-ASCII bytes in STRING, BINSTRING and
  SHORT_BINSTRING arguments as escaped (\xXX).
(cherry picked from commit eaf2171)

Co-authored-by: Serhiy Storchaka <[email protected]>
serhiy-storchaka added a commit that referenced this issue Nov 21, 2024
…-127062) (GH-127094)

* Fix support of STRING and GLOBAL opcodes with non-ASCII arguments.
* dis() now outputs non-ASCII bytes in STRING, BINSTRING and
  SHORT_BINSTRING arguments as escaped (\xXX).
(cherry picked from commit eaf2171)

Co-authored-by: Serhiy Storchaka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

4 participants