Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError in version 51 #1025

Closed
tomek1024 opened this issue Jan 11, 2020 · 2 comments
Closed

UnicodeDecodeError in version 51 #1025

tomek1024 opened this issue Jan 11, 2020 · 2 comments
Labels
crash Problems preventing documents from being rendered
Milestone

Comments

@tomek1024
Copy link

I updated yesterday from version 50 to 51, and in some cases I get UnicodeDecodeError, although the file is perfectly valid. I use &nsbp; to bind words which should not be left at the end of line, and ­ for manual hyphenation.

I put the file here: https://zecer.elibri.com.pl/lte/unicode-error.html

When I call weasyprint on it, I get:

Traceback (most recent call last):
  File "/usr/local/bin/weasyprint", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/__main__.py", line 212, in main
    getattr(html, 'write_' + format_)(output, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/__init__.py", line 211, in write_pdf
    font_config=font_config).write_pdf(
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/__init__.py", line 168, in render
    font_config)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/document.py", line 393, in _render
    [Page(page_box, enable_hinting) for page_box in page_boxes],
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/document.py", line 393, in <listcomp>
    [Page(page_box, enable_hinting) for page_box in page_boxes],
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/__init__.py", line 126, in layout_document
    pages = list(make_all_pages(context, root_box, html, pages))
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/pages.py", line 804, in make_all_pages
    page, resume_at = remake_page(i, context, root_box, html)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/pages.py", line 743, in remake_page
    page_number, page_state)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/pages.py", line 554, in make_page
    positioned_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 63, in block_level_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 77, in block_level_layout_switch
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 130, in block_box_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 507, in block_container_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 63, in block_level_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 77, in block_level_layout_switch
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 130, in block_box_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 507, in block_container_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 63, in block_level_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 77, in block_level_layout_switch
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 130, in block_box_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/blocks.py", line 373, in block_container_layout
    for line, resume_at in lines_iterator:
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/inlines.py", line 53, in iter_line_boxes
    absolute_boxes, fixed_boxes, first_letter_style)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/inlines.py", line 108, in get_next_linebox
    line_children=[])
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/inlines.py", line 754, in split_inline_box
    line_placeholders, child_waiting_floats, line_children))
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/inlines.py", line 589, in split_inline_level
    context, box, max_x - position_x, skip)
  File "/usr/local/lib/python3.6/dist-packages/weasyprint/layout/inlines.py", line 1008, in split_text_box
    length = len(encoded[:length].decode('utf8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 63: unexpected end of data

I was trying to debug is. The variable encoded in inlines.py on line 1008 has the value

b'cz\xc4\x85c\xc4\x85 umy\xc2\xadwalk\xc4\x99, w\xc4\x85skie \xc5\x82\xc3\xb3\xc5\xbcko, p\xc3\xb3\xc5\x82k\xc4\x99 na\xc2\xa0ksi\xc4\x85\xc5\xbcki i\xc2\xa0biurko, a\xc2\xa0na\xc2\xa0nim obli\xc2\xadcze\xc2\xadnia kwan\xc2\xadtowe. Czy mia\xc5\x82a kon\xc2\xadty\xc2\xadnu\xc2\xadowa\xc4\x87?'

and length has the value 64, encoded[:length] is

b'cz\xc4\x85c\xc4\x85 umy\xc2\xadwalk\xc4\x99, w\xc4\x85skie \xc5\x82\xc3\xb3\xc5\xbcko, p\xc3\xb3\xc5\x82k\xc4\x99 na\xc2\xa0ksi\xc4\x85\xc5\xbcki i\xc2'

which is not valid utf-8. It looks for me like the length is incorrectly computed.

It seems like the problem was introduced in version 51.

@liZe liZe added the crash Problems preventing documents from being rendered label Jan 11, 2020
@liZe liZe closed this as completed in d2278ae Jan 13, 2020
@liZe liZe added this to the 52 milestone Jan 13, 2020
@liZe
Copy link
Member

liZe commented Jan 13, 2020

Hello!

Sorry for this bug, it only happens on peculiar situations and has been introduced while fixing #954. We should add a non-regression test, but it’s probably hard to add one only using Ahem.

@tomek1024
Copy link
Author

Thank you, the file is rendered correctly now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Problems preventing documents from being rendered
Projects
None yet
Development

No branches or pull requests

2 participants