-
-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeEncodeError when using Stream flavor #183
Comments
At this point I'm willing to put try/except code around the |
I found this solution (it is a monkey patch): https://stackoverflow.com/questions/63403629/python-camelot-pdf-unicodeencodeerror-when-using-stream-flavor-on-windows/ |
Thanks @anakin87 this works great. |
@anakin87 Would you like to open a PR to fix this in the library itself? :) |
It is my first PR. If it is uncorrect, please provide some help. |
@anakin87 It looks good! I'm waiting for the the tests to pass so that I can merge it, even though there isn't a test for the Also, I've noticed that you use a lot of different camelot features, based on your issue tracker replies and SO answers. I would love to chat about how you use camelot if you have some time this / next week! |
Python 3.7 on Windows
Using this pdf: http://tsbde.texas.gov/78i8ljhbj/Fiscal-Year-2014-Disciplinary-Actions.pdf
I am running it through Camelot to convert to html using Stream flavor and I get the following error at execution of the
export
line, once it reaches page 4 of 8:"UnicodeEncodeError -'charmap' codec can't encode character '\u2010' in position y: character maps to undefined."
Pages 1 through 3 get converted nicely - it crashes somewhere between page 4 and 5. In debug with the breakpoint after the
tables.export
line, it also brings me to line 19 of cp1252.py, if that's helpful.I am on Windows, and this seems not to be an issue on Mac. But Windows is our environment so I have to figure this out. I have done a ton of research on this error and everything for this in Python world points to either adding
encoding="utf-8"
orerrors="ignore"
, but those both relate to thefile.read
method and can't be used in Camelot'sexport
method.Any thoughts on what I could add to the script to get around this error? We can't avoid using Windows, and this seems to be the final blocker for us for being able to really make great use of this tool for our PDF's.
The text was updated successfully, but these errors were encountered: