-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow text extraction to keep intendation #2054
Comments
I'd be interested in contributing to this enhancement for PyPDF2 @MartinThoma. |
@MrAnayDongre PyPDF2 is deprecated. This is going into pypdf. This is a very complex feature. I don't know myself by know what would be a good way to start doing that. If you want to start contributing to pypdf, I recommend to have a look at
Easy
|
extract_text has now layout extraction_mode |
@pubpub-zz The layout mode does not resolve this and this issue requires further work to convert horizontal positions into whitespace accordingly. I have therefore re-opened this issue. |
@stefan6419846, this is is the rendering:
Isn't this good ? |
Sorry, seems like my checkout was somehow broken. Still not optimal, but yes, then we can close this for now. |
When we extract Python code from a PDF, it's completely messed up. It would be nice to have an option that keeps the indentation. Maybe a flag for a layout-mode?
Code Example: How the new feature could be used
should give:
Currently, we get:
The text was updated successfully, but these errors were encountered: