Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed text from Columns #68

Closed
wackerm opened this issue May 28, 2015 · 2 comments
Closed

Mixed text from Columns #68

wackerm opened this issue May 28, 2015 · 2 comments

Comments

@wackerm
Copy link

wackerm commented May 28, 2015

If there are text columns in the pdf, the text via getText() is mangled together.

For e.g:

Column Column
This is Column This is Column
one, and there is two and there is
even more text not more text

gets transformed to

This ist Column This is Column
one, and there is two and there is
even more text not more text

Maybe this could be targeted by using some special chars like tab "\t"?

@rubenvanerk
Copy link
Contributor

This is fixed with #505
You can now do this:

$config = new \Smalot\PdfParser\Config();
$config->setHorizontalOffset("\t");

$parser = new \Smalot\PdfParser\Parser([], $config);
$pdf = $parser->parseFile('path/to/file');
$text = $pdf->getText();

@k00ni
Copy link
Collaborator

k00ni commented Jan 19, 2022

Closed. Please reopen if problem persists.

@k00ni k00ni closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants