You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The getText() output returns mostly utf16 encoding text, but it seems like there were non utf16 chars added by the parser.
Besides that, I wonder if there is any way to determine which encoding is use? Or maybe, can the parser do a conversion to utf8?
Code
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile($infile);
$t = $pdf->getText();
The text was updated successfully, but these errors were encountered:
Description:
PDF input
There is a file attached to a bug report of pdftotext https://gitlab.freedesktop.org/poppler/poppler/-/issues/332
2004.pdf
Expected output & actual output
The getText() output returns mostly utf16 encoding text, but it seems like there were non utf16 chars added by the parser.
Besides that, I wonder if there is any way to determine which encoding is use? Or maybe, can the parser do a conversion to utf8?
Code
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile($infile);
$t = $pdf->getText();
The text was updated successfully, but these errors were encountered: