getText() adds the letter "j" to a lot of words in the PDF content #353

conkreet · 2020-10-07T18:58:46Z

When I parse the attached PDF, the letter "j" is added to random words when I use getText(). For instance, in some cases the term "BrabantWonen" is changed to "BjrabantWonen" and "aanbrengen" becomes "aanbrengejn". Is this a known issue?

4c011e062aafcc305834aa245734eafc6945c1dc.pdf

GreyWyvern · 2023-08-14T14:02:24Z

Hi @conkreet. Are we able to use your sample PDF 4c011e062aafcc305834aa245734eafc6945c1dc.pdf in the PdfParser test suite? Is the file free to use? Thanks!

k00ni added the missing or incomplete functionality For something which is not a bug, but more like an incomplete feature. label Oct 8, 2020

GreyWyvern mentioned this issue Aug 10, 2023

PdfParser does not consider the entire document stream #628

Closed

GreyWyvern mentioned this issue Aug 18, 2023

Major Update to PDFObject.php + Ancillary #634

Merged

k00ni closed this as completed in #634 Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getText() adds the letter "j" to a lot of words in the PDF content #353

getText() adds the letter "j" to a lot of words in the PDF content #353

conkreet commented Oct 7, 2020

GreyWyvern commented Aug 14, 2023

getText() adds the letter "j" to a lot of words in the PDF content #353

getText() adds the letter "j" to a lot of words in the PDF content #353

Comments

conkreet commented Oct 7, 2020

GreyWyvern commented Aug 14, 2023