Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gettext exracts partial text from pdf, only heading from each page #751

Open
AntiHate opened this issue Dec 10, 2024 · 1 comment
Open
Labels

Comments

@AntiHate
Copy link

  • PHP Version: 8.3
  • PDFParser Version: 2.11

Description:

Trying to get full text from the PDF, using gettext() extract only the few lines from each page but if use getObjects() I can get text all the text but then content order is random.

PDF input

sample.pdf

Expected output & actual output

actual output

CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 2 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 3 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 4 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 5 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 6 OF 29 PAGES SPE4A6-25-T-189V SECTION A CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 7 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 8 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 9 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 10 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 11 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 12 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: CONTINUED ON NEXT PAGE PAGE 13 OF 29 PAGES SPE4A6-25-T-189V SECTION B CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 14 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 15 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 16 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 17 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 18 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 19 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 20 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION I - CONTRACT CLAUSES (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 21 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 22 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 23 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 24 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 25 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 26 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 27 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 28 OF 29 PAGES CONTINUED ON NEXT PAGE SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED) CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 29 OF 29 PAGES SPE4A6-25-T-189V SECTION K - REPRESENTATIONS, CERTIFICATIONS AND STATEMENTS (CONTINUED)

Code

Getting only partial text using gettext
echo $pdf->getText();

Shows all the text but in random order

$objects = $pdf->getObjects();
foreach ($objects as $key => $object) {
        echo $object->getText();
}
@k00ni k00ni added the bug label Dec 10, 2024
@orandev
Copy link

orandev commented Dec 10, 2024

Hello
I have the same issue with the following file.
I have entered "MYSTRING123" in one of the pdf form fields.
sample with mystring123 in one form field.pdf

$pdf->getText() does not show this string. But a loop on getObjects shows it.

PHP Version: 8.3
PDFParser Version: 2.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants