-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error thrown on getDataTm() - Call to a member function decodeText() on null #450
Comments
@k00ni I will have a look. @eddturtle I have a look at the file, it just have 1 line... with: {signature:signer505906:Please+Sign+Here} on it. Is that ok? |
The problem with this Case, is that the PDF file, doesnt behaves like "Normal" pdfs files. Actually, I was going to open another issue with one file, which I am working with, that is created using FPDI, that also doesnt behave like "Normal" pdfs files. In both cases Page::getTextArray() doesnt give the right data. I already have a work around for this case using Page::getTextArray (but changing it a little bid). I will let to open a new Issue to discuss the FPDI and not get a mess with this case. |
Hi, I already made the fix/workaround, but when I make the pull request, the automatica validation is giving me some errors, can someone help me with that?? (By the way, if the directions are for a windows machine, is better for me). |
I've tried the same thing again, but changed the pdfparser code on my local computer to copy the changes you made in the linked commit and it looks like it works to me. It's returning data + text through |
Please bear with me, related pull request is #453. I will have a look next week to bring the fix on the way. @eddturtle it would be great if you could help us test these changes. |
Your welcome @eddturtle !!!! I just waiting the help from @k00ni so we can have the code finally merge in the master branch. |
* workaround for the Issue #450 The file makes that 2 of the Page methods fails. The Page->extractDecodedRawData was not returning the correct string. This was corrected. The Page->getTextArray breaks when the Page->get(´Contents´) returns a PDFObject, but this object makes that the PDFObject->getTextArray($this) throw an Error. But if you detected it and instead call PDFObject->getTextArray() , it returns the correct data. This is a workaround, because, what is exactly the difference in the format of this PDF and why it fails, needs to have a more deep investigation. I run all the PageTests and they work. This happends because the sample Pdf file is not format as we usually see in other files. Actually, I have a similar (not exactly the same) case for a file created with FPDI, that also broke the getTextArray and getDataTm methods, but I am doing a research to see what is actually happends before I open an Issue for that. As soon as I know what is happening in that case, I will opened the Issue, hopefully with the workaround or fix already done. * PageTest: attempt to fix cs issues * Page.php: fixed cs issues * ParserTest: fixed failing test testRetainImageContentImpact This test is a bit wonky because it relies on memory values which may differ from system to system and run to run. Adjusted values to fix it. Ref: https://github.com/smalot/pdfparser/pull/453/checks?check_run_id=3397695916#step:6:22 * refined memory threshold in ParserTest::testRetainImageContentImpact * Update Page.php * Taking out line Taking out the line: $decodedText = ''; This was not needed. Thanks @j0k3r * Changing the catch of the Error To catching Throwable. Co-authored-by: Konrad Abicht <[email protected]>
* workaround for the Issue #450 The file makes that 2 of the Page methods fails. The Page->extractDecodedRawData was not returning the correct string. This was corrected. The Page->getTextArray breaks when the Page->get(´Contents´) returns a PDFObject, but this object makes that the PDFObject->getTextArray($this) throw an Error. But if you detected it and instead call PDFObject->getTextArray() , it returns the correct data. This is a workaround, because, what is exactly the difference in the format of this PDF and why it fails, needs to have a more deep investigation. I run all the PageTests and they work. This happends because the sample Pdf file is not format as we usually see in other files. Actually, I have a similar (not exactly the same) case for a file created with FPDI, that also broke the getTextArray and getDataTm methods, but I am doing a research to see what is actually happends before I open an Issue for that. As soon as I know what is happening in that case, I will opened the Issue, hopefully with the workaround or fix already done. * PageTest: attempt to fix cs issues * Page.php: fixed cs issues * ParserTest: fixed failing test testRetainImageContentImpact This test is a bit wonky because it relies on memory values which may differ from system to system and run to run. Adjusted values to fix it. Ref: https://github.com/smalot/pdfparser/pull/453/checks?check_run_id=3397695916#step:6:22 * refined memory threshold in ParserTest::testRetainImageContentImpact * Update Page.php * Taking out line Taking out the line: $decodedText = ''; This was not needed. Thanks @j0k3r * Changing the catch of the Error To catching Throwable. * Fix/workaround for Issue #454 When the pdf files is produced by setasign/fpdi/fpdi or FPDF, this correct that nothing is returning by the methods. But for doing that things like to know that the producer is FPDF and the page number are required and used in conjunction with getXObjects. * Update Page.php Some of the changes asked in Github by kOOni * Update Page.php Other changes asked by k00ny * Some other recomendations Some other @k00ni recommendations * After manually doing php-cs-fixer I manually run dev-tools\vendor\bin\php-cs-fixer fix * Correcting the phpstan error * Update Page.php just to make a code enhacement * Removing vscode\lauch.json and some corrections Some corrections metions by @k00ni. * creating some function to get this clearer Follow the recomendation of @k00ni on using extra function to have the code clearer. * After applaying some @k00ni recomendations Many changes following @k00ni recommendations. * Updating the comment for the isFpdf function Better explanation for the function * Changes for correcting phpstan errors * some changes Changes in comments, functions names and variable names. * Reformatted some code parts Co-authored-by: Konrad Abicht <[email protected]>
@k00ni @eddturtle This problem should be closed, shouldn't it? |
Hello, I'm trying to find the X, Y coords for a specific piece of text inside a PDF. I'm trying to use
getDataTm()
(correct me if that's the wrong method to use).This works for many pdfs, but throws an error for this one example pdf.
myfile.pdf
Example code:
Error thrown:
I've tried this on php 7.4 and php 8.0 (running through apache2) on ubuntu 18.04.
Any ideas on how to get this pdf to process?
The text was updated successfully, but these errors were encountered: