-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workaround for the Issue #450 #453
Conversation
The file makes that 2 of the Page methods fails. The Page->extractDecodedRawData was not returning the correct string. This was corrected. The Page->getTextArray breaks when the Page->get(´Contents´) returns a PDFObject, but this object makes that the PDFObject->getTextArray($this) throw an Error. But if you detected it and instead call PDFObject->getTextArray() , it returns the correct data. This is a workaround, because, what is exactly the difference in the format of this PDF and why it fails, needs to have a more deep investigation. I run all the PageTests and they work. This happends because the sample Pdf file is not format as we usually see in other files. Actually, I have a similar (not exactly the same) case for a file created with FPDI, that also broke the getTextArray and getDataTm methods, but I am doing a research to see what is actually happends before I open an Issue for that. As soon as I know what is happening in that case, I will opened the Issue, hopefully with the workaround or fix already done.
Guys, I already run CS Fixer, but I still get the errors. Please some advice here. |
Thank you for your work @izabala!
I am busy this week, but I will try to assist you next week. |
I grabbed a copy of your repo @izabala + phpcs was failing (like above). I opened it in my editor and saved and it auto changed quite a few things, now it passes php cs. Here's the updated code: public function testExtractDecodedRawDataIssue450()
{
$filename = $this->rootDir.'/samples/bugs/Issue450.pdf';
$parser = $this->getParserInstance();
$document = $parser->parseFile($filename);
$pages = $document->getPages();
$page = $pages[0];
$extractedDecodedRawData = $page->extractDecodedRawData();
$this->assertIsArray($extractedDecodedRawData);
$this->assertGreaterThan(3, \count($extractedDecodedRawData));
$this->assertIsArray($extractedDecodedRawData[3]);
$this->assertEquals('TJ', $extractedDecodedRawData[3]['o']);
$this->assertIsArray($extractedDecodedRawData[3]['c']);
$this->assertIsArray($extractedDecodedRawData[3]['c'][0]);
$this->assertEquals(3, \count($extractedDecodedRawData[3]['c'][0]));
$this->assertEquals('{signature:signer505906:Please+Sign+Here}', $extractedDecodedRawData[3]['c'][0]['c']);
}
public function testGetDataTmIssue450()
{
$filename = $this->rootDir.'/samples/bugs/Issue450.pdf';
$parser = $this->getParserInstance();
$document = $parser->parseFile($filename);
$pages = $document->getPages();
$page = $pages[0];
$dataTm = $page->getDataTm();
$this->assertIsArray($dataTm);
$this->assertEquals(1, \count($dataTm));
$this->assertIsArray($dataTm[0]);
$this->assertEquals(2, \count($dataTm[0]));
$this->assertIsArray($dataTm[0][0]);
$this->assertEquals(6, \count($dataTm[0][0]));
$this->assertEquals(1, $dataTm[0][0][0]);
$this->assertEquals(0, $dataTm[0][0][1]);
$this->assertEquals(0, $dataTm[0][0][2]);
$this->assertEquals(1, $dataTm[0][0][3]);
$this->assertEquals(67.5, $dataTm[0][0][4]);
$this->assertEquals(756.25, $dataTm[0][0][5]);
$this->assertEquals('{signature:signer505906:Please+Sign+Here}', $dataTm[0][1]);
} Hope this helps! As mentioned in the issue, I think the actual code change is looking good to me. |
Thanks @k00ni. I wait, I just want to be sure what I am doing wrong! |
This test is a bit wonky because it relies on memory values which may differ from system to system and run to run. Adjusted values to fix it. Ref: https://github.com/smalot/pdfparser/pull/453/checks?check_run_id=3397695916#step:6:22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@izabala thank you for the PR.
I took the liberty to directly adapt a few files in your repository (see latest commits). All tests are good now.
In ParserTest I had to refine a few lines to fix failing test. It was not related to your changes.
I have one question (see below). After we solved it we are good to go.
Taking out the line: $decodedText = ''; This was not needed. Thanks @j0k3r
To catching Throwable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏿 Thanks.
As always, I will keep this open for a few days so others can comment.
The file makes that 2 of the Page methods fails.
The Page->extractDecodedRawData was not returning the correct string. This was corrected.
The Page->getTextArray breaks when the Page->get(´Contents´) returns a PDFObject, but this object makes that the PDFObject->getTextArray($this) throw an Error. But if you detected it and instead call PDFObject->getTextArray() , it returns the correct data. This is a workaround, because, what is exactly the difference in the format of this PDF and why it fails, needs to have a more deep investigation. I run all the PageTests and they work.
This happends because the sample Pdf file is not format as we usually see in other files. Actually, I have a similar (not exactly the same) case for a file created with FPDI, that also broke the getTextArray and getDataTm methods, but I am doing a research to see what is actually happends before I open an Issue for that. As soon as I know what is happening in that case, I will opened the Issue, hopefully with the workaround or fix already done.