-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added function to parse base64 encoded PDFs #493
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @granjero, thank you for you pull request.
Your changes seem too small and can be implemented with just 1 line of code, like
$decoded = (new Parser())->parseContent(base64_decode($base64EncodedPdf));
What is the benefit having this as a new function?
Hello @k00ni, thank you for taking the time in reviewing the request. I found this library while working with an API that returns a PDF in a base64 string. I could not find any mention in the docs for base64 strings. At first I thought I had to save the file and then read it with parseFile() but wondered if i could do it in the fly. So I came with that solution, wrote the function and it worked. If you think it's worth it I can write the tests and meet the other labeled requirements. Thank you. jm PS: Forgive my rusty english. |
Thanks for getting back to me quickly.
My suggestion would be:
What do you think? If you agree, could you propose such a change? |
Ok, so you are saying something like this at the end of DEVELOPER.md or in the documentation Base64 encoded PDFsIf working with Base64 encoded PDFs you might want to parse the PDF without saving the file on disk. This sample will parse the Base64 encoded PDF and extract text from each page. <?php
// Include Composer autoloader if not already done.
include 'vendor/autoload.php';
// Parse Base64 encoded PDF string and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseContent(base64_decode($base64PDF));
$text = $pdf->getText();
echo $text; |
Good suggestion. I refined it a bit. We don't have control over pdfparser.org, therefore DEVELOPER.md is the way to go.
Suggestion for README.md:
|
Ok. I hope I did it right. |
Its looks good! But I dont have the time right now to look into it further. WIll do it hopefully in the first two weeks of January. Bear with me. |
Removed because we decided to focus on improving documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted many of your changes in README.md, only kept reference to DEVERLOPER.md section so this PR is only about Base64 information.
We plan to update documentation overall, which will include some of your changes in README.md (see #498). You are welcome to contribute.
@granjero do you agree with this state of the PR?
I merged our work @granjero to keep the show going. Thanks for your contribution. |
Sorry, I was afk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 47 Base63 instead of Base64
This function instead of receiving a filename like parseFile it receives a base 64 enconded PDF.