Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added function to parse base64 encoded PDFs #493

Merged
merged 4 commits into from
Jan 10, 2022
Merged

Conversation

granjero
Copy link
Contributor

This function instead of receiving a filename like parseFile it receives a base 64 enconded PDF.

Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @granjero, thank you for you pull request.

Your changes seem too small and can be implemented with just 1 line of code, like

$decoded = (new Parser())->parseContent(base64_decode($base64EncodedPdf));

What is the benefit having this as a new function?

@granjero
Copy link
Contributor Author

Hello @k00ni, thank you for taking the time in reviewing the request.
I see that your approach works. And does not require the function I proposed.
Nevertheless I think that the function I submitted may be "user friendlier".

I found this library while working with an API that returns a PDF in a base64 string. I could not find any mention in the docs for base64 strings. At first I thought I had to save the file and then read it with parseFile() but wondered if i could do it in the fly. So I came with that solution, wrote the function and it worked.

If you think it's worth it I can write the tests and meet the other labeled requirements.

Thank you.

jm

PS: Forgive my rusty english.

@k00ni
Copy link
Collaborator

k00ni commented Dec 1, 2021

Thanks for getting back to me quickly.

If you think it's worth it I can write the tests and meet the other labeled requirements.

My suggestion would be:

  • no new function
  • extend DEVELOPER.md with a new section which contains the case you described and some example code how to solve it.
  • maybe add a note about it in README.md

What do you think? If you agree, could you propose such a change?

@granjero
Copy link
Contributor Author

granjero commented Dec 1, 2021

Ok, so you are saying something like this at the end of DEVELOPER.md or in the documentation


Base64 encoded PDFs

If working with Base64 encoded PDFs you might want to parse the PDF without saving the file on disk. This sample will parse the Base64 encoded PDF and extract text from each page.

<?php
// Include Composer autoloader if not already done.
include 'vendor/autoload.php';

// Parse Base64 encoded PDF string and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf    = $parser->parseContent(base64_decode($base64PDF));

$text = $pdf->getText();
echo $text;

@k00ni
Copy link
Collaborator

k00ni commented Dec 7, 2021

Good suggestion. I refined it a bit.

We don't have control over pdfparser.org, therefore DEVELOPER.md is the way to go.

  • Please add your changes to DEVELOPER.md.
  • Please add a sentence to README.md which points to Base64 section in DEVELOPER.md.

Suggestion for README.md:

**For developers:** Please read DEVELOPER.md for more information about integration, 
usage and local development of the PDFParser library. There you will also find information about Base64 PDFs.

@granjero
Copy link
Contributor Author

granjero commented Dec 13, 2021

Ok. I hope I did it right.
@k00ni please tell me if it is ok.
Thanks!

@k00ni
Copy link
Collaborator

k00ni commented Dec 20, 2021

Its looks good! But I dont have the time right now to look into it further. WIll do it hopefully in the first two weeks of January. Bear with me.

k00ni added 2 commits January 3, 2022 09:59
Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted many of your changes in README.md, only kept reference to DEVERLOPER.md section so this PR is only about Base64 information.

We plan to update documentation overall, which will include some of your changes in README.md (see #498). You are welcome to contribute.

@granjero do you agree with this state of the PR?

@k00ni k00ni self-assigned this Jan 3, 2022
@k00ni k00ni merged commit 1991526 into smalot:master Jan 10, 2022
@k00ni
Copy link
Collaborator

k00ni commented Jan 10, 2022

I merged our work @granjero to keep the show going. Thanks for your contribution.

@granjero
Copy link
Contributor Author

Sorry, I was afk.
I'm glad I could contribute!
Keep it going!

Copy link
Contributor Author

@granjero granjero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 47 Base63 instead of Base64

@granjero
Copy link
Contributor Author

@k00ni
There is a typo in the new README.md

Line 47 -> Base63 instead of Base64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants