Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It it possible to combine multiple pdf with Destination property? #31

Closed
hiroyuki-sato opened this issue Oct 8, 2015 · 11 comments
Closed
Labels

Comments

@hiroyuki-sato
Copy link

Hello

It it possible to combine multiple pdf with Destination property?

I have two PDFs. one is pdf which has destination property. (link text)
The other one is pdf which has no destination properly.

I can combine two PDF, but destination property broken.

This is sample.
https://gist.github.com/hiroyuki-sato/ce48110f34accec1f837

Thank you for your advice.

Hiroyuki Sato.

@boazsegev
Copy link
Owner

This raises a different question:

When there is the risk of a destination Name conflict, will it be better to attempt at a best guess - so that ALWAYS, some links work and others point to the WrONG destination - or is it better to disable all inner-document links?

At the moment, all the inner document links (Name => Dest combinations) are "disabled" (by not supplying the destination) rather than risk a "broken" link that points to the wrong placement.

...

Although, arguably, maybe it's possible to:

  1. raise an exception whenever a conflict is detected, causing the "merge" to fail; OR
  2. write a renaming algorithm, such as was written for the "secure" copy (which attempts to safeguard against name conflicts at the expense of risking data corruption).

I'd love to read your thoughts on the matter, because it was something I debated with myself and ended up (for project related reasons) going with the highest performance option - which is to disable the links and avoid the issue.

@boazsegev
Copy link
Owner

In a more practical manner, this requires persisting the Names object (referenced or contained by the Catalog object), much as the Info object referenced/contained by the Root object was preserved...

I'll work on preserving the Names dictionary, but I think we should think about the correct approach as far as Destinations are concerned.

@boazsegev
Copy link
Owner

I uploaded a commit with a persistent Names and Dest object, preserving the links...

But, what about this code:

require 'combine_pdf' # require the unreleased edge version

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf") # links to page 2.
pdf << CombinePDF.load("pdf_with_link.pdf") # should link to page 4... but will link to page 2.
pdf << CombinePDF.load("pdf_no_link.pdf")

pdf.save "combine.pdf"

I'd love to know what you think, as it seems to me that persisting the Names=>Dest dictionary has both an upside (it mostly works and might allow for more features in the future, such as Forms etc') and a downside (it could produce unexpected results).

@hiroyuki-sato
Copy link
Author

Thank you for your comment.

In my opinion, I want to use two mode.

  • Disable link (current implementation)
  • Allow broken link.
    • I don't merge multiple link files on my current project.
    • I just want to merge one linked file and no linked file.
    • so I accept broken link.

So how about following code?

pdf = CombinePDF.new(:allow_broken_link => true) # default false. 
pdf << CombinePDF.load("pdf_with_link.pdf") # links to page 2.
pdf << CombinePDF.load("pdf_with_link.pdf") # should link to page 4... but will link to page 2.
pdf << CombinePDF.load("pdf_no_link.pdf")

Additonaly I tried PyPDF2.
Merge PDF files with under 10 lines in Python!

First pdf link to page4 (should like to page2)
And second pdf lik to page4). It made broken link.

@boazsegev
Copy link
Owner

I don't like to write complicated API... The secure_copy was a necessary evil and I would be happier if I could avoid it.

I think it would be better to rename incoming links and risk links breaking up than to start PDF configurations that very few developers will be able to manage in an optimal manner...

I'll try writing a renaming mechanism and see how difficult that might end up being.

@boazsegev
Copy link
Owner

Could you look at the edge version again (the one on github, not the published version)?

It should now rename the links and identify the pages to which it is linking...

However, if the linked pages are EXACTLY the same, the identification will link to the first page that fits the requirements.

require 'combine_pdf' # require the edge version.

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf")
pdf << CombinePDF.load("pdf_with_link.pdf")
pdf << CombinePDF.load("pdf_no_link.pdf")

pdf.save "combine1.pdf"

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf")
# link 2
linked = CombinePDF.load("pdf_with_link.pdf")
linked.pages[1].textbox "2nd copy" # try with and without changing the content of the page.
pdf << linked # one way to combine, very fast.
pdf << CombinePDF.load("pdf_no_link.pdf")

pdf.save "combine2.pdf"

Seems reasonable to me... what do you think?

@hiroyuki-sato
Copy link
Author

Thank you for your working.👍

Could I checkout new version?
It seems that version have not pushed yet.
I can't find another branch and new master version on your github.

New version seems very well.
I'll comment again after try it.

Thanks again!!

@boazsegev
Copy link
Owner

I didn't release the new gem version, but the master branch contains the update, so you can check it out by forking the master branch or (if I remember correctly) by using a url to github in the gem source property in your Gemfile (if your using bundler and a Gemfile)...

...It might take me a bit to release the updates version because I'm trying to figure out how to handle outlines as well (some links use the outline for a destination, and using Names without Outlines means that some links work while others are broken).

@hiroyuki-sato
Copy link
Author

Thank you for replying.

I tried master branch. but link does not work properly.
Your master branch HEAD is currenly "bump 8133afe".
It seems you are missing git push
https://github.com/boazsegev/combine_pdf.git

Could you check again?

@boazsegev
Copy link
Owner

Oh.. My bad... I committed but forgot to push (sync)... 😳

You can try now, if you want.

@hiroyuki-sato
Copy link
Author

Thank you very much

It worked well.😀

I tried with and without changing the content of page.

  • No change content: two links jumped to page2.
  • Change content: first link jumped to page2, second one jumped to page4.

I(and combine_pdf users) would like to know whether
I need change content or not. Warning message is helpful.

Without changing the content

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf")
# link 2
linked = CombinePDF.load("pdf_with_link.pdf")
#linked.pages[1].textbox "2nd copy" # try with and without changing the content of the page.
pdf << linked # one way to combine, very fast.
pdf << CombinePDF.load("pdf_no_link.pdf")

With changing the content

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf")
# link 2
linked = CombinePDF.load("pdf_with_link.pdf")
linked.pages[1].textbox "2nd copy" # try with and without changing the content of the page.
pdf << linked # one way to combine, very fast.
pdf << CombinePDF.load("pdf_no_link.pdf")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants