It it possible to combine multiple pdf with Destination property? #31

hiroyuki-sato · 2015-10-08T10:08:18Z

Hello

It it possible to combine multiple pdf with Destination property?

I have two PDFs. one is pdf which has destination property. (link text)
The other one is pdf which has no destination properly.

I can combine two PDF, but destination property broken.

This is sample.
https://gist.github.com/hiroyuki-sato/ce48110f34accec1f837

Thank you for your advice.

Hiroyuki Sato.

boazsegev · 2015-10-08T18:45:56Z

This raises a different question:

When there is the risk of a destination Name conflict, will it be better to attempt at a best guess - so that ALWAYS, some links work and others point to the WrONG destination - or is it better to disable all inner-document links?

At the moment, all the inner document links (Name => Dest combinations) are "disabled" (by not supplying the destination) rather than risk a "broken" link that points to the wrong placement.

...

Although, arguably, maybe it's possible to:

raise an exception whenever a conflict is detected, causing the "merge" to fail; OR
write a renaming algorithm, such as was written for the "secure" copy (which attempts to safeguard against name conflicts at the expense of risking data corruption).

I'd love to read your thoughts on the matter, because it was something I debated with myself and ended up (for project related reasons) going with the highest performance option - which is to disable the links and avoid the issue.

boazsegev · 2015-10-08T19:06:55Z

In a more practical manner, this requires persisting the Names object (referenced or contained by the Catalog object), much as the Info object referenced/contained by the Root object was preserved...

I'll work on preserving the Names dictionary, but I think we should think about the correct approach as far as Destinations are concerned.

boazsegev · 2015-10-08T20:03:17Z

I uploaded a commit with a persistent Names and Dest object, preserving the links...

But, what about this code:

require 'combine_pdf' # require the unreleased edge version

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf") # links to page 2.
pdf << CombinePDF.load("pdf_with_link.pdf") # should link to page 4... but will link to page 2.
pdf << CombinePDF.load("pdf_no_link.pdf")

pdf.save "combine.pdf"

I'd love to know what you think, as it seems to me that persisting the Names=>Dest dictionary has both an upside (it mostly works and might allow for more features in the future, such as Forms etc') and a downside (it could produce unexpected results).

hiroyuki-sato · 2015-10-09T00:39:45Z

Thank you for your comment.

In my opinion, I want to use two mode.

Disable link (current implementation)
Allow broken link.
- I don't merge multiple link files on my current project.
- I just want to merge one linked file and no linked file.
- so I accept broken link.

So how about following code?

pdf = CombinePDF.new(:allow_broken_link => true) # default false. 
pdf << CombinePDF.load("pdf_with_link.pdf") # links to page 2.
pdf << CombinePDF.load("pdf_with_link.pdf") # should link to page 4... but will link to page 2.
pdf << CombinePDF.load("pdf_no_link.pdf")

Additonaly I tried PyPDF2.
Merge PDF files with under 10 lines in Python!

First pdf link to page4 (should like to page2)
And second pdf lik to page4). It made broken link.

boazsegev · 2015-10-09T18:22:38Z

I don't like to write complicated API... The secure_copy was a necessary evil and I would be happier if I could avoid it.

I think it would be better to rename incoming links and risk links breaking up than to start PDF configurations that very few developers will be able to manage in an optimal manner...

I'll try writing a renaming mechanism and see how difficult that might end up being.

boazsegev · 2015-10-10T03:50:50Z

Could you look at the edge version again (the one on github, not the published version)?

It should now rename the links and identify the pages to which it is linking...

However, if the linked pages are EXACTLY the same, the identification will link to the first page that fits the requirements.

require 'combine_pdf' # require the edge version.

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf")
pdf << CombinePDF.load("pdf_with_link.pdf")
pdf << CombinePDF.load("pdf_no_link.pdf")

pdf.save "combine1.pdf"

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf")
# link 2
linked = CombinePDF.load("pdf_with_link.pdf")
linked.pages[1].textbox "2nd copy" # try with and without changing the content of the page.
pdf << linked # one way to combine, very fast.
pdf << CombinePDF.load("pdf_no_link.pdf")

pdf.save "combine2.pdf"

Seems reasonable to me... what do you think?

hiroyuki-sato · 2015-10-10T23:22:35Z

Thank you for your working.👍

Could I checkout new version?
It seems that version have not pushed yet.
I can't find another branch and new master version on your github.

New version seems very well.
I'll comment again after try it.

Thanks again!!

boazsegev · 2015-10-10T23:33:57Z

I didn't release the new gem version, but the master branch contains the update, so you can check it out by forking the master branch or (if I remember correctly) by using a url to github in the gem source property in your Gemfile (if your using bundler and a Gemfile)...

...It might take me a bit to release the updates version because I'm trying to figure out how to handle outlines as well (some links use the outline for a destination, and using Names without Outlines means that some links work while others are broken).

hiroyuki-sato · 2015-10-11T02:47:39Z

Thank you for replying.

I tried master branch. but link does not work properly.
Your master branch HEAD is currenly "bump 8133afe".
It seems you are missing git push
https://github.com/boazsegev/combine_pdf.git

Could you check again?

boazsegev · 2015-10-11T04:19:21Z

Oh.. My bad... I committed but forgot to push (sync)... 😳

You can try now, if you want.

hiroyuki-sato · 2015-10-13T00:12:44Z

Thank you very much

It worked well.😀

I tried with and without changing the content of page.

No change content: two links jumped to page2.
Change content: first link jumped to page2, second one jumped to page4.

I(and combine_pdf users) would like to know whether
I need change content or not. Warning message is helpful.

Without changing the content

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf")
# link 2
linked = CombinePDF.load("pdf_with_link.pdf")
#linked.pages[1].textbox "2nd copy" # try with and without changing the content of the page.
pdf << linked # one way to combine, very fast.
pdf << CombinePDF.load("pdf_no_link.pdf")

With changing the content

pdf = CombinePDF.new
pdf << CombinePDF.load("pdf_with_link.pdf")
# link 2
linked = CombinePDF.load("pdf_with_link.pdf")
linked.pages[1].textbox "2nd copy" # try with and without changing the content of the page.
pdf << linked # one way to combine, very fast.
pdf << CombinePDF.load("pdf_no_link.pdf")

boazsegev added the question label Dec 23, 2015

andrewbaker00 mentioned this issue Jan 26, 2016

Bookmarks/Links drop when combining PDFs #44

Closed

metaist mentioned this issue May 17, 2018

Keep in-document hyperlinks after merged metaist/pdfmerge#22

Open

boazsegev closed this as completed Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It it possible to combine multiple pdf with Destination property? #31

It it possible to combine multiple pdf with Destination property? #31

hiroyuki-sato commented Oct 8, 2015

boazsegev commented Oct 8, 2015

boazsegev commented Oct 8, 2015

boazsegev commented Oct 8, 2015

hiroyuki-sato commented Oct 9, 2015

boazsegev commented Oct 9, 2015

boazsegev commented Oct 10, 2015

hiroyuki-sato commented Oct 10, 2015

boazsegev commented Oct 10, 2015

hiroyuki-sato commented Oct 11, 2015

boazsegev commented Oct 11, 2015

hiroyuki-sato commented Oct 13, 2015

It it possible to combine multiple pdf with Destination property? #31

It it possible to combine multiple pdf with Destination property? #31

Comments

hiroyuki-sato commented Oct 8, 2015

boazsegev commented Oct 8, 2015

boazsegev commented Oct 8, 2015

boazsegev commented Oct 8, 2015

hiroyuki-sato commented Oct 9, 2015

boazsegev commented Oct 9, 2015

boazsegev commented Oct 10, 2015

hiroyuki-sato commented Oct 10, 2015

boazsegev commented Oct 10, 2015

hiroyuki-sato commented Oct 11, 2015

boazsegev commented Oct 11, 2015

hiroyuki-sato commented Oct 13, 2015

Without changing the content

With changing the content