Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding Attachments inside PDF #179

Closed
sriram15690 opened this issue Aug 1, 2016 · 5 comments
Closed

Finding Attachments inside PDF #179

sriram15690 opened this issue Aug 1, 2016 · 5 comments

Comments

@sriram15690
Copy link

sriram15690 commented Aug 1, 2016

I have a pdf with attachments inside. (I have attached the same to issue as well).

Using pdf-reader gem, how can i do the following:

  1. Detect if a pdf has attachments
  2. If it has, how to find the details of the attachments.

If pdf-reader cannot perform this action, may i know any other gem/library which can get this information.
PDFWithFileAttachmentAnnotation.pdf

@yob
Copy link
Owner

yob commented Feb 25, 2017

Here's a very basic script that will tell you if a PDF page has annotations, and some basic details about them:

require 'pdf-reader'

PDF::Reader.open("PDFWithFileAttachmentAnnotation.pdf") do |pdf|
  page = pdf.page(1)
  puts "Has Annotations?"
  puts page.attributes.key?(:Annots)
  puts
  Array(page.attributes[:Annots]).each do |annot|
    data = pdf.objects.deref(annot)
    puts data.inspect
  end
end

Section 12.5 of the pdf spec has more details on annotations. Section 12.5.6 is particularly interesting - it lists the various types of annotations and how they're stored in a PDF.

@yob yob closed this as completed Feb 25, 2017
@dankimio
Copy link

Is it possible to extract text that is associated with a given annotation?

@yob
Copy link
Owner

yob commented Jun 16, 2021

I'm not super familiar with Annotations, but it should be. pdf-reader doesn't have any helpers to make it nice though.

The above code snipped doesn't show any annotation text?

@dankimio
Copy link

I'm getting PDF::Reader::Reference instances, but I'm not sure how to extract text using those.

Array(page.attributes[:Annots]).each do |annot|
  data = pdf.objects.deref(annot)
  puts data.inspect
  # [#<PDF::Reader::Reference:0x000000012400e4c8 @id=357, @gen=0>, #<PDF::Reader::Reference:0x000000012400de88 @id=356, @gen=0>]
end

@yob
Copy link
Owner

yob commented Jun 19, 2021

If you use the ! version of deref (deref!), it should resolve that references recursively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants