Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined method #empty? when parsing PDF #10

Closed
srogers opened this issue Jan 1, 2015 · 7 comments
Closed

Undefined method #empty? when parsing PDF #10

srogers opened this issue Jan 1, 2015 · 7 comments
Labels

Comments

@srogers
Copy link

srogers commented Jan 1, 2015

Hi - I'm looking at combine_pdf to get around the issues with prawn-templates. I have a PDF document that is a form (not a fillable PDF form, just an ordinary PDF created in Word that makes a printed paper form). Then I'm creating a PDF in prawn that looks like the form is filled out when the form is in the background and the user data is laid over it. Prawn-templates worked great for that, but I don't want to be stuck at 0.15 forever - so here we are.

I'm doing some very preliminary experiments with combine_pdf - when I do the following with one of my PDFs that will act as the background:

CombinePDF.new(template_file)

I get this error:

undefined method `empty?' for #<Enumerator: "Identity":bytes>
combine_pdf (0.1.9) lib/combine_pdf/combine_pdf_parser.rb:210:in `_parse_'

and in the log file, there are the messages:

Couldn't connect all values from references - didn't find reference {a big hash}!!!
PDF 1.5 Object streams found - they are not fully supported! attempting to extract objects.

is there perhaps a way to generate my template file, or use Acrobat to get rid of the object streams (e.g. saving it while forcing compatibility with older versions of Acrobat)? I have control of the template file, so if it's possible to generate it in a way that doesn't use object streams, I can do that.

Thanks for your help.

@boazsegev
Copy link
Owner

Hi,

Thank you very much for opening this issue.

It seems to me there might be a real bug here, related to the Ruby version that's in use.

In MRI I expect the str[0..-2].bytes to return an Array object. I think that in JRuby it returns an Enumerable (to avoid reallocating memory and increasing performance)... I might be wrong and it's a different Ruby version that does that - can you let me know what Ruby version you're running?

I can run through this on my end a few times until I find a solution, but I'd rather have a PDF file I can test on as well.

To conclude:

  1. I believe the code will run correctly on the standard Ruby MRI.
  2. I want to correct the issue so the code runs on all updated Ruby environments.
  3. To correct the issue I need to know the exact Ruby version and type that you are running (i.e. jruby-1.7.16.1) - So please let me know.
  4. It would make things easier to check if I had a file that is known to raise the exception - So please send a file, if you can, that you know will reproduce the issue.
  5. The moment I resolve the issue, I'll post an updated gem version and keep you posted.

@boazsegev boazsegev added the bug label Jan 2, 2015
@boazsegev
Copy link
Owner

P.S.

I tried an untested solution that should leave the current code intact (adding a #to_a)... It might work and it might do nothing.

Please install the 0.1.10 gem version and keep me posted.

P.S. 2

To answer a few of your questions:

PDF 1.5 content streams are actually supported quite well. The warning is there because of two reasons:

  1. not enough testing was done (for my peace of mind) on these types streams.
  2. Compression support is limited to standard compression (the most common), but PDF 1.5 content streams can use any compression that's possible within PDF files... since CombinePDF normally doesn't need to decompress (inflate) the data - and, frankly, since I don't know enough to write uncommon inflation code - JPEG group and TIFF group compression systems is unsupported just yet... hence, support for PDF 1.5 content streams is incomplete.

You can avoid PDF 1.5 Content Streams by exporting your PDF file as version 1.4. This will force the writer to avoid Content Streams and get rid of the warning,,, but it might not resolve the issue at hand (which, I believe, has nothing to do with the PDF 1.5 Content Streams).

@srogers
Copy link
Author

srogers commented Jan 2, 2015

Thanks for the quick response. Re Ruby, I'm on MRI ruby-1.9.3-p550 [ x86_64 ] (but I'm upgrading soon).

I switched to 0.1.10 and it solved the problem above, but now I get a different error when saving the PDF:

NoMethodError (undefined method `length' for #<Enumerator: "%PDF-1.5\n%\x00\x00\x00\x00":bytes>):
combine_pdf (0.1.10) lib/combine_pdf/combine_pdf_pdf.rb:147:in `block in to_pdf'

I did a little testing, and I get this error if I just do:

pdf = CombinePDF.new
pdf.save(tmp_filename)

I think I can send you the template file - I need to check to be sure. But the immediate error may be related to my Ruby version vs. yours.

Just as an aside, it would be handy if there were a method comparable to prawn's "render" method that just returned the PDF data in a string. I was doing a send_data with prawn_document.render, so if combine_pdf could do that, it would slide right into the same place.

@boazsegev
Copy link
Owner

  • The Issue itself:

Oh, this actually explains both the JRuby / RBX tendencies and the issue you discovered...

...I wrote the gem on Ruby 2+, so some of the methods (especially String#bytes) behave differently...

...I probably need to search through the code for every time String#bytes is called and find a suitable replacement. It might take me a few days, as I have a few things on my plate - but I'll do a quick search tonight and try to release a temporary fix by morning.

If it's urgent for you, you can install Ruby 2+ (2.1.5 is what I'm running) and if should run just fine.

  • The rendering issue:

You can call the #to_pdf method instead of the #save method. See the example code I posted regarding issue #9 .

The #save(file_name) method is actually just a shortcut: IO.binwrite file_name, to_pdf

boazsegev pushed a commit that referenced this issue Jan 2, 2015
@boazsegev
Copy link
Owner

I pushed a fixed version of the gem - I also fixed another bug that was hiding ( issue #11 ) and that I found by pure luck (I used my tax filing for some testing, and they were a bit unique in their PDF structure, so I was just lucky finding the bug).

Please let me know if it works.

For some reason, I couldn't install the MRI 1.9.3 on my mac... So I couldn't test properly.

P.S.

If you have the time, I'd thank you if you could test the writing features as well, as I cannot test if these features work on Ruby 1.9.3.

try adding page numbers using (I added options to check the box drawing as well as the text drawing):

pdf = CombinePDF.new template_file
pdf.number_pages box_color: [0.9, 0.9, 0.9], box_radius: 8, number_location: [:top, :bottom], opacity: 0.75, font_size: 14, stroke_width: 1, stroke_color: [0.4,0,0]
pdf.save "test1.pdf"

and if you could try the #textbox as well... (although numbering should cover it):

pdf = CombinePDF.new
page = CombinePDF.create_page
page.textbox "test", box_color: [0.8, 0.8, 0.8]
pdf << page
pdf.save 'test2.pdf'

@srogers
Copy link
Author

srogers commented Jan 2, 2015

The latest version of the gem seems to have solved all my issues. Overlaying one PDF page onto another works. And I verified that the two examples you gave above also work. Apparently the errors have nothing to do with my PDF - just Ruby 1.9.3.

So that is really super! It gets me out of the hole of being stuck on Prawn 0.15 forever due to templates.

I'll be updating to Ruby 2 soon - if there's anything else you want tested in 1.9.3, let me know.

@boazsegev
Copy link
Owner

Thanks for keeping me posted and for testing the updated gem :)

If anything comes up, feel free to contact me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants