From c77e914f0ab4f81760cf2d39112942082ae96562 Mon Sep 17 00:00:00 2001 From: James Healy Date: Wed, 18 Jan 2017 23:39:19 +1100 Subject: [PATCH 1/2] add a sample PDF that has a form xobject that references itself This single page PDF has a Form XObject in its Resources dict. The Form XObject has its own Resources dict, which has an XObject pointing to itself. Nested XObjects are fine, but not recursive ones. This is a simpler file that demonstrates an issue originally found in the PDF provided at [1] [1] https://github.com/yob/pdf-reader/issues/94#issuecomment-272663109 --- spec/data/form_xobject_recursive.pdf | Bin 0 -> 1454 bytes spec/integration_spec.rb | 10 ++++++++++ spec/integrity.yml | 3 +++ 3 files changed, 13 insertions(+) create mode 100644 spec/data/form_xobject_recursive.pdf diff --git a/spec/data/form_xobject_recursive.pdf b/spec/data/form_xobject_recursive.pdf new file mode 100644 index 0000000000000000000000000000000000000000..e065d2d20f4686e444e64809b3b2e51862e6007c GIT binary patch literal 1454 zcmaJ>+iu%141MQU@FkCpBwLmpAPBIe&DQSHE-|nHK^|ON&RjHhXDe;DpWi6OPMo%z z(dt5-Lmo1e1~=#PGcojG@P|GS$g#S;gYg*Itk#LCY8>Y}&p8h`=b^{B5;TW2>j|)% zy2^H`{x4`U0bOR+tJm`$|9a3ABQeQ+wIN5zMmOjaU;#&h-FI^K1cw7Qt9GS90qkRu zHMj~aZ`~ux9`TbdH|7V5z@9gTvf9E=U`vy19|WFP@D;4U_N0|+J-&`?b=I09?}f6@ zIxCWC^^7Y{cqP0c$8bdB+Da@9K^wGM7Qhy|sdjZrS=!=K;nJlxxMK7JAdB_0L3X*N zDoopX*3OsY>-+WXolcEowz`$`N*V3{TG`fmD!ZjpzPnrFYCf;(EwJf7WcNcYbh@jX z;!#@)9wNR-Wlo7YWpiQE@9AP~>1C)+Yk#gEi&QV(PcdzP52>@TKQ?Iz?1-5y-@q(q zU#MH3fa4YY-@$54finoK1~+WmwLc7%kSL@594G=vJGfN}(jR($r1o3s=pE(Yi%@n= zJ{+{8I5)?EP;HRC|32lYi%>4QJ>5bMFpa^tmQaR6Kav<{z+5SXhj9+8aVS-!MxpO< zr6%Ze9IHI^CV1urw9GsnL^4;NN>rfyP!h~nxza=vYDBXo3Xd)?op%lbx6_0S=tQcV zFe6exdzC6>cS?6Qia=aC%R|c}jPof8L-8hSjIZD#hBq&g?T6EjwEfa+M|-d7B2SZs z+Kf7hebD!h+7xL5?4nGotSC3gz7^%WvMD-4@2Wr7I;ReZfV!ytPz8Y>V2%fyp!;k; zEDtupN3U%`^vdSF!Sy0)*cbk!qDSG>CWJ>1YG*TbQrzp>y{*gQw Date: Sun, 22 Jan 2017 22:59:08 +1100 Subject: [PATCH 2/2] remove special handling of Stream objects from ObjectHash#deref! * deref! was added in 9c520b6 and attempts to return a deep object graph that contains no PDF::Reader::Reference objects and can be walked without risk of needing to refer back to the associated ObjectHash instance * This change resolves a recursion bug that can occur in real world files, but it also weakens the guarantee that the returned object graph can be used without the ObjectHash. It's possible that the returned graph will contain a PDF::Reader::Stream instance that contains some PDF::Reader::Reference objects. * I'm not yet convinced that this is the right solution - but the build is green and it's an option open to us. --- lib/pdf/reader/object_hash.rb | 3 --- spec/object_hash_spec.rb | 5 ----- 2 files changed, 8 deletions(-) diff --git a/lib/pdf/reader/object_hash.rb b/lib/pdf/reader/object_hash.rb index 07b654ed..ee290d5c 100644 --- a/lib/pdf/reader/object_hash.rb +++ b/lib/pdf/reader/object_hash.rb @@ -109,9 +109,6 @@ def deref!(key) hash[k] = deref!(value) end } - when PDF::Reader::Stream - object.hash = deref!(object.hash) - object when Array object.map { |value| deref!(value) } else diff --git a/spec/object_hash_spec.rb b/spec/object_hash_spec.rb index cadf4d74..2770cac8 100644 --- a/spec/object_hash_spec.rb +++ b/spec/object_hash_spec.rb @@ -102,11 +102,6 @@ PDF::Reader::Stream end - it "recursively dereferences references within stream hashes" do - font_file = hash.deref! PDF::Reader::Reference.new(15, 0) - expect(font_file.hash[:Length]).to eq 2103 - end - it "recursively dereferences references within arrays" do font = hash.deref! PDF::Reader::Reference.new(19, 0) expect(font[:DescendantFonts][0][:Subtype]).to eq :CIDFontType0