Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MalformedPDFError Invalid filter algorithm 31 #500

Open
ollym opened this issue Oct 19, 2022 · 3 comments
Open

MalformedPDFError Invalid filter algorithm 31 #500

ollym opened this issue Oct 19, 2022 · 3 comments

Comments

@ollym
Copy link

ollym commented Oct 19, 2022

PDF file:
EA9DDBD4F46B6A41F4CFC7FE3A222FAF8013C3CEAC0918D1E2A5.pdf

There seems to be some issue with png_depredict function when running the code:

PDF::Reader.new(file).pages[0].xobjects[:I3].unfiltered_data

# => PDF::Reader::MalformedPDFError (Invalid filter algorithm 31):

That specific xobject is the QR Code which we're trying to extract and parse, but struggling to get the unfiltered_data necessary to do so. Will continue to try and debug but may need someone else's help

@yob
Copy link
Owner

yob commented Oct 20, 2022

The image xobject looks like this:

<</Type /XObject
/Subtype /Image
/Width 100
/Height 100
/ColorSpace [/Indexed /DeviceRGB 1 23 0 R]
/BitsPerComponent 1
/Filter /FlateDecode
/DecodeParms <</Predictor 15 /Colors 1 /BitsPerComponent 1 /Columns 100>>
/Length 265>>

I'm fairly sure it's accurate that 31 isn't a valid filter type in the PNG format, but I suspect the png_depredict isn't correctly parsing the data and it should be getting as far as thinking there's a filter type of 31. Maybe because it's a single bit per component? Or maybe because the colour space is indexed 🤔

Unfortunately I'm fairly swamped at the moment with day job and family life so I want be able to take a closer look for a while. Sorry!

@yob
Copy link
Owner

yob commented Oct 20, 2022

Ouch, this has reminded me that there's only a single unit spec for the Flate filter with PNG shaped data 😬

context "deflated stream with PNG predictors" do
let(:deflated_path) {
File.dirname(__FILE__) + "/../../data/deflated_with_predictors.dat"
}
let(:depredicted_path) {
File.dirname(__FILE__) + "/../../data/deflated_with_predictors_result.dat"
}
let(:deflated_data) { binread(deflated_path) }
let(:depredicted_data) { binread(depredicted_path) }
it "inflates the data" do
filter = PDF::Reader::Filter::Flate.new(
:Columns => 5,
:Predictor => 12
)
expect(filter.filter(deflated_data)).to eql(depredicted_data)
end
end

@ollym
Copy link
Author

ollym commented Oct 23, 2022

For those also having issues with this, we found HexaPDF was able to export the image correctly:
https://github.com/gettalong/hexapdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants