-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NotImplementedError: unsupported filter /JBIG2Decode #1989
Comments
PDF found in #1983 |
from #951 Here is pdfminer implementation: |
#2502 (comment) - here we have another example |
This might need a general design decision if I am not mistaken: Pillow does not seem to support JBIG2, while our implementation currently assumes that all images can be loaded as AFAIK there only is |
Many platforms support a standalone 'jbig2dec' functionality (e.g. on Mac can brew install jbig2dec) - can you farm that functionality out to that routine? I was going to try, but can't seem to get the raw unfiltered image bytes from the page object. Prob my ignorance... (will keep digging) |
the XObject is a ContentStream. You should be able to access the data with |
Given the data, you should still have a look at the PDF specification on the filter (for PDF 2.0/ISO 32000-2:2020, this is section 7.4.7), especially as |
I have it working for the test file I have been using. Cludgey bt working!
I only had to modify one file - filters.py.
|
Changed file type
|
If you intended to append a file here, then this did not work due to you answering by e-mail. Uploading files usually requires using the GitHub UI itself. |
OK.
… On Jun 3, 2024, at 2:05 AM, Stefan ***@***.***> wrote:
If you intended to append a file here, then this did not work due to you answering by e-mail. Uploading files usually requires using the GitHub UI itself.
—
Reply to this email directly, view it on GitHub <#1989 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AENJQIHZOC3Q6WGHX5R5XFLZFQBSDAVCNFSM6AAAAAA2RVC3T6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBUGM2DMNZWGI>.
You are receiving this because you commented.
|
This is a quick McGiver fix for the /JBIG2 error. You need to load the 'jbig2dec' utility. On the Mac this is on home-brew. The fix assumes you have apple silicon - the path to the utility is /opt/homebrew/bin/jbig2ec. If you are Intel Mac, it probably is /usr/local/bin/jbig2dec???? |
Relevant class from the code: class JBIG2Decode:
@staticmethod
def decode(
data: bytes,
decode_parms: Optional[DictionaryObject] = None,
**kwargs: Any,
) -> bytes:
# decode_parms is unused here
pathin = '/var/tmp/tempin.jbig2'
pathout = '/var/tmp/tempout.jbig2'
with open(pathin,"wb") as fl:
fl.write(data)
process = subprocess.run(['/opt/homebrew/bin/jbig2dec', '-e', '-o', pathout, pathin])
with open(pathout,'rb') as fl:
data = fl.read()
os.unlink(pathin)
os.unlink(pathout)
return data |
this solution is not valid for windows / linux... |
This is untested and may cause issues due to opening the same file twice on Windows, but should in general be OS-independent: class JBIG2Decode:
@staticmethod
def decode(
data: bytes,
decode_parms: Optional[DictionaryObject] = None,
**kwargs: Any,
) -> bytes:
# decode_parms is unused here
with NamedTemporaryFile(suffix=".jbig2") as infile:
infile.write(data)
infile.seek(0)
result = subprocess.run(
[shutil.which("jbig2dec"), "--embedded", "--output", "-", infile],
stdout=subprocess.PIPE
)
return result.stdout |
The utility jbig2dec is available for other platforms.
I am looking into porting the decompressor to python. There are about 20 separate source files to convert. I’ll work on it but it will take time!!
… On Jun 4, 2024, at 1:26 AM, pubpub-zz ***@***.***> wrote:
this solution is not valid for windows / linux...
Can you try to rehost the code into python natively ?
—
Reply to this email directly, view it on GitHub <#1989 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AENJQIFALDEG4OZZ4PBVJMLZFVFZRAVCNFSM6AAAAAA2RVC3T6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBWGYZTQNZUGI>.
You are receiving this because you commented.
|
I experimented with this to some extent previously. The version of Jbig2 that I used does not pipe output to standard out, need a file.Sent from my iPadOn Jun 4, 2024, at 7:13 AM, Matthew DeCaro ***@***.***> wrote:The utility jbig2dec is available for other platforms. I am looking into porting the decompressor to python. There are about 20 separate source files to convert. I’ll work on it but it will take time!!On Jun 4, 2024, at 1:26 AM, pubpub-zz ***@***.***> wrote:this solution is not valid for windows / linux...
Can you try to rehost the code into python natively ?—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Liked your suggestions!!! I'll still try porting to python (at my pace unfortunately). Here is the updated file. |
Explanation
I found an example for the /JBIG2Decode filter :-)
Code Example
PDF: https://github.com/py-pdf/pypdf/files/12090692/New.Jersey.Coinbase.staking.securities.charges.2023-0606_Coinbase-Penalty-and-C-D.pdf
gives
The text was updated successfully, but these errors were encountered: