You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Desktop (please complete the following information):
OS: Windows
1.16.0
Workaround
I think the issue is with counter_generator. If we pass a generator for output_file, then counter_generator is never called and we can produce the expected outputs:
I saw this behavior on a project yesterday - like you, I wasn't expecting that output in the file names. I checked generators.py to look at the counter_generator function. If you look more closely at the output file names, it's not duplicating page numbers - rather, it's appending the number of the thread that handles the page conversion.
A simple fix is to change this in generators.py:
@threadsafedefcounter_generator(prefix="", suffix="", padding_goal=4):
"""Returns a joined prefix, iteration number, and suffix"""i=0whileTrue:
i+=1yieldstr(prefix) +str(i).zfill(padding_goal) +str(suffix)
to:
@threadsafedefcounter_generator(prefix="", suffix="", padding_goal=4):
"""Returns a joined prefix, iteration number, and suffix"""i=0whileTrue:
i+=1yieldstr(prefix) +str(suffix)
Looks like there's a PR out waiting on merge to do just that and a bit more.
Describe the bug
Given a multi-page PDF, the page number is encoded twice in the output file name: once by pdf2image and again by pdftoppm/pdftocairo.
To Reproduce
Steps to reproduce the behavior:
(1) Download multipage.pdf
(2) Run this code from the same directory as multipage.pdf:
(3) The previous step should produce 10 JPG files. Notice the filename of each follows format:
{PPM-root}{PPPP}-{number}.jpg
Expected behavior
Filenames should only have the page number encoded once (which the pdfto* already handles):
{PPM-root}-{number}.jpg
Screenshots
File tree showing outputs for pdf2image, pdftoppm, and pdftocairo:
Desktop (please complete the following information):
Workaround
I think the issue is with counter_generator. If we pass a generator for
output_file
, then counter_generator is never called and we can produce the expected outputs:The text was updated successfully, but these errors were encountered: