You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Provide a temporary directory where pdftoppm will write PPM files (one per page).
Ask pdftoppm to report progress in stderr. Progress reports are lines with the following format: <page> <num_pages> <file>
Register a callback handler for the stderr of pdftoppm, which will be called with each line of the progress report as an argument.
(callback handler) For each progress line, read the file with the PPM data and get the following info: page width, page height, RGB data
(callback handler) Write this data as separate files under /tmp/dangerzone, which is the directory that the second phase will use to convert pixels to PDF.
Bug 🐞
Let's dig deeper into how the callback handler is called. We have a generic function that reads lines from a command's output:
For each line, it appends it to a buffer, and then calls a callback function with the line as an argument.
What's the bug here? We first read the line, and then check if we reached EOF 🤦. So, it's possible that we read the last line of the stream, and then immediately discard it, because we have indeed reached EOF. This means that the callback handler will not be called, and we will not create the necessary files under /tmp/dangerzone for the last page.
Impact
This bug was introduced in version 0.4.1 (aeeed41). Users of affected versions may have documents with the last page missing. This bug should not have any impact on the security of the sanitization.
During our QA we never stubled into this bug, and we don't have any report from our users hinting to such an issue. I only managed to trigger it today, while working on something that made the callback handler run twice as slow.
If you have been affected though, please let us know. In any case, we will fix this issue ASAP.
Remediation
Change the order of the checks: first check if we are at EOF, and then read the line. Note that we can't check the output of readline() for EOF (i.e, if line == ""), because it will detect empty lines as EOF.
The fix for this bug will be included in the upcoming 0.5.0 release.
The text was updated successfully, but these errors were encountered:
Do not read a line from the command output and then check if
we are at EOF, because it's possible that the writer immediately exited
after writing the last line of output. Instead, switch the order of
actions.
This is a very serious bug that can lead to Dangerzone excluding the
last page of the document. It should have bit us right from the start
(see aeeed41), but it seems that the
small period of time it takes the kernel to close the file descriptors
was hiding this bug.
Fixes#560
Add a sanity check at the end of the conversion from doc to pixels, to
ensure that the resulting document will have the same number of pages as
the original one.
Refs #560
Do not read a line from the command output and then check if
we are at EOF, because it's possible that the writer immediately exited
after writing the last line of output. Instead, switch the order of
actions.
This is a very serious bug that can lead to Dangerzone excluding the
last page of the document. It should have bit us right from the start
(see aeeed41), but it seems that the
small period of time it takes the kernel to close the file descriptors
was hiding this bug.
Fixes#560
Add a sanity check at the end of the conversion from doc to pixels, to
ensure that the resulting document will have the same number of pages as
the original one.
Refs #560
Add a sanity check at the end of the conversion from doc to pixels, to
ensure that the resulting document will have the same number of pages as
the original one.
Refs #560
Background
Dangerzone uses a tool called
pdftoppm
in order to convert a PDF document into pixels:dangerzone/dangerzone/conversion/doc_to_pixels.py
Lines 313 to 319 in 18b73d9
The way we use
pdftoppm
is the following:pdftoppm
will write PPM files (one per page).pdftoppm
to report progress in stderr. Progress reports are lines with the following format:<page> <num_pages> <file>
pdftoppm
, which will be called with each line of the progress report as an argument./tmp/dangerzone
, which is the directory that the second phase will use to convert pixels to PDF.Bug 🐞
Let's dig deeper into how the callback handler is called. We have a generic function that reads lines from a command's output:
dangerzone/dangerzone/conversion/common.py
Lines 66 to 74 in 18b73d9
For each line, it appends it to a buffer, and then calls a callback function with the line as an argument.
What's the bug here? We first read the line, and then check if we reached EOF 🤦. So, it's possible that we read the last line of the stream, and then immediately discard it, because we have indeed reached EOF. This means that the callback handler will not be called, and we will not create the necessary files under
/tmp/dangerzone
for the last page.Impact
This bug was introduced in version 0.4.1 (aeeed41). Users of affected versions may have documents with the last page missing. This bug should not have any impact on the security of the sanitization.
During our QA we never stubled into this bug, and we don't have any report from our users hinting to such an issue. I only managed to trigger it today, while working on something that made the callback handler run twice as slow.
If you have been affected though, please let us know. In any case, we will fix this issue ASAP.
Remediation
Change the order of the checks: first check if we are at EOF, and then read the line. Note that we can't check the output of
readline()
for EOF (i.e,if line == ""
), because it will detect empty lines as EOF.The fix for this bug will be included in the upcoming 0.5.0 release.
The text was updated successfully, but these errors were encountered: