Dangerzone may exclude the last page of a document #560

apyrgio · 2023-09-26T18:58:12Z

Background

Dangerzone uses a tool called pdftoppm in order to convert a PDF document into pixels:

dangerzone/dangerzone/conversion/doc_to_pixels.py

Lines 313 to 319 in 18b73d9

    
           await self.run_command( 
        
               [ 
        
                   "pdftoppm", 
        
                   pdf_filename, 
        
                   page_base, 
        
                   "-progress", 
        
               ],

The way we use pdftoppm is the following:

Provide the PDF document as an argument.
Provide a temporary directory where pdftoppm will write PPM files (one per page).
Ask pdftoppm to report progress in stderr. Progress reports are lines with the following format: <page> <num_pages> <file>
Register a callback handler for the stderr of pdftoppm, which will be called with each line of the progress report as an argument.
(callback handler) For each progress line, read the file with the PPM data and get the following info: page width, page height, RGB data
(callback handler) Write this data as separate files under /tmp/dangerzone, which is the directory that the second phase will use to convert pixels to PDF.

Bug 🐞

Let's dig deeper into how the callback handler is called. We have a generic function that reads lines from a command's output:

dangerzone/dangerzone/conversion/common.py

Lines 66 to 74 in 18b73d9

    
           while True: 
        
               line = await sr.readline() 
        
               if sr.at_eof(): 
        
                   break 
        
               self.captured_output += line 
        
               if callback is not None: 
        
                   callback(line) 
        
               buf += line 
        
           return buf

For each line, it appends it to a buffer, and then calls a callback function with the line as an argument.

What's the bug here? We first read the line, and then check if we reached EOF 🤦. So, it's possible that we read the last line of the stream, and then immediately discard it, because we have indeed reached EOF. This means that the callback handler will not be called, and we will not create the necessary files under /tmp/dangerzone for the last page.

Impact

This bug was introduced in version 0.4.1 (aeeed41). Users of affected versions may have documents with the last page missing. This bug should not have any impact on the security of the sanitization.

During our QA we never stubled into this bug, and we don't have any report from our users hinting to such an issue. I only managed to trigger it today, while working on something that made the callback handler run twice as slow.

If you have been affected though, please let us know. In any case, we will fix this issue ASAP.

Remediation

Change the order of the checks: first check if we are at EOF, and then read the line. Note that we can't check the output of readline() for EOF (i.e, if line == ""), because it will detect empty lines as EOF.

The fix for this bug will be included in the upcoming 0.5.0 release.

The text was updated successfully, but these errors were encountered:

Do not read a line from the command output and then check if we are at EOF, because it's possible that the writer immediately exited after writing the last line of output. Instead, switch the order of actions. This is a very serious bug that can lead to Dangerzone excluding the last page of the document. It should have bit us right from the start (see aeeed41), but it seems that the small period of time it takes the kernel to close the file descriptors was hiding this bug. Fixes #560

Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560

Do not read a line from the command output and then check if we are at EOF, because it's possible that the writer immediately exited after writing the last line of output. Instead, switch the order of actions. This is a very serious bug that can lead to Dangerzone excluding the last page of the document. It should have bit us right from the start (see aeeed41), but it seems that the small period of time it takes the kernel to close the file descriptors was hiding this bug. Fixes #560

Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560

apyrgio self-assigned this Sep 26, 2023

apyrgio added bug Something isn't working container labels Sep 26, 2023

apyrgio added this to the 0.5.0 milestone Sep 26, 2023

apyrgio mentioned this issue Sep 26, 2023

Qubes: Stream page data in real time #561

Merged

apyrgio added a commit that referenced this issue Sep 27, 2023

conversion: Add sanity check for page count

dd143d6

Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560

apyrgio added a commit that referenced this issue Sep 28, 2023

conversion: Add sanity check for page count

5eb3685

Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560

apyrgio added a commit that referenced this issue Sep 28, 2023

conversion: Add sanity check for page count

ccf4132

Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560

apyrgio closed this as completed in 6012cd1 Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dangerzone may exclude the last page of a document #560

Dangerzone may exclude the last page of a document #560

apyrgio commented Sep 26, 2023

Dangerzone may exclude the last page of a document #560

Dangerzone may exclude the last page of a document #560

Comments

apyrgio commented Sep 26, 2023

Background

Bug 🐞

Impact

Remediation