Allow client-side timeouts on Windows #632

apyrgio · 2023-12-04T19:39:45Z

In order to add streaming support in Windows for the first stage of the conversion (see #443, #627), we need to read the stdout of the container from the host. This is currently implemented in Qubes with this method, which uses non-blocking reads under the hood:

dangerzone/dangerzone/isolation_provider/qubes.py

Lines 40 to 45 in 6876fa5

    
           def read_bytes(f: IO[bytes], size: int, timeout: float, exact: bool = True) -> bytes: 
        
               """Read bytes from a file-like object.""" 
        
               buf = nonblocking_read(f, size, timeout) 
        
               if exact and len(buf) != size: 
        
                   raise errors.InterruptedConversion 
        
               return buf

The reason we use non-blocking reads is to stop the execution if the server component does not send all the necessary bytes within the timeout period. The problem here is that our non-blocking read implementation does not pass tests on Windows platforms:

dangerzone/tests/test_util.py

Lines 40 to 43 in 6876fa5

    
           @pytest.mark.skipif( 
        
               platform.system() == "Windows", reason="Cannot test non-blocking read on Windows" 
        
           ) 
        
           def test_nonblocking_read(mocker: MockerFixture) -> None:

because non-blocking reads from pipes do not work on Windows:

Note that on Windows, it only works for sockets; on other operating systems, it also works for other file types (in particular, on Unix, it works on pipes).

(taken from the select module)

So, we need to somehow support client-side timeouts on Windows, without relying on non-blocking reads. Basically, we need a primitive that offers blocking operations with a timeout, that works on Windows as well.

The text was updated successfully, but these errors were encountered:

apyrgio · 2023-12-04T20:54:28Z

Proposal

Turns out that a Python queue has such a primitive: https://docs.python.org/3/library/queue.html#queue.Queue.get

We can use it as follows:

Setup the queue.
Start a thread that performs blocking read on a pipe.
Make the thread enqueue a data chunk on the queue, once each read() completes.
Dequeue the chunks of data on the parent using blocking get with a timeout
- I.e., Queue.get(block=True, timeout=<timeout>
- In case of a timeout error, close the read pipe, in order to force the thread to quit.

Performance

Using a queue and a Python thread certainly adds a non-negligible overhead. On the other hand, the Docker container is doing work between reads, so this overhead may not be noticeable to the user. What we need to answer here is when does this overhead start becoming noticeable, and will it realistically bite us?

I've written a synthetic benchmark were we simulate reading from a pipe with time.sleep and a user-provided delay.
Then, I've measured how long it takes to iterate over time.sleep, and what happens when you add a Python thread and queue to the mix.

Benchmark script

#!/bin/python

import queue
import sys
import threading
import time

TEST_DURATION = 1


def main():
    if len(sys.argv) != 2:
        print("Usage: ./bench.py <delay_ms>")
        return 1

    delay = float(sys.argv[1]) / 1000
    iterations = int(TEST_DURATION / delay)
    print(f"Reading with {delay}s delay, iterations: {iterations}")

    # Test 1: Common read
    print("Normal read")
    start = time.monotonic_ns()
    for i in range(iterations):
        time.sleep(delay)
    end = time.monotonic_ns()
    total = normal_total = end - start
    per_iter = total / iterations
    print(f"Total: {total} ns, per iteration: {per_iter} ns")

    # Test 2: Read with queue
    print("Enqueued read")
    q = queue.Queue()

    def enqueued_read():
        for i in range(iterations):
            time.sleep(delay)
            q.put(9)

    t = threading.Thread(target=enqueued_read)
    start = time.monotonic_ns()
    t.start()
    for i in range(iterations):
        q.get(block=True, timeout=2)
    end = time.monotonic_ns()
    total = end - start
    per_iter = total / iterations
    print(f"Total: {total} ns, per iteration: {per_iter} ns")

    slowdown = round((1 - normal_total / total) * 100, 2)
    print(f"Slowdown: {slowdown}%")


if __name__ == "__main__":
    sys.exit(main())

The results are the following:

Delay (ms)	Slowdown
1000	0.03%
100	0.1%
10	0.46%
5	1%
1	2%
0.1	1.5% (!?)
0.01	2.5%
0.001	4%

What we see in this table (ignoring the weird result for 0.1 ms delay) is that the queue overhead starts to become noticeable (> 1% slowdown) once the work that the container has to do per page is < 5 ms (at least in my computer). Empirically, pdftoppm takes more to process each page and write the output, which means that we can probably go forward with this solution.

deeplow · 2024-02-23T13:45:19Z

Removing this because timeouts were fully removed.

apyrgio added P:windows timeout Dangerzone Times Out labels Dec 4, 2023

apyrgio mentioned this issue Jan 11, 2024

Containers Page Streaming based on PyMuPDF #627

Merged

deeplow mentioned this issue Jan 25, 2024

Removing Timeouts #687

Closed

deeplow closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow client-side timeouts on Windows #632

Allow client-side timeouts on Windows #632

apyrgio commented Dec 4, 2023

apyrgio commented Dec 4, 2023 •

edited

Loading

deeplow commented Feb 23, 2024

Allow client-side timeouts on Windows #632

Allow client-side timeouts on Windows #632

Comments

apyrgio commented Dec 4, 2023

apyrgio commented Dec 4, 2023 • edited Loading

Proposal

Performance

deeplow commented Feb 23, 2024

apyrgio commented Dec 4, 2023 •

edited

Loading