-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defense in Depth - Traceless Sanitization #633
Comments
My take on shred-resistant filesystems and devices, in the context of Docker Desktop is this:
That's my understanding so far, but if anyone has more experience in this, please chip in! |
This is not so clear-cut. As we saw, ubuntu didn't have /tmp mounted as tmpfs. |
That's because we currently don't use any of the Might be worth exploring if |
I was meaning on the host, not the containers. But that would be good for the containers. |
I have filed google/gvisor#10530 which would address this. No promise on practical feasibility yet, but I do think that with the document processing pipeline running in gVisor, ensuring that this pipeline runs entirely in unswappable memory becomes easier to systematically guarantee. A gVisor |
Oh. I hadn't realized that gVisor emulates various filesystem types. That's really awesome. As for the The good thing about this move is that we will no longer mount pixel data as files, and therefore we can do the reconstruction of the PDF in-memory. The bad thing is that, in order to leave no traces, we would have to do some memory management tricks (e.g., We could perhaps see how cross-platform programs like GnuPG protect their keys from being swapped to the disk. Then, along with the proposed gVisor safeguard, we can have a solid solution to this issue. |
I want to point out that if this wasn't the case, then it would be impossible to specify
Right... But as long as this conversion happens within memory that is in the Dangerzone process's own address space, then it too can call One way to ensure that may be to impose a seccomp-bpf filter on the Dangerzone process that blocks use of the If any of the pixel-to-PDF conversion pipeline does require
I believe the relevant file is this one. It calls |
Just wanted to point out, all the above gives us a lot of food for thought, once we decide to tackle this issue. One question that immediately spawned from the above is: if you A scenario I'm thinking is; suppose we ran PyMuPDF within a gVisor sandbox, and pixel data manipulation takes place in Python's heap memory. This memory is handled by gVisor, which would have to unconditionally
😬 |
The
Thus it's possible to Therefore, if a Python program calls |
Thanks a lot for the explanation Etienne. I think we have a reasonable path forward here, once we decide to implement this feature 🙂 |
Per my update on the gVisor bug for a fully- I tried to run Dangerzone under
In order to see where the Python code tries to fork, I added this to the top of import threading
threading.Thread = None ... and it crashed on this spot in def convert_documents(
self, ocr_lang: Optional[str], stdout_callback: Optional[Callable] = None
) -> None:
def convert_doc(document: Document) -> None:
self.isolation_provider.convert(
document,
ocr_lang,
stdout_callback,
)
max_jobs = self.isolation_provider.get_max_parallel_conversions()
with concurrent.futures.ThreadPoolExecutor(max_workers=max_jobs) as executor:
executor.map(convert_doc, self.documents) I serialized the function body by replacing the last three lines with: for document in self.documents:
convert_doc(document) ... and after this change there were no crashes, so we can conclude that this is probably the only "Python-code-initiated" point where the code willingly forks (as opposed to Python-runtime-initiated forks). But even with this change, there are still lots of threads created. So the Python runtime still decides it needs to fork for some reason. This means the approach I had suggested to self-sandbox the Dangerzone application in a seccomp-bpf filter forbidding fork/clone syscalls may not work as a means to enforce that the application memory remains The next step here is to see why the Python runtime decides to fork, and whether these threads are actually touching any memory that is sensitive or doing I/O on the documents being converted. |
Parent issue: #221
Security Concern
One aspect of the sanitization that Dangerzone has not targeted yet is avoid leaving traces of the converted file on disk. Depending on your threat model, this may be troublesome for two reasons:
Current Situation
Dangerzone uses Linux commands (
libreoffice
,gm
,pdftoppm
,tesseract
) for the various stages of the file conversion. Most of these commands rely on a file to work. Dangerzone is passing files from command to command using two locations:/tmp
directory in the container.Note
A few notes on the
/tmp
dir of a container. This directory is not guaranteed to be backed bytmpfs
, unless you pass a flag like--tmpfs
/--mount type=tmpfs
. Even in that case, this functionality may not be available across platforms. For example, Docker states that tmpfs mounts are only available on Linux. Also, WSL used to emulatetmpfs
on disk.Note that there are contradicting accounts on whether WSL2 supports tmpfs on RAM:
In any case, we have to verify the avaiability of
tmpfs
mounts on each platform separately.Upcoming Improvements
The file passing implementation will drastically change for two reasons:
Remaining Problems
These two improvements will drastically reduce the need for file passing during the conversion, but there is one thing that remains, and that is LibreOffice, which does not accept input from stdin.
There is a Python project called pylokit, that wraps LibreOfficeKit and calls some functions directly, without starting an external process. Unfortunately, even these bindings don't offer a way to read a document from memory: https://github.com/xrmx/pylokit/blob/abdfedbdb80ee172785cead189760a77a544045a/pylokit/lokit.py#L112
So, we have to accept that we will create a file in order to use LibreOffice, and our main line of defense will be storing this file in a tmpfs mount. However, we have to account for the cases where this is simply not available in a platform.
Suggestion
Assuming that an encrypted FUSE fs within the container is prohibitively complex and time-consuming, then we have one more option. We can "shred" the file after LibreOffice has used it, i.e., write random data in the disk blocks where the file was stored. See some existing projects:
Note that this does not offer 100% protection. As @legoktm pointed out, modern filesystems and SSDs are shred-resistant.
Note
@legoktm has pointed out that SecureDrop already uses this approach, when deleting submissions: https://github.com/freedomofpress/securedrop/blob/develop/securedrop/rm.py
The text was updated successfully, but these errors were encountered: