Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qubes Integration Proof of Concept #437

Merged
merged 10 commits into from
Jun 21, 2023
Merged

Qubes Integration Proof of Concept #437

merged 10 commits into from
Jun 21, 2023

Conversation

deeplow
Copy link
Contributor

@deeplow deeplow commented Jun 7, 2023

Initial work to port Dangerzone to Qubes OS, making use of Qubes VMs as an isolation provider. See BUILD.md for instructions on how to setup and run. Fixes #411 and fixes #414.

(some) implementation details:

  • one disposable qube is started per document
  • first stage of the conversion (document_to_pixels) takes place in disposable qube
  • unlike with containers the second conversion stage (OCR+compression) takes place in the host, since the sanitization already happened.

More details can be found on the wiki page

Primary code changes

  • moves container code to dangerzone/conversion/ and spiting dangerzone.py into the two files
    This was needed given that the second stage of the conversion original container code is ran on the primary qube (the one which runs the dangerzone GUI)
  • Qubes conversion is a wrapper (dangerzone/conversion/doc_to_pixels_qubes_wrapper.py)
    At the moment, the Qubes conversion is a simple wrapper around the doc_to_pixels code. This was because in Qubes pages are streamed one-by-one instead of all files sent at once after the conversion is done. The container-code however is still not ready to stream pages. So our wrapper converts the pages like before and then streams the results back to the client.

Limitations

  • It does not support simultaneous file conversion as the file being converted is hard-coded as /tmp/input-file.

  • conversion progress was out of scope, but part of it is already working (for the first container stage). However, due to the "wrapper approach" it will jump from 0-50% in fractions of a second since when it starts sending pages, all the conversion has already happened.

@deeplow deeplow requested a review from apyrgio June 7, 2023 09:44
@deeplow
Copy link
Contributor Author

deeplow commented Jun 7, 2023

CI is awfully broken. I'll can look into that shortly.

dangerzone/conversion/doc_to_pixels_qubes_wrapper.py Outdated Show resolved Hide resolved
dangerzone/conversion/doc_to_pixels_qubes_wrapper.py Outdated Show resolved Hide resolved
BUILD.md Outdated Show resolved Hide resolved
BUILD.md Outdated Show resolved Hide resolved
dangerzone/conversion/doc_to_pixels_qubes_wrapper.py Outdated Show resolved Hide resolved
dangerzone/conversion/doc_to_pixels_qubes_wrapper.py Outdated Show resolved Hide resolved
dangerzone/isolation_provider/qubes.py Outdated Show resolved Hide resolved
dangerzone/logic.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
qubes/dz.Convert Outdated Show resolved Hide resolved
@deeplow deeplow force-pushed the qubes-integration-poc branch 13 times, most recently from ff8b956 to 36de9e7 Compare June 7, 2023 13:50
BUILD.md Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
dangerzone/cli.py Outdated Show resolved Hide resolved
@deeplow deeplow force-pushed the qubes-integration-poc branch 2 times, most recently from ab3ae26 to e714f8d Compare June 9, 2023 09:23
dangerzone/conversion/pixels_to_pdf.py Outdated Show resolved Hide resolved
dangerzone/conversion/common.py Outdated Show resolved Hide resolved
dangerzone/conversion/common.py Show resolved Hide resolved
dangerzone/isolation_provider/qubes.py Outdated Show resolved Hide resolved
dangerzone/isolation_provider/qubes.py Show resolved Hide resolved
dangerzone/isolation_provider/qubes.py Outdated Show resolved Hide resolved
dangerzone/isolation_provider/qubes.py Show resolved Hide resolved
@apyrgio apyrgio force-pushed the qubes-integration-poc branch 2 times, most recently from 7a77de5 to ac4a81b Compare June 12, 2023 12:32
@deeplow deeplow force-pushed the qubes-integration-poc branch from 468ffe3 to 6ebd959 Compare June 14, 2023 10:03
deeplow added a commit that referenced this pull request Jun 14, 2023
Following a suggestion from @apyrgio [1] to not pollute /usr/local/bin.

[1]: #437 (comment)
@deeplow deeplow marked this pull request as ready for review June 15, 2023 10:29
deeplow added a commit that referenced this pull request Jun 15, 2023
Also removes exit codes in the qubes wrapper.
Following a suggestion from #437 (comment)
@apyrgio apyrgio force-pushed the qubes-integration-poc branch 2 times, most recently from 4c362a5 to 51baa34 Compare June 15, 2023 15:24
@apyrgio apyrgio force-pushed the qubes-integration-poc branch 3 times, most recently from 75003e2 to 71b8f67 Compare June 15, 2023 18:56
Copy link
Contributor Author

@deeplow deeplow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I quite like the simplification of the "code teleport" part. And I spotted a potential issue with the Dockerfile change.

Dockerfile Outdated Show resolved Hide resolved
INSTALL.md Show resolved Hide resolved
BUILD.md Outdated Show resolved Hide resolved
@apyrgio apyrgio force-pushed the qubes-integration-poc branch 2 times, most recently from 9a75540 to 29612f4 Compare June 19, 2023 14:06
Copy link
Contributor Author

@deeplow deeplow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge

Copy link
Contributor Author

@deeplow deeplow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one more issue here.

dangerzone/isolation_provider/qubes.py Show resolved Hide resolved
dangerzone/isolation_provider/qubes.py Show resolved Hide resolved
deeplow and others added 9 commits June 21, 2023 11:44
The files in `container/` no longer make sense to have that name since
the "document to pixels" part will run in Qubes OS in its own virtual
machine.

To adapt to this, this PR does the following:
- Moves all the files in `container` to `dangerzone/conversion`
- Splits the old `container/dangerzone.py` into its two components
  `dangerzone/conversion/{doc_to_pixels,pixels_to_pdf}.py` with a
  `common.py` file for shared functions
- Moves the Dockerfile to the project root and adapts it to the new
  container code location
- Updates the CircleCI config to properly cache Docker images.
- Updates our install scripts to properly build Docker images.
- Adds the new conversion module to the container image, so that it can
  be imported as a package.
- Adapts the container isolation provider to use the new way of calling
  the code.

NOTE: We have made zero changes to the conversion code in this commit,
except for necessary imports in order to factor out some common parts.
Any changes necessary for Qubes integration follow in the subsequent
commits.
For using in containers, creating a /dangerzone directory is fine but it
is more standard to do this in /tmp.
Add a way to check if the code runs (or should run) on Qubes.

Refs #451
It seems that there are at least two Python libraries with libmagic
support:

* PyPI: python-magic (https://pypi.org/project/python-magic/)
  On Fedora it's `python3-magic`
* PyPI: filemagic (https://pypi.org/project/filemagic/)
  On Fedora it's `python3-file-magic`

The first package corresponds to the `py3-magic` package on Alpine
Linux, and it's the one we install in the container. The second package
uses a different API, and it's the only one we can use on Qubes.

To make matters worse, we:

* Can't install the first package on Fedora, because it installs the
  second under the hood:
  https://bugzilla.redhat.com/show_bug.cgi?id=1899279
* Can't install the second package on Alpine Linux (untested), due to
  Musl being used instead of libC:
  https://stackoverflow.com/a/53936722

Ultimately, we need to support both, by trying the first API, and on
failure using the other API.
The "document to pixels" code assumes that the client has called it with
some mount points in which it can write files. This is true for the
container isolation provider, but not for Qubes, who can communicate
with the client only via stdin/stdout.

Add a Qubes wrapper for this code that reads the suspicious document
from stdin and writes the pages to stdout. The on-wire format is the
same as the one that TrustedPDF uses.
Add two RPC calls that can run on disposable VMs:

* dz.Convert: This call simply imports the dangerzone package and runs
  the Qubes wrapper for the "document to pixels" code. This call is
  similar to the way we run the conversion part in a container.
* dz.ConvertDev: This call is for development purposes, and does the
  following:
  - First it receives the `dangerzone.conversion` module as Python
    zipfile. This way, we can quickly iterate on changes on the
    server-side part of Qubes, without altering the templates.
  - Second, it calls the Qubes wrapper for the "document to pixels"
    code, as dz.Convert does.
Add an isolation provider for Qubes, that performs the document
conversion as follows:

Document to pixels phase
------------------------

1. Starts a disposable qube by calling either the dz.Convert or the
   dz.ConvertDev RPC call, depending on the execution context.
2. Sends the file to disposable qube through its stdin.
   * If we call the conversion from the development environment, also
     pass the conversion module as a Python zipfile, before the
     suspicious document.
3. Reads the number of pages, their dimensions, and the page data.

Pixels to PDF phase
-------------------

1. Writes the page data under /tmp/dangerzone, so that the
   `pixels_to_pdf` module can read them.
2. Pass OCR parameters as envvars.
3. Call the `pixels_to_pdf` main function, as if it was running within a
   container. Wait until the PDF gets created.
4. Move the resulting PDF to the proper directory.

Fixes #414
Autodetect in the CLI/GUI if we should run the conversion in disposable
qubes.
Allow creating an RPM package that is to be installed specifically on
Qubes. This package has the following extra properties from our regular
RPM packages:

1. Make `python3-magic`, `libreoffice` and `tesseract` requirements
   for installing Dangerzone, since the conversion takes place in a
   disposable qube that needs these packages.
2. Ignore the container.tar.gz file, if it exists.
3. Add our RPC calls under `/etc/qubes-rpc`
apyrgio pushed a commit that referenced this pull request Jun 21, 2023
Following a suggestion from @apyrgio [1] to not pollute /usr/local/bin.

[1]: #437 (comment)
apyrgio pushed a commit that referenced this pull request Jun 21, 2023
Also removes exit codes in the qubes wrapper.
Following a suggestion from #437 (comment)
@apyrgio apyrgio force-pushed the qubes-integration-poc branch from a4390ea to abcc7b4 Compare June 21, 2023 08:47
@deeplow deeplow mentioned this pull request Jun 21, 2023
10 tasks
@apyrgio apyrgio force-pushed the qubes-integration-poc branch from abcc7b4 to c4cc1a9 Compare June 21, 2023 09:17
Add instructions aimed at developers who want to try out Qubes
integration.

Fixes #411
@apyrgio apyrgio force-pushed the qubes-integration-poc branch from c4cc1a9 to 20b24a6 Compare June 21, 2023 12:06
@apyrgio apyrgio merged commit 20b24a6 into main Jun 21, 2023
@apyrgio apyrgio deleted the qubes-integration-poc branch June 26, 2023 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Qubes OS isolation provider Qubes: Alpha integration
2 participants