This module provides graphical and command-line interfaces for the document sanization solution.
- For execution, it is assumed that the entrusted_container component is deployed (under Docker or Podman). On Mac OS, you need more specifically Docker Desktop.
- For compilation, you need a recent version of the Rust toolchain (rustc compiler and cargo): rust
1.70+
.
The user interface is built with the FLTK toolkit because it’s lightweight and relatively easy to use.
The CLI can be convenient for “terminal warriors” or to supplement shell scripts.
Please checkout the build instructions on the wiki for more information (software dependencies).
On Linux, You’ll need to have xorg
and libxcb
dev libraries (-dev
or -devel
packages accordingly to your Linux distribution).
At the root of this project, open a command prompt and type cargo build --features=gui,fltk/fltk-bundled,fltk/use-wayland
.
./target/debug/entrusted-gui
In the example below, a suspicious PDF file is converted to a searchable PDF (OCR), instead of just PDF images (ocr-lang
parameter).
- OCR is a time consuming task in comparison to just generating PDF images for pages of the original input.
- You only want OCR if you need to be able to select or search text in the resulting PDF
- It increases significantly processing time
- It leverages the tesseract OCR engine behind the scenes
cp ../test_data/gnus-logo.pdf suspicious_file.pdf
./target/debug/entrusted-cli --input-filename suspicious_file.pdf --ocr-lang eng
There’s no explicit command-line support for batch conversion, because in a UNIX/Linux environment shell scripting is much more flexible.
In the example below, all the PDF files in the Downloads
folder of the me
user will be converted (recursive folder traversal).
find /home/me/Downloads \
-name "*.pdf" \
-exec entrusted-cli --input-filename {} \;
Yes, the configuration file (config.toml
) is optional and its location is operating system dependent.
Operating System | Configuration File Location |
---|---|
Linux & Others | $XDG_CONFIG_HOME/com.rimerosolutions.entrusted.entrusted_client/config.toml |
Mac OS | $HOME/Library/Application Support/com.rimerosolutions.entrusted.entrusted_client/config.toml |
Windows | %APPDATA%\com.rimerosolutions.entrusted.entrusted_client\config.toml |
The configuration format is TOML, it’s a bit similar to INI files syntax.
# This must be a valid tesseract lang code
# See https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
ocr-lang = "eng"
# The converted name will be named as follow original-name-sanitized.pdf
file-suffix = "sanitized"
# This is meant mostly for advanced usage (self-hosting, development, etc.)
# container-image-name= docker.io/MY_USERNAME_HERE/entrusted_container:1.2.3
# The requested visual quality of the PDF result influences processing time and result size
# This is one of 'low', 'medium' or 'high' with a default of 'medium'
visual-quality = "medium"
Parameter | Description |
---|---|
ocr-lang | The tesseract OCR langcode if OCR is desired (slower conversions) |
file-suffix | Custom file suffix for converted files (defaults to entrusted ) |
container-image-name | A custom container image for conversions (advanced option) |
visual-quality | The result visual quality (file size, processing time, visuals) |