This project implements a Docker container for dupeGuru.
The GUI of the application is accessed through a modern web browser (no installation or configuration needed on the client side) or via any VNC client.
dupeGuru is a tool to find duplicate files on your computer. It can scan either filenames or contents. The filename scan features a fuzzy matching algorithm that can find duplicate filenames even when they are not exactly the same.
- Quick Start
- Usage
- Docker Compose File
- Docker Image Versioning
- Docker Image Update
- User/Group IDs
- Accessing the GUI
- Security
- Reverse Proxy
- Shell Access
- dupeGuru Deletion Options
- Support or Contact
Important
The Docker command provided in this quick start is given as an example and parameters should be adjusted to your need.
Launch the dupeGuru docker container with the following command:
docker run -d \
--name=dupeguru \
-p 5800:5800 \
-v /docker/appdata/dupeguru:/config:rw \
-v /home/user:/storage:rw \
jlesage/dupeguru
Where:
/docker/appdata/dupeguru
: This is where the application stores its configuration, states, log and any files needing persistency./home/user
: This location contains files from your host that need to be accessible to the application.
Browse to http://your-host-ip:5800
to access the dupeGuru GUI.
Files from the host appear under the /storage
folder in the container.
docker run [-d] \
--name=dupeguru \
[-e <VARIABLE_NAME>=<VALUE>]... \
[-v <HOST_DIR>:<CONTAINER_DIR>[:PERMISSIONS]]... \
[-p <HOST_PORT>:<CONTAINER_PORT>]... \
jlesage/dupeguru
Parameter | Description |
---|---|
-d | Run the container in the background. If not set, the container runs in the foreground. |
-e | Pass an environment variable to the container. See the Environment Variables section for more details. |
-v | Set a volume mapping (allows to share a folder/file between the host and the container). See the Data Volumes section for more details. |
-p | Set a network port mapping (exposes an internal container port to the host). See the Ports section for more details. |
To customize some properties of the container, the following environment
variables can be passed via the -e
parameter (one for each variable). Value
of this parameter has the format <VARIABLE_NAME>=<VALUE>
.
Variable | Description | Default |
---|---|---|
USER_ID |
ID of the user the application runs as. See User/Group IDs to better understand when this should be set. | 1000 |
GROUP_ID |
ID of the group the application runs as. See User/Group IDs to better understand when this should be set. | 1000 |
SUP_GROUP_IDS |
Comma-separated list of supplementary group IDs of the application. | (no value) |
UMASK |
Mask that controls how permissions are set for newly created files and folders. The value of the mask is in octal notation. By default, the default umask value is 0022 , meaning that newly created files and folders are readable by everyone, but only writable by the owner. See the online umask calculator at http://wintelguy.com/umask-calc.pl. |
0022 |
LANG |
Set the locale, which defines the application's language, if supported. Format of the locale is language[_territory][.codeset] , where language is an ISO 639 language code, territory is an ISO 3166 country code and codeset is a character set, like UTF-8 . For example, Australian English using the UTF-8 encoding is en_AU.UTF-8 . |
en_US.UTF-8 |
TZ |
TimeZone used by the container. Timezone can also be set by mapping /etc/localtime between the host and the container. |
Etc/UTC |
KEEP_APP_RUNNING |
When set to 1 , the application will be automatically restarted when it crashes or terminates. |
0 |
APP_NICENESS |
Priority at which the application should run. A niceness value of -20 is the highest priority and 19 is the lowest priority. The default niceness value is 0. NOTE: A negative niceness (priority increase) requires additional permissions. In this case, the container should be run with the docker option --cap-add=SYS_NICE . |
0 |
INSTALL_PACKAGES |
Space-separated list of packages to install during the startup of the container. List of available packages can be found at https://pkgs.alpinelinux.org. ATTENTION: Container functionality can be affected when installing a package that overrides existing container files (e.g. binaries). | (no value) |
PACKAGES_MIRROR |
Mirror of the repository to use when installing packages. List of mirrors is available at https://mirrors.alpinelinux.org. | (no value) |
CONTAINER_DEBUG |
Set to 1 to enable debug logging. |
0 |
DISPLAY_WIDTH |
Width (in pixels) of the application's window. | 1920 |
DISPLAY_HEIGHT |
Height (in pixels) of the application's window. | 1080 |
DARK_MODE |
When set to 1 , dark mode is enabled for the application. |
0 |
WEB_AUDIO |
When set to 1 , audio support is enabled, meaning that any audio produced by the application is played through the browser. Note that audio is not supported for VNC clients. |
0 |
WEB_AUTHENTICATION |
When set to 1 , the application' GUI is protected via a login page when accessed via a web browser. Access is allowed only when providing valid credentials. NOTE: This feature requires secure connection (SECURE_CONNECTION environment variable) to be enabled. |
0 |
WEB_AUTHENTICATION_USERNAME |
Optional username to configure for the web authentication. This is a quick and easy way to configure credentials for a single user. To configure credentials in a more secure way, or to add more users, see the Web Authentication section. | (no value) |
WEB_AUTHENTICATION_PASSWORD |
Optional password to configure for the web authentication. This is a quick and easy way to configure credentials for a single user. To configure credentials in a more secure way, or to add more users, see the Web Authentication section. | (no value) |
SECURE_CONNECTION |
When set to 1 , an encrypted connection is used to access the application's GUI (either via a web browser or VNC client). See the Security section for more details. |
0 |
SECURE_CONNECTION_VNC_METHOD |
Method used to perform the secure VNC connection. Possible values are SSL or TLS . See the Security section for more details. |
SSL |
SECURE_CONNECTION_CERTS_CHECK_INTERVAL |
Interval, in seconds, at which the system verifies if web or VNC certificates have changed. When a change is detected, the affected services are automatically restarted. A value of 0 disables the check. |
60 |
WEB_LISTENING_PORT |
Port used by the web server to serve the UI of the application. This port is used internally by the container and it is usually not required to be changed. By default, a container is created with the default bridge network, meaning that, to be accessible, each internal container port must be mapped to an external port (using the -p or --publish argument). However, if the container is created with another network type, changing the port used by the container might be useful to prevent conflict with other services/containers. NOTE: a value of -1 disables listening, meaning that the application's UI won't be accessible over HTTP/HTTPs. |
5800 |
VNC_LISTENING_PORT |
Port used by the VNC server to serve the UI of the application. This port is used internally by the container and it is usually not required to be changed. By default, a container is created with the default bridge network, meaning that, to be accessible, each internal container port must be mapped to an external port (using the -p or --publish argument). However, if the container is created with another network type, changing the port used by the container might be useful to prevent conflict with other services/containers. NOTE: a value of -1 disables listening, meaning that the application's UI won't be accessible over VNC. |
5900 |
VNC_PASSWORD |
Password needed to connect to the application's GUI. See the VNC Password section for more details. | (no value) |
ENABLE_CJK_FONT |
When set to 1 , open-source computer font WenQuanYi Zen Hei is installed. This font contains a large range of Chinese/Japanese/Korean characters. |
0 |
Many tools used to manage Docker containers extract environment variables defined by the Docker image and use them to create/deploy the container. For example, this is done by:
- The Docker application on Synology NAS
- The Container Station on QNAP NAS
- Portainer
- etc.
While this can be useful for the user to adjust the value of environment variables to fit its needs, it can also be confusing and dangerous to keep all of them.
A good practice is to set/keep only the variables that are needed for the container to behave as desired in a specific setup. If the value of variable is kept to its default value, it means that it can be removed. Keep in mind that all variables are optional, meaning that none of them is required for the container to start.
Removing environment variables that are not needed provides some advantages:
- Prevents keeping variables that are no longer used by the container. Over time, with image updates, some variables might be removed.
- Allows the Docker image to change/fix a default value. Again, with image updates, the default value of a variable might be changed to fix an issue, or to better support a new feature.
- Prevents changes to a variable that might affect the correct function of
the container. Some undocumented variables, like
PATH
orENV
, are required to be exposed, but are not meant to be changed by users. However, container management tools still show these variables to users. - There is a bug with the Container Station on QNAP and the Docker application on Synology, where an environment variable without value might not be allowed. This behavior is wrong: it's absolutely fine to have a variable without value. In fact, this container does have variables without value by default. Thus, removing unneeded variables is a good way to prevent deployment issue on these devices.
The following table describes data volumes used by the container. The mappings
are set via the -v
parameter. Each mapping is specified with the following
format: <HOST_DIR>:<CONTAINER_DIR>[:PERMISSIONS]
.
Container path | Permissions | Description |
---|---|---|
/config |
rw | This is where the application stores its configuration, states, log and any files needing persistency. |
/storage |
rw | This location contains files from your host that need to be accessible to the application. |
/trash |
rw | This is where duplicated files are moved when they are sent to trash. |
Here is the list of ports used by the container.
When using the default bridge network, ports can be mapped to the host via the
-p
parameter (one per port mapping). Each mapping is defined with the
following format: <HOST_PORT>:<CONTAINER_PORT>
. The port number used inside
the container might not be changeable, but you are free to use any port on the
host side.
See the Docker Container Networking documentation for more details.
Port | Protocol | Mapping to host | Description |
---|---|---|---|
5800 | TCP | Optional | Port to access the application's GUI via the web interface. Mapping to the host is optional if access through the web interface is not wanted. For a container not using the default bridge network, the port can be changed with the WEB_LISTENING_PORT environment variable. |
5900 | TCP | Optional | Port to access the application's GUI via the VNC protocol. Mapping to the host is optional if access through the VNC protocol is not wanted. For a container not using the default bridge network, the port can be changed with the VNC_LISTENING_PORT environment variable. |
As can be seen, environment variables, volume and port mappings are all specified while creating the container.
The following steps describe the method used to add, remove or update parameter(s) of an existing container. The general idea is to destroy and re-create the container:
- Stop the container (if it is running):
docker stop dupeguru
- Remove the container:
docker rm dupeguru
- Create/start the container using the
docker run
command, by adjusting parameters as needed.
Note
Since all application's data is saved under the /config
container folder,
destroying and re-creating a container is not a problem: nothing is lost and
the application comes back with the same state (as long as the mapping of the
/config
folder remains the same).
Here is an example of a docker-compose.yml
file that can be used with
Docker Compose.
Make sure to adjust according to your needs. Note that only mandatory network ports are part of the example.
version: '3'
services:
dupeguru:
image: jlesage/dupeguru
ports:
- "5800:5800"
volumes:
- "/docker/appdata/dupeguru:/config:rw"
- "/home/user:/storage:rw"
Each release of a Docker image is versioned. Prior to october 2022, the semantic versioning was used as the versioning scheme.
Since then, versioning scheme changed to
calendar versioning. The format used is YY.MM.SEQUENCE
,
where:
YY
is the zero-padded year (relative to year 2000).MM
is the zero-padded month.SEQUENCE
is the incremental release number within the month (first release is 1, second is 2, etc).
Because features are added, issues are fixed, or simply because a new version of the containerized application is integrated, the Docker image is regularly updated. Different methods can be used to update the Docker image.
The system used to run the container may have a built-in way to update containers. If so, this could be your primary way to update Docker images.
An other way is to have the image be automatically updated with Watchtower. Watchtower is a container-based solution for automating Docker image updates. This is a "set and forget" type of solution: once a new image is available, Watchtower will seamlessly perform the necessary steps to update the container.
Finally, the Docker image can be manually updated with these steps:
- Fetch the latest image:
docker pull jlesage/dupeguru
- Stop the container:
docker stop dupeguru
- Remove the container:
docker rm dupeguru
- Create and start the container using the
docker run
command, with the the same parameters that were used when it was deployed initially.
For owners of a Synology NAS, the following steps can be used to update a container image.
- Open the Docker application.
- Click on Registry in the left pane.
- In the search bar, type the name of the container (
jlesage/dupeguru
). - Select the image, click Download and then choose the
latest
tag. - Wait for the download to complete. A notification will appear once done.
- Click on Container in the left pane.
- Select your dupeGuru container.
- Stop it by clicking Action->Stop.
- Clear the container by clicking Action->Reset (or Action->Clear if you don't have the latest Docker application). This removes the container while keeping its configuration.
- Start the container again by clicking Action->Start. NOTE: The container may temporarily disappear from the list while it is re-created.
For unRAID, a container image can be updated by following these steps:
- Select the Docker tab.
- Click the Check for Updates button at the bottom of the page.
- Click the update ready link of the container to be updated.
When using data volumes (-v
flags), permissions issues can occur between the
host and the container. For example, the user within the container may not
exist on the host. This could prevent the host from properly accessing files
and folders on the shared volume.
To avoid any problem, you can specify the user the application should run as.
This is done by passing the user ID and group ID to the container via the
USER_ID
and GROUP_ID
environment variables.
To find the right IDs to use, issue the following command on the host, with the user owning the data volume on the host:
id <username>
Which gives an output like this one:
uid=1000(myuser) gid=1000(myuser) groups=1000(myuser),4(adm),24(cdrom),27(sudo),46(plugdev),113(lpadmin)
The value of uid
(user ID) and gid
(group ID) are the ones that you should
be given the container.
Assuming that container's ports are mapped to the same host's ports, the graphical interface of the application can be accessed via:
- A web browser:
http://<HOST IP ADDR>:5800
- Any VNC client:
<HOST IP ADDR>:5900
By default, access to the application's GUI is done over an unencrypted connection (HTTP or VNC).
Secure connection can be enabled via the SECURE_CONNECTION
environment
variable. See the Environment Variables section for
more details on how to set an environment variable.
When enabled, application's GUI is performed over an HTTPs connection when accessed with a browser. All HTTP accesses are automatically redirected to HTTPs.
When using a VNC client, the VNC connection is performed over SSL. Note that few VNC clients support this method. SSVNC is one of them.
SSVNC is a VNC viewer that adds encryption security to VNC connections.
While the Linux version of SSVNC works well, the Windows version has some
issues. At the time of writing, the latest version 1.0.30
is not functional,
as a connection fails with the following error:
ReadExact: Socket error while reading
However, for your convenience, an unofficial and working version is provided here:
The only difference with the official package is that the bundled version of
stunnel
has been upgraded to version 5.49
, which fixes the connection
problems.
Here are the certificate files needed by the container. By default, when they are missing, self-signed certificates are generated and used. All files have PEM encoded, x509 certificates.
Container Path | Purpose | Content |
---|---|---|
/config/certs/vnc-server.pem |
VNC connection encryption. | VNC server's private key and certificate, bundled with any root and intermediate certificates. |
/config/certs/web-privkey.pem |
HTTPs connection encryption. | Web server's private key. |
/config/certs/web-fullchain.pem |
HTTPs connection encryption. | Web server's certificate, bundled with any root and intermediate certificates. |
Tip
To prevent any certificate validity warnings/errors from the browser or VNC client, make sure to supply your own valid certificates.
Note
Certificate files are monitored and relevant daemons are automatically restarted when changes are detected.
To restrict access to your application, a password can be specified. This can be done via two methods:
- By using the
VNC_PASSWORD
environment variable. - By creating a
.vncpass_clear
file at the root of the/config
volume. This file should contain the password in clear-text. During the container startup, content of the file is obfuscated and moved to.vncpass
.
The level of security provided by the VNC password depends on two things:
- The type of communication channel (encrypted/unencrypted).
- How secure the access to the host is.
When using a VNC password, it is highly desirable to enable the secure connection to prevent sending the password in clear over an unencrypted channel.
Caution
Password is limited to 8 characters. This limitation comes from the Remote Framebuffer Protocol RFC (see section 7.2.2). Any characters beyond the limit are ignored.
Access to the application's GUI via a web browser can be protected with a login page. When web authentication is enabled, users have to provide valid credentials, otherwise access is denied.
Web authentication can be enabled by setting the WEB_AUTHENTICATION
environment variable to 1
.
See the Environment Variables section for more details on how to set an environment variable.
Important
Secure connection must also be enabled to use web authentication. See the Security section for more details.
Two methods can be used to configure users credentials:
- Via container environment variables.
- Via password database.
Containers environment variables can be used to quickly and easily configure a single user. Username and pasword are defined via the following environment variables:
WEB_AUTHENTICATION_USERNAME
WEB_AUTHENTICATION_PASSWORD
See the Environment Variables section for more details on how to set an environment variable.
The second method is more secure and allows multiple users to be configured.
The usernames and password hashes are saved into a password database, located at
/config/webauth-htpasswd
inside the container. This database file has the
same format as htpasswd files of the Apache HTTP server. Note that password
themselves are not saved into the database, but only their hash. The bcrypt
password hashing function is used to generate hashes.
Users are managed via the webauth-user
tool included in the container:
- To add a user password:
docker exec -ti <container name or id> webauth-user add <username>
. - To update a user password:
docker exec -ti <container name or id> webauth-user update <username>
. - To remove a user:
docker exec <container name or id> webauth-user del <username>
. - To list users:
docker exec <container name or id> webauth-user user
.
The following sections contain NGINX configurations that need to be added in order to reverse proxy to this container.
A reverse proxy server can route HTTP requests based on the hostname or the URL path.
In this scenario, each hostname is routed to a different application/container.
For example, let's say the reverse proxy server is running on the same machine
as this container. The server would proxy all HTTP requests sent to
dupeguru.domain.tld
to the container at 127.0.0.1:5800
.
Here are the relevant configuration elements that would be added to the NGINX configuration:
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream docker-dupeguru {
# If the reverse proxy server is not running on the same machine as the
# Docker container, use the IP of the Docker host here.
# Make sure to adjust the port according to how port 5800 of the
# container has been mapped on the host.
server 127.0.0.1:5800;
}
server {
[...]
server_name dupeguru.domain.tld;
location / {
proxy_pass http://docker-dupeguru;
}
location /websockify {
proxy_pass http://docker-dupeguru;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_read_timeout 86400;
}
}
In this scenario, the hostname is the same, but different URL paths are used to route to different applications/containers.
For example, let's say the reverse proxy server is running on the same machine
as this container. The server would proxy all HTTP requests for
server.domain.tld/dupeguru
to the container at 127.0.0.1:5800
.
Here are the relevant configuration elements that would be added to the NGINX configuration:
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream docker-dupeguru {
# If the reverse proxy server is not running on the same machine as the
# Docker container, use the IP of the Docker host here.
# Make sure to adjust the port according to how port 5800 of the
# container has been mapped on the host.
server 127.0.0.1:5800;
}
server {
[...]
location = /dupeguru {return 301 $scheme://$http_host/dupeguru/;}
location /dupeguru/ {
proxy_pass http://docker-dupeguru/;
# Uncomment the following line if your Nginx server runs on a port that
# differs from the one seen by external clients.
#port_in_redirect off;
location /dupeguru/websockify {
proxy_pass http://docker-dupeguru/websockify/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_read_timeout 86400;
}
}
}
To get shell access to the running container, execute the following command:
docker exec -ti CONTAINER sh
Where CONTAINER
is the ID or the name of the container used during its
creation.
When deleting duplicated files, dupeGuru offer two choices:
- Send files to trash
- Delete files directly
The first option moves files to the /trash
directory inside the container.
This operation can be slow for large files since it may imply a copy of the
data before the actual deletion.
There is also an option to link deleted files. It is not recommended to enable this option, since there is a good chance that created links won't make sense outside the container.
Having troubles with the container or have questions? Please create a new issue.
For other great Dockerized applications, see https://jlesage.github.io/docker-apps.