-
Notifications
You must be signed in to change notification settings - Fork 504
Add Dockerfile for application and documentation #39
Add Dockerfile for application and documentation #39
Conversation
47feabe
to
328e17b
Compare
328e17b
to
7b85d83
Compare
7b85d83
to
89b0dff
Compare
After having the licensing issue on the mentioned gist resolved (thanks again @bmispelon), I have added the missing documentation and command to allow for easy loading of exported tags. I have double-checked the documentation and went through the whole process:
From what I could tell, everything worked out. Thus, I have removed the work-in-progress notice and "clear" this PR for merge. @danielquinn Feel free to merge whenever, but maybe @TheConnMan wants to check it out first, since he opened the initial Docker PR (#28). |
Ok so I've started learning about Docker these past few days in anticipation of your wanting to merge this. I have a few comments so far:
|
Are you running version 1.6.0? It is the minimum required version for the file to work. You can check by running I'll correct the typo and check into the environment variable issue after I got some sleep, but in general you are correct with your assumption: the defaults are not repeated in the |
Well shit. It looks like that's the latest version in Gentoo. I'll have to take a look around regarding the "right" way to get a more modern version. Thanks for the heads up. Maybe that should go in the doc somewhere too? Get some sleep, we can do this over a few days :-) |
I need to actually download/run this, but at least from what I read of the Dockerfile, I'm a fan 👍 |
If you are talking about mentioning that we need version 1.6.0, it is. If you talk about how to install a current version, linking the install page might not be a bad idea, which suggests executing the following commands as root: curl -L https://github.com/docker/compose/releases/download/1.6.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose Or adapted to curl -L https://github.com/docker/compose/releases/download/1.6.0/docker-compose-`uname -s`-`uname -m` | sudo tee /usr/local/bin/docker-compose > /dev/null
sudo chmod +x /usr/local/bin/docker-compose
Cool, be sure to let us know if it worked out for you! |
Hi folks, I had a go following the instructions in the diff (checking out https://github.com/pitkley/paperless/tree/feature/dockerfile) and ran into the following issue when running
I don't know if this is something you need to fix before landing the PR: I'm a complete docker n00b so it's very possible I'm doing something non-standard that breaks things, or even that this is completely expected. Please ignore if so, I'm quite happy to wait until the PR lands and things are "official". If it is something you want to look into please let me know if I can give any useful details. I'm on OS X El Capitan, versions
|
Hey @tikitu, thanks for taking the time for trying it out! Right now the image is not available from the Docker Hub. You have to clone this PR and execute To clone this PR and build the image, the following is one of the ways: $ git clone https://github.com/pitkley/paperless
$ cd paperless
$ git checkout feature/dockerfile
$ docker build -t paperless . Then you should be able to continue on as you tried above. |
Aha! That's what "this should go to Docker Hub but let's land the PR first" is talking about. Thanks for helping me through the basics @pitkley, I'll give it a whirl! |
Ok I managed to start everything up and have a few more issues:
|
Actually 1c45ca1 is the current parent, so the logging should be working (it is for me at least). To look at stdout, you can run
With docker-compose version 1.6.0 they introduced exactly that, I will push an updated version shortly.
That was pretty much the idea of the file, it should just be for the users convenience and not a hard requirement. I'm going to rename it!
No, you would have to modify the YAML-file something like this: diff --git a/docker-compose.yml b/docker-compose.yml
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -17,9 +18,8 @@ services:
volumes:
- paperless-data:/usr/src/paperless/data
- paperless-media:/usr/src/paperless/media
- - /consume
+ - /local/path/you/choose:/consume where the left part of the colon is the local path you want to mount. I haven't included this in the documentation since that are Docker and docker-compose basics I would assume a user coming to us using Docker already has, but maybe I'll add this as an example. Thanks for the feedback 👍 PS: The minimum Docker version required is 1.10.0, stable in Gentoo is only 1.7.1. If you have further issues, that might be the cause! |
👍 for including the |
89b0dff
to
cef941f
Compare
Done.
The reason the line is there is so that the webserver that doesn't do any text recognition doesn't have to install unnecessary languages the user might have set in the env-file by overwriting the value with nothing. So yes, we could remove the line with the potential drawback of the webserver having a higher startup time and higher space consumption. Moving it to the environment file is not possible since (a) it is already there in form of the commented out line and (b) it would always affect both the consumer and the webserver.
Naming it In the end it is up to you.
Generally yes. The reason I didn't is mainly because I wanted the YAML-file to be drop-in and use without modifications, but that really doesn't make a whole lot of sense since a mounted consumption directory will be what the end-user wants most of the time, I assume. Besides having a non-existant "random" path, we could change it so something like this: diff --git a/docker-compose.yml b/docker-compose.yml
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -17,9 +18,8 @@ services:
volumes:
- paperless-data:/usr/src/paperless/data
- paperless-media:/usr/src/paperless/media
- - /consume
+ - $PWD/data:/consume This would take the current path from the shell and append
The I too was surprised by this, something like this doesn't work: volumes:
- paperless-media
- paperless-data If there are any more questions, just ask :)! |
0018516
to
cad7e6b
Compare
Note that YAML file syntax allows for
That might be the approach to take in the example file |
Hey, I've now successfully followed the Docker instructions and had my first document OCRed for me! One tiny bump on the way: the docs say I can find the webserver at |
@tikitu Indeed. Docker creates a virtual machine with a different IP (in your case, If it's not already in the docs for this it would definitely be worth pointing out. You've already figured out how to get the machine IP, but (assuming one has docker-machine installed as well) one can get the IP via |
@tikitu The guide was written with a "vanilla" Linux installation in mind. The In my opinion that is something that should not be in the guide, since we never mention Docker machine (and that would probably blow up the documentation in regards to mounting volumes, since machine starts a separate VM). If the end-user decides to go with Docker machine, they should be able to find all the documentation they need in the machine docs. |
I'm not sure that "since we never mention Docker machine" is a strong enough reason: unless I missed something, this is what I got from starting at https://www.docker.com/ and following the vanilla install instructions for OS X (I'm guessing docker-machine is installed as part of Docker Toolbox, but that's not at all obvious). But it also comes down to how much docker-n00b-support you're willing to write into the docs, of course; after all, these are paperless docs not docker docs. |
Indeed that is the official way to do Docker under OS X, but again, this guide was with Linux in mind where you can access the container under
That is my main issue right now. This is slowly turning into a rabbit hole, rewriting already existing documentation. Some of it is clearly justified and needed to have an easy and pleasant start with Paperless on Docker (regardless of if you know Docker yet or not), but supporting the "non-native" operating systems (in regards to Docker) will blow up the documentation. One point I take from this though is that we should at least mention that you can use Docker and thus Paperless on Windows and Mac OS X, and that the guide might not adapt to if you use either of those OSes. What do you think? |
Yup, makes sense. (Once this PR lands I can turn my quibbles into PRs myself instead of bothering you with them.) |
I do have one more substantive question though: this might again just be my (complete) lack of docker-fu, or might be something to tweak. I've tried to add a couple of other OCR languages by editing
Again: might easily be I'm just missing some docker basics. If so I promise not to demand whatever-is-needed makes its way into the docs! |
@tikitu ooh, I think I know this one. The Dockerfile is the basis of both the webserver and the consumer, but in reality, the consumer is the only one that needs Tesseract, and certainly only the consumer needs multiple languages. With this in mind, the base install includes Tesseract with only one language, but when it comes time to build the consumer from the base, we also build the additional languages since they're actually used there. @pitkley thank you so much for all of your work on this. I've been at a conference all day so I've not been able to chime in when I would have liked. Can I assume that you're happy with what we have here and are cool with me merging it? Once I do, I may need some quick tips on how to push it into my Docker account so people can download Paperless from the cloud. As for the discussion about documentation, I agree with @pitkley that this is definitely paperless documentation. I asked for clarification in a few parts because I was largely ignorant about how Docker works, but I think that the right perspective to take when writing docs for this project is that if the user has chosen the Docker route, she knows everything she needs to know about Docker to get it working. Otherwise she would have gone the bare-metal or Vagrant route instead. |
@pitkley I just realised I didn't respond to your comments, so here goes:
I understand now. Just a thought though: given that the webserver doesn't actually use Tesseract (or ImageMagick for that matter), would it be better to tell the composer to install those only on the consumer rather than have it as part of the Dockerfile? Or is that just Not Done in the Docker way of things? I'm still trying to get a grip of standard patterns in Dockerland. In the same vein, I see now why you named it As for changing the consume mount, in the same way you've set This last bit is symatics though, and shouldn't hold up this merge as clearly it's a popular feature that's long overdue. Perhaps the better route down the road would be to not have a So, let me know if you want to change the stuff I said about Tesseract there, and if not, just post a 👍 and I'll merge this baby. |
@tikitu The additional languages get installed after the consumer-container starts, so issuing Why do you think your additional languages don't get used? What seems to indicate this? I've tested it with
Installing the dependencies for the consumer using docker-compose directly is not possible, only indirectly using e.g. a environment variable, just like we do with the additional languages. While I could theoretically implement such a variable, it would drastically increase the startup time of the consumer-container since it needs to install all those hefty dependencies. Also what one needs to realize in regards to Docker: If you run a container off of a image multiple times, it still only consumes space of the single image. Everything you add or modify within a container overlays the base image and only requires that delta of space. If we would relocate the installation of the dependencies of the consumer into runtime, the end-user would not have any advantages, since the consumed space doesn't change. What they would have is the disadvantage of having to download the dependencies every time the container gets recreated.
The passphrase is a good point. I will change it to a non-existant arbitray path (which will conveniently throw an error message in docker-compose, maybe nudging the user in the right direction to modify the
I only introduced the
Yes, I will change the stuff mentioned above. When I am done with that, I'll ping you. |
cad7e6b
to
44ddb7c
Compare
@danielquinn One comment of yours that I missed to discuss:
No. The quotes will be part of the variable, causing the installation to fail since neither Other than that, I have fixed/updated the points mentioned above, and I think and hope that that should be it. If you could check the documentation one more time (specifically this paragraph and points 1 and 2 including the notes here). So, here we go: 👍 |
This commit adds a `Dockerfile` to the root of the project, accompanied by a `docker-compose.yml.example` for simplified deployment. The `Dockerfile` is agnostic to whether it will be the webserver, the consumer, or if it is run for a one-off command (i.e. creation of a superuser, migration of the database, document export, ...). The containers entrypoint is the `scripts/docker-entrypoint.sh` script. This script verifies that the required permissions are set, remaps the default users and/or groups id if required and installs additional languages if the user wishes to. After initialization, it analyzes the command the user supplied: - If the command starts with a slash, it is expected that the user wants to execute a binary file and the command will be executed without further intervention. (Using `exec` to effectively replace the started shell-script and not have any reaping-issues.) - If the command does not start with a slash, the command will be passed directly to the `manage.py` script without further modification. (Again using `exec`.) The default command is set to `--help`. If the user wants to execute a command that is not meant for `manage.py` but doesn't start with a slash, the Docker `--entrypoint` parameter can be used to circumvent the mechanics of `docker-entrypoint.sh`. Further information can be found in `docs/setup.rst` and in `docs/migrating.rst`. For additional convenience, a `Dockerfile` has been added to the `docs/` directory which allows for easy building and serving of the documentation. This is documented in `docs/requirements.rst`.
44ddb7c
to
724afa5
Compare
It all looks good to me. I'll give all the documentation a once-over later next week, but I haven't seen anything glaring that's worth holding up the merge. Here we go indeed! |
Add Dockerfile for application and documentation
Thanks to everyone involved. I am very happy with the end-result and it wouldn't have been as good without your input, testing and feedback! |
🎆
@pitkley Thanks for the explanation of the build/run setup -- indeed I see the package management in the consumer logs, so I must be getting unlucky with the documents. If I can get an unambiguous example of recognition-gone-wrong, where would you like me to drop the document? (This PR should be left to rest on its laurels, I'm sure you'll agree :-) ) |
@tikitu If you find a document, that would be great. What you could also do is check the logs, since it contains messages like "Parsing for eng" or "Parsing for deu". Note that there will always be a "Parsing for eng" to begin with, since the detect the language after the first OCR run. If it detects a different language, it will output "Language detected: ..." and "Parsing for ..." directly after that message. In terms of where to put that document, I don't know. You can mail it to me directly if you want, but maybe uploading it somewhere and opening an issue linking to it so that everyone has access to the document (if you would want that) might be another option. |
@pitkley I've only ever seen "Parsing for eng" (multiple times for the same document) and "Language detected: en". That's for some Dutch and Greek documents I happened to use as my first test cases, but they're pretty messy: it's possible there's enough stuff that looks like English that it's mis-detecting. I'll try to find a clear example that I'm ok with sharing with the world, and open an issue for it. |
This commit adds a
Dockerfile
to the root of the project, accompanied by adocker-compose.yml
for simplified deployment. TheDockerfile
is agnostic to whether it will be the webserver, the consumer, or if it is run for a one-off command (i.e. creation of a superuser, migration of the database, document export, ...).The containers entrypoint is the
scripts/docker-entrypoint.sh
script. This script verifies that the required permissions are set, remaps the default users and/or groups id if required and installs additional languages if the user wishes to.After initialization, it analyzes the command the user supplied:
exec
to effectively replace the started shell-script and not have any reaping-issues.)manage.py
script without further modification. (Again usingexec
.)The default command is set to
--help
.If the user wants to execute a command that is not meant for
manage.py
but doesn't start with a slash, the Docker--entrypoint
parameter can be used to circumvent the mechanics ofdocker-entrypoint.sh
.Further information can be found in
docs/setup.rst
and indocs/migrating.rst
.Some additional points:
Given the discussions in issue Server setup via Docker #2 and PR Docker #28, this PR will probably supersede PR Docker #28.
If you have skimmed through the migration-documentation you might have realized that I have left out how to restore data using Docker. I have actually written the corresponding documentation and implementation, but it requires a custom
loaddata
-command which can be found in this gist.Right now, the license the gist is under is unclear. I have already asked for clarification and as soon as I get a response, I am going to update this PR accordingly.
I have marked this as work in progress. While I have tested everything (at least I think so...) I have documented, I am not comfortable merging this before there are a few responses.
One big point remaining to discuss is Docker Hub. In my eyes it is essential that the official Docker container will be available on the hub, and as up-to-date as possible.
Regarding integrity of the image, using Docker Hub's automated builds is also something I would see as a given. This leaves the issue of "namespacing" -- under what user or organization should this container live?
If we were to bind it to a user, it would have to be @danielquinn as far as I can tell since Docker Hub requires linking your GitHub account to make automated builds work. Additionally, having Docker Hub rebuild the image as soon as
master
gets updated requires this link as well. (I don't know if we would want this build to happen fully automatically, but rather only if we know building the Docker image will not fail. I have something brewing regarding Travis-CI, this could possibly solve this.)I have no experience with organizations on Docker Hub, maybe someone with more knowledge on that can support here?
Overall, there are a few open points, although I think the whole Docker Hub issue should be discussed separately and should not block this PR from being merged.