Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate FSCrawler docker images #820

Closed
wants to merge 35 commits into from
Closed

Conversation

toto1310
Copy link
Contributor

Hello

I created the docker images for FSCrawler. I would be happy if you could merge them.

I also pushed them to Dockerhub, and I wrote how to use it.

@toto1310
Copy link
Contributor Author

Hmm, I will check above warnings.

@dadoonet
Copy link
Owner

That looks great! Thank you so much for working on that! ❤️

@dadoonet
Copy link
Owner

dadoonet commented Oct 1, 2019

That looks amazing. I saw that you modified the current distribution modules which are generating the ZIP files. Is that the way to do it or should we create instead new distribution modules specifically for docker images?

@toto1310
Copy link
Contributor Author

toto1310 commented Oct 1, 2019

Thank you for taking the time to see my changes and comment on me.

Is that the way to do it or should we create instead new distribution modules specifically for docker images?

I want to suggest that we do not need to create new modules because docker is one of the ways to distribute to someone such as zip or tarball.
And, maybe, I can also create a new module. However, for example, if you delete the “es5” submodule or add the “es8” submodule, it will be a bit more work than it currently is.

@toto1310
Copy link
Contributor Author

toto1310 commented Oct 1, 2019

Excuse me while suggesting as above, please lend me some help to resolve warning messages from the LGTM analysis.

Is it possible to run the LGTM analysis with the "-Ddocker.skip" flag or the "-Ddocker.skip.build" flag in the same way as the "-DskipTests" flag to be able to build in the environments that docker is not installed in?

@dadoonet
Copy link
Owner

dadoonet commented Oct 1, 2019

I want to suggest that we do not need to create new modules because docker is one of the ways to distribute to someone such as zip or tarball.

I see. Actually the Docker image is not uploaded to Maven central but to Docker repository. Makes sense.

And, maybe, I can also create a new module. However, for example, if you delete the “es5” submodule or add the “es8” submodule, it will be a bit more work than it currently is.

Right.

Thanks

@dadoonet
Copy link
Owner

dadoonet commented Oct 1, 2019

Is it possible to run the LGTM analysis with the "-Ddocker.skip" flag or the "-Ddocker.skip.build" flag in the same way as the "-DskipTests" flag to be able to build in the environments that docker is not installed in?

based on https://lgtm.com/help/lgtm/java-extraction#exporting-variable-customizing-maven I'm wondering if you could change the lgtm.yml file and do something like:

extraction:
  python:
    python_setup:
      version: 3
      setup_py: false
  java:
    before_index:
      export DOCKER_SKIP=true

and then change the distribution/pom.xml and add the following properties:

<env.DOCKER_SKIP>false</env.DOCKER_SKIP>
<docker.skip>${env.DOCKER_SKIP}</docker.skip>

Not sure if that would work... I guess you need to try.

@toto1310
Copy link
Contributor Author

toto1310 commented Oct 1, 2019

Thank you very match ❗ I will try it soon.

@dadoonet dadoonet added the new For new features or options label Oct 1, 2019
@dadoonet dadoonet added this to the 2.7 milestone Oct 1, 2019
@dadoonet
Copy link
Owner

dadoonet commented Oct 1, 2019

Another comment is that it would be great to add some documentation about what you did.
The documentation you wrote in DockerHub is amazing and would be a great fit as well for our documentation.

@toto1310
Copy link
Contributor Author

toto1310 commented Oct 1, 2019

I am very happy and honored to say that.

I will add docs too.

By the way, could I ask you on which page should I add the new docs? "tips" or a new page?

@dadoonet
Copy link
Owner

dadoonet commented Oct 1, 2019

I wonder if we should add a new section in https://fscrawler.readthedocs.io/en/latest/user/getting_started.html

Which is this file: https://github.com/dadoonet/fscrawler/blob/master/docs/source/user/getting_started.rst

Eventually I'll reorganize the documentation at some point.

@toto1310
Copy link
Contributor Author

toto1310 commented Oct 1, 2019

Thank you for detail. I understand.
I will try to add a new section in "getting_started".

@toto1310
Copy link
Contributor Author

toto1310 commented Oct 7, 2019

I have done the work to update.
Could you check it?

@dadoonet
Copy link
Owner

dadoonet commented Oct 7, 2019

I started but I'm traveling atm and the airport wifi did not let me enough bandwidth to test it 😉

@dadoonet
Copy link
Owner

dadoonet commented Jan 7, 2020

Thanks for adding the documentation! Could you rebase on master or merge master in your branch?
I'm wondering now what should I do on my side to automatically publish the SNAPSHOTs and the released version on Dockerhub to make sure it's available on the long run.

Could you guide me on that? (Total n00b on Dockerhub 🤣)

@toto1310
Copy link
Contributor Author

Could you rebase on master or merge master in your branch?

Today, I did the above.

I'm wondering now what should I do on my side to automatically publish the SNAPSHOTs and the released version on Dockerhub to make sure it's available on the long run.

Could you guide me on that?

OK, Could you try it below?

Preparation

1. Create your Dockerhub Account.

2. Create a repository named dadoonet/fscrawler.

3. Pass the Docker ID and and password to Maven.

Execution

1. Build jar files and docker images as usual.

For example,

$ mvn package 

or

$ mvn install

2. Push to Dockerhub.

For example,

$ mvn -f distribution/pom.xml docker:push 

Note

When you have executed it above, you get docker images tagged below.

  • For Elasticsearch 7
    • 2.7-SNAPSHOT-es7-nolang
    • 2.7-SNAPSHOT-es7-eng
    • 2.7-SNAPSHOT-es7-fra
    • 2.7-SNAPSHOT-es7-jpn
  • For Elasticsearch 6
    • 2.7-SNAPSHOT-es6-nolang
    • 2.7-SNAPSHOT-es6-eng
    • 2.7-SNAPSHOT-es6-fra
    • 2.7-SNAPSHOT-es6-jpn
  • Alias for 2.7-SNAPSHOT-es7-eng
    • latest
    • 2.7-SNAPSHOT

This version number (2.7-SNAPSHOT) is taken from project.version property in pom.xml.

@dadoonet
Copy link
Owner

Amazing @toto1310. It works very well. See https://hub.docker.com/r/dadoonet/fscrawler

I'm trying to add some documentation and make all that working from TravisCI as well so we can have automatic publication of new snapshots.

Stay tuned!

Thank you so much for your help so far.

@dadoonet
Copy link
Owner

Could you tell me how to run it?
I suppose it's something like:

docker run dadoonet/fscrawler fscrawler foo

Where I'm supposed to put the config file and how to mount the data dir?

@toto1310
Copy link
Contributor Author

Thank you for your work! I'm looking forward to it!!

Could you tell me how to run it?

For example, you can run FSCrawler that read its configuration files from /root/.fscrawler(i.e. --config_dir) and its target files from /tmp/es(i.e. fs.url).

$ docker run -it --rm -v ${PWD}/config:/root/.fscrawler -v ${PWD}/data:/tmp/es:ro dadoonet/fscrawler fscrawler job_name
  • -it option is used to send fscrawler's standard input.
    • this option makes you type Y or N.
  • -v /path/to/host/src/dir:/path/to/container/dst/dir option is used to bind the /path/to/container/dst/dir on /path/to/host/src/dir./path/to/container/dst/dir is in the host machine, and /path/to/container/dst/dir is in the docker container.
    • You may put the config dir and data-dir in your machine.
    • In this case, ${PWD}/config is the config dir that is in your machine, and /root/.fscrawler that is in the docker container is bound on it.
    • And also, ${PWD}/data and /tmp/es too.

For more detail, please refer to https://docs.docker.com/engine/reference/run/.

@dadoonet
Copy link
Owner

@toto1310 I don't know if you could help here building a new version of the Docker images updated to JDK15 or so.
Could you?

@dadoonet dadoonet mentioned this pull request Mar 5, 2021
@earzur earzur mentioned this pull request Mar 30, 2021
@dadoonet
Copy link
Owner

Closing this one with #1122

@dadoonet dadoonet closed this Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new For new features or options
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants