Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker available for TransformsReader Class #180

Merged
merged 3 commits into from
Jul 7, 2020
Merged

Docker available for TransformsReader Class #180

merged 3 commits into from
Jul 7, 2020

Conversation

guillim
Copy link
Contributor

@guillim guillim commented Jun 30, 2020

I worked on making Docker available for TransformsReader Class as FARMReader Class is already implemented.

I had to make a few changes in search.py, in order to add some env variable enabling to choose between the two Classes to instantiate the Reader.
In config.py, i created three Variables:

  • READER_TOKENIZER,
  • READER_USE_TRANSFORMERS
  • and GPU_NUMBER.
    They simply follow the Class requirements, nothing fancy.

I created another conf file called docker-compose-custom.yml reflecting how you can use TransformerReader by changing some env variables.

A couple of small unrelated changes i did :

  • Dockerfile : Better to copy the /data folder as well, since the container needs it when using InMemoryDocumentStore to load the data. (espcially useful if you have downloaded your data and you run the ipyhton command to make this happen)
  • transformers.py : I am not 100% sure on this one, but I think there is a typo for use_gpu values indications

…MReader Class is already implemented.

So I had to make a few changes in search.py, in order to add the env 
variable to choose between the two Classes to instanciate the Reader 
through docker-compose. in config.py, i created three Variables: 
READER_TOKENIZER,
READER_USE_TRANSFORMERS and GPU_NUMBER. They smiply follow the Class 
requirements


I created another conf file called docker-compose-custom.yml that 
reflect how you can use TransformerReader simply changing some env 
varibles.

 Small things:
- Dockerfile :  Better to copy the /data folder as well, since the 
container needs it when using InMemoryDocumentStore to load the data. 
(espcially useful if you have downloaded your data and you run the 
ipyhton command to make this happen)

transformers.py : I am not 100% sure on this one, but I think there is a 
typo for use_gpu values indications
@tholor tholor self-requested a review July 1, 2020 07:01
@tholor tholor added type:feature New feature or request topic:speed labels Jul 1, 2020
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @guillim ,

Thanks for this great addition! Very helpful.

Two comments:

  1. The fix in the doc string in transformers.py is absolutely right. Thx!
  2. Why do you suggest a new docker-compose file? I could imagine that it's more user friendly to have a single one here but to add more comments in the "environment" section to clarify the usage of a TransformersReader. What do you think?

@guillim
Copy link
Contributor Author

guillim commented Jul 2, 2020

Hello,

Answering point 2. here :
Creating another docker-compose file is a way to ease people who want to build from Dockerfile, with custom ElasticSearch Store. But it is absolutely deletable if you think it confuses people. I am going to delete it so that you can merge the rest.

guillim and others added 2 commits July 2, 2020 12:54
@tholor
Copy link
Member

tholor commented Jul 7, 2020

Hey @guillim ,

Sorry for the delay here. I did some testing and added some slight changes to your branch. I hope that's okay for you, if not feel free to rollback and I create a new PR with my own branch.

Suggested Changes:

  • default of READER_USE_TRANSFORMERS should be False (Might otherwise break existing deployments using a FARMReader)
  • merged your docker-compose-custom into the existing docker-compose (comments for TransformersReader + build + plain elasticsearch)
  • made COPY of "data" folder optional in the Dockerfile as this is not present in the normal haystack repo and therefore breaks when building in environments without it

What do you think? If that's all good with you, we'll be ready to merge this.

@guillim
Copy link
Contributor Author

guillim commented Jul 7, 2020

Yeah that's perfect ! You're actually write, if we can make everything "backward-compatible" that's better. totally agreed

@tholor tholor merged commit 8a616da into deepset-ai:master Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:speed type:feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants