-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion - stopwords #212
Comments
Hi Leonardo, thank you for opening this issue. I agree with you, it's quite annoying that stopwords are downloaded even when they are not needed. This should have been fixed in #194. I will soon release a new version that includes the patch. Regarding your other questions:
Hope it helps! |
Hi Leonardo, I just released a new version (Texthero 1.1.0); now stopwords should be downloaded lazily. Would you mind try it and let me know? Later on, we can discuss your other great points further! |
Hello @jbesomi , sorry for my late answer. Yes, I would like to help. But, I'm not sure how to support multi-lingual stopwords.. But add multi-lingual embeddings could improve, and slowly the code. This is tough.. heheh Removal of spacy stopwords requirements. I'm going to take a look and send a message here. |
Thanks for the update Leo. As you suggested, we can start by improving the stopwords (for English) and see how it goes. Multilingual support requires some thinking and refactoring, we can discuss that later on once the simpler version is implemented. |
I liked the
texthero
, and I want to contribute in somehow.First, I want to discuss something that boring me - stopwords..
Problem - I want to deploy a solution without the
spacy
stopwords requirements, and, possible, add my own stopwords.My solution is based on Docker containers, is a bad practice download files every time that a new containers is instanced, causing a cold start problem, also using unnecessary space (because I don't use them).
In this sense,
spacy
stopwords requirements?spacy
?The text was updated successfully, but these errors were encountered: