-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Pin nltk version for sentence tokenizer #8786
Conversation
Pull Request Test Coverage Report for Build 13075527333Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sorry... I think that NLTK is pretty stable in general, compared to other libraries we usually pin to a specific version (e.g. Can we avoid pinning a specific version? |
@anakin87 yeah, I think it should be OK - I will make this change |
Related Issues
DocumentSplitter
with nltk==3.8.1 or lower versions raises the below errornltk.find:No such file or directory: '/root/nltk_data/tokenizers/punkt/PY3_tab'
Related issue in nltk: nltk.find:No such file or directory: '/root/nltk_data/tokenizers/punkt/PY3_tab' nltk/nltk#3305Proposed Changes:
Pin
nltk==3.9.1
forSentenceTokenizer
andDocumentSplitter
.This PR is a suggestion, open to discussion.
How did you test it?
Ran the tests
Tested an example.
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
and added!
in case the PR includes breaking changes.