-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace deprecated 'punkt' with 'punkt_tab' #83
base: main
Are you sure you want to change the base?
Conversation
The maintainers suggest upgrading NLTK to at least version 3.9.1 |
if someone doesn't upgrade their version of nltk, their pinecone will break. |
Any update on this? A library like Pinecone should avoid introducing security risks, especially considering the issue mentioned here: nltk/nltk#3266 (comment). It would be best to fully migrate to punkt_tab and enforce a minimum NLTK version to prevent breakage and vulnerabilities. |
Just ran into this error on production. Is there a reason why the fix is not in yet? Any workarounds while this is getting delayed? |
Problem
The NLTK package
punkt
has been deprecated, resulting in an error when calling a BM25TokenizerSolution
Replace
punkt
with the newpunkt_tab
Type of Change
This might be a breaking change. NLTK 3.8.1 and lower use
punkt
whereas NLTK 3.8.2 and above will usepunkt_tab
. Thepyproject.toml
file referencesnltk = "^3.6.5"
, meaning it will install NLTK 3.8.2 if possible, thus breaking. Introducing this breaking change on a patch version is something that the NLTK maintainers not should have done, but alas.Another fix would be to freeze the NLTK version.
Test Plan
I tried it locally and it fixed my issue.