Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Include MIT-licensed data from Stopwords ISO #1651

Closed
wants to merge 3 commits into from
Closed

RFC: Include MIT-licensed data from Stopwords ISO #1651

wants to merge 3 commits into from

Conversation

adamreichold
Copy link
Collaborator

Adds pre-defined stop word filters for three common European languages based on the MIT-licensed data from Stopwords ISO. This is put behind a non-default feature to avoid bloating the binary for users who are not interested in this while still providing batteries-included usability for those who are.

Adds pre-defined stop word filters for three common European languages based on
the MIT-licensed data from Stopwords ISO [1]. This is put behind a non-default
feature to avoid bloating the binary for users who are not interested in this
while still providing batteries-included usability for those who are.

[1] https://github.com/stopwords-iso
@fulmicoton
Copy link
Collaborator

This list of stopwords seems way too long for search.

@adamreichold
Copy link
Collaborator Author

This list of stopwords seems way too long for search.

Alright. Do you know a more reasonable source with compatible licensing terms?

Otherwise, I'd propose that I just break out the first two general improvements to StopWordFilter into a new PR and drop this one?

@adamreichold adamreichold closed this by deleting the head repository Nov 1, 2022
@fulmicoton
Copy link
Collaborator

We can probably steal something from the Lucene world?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants