Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic Korean support #129

Merged
merged 1 commit into from
Oct 28, 2019
Merged

Conversation

kimbyungnam
Copy link
Contributor

by using kkma in konlpy library, we add Korean sentence and word tokenizer. it requires jdk>=7

@miso-belica
Copy link
Owner

Hi @kimbyungnam, thanks for the PR. I guess this one is not easy. I usually C&P the stopwords into Google translator to check them, but here it gives me something like a weird story, in the end:/ Not sure how to check these stopwords.

Also, please rebase the branch and add Korean extra dependencies into setup.py file and update TravisCI file accordingly, otherwise the tests will fail.

@kimbyungnam
Copy link
Contributor Author

In Korean there are postpositional particle, which don`t have meaning. maybe it makes you feel weird when you check via Google translator. so i write down the references. if it is not enough i will find another way to check the stopwords. thank you!!

Copy link
Owner

@miso-belica miso-belica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ➕

@miso-belica miso-belica merged commit fdada8c into miso-belica:master Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants