Skip to content

dimancheite/KhmerWordSegmentation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KhmerWordSegmentation(NLP)

Problem

Unlike other languages, Khmer Word Segmentation is way more complected. Because the Khmer language does not have any standard rule on how we are using space to separate between each word(space are used for easier reading). Moreover, Khmer word can have different meaning with the order of words when it will form. Khmer word could also be a join of two or more Khmer words together.

Because of uncertain rule of spacing and the complicated structure above, which it is hard to segment Khmer Word.

Why we build it?

Ref:

Plan

1.Build web site for:

  • word segmentations: user to input string of sentences and submit then it response with list of words in those sentences.
  • words checking: user submit sentences then it response with sentences and some suggestion word
  • words contribution: allow user input Khmer words with it function(noun, verb,...) then we use it to train our model

About

Separate Khmer words from given sentences.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 54.3%
  • Jupyter Notebook 45.7%