GitHub - SamuraiT/tinysegmenter: :japanese_goblin: tokenizer specified for Japanese

TinySegmenter

tinysegmenter for python2.x was written by Masato Hagiwara. for his information see here

This tinysegmenter is modified for python3.x and python2.x for distribution by Tatsuro Yasukawa. Additionaly, this tinysegmenter is modified for being more faster - thanks to @chezou, @cocoatomo and @methane.

See info about tinysegmenter

Installation

pip install tinysegmenter3

Usage

import tinysegmenter
statement = '私はpython大好きStanding Engineerです．'
tokenized_statement = tinysegmenter.tokenize(statement)
print(tokenized_statement)
# ['私', 'は', 'python', '大好き', 'Standing', ' Engineer', 'です', '．']

Test Text

The test text (in the tests directory) was The Time Machine by H.G. Wells, translated to Japanese by Hiroo Yamagata under the CC BY-SA 2.0 License.

How to run Test

Install requirements from requirements.txt by

pip install -r requirements.txt

then run this:

./runtests.sh

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
dist		dist
tests		tests
tinysegmenter		tinysegmenter
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MANIFEST		MANIFEST
README.md		README.md
circle.yml		circle.yml
requirements.txt		requirements.txt
runtests.py		runtests.py
runtests.sh		runtests.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinySegmenter

Installation

Usage

Test Text

How to run Test

About

Releases

Packages

Contributors 3

Languages

License

SamuraiT/tinysegmenter

Folders and files

Latest commit

History

Repository files navigation

TinySegmenter

Installation

Usage

Test Text

How to run Test

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages