Documentation: https://rhoknp.readthedocs.io/en/latest/
Source Code: https://github.com/ku-nlp/rhoknp
rhoknp is a Python binding for Juman++, KNP, and KWJA.1
import rhoknp
# Perform morphological analysis by Juman++
jumanpp = rhoknp.Jumanpp()
sentence = jumanpp.apply_to_sentence(
"電気抵抗率は電気の通しにくさを表す物性値である。"
)
# Access to the result
for morpheme in sentence.morphemes: # a.k.a. keitai-so
...
# Save the result
with open("result.jumanpp", "wt") as f:
f.write(sentence.to_jumanpp())
# Load the result
with open("result.jumanpp", "rt") as f:
sentence = rhoknp.Sentence.from_jumanpp(f.read())
pip install rhoknp
Let's begin by using Juman++ with rhoknp. Here, we present a simple example demonstrating how Juman++ can be used to analyze a sentence.
# Perform morphological analysis by Juman++
jumanpp = rhoknp.Jumanpp()
sentence = jumanpp.apply_to_sentence("電気抵抗率は電気の通しにくさを表す物性値である。")
You can easily access the individual morphemes that make up the sentence.
for morpheme in sentence.morphemes: # a.k.a. keitai-so
...
Sentence objects can be saved in the JUMAN format.
# Save the sentence in the JUMAN format
with open("sentence.jumanpp", "wt") as f:
f.write(sentence.to_jumanpp())
# Load the sentence
with open("sentence.jumanpp", "rt") as f:
sentence = rhoknp.Sentence.from_jumanpp(f.read())
Almost the same APIs are available for KNP.
# Perform language analysis by KNP
knp = rhoknp.KNP()
sentence = knp.apply_to_sentence("電気抵抗率は電気の通しにくさを表す物性値である。")
KNP performs language analysis at multiple levels.
for clause in sentence.clauses: # a.k.a., setsu
...
for phrase in sentence.phrases: # a.k.a. bunsetsu
...
for base_phrase in sentence.base_phrases: # a.k.a. kihon-ku
...
for morpheme in sentence.morphemes: # a.k.a. keitai-so
...
Sentence objects can be saved in the KNP format.
# Save the sentence in the KNP format
with open("sentence.knp", "wt") as f:
f.write(sentence.to_knp())
# Load the sentence
with open("sentence.knp", "rt") as f:
sentence = rhoknp.Sentence.from_knp(f.read())
Furthermore, rhoknp provides convenient APIs for document-level language analysis.
document = rhoknp.Document.from_raw_text(
"電気抵抗率は電気の通しにくさを表す物性値である。単に抵抗率とも呼ばれる。"
)
# If you know sentence boundaries, you can use `Document.from_sentences` instead.
document = rhoknp.Document.from_sentences(
[
"電気抵抗率は電気の通しにくさを表す物性値である。",
"単に抵抗率とも呼ばれる。",
]
)
Document objects can be handled in a similar manner as Sentence objects.
# Perform morphological analysis by Juman++
document = jumanpp.apply_to_document(document)
# Access language units in the document
for sentence in document.sentences:
...
for morpheme in document.morphemes:
...
# Save language analysis by Juman++
with open("document.jumanpp", "wt") as f:
f.write(document.to_jumanpp())
# Load language analysis by Juman++
with open("document.jumanpp", "rt") as f:
document = rhoknp.Document.from_jumanpp(f.read())
For more information, please refer to the examples and documentation.
Main differences from pyknp
pyknp serves as the official Python binding for Juman++ and KNP. In the development of rhoknp, we redesigned the API, considering the current use cases of pyknp. The key differences between the two are as follows:
- Support for document-level language analysis: rhoknp allows you to load and instantiate the results of document-level language analysis, including cohesion analysis and discourse relation analysis.
- Strict type-awareness: rhoknp has been thoroughly annotated with type annotations, ensuring strict type checking and improved code clarity.
- Comprehensive test suite: rhoknp is extensively tested with a comprehensive test suite. You can view the code coverage report on Codecov.
MIT
We warmly welcome contributions to rhoknp. You can get started by reading the contribution guide.
Footnotes
-
The logo was generated by OpenAI DALL·E 2. ↩