Skip to content

Commit

Permalink
Update README to describe zstd format (#88)
Browse files Browse the repository at this point in the history
* Update README to describe zstd format

* update

* update

* Update README-ja.md
  • Loading branch information
vbkaisetsu authored Mar 7, 2023
1 parent 60ba89e commit 6afa5ed
Show file tree
Hide file tree
Showing 4 changed files with 55 additions and 10 deletions.
13 changes: 13 additions & 0 deletions README-ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,19 @@ Vaporetto はトークン化モデルを生成するための方法を3つ用意
ヴェネツィア は イタリア に あり ます 。
```

##### Vaporetto APIs を使用する際の注意点

配布モデルは zstd 形式で圧縮されています。
*vaporetto* APIでこれらの圧縮済みモデルを読み込むには、APIの外側で展開する必要があります。

```rust
// zstd クレートまたは ruzstd クレートが必要
let reader = zstd::Decoder::new(File::open("path/to/model.bin.zst")?)?;
let model = Model::read(reader)?;
```

最近のLinuxディストリビューションに同梱されている *unzstd* コマンドを利用して展開することもできます。

#### KyTea のモデルを変換する

2つ目の方法も単純で、 KyTea で学習されたモデルを変換することです。
Expand Down
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,21 @@ The following will be output:
ヴェネツィア は イタリア に あり ます 。
```

##### Notes for Vaporetto APIs

The distribution models are compressed in the zstd format.
If you want to load these compressed models with the *vaporetto* API,
you must decompress them outside of the API.

```rust
// Requires zstd crate or ruzstd crate
let reader = zstd::Decoder::new(File::open("path/to/model.bin.zst")?)?;
let model = Model::read(reader)?;
```

You can also decompress the file using the *unzstd* command, which is bundled with modern Linux
distributions.

#### Convert KyTea's Model

The second is also a simple way, which is to convert a model trained by KyTea.
Expand Down
24 changes: 19 additions & 5 deletions vaporetto/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ use std::fs::File;

use vaporetto::{Model, Predictor, Sentence};

let f = File::open("../resources/model.bin").unwrap();
let model = Model::read(f).unwrap();
let predictor = Predictor::new(model, true).unwrap();
let f = File::open("../resources/model.bin")?;
let model = Model::read(f)?;
let predictor = Predictor::new(model, true)?;

let mut buf = String::new();

let mut s = Sentence::default();

s.update_raw("まぁ社長は火星猫だ").unwrap();
s.update_raw("まぁ社長は火星猫だ")?;
predictor.predict(&mut s);
s.fill_tags();
s.write_tokenized_text(&mut buf);
Expand All @@ -26,7 +26,7 @@ assert_eq!(
buf,
);

s.update_raw("まぁ良いだろう").unwrap();
s.update_raw("まぁ良いだろう")?;
predictor.predict(&mut s);
s.fill_tags();
s.write_tokenized_text(&mut buf);
Expand All @@ -53,6 +53,20 @@ The following features are enabled by default:
* `tag-prediction` - Enables tag prediction.
* `charwise-pma` - Uses the [Charwise Daachorse](https://docs.rs/daachorse/latest/daachorse/charwise/index.html) instead of the standard version for faster prediction, although it can make to load a model file slower.

## Notes for distributed models

The distributed models are compressed in the zstd format.
If you want to load these compressed models, you must decompress them outside of the API.

```rust
// Requires zstd crate or ruzstd crate
let reader = zstd::Decoder::new(File::open("path/to/model.bin.zst")?)?;
let model = Model::read(reader)?;
```

You can also decompress the file using the *unzstd* command, which is bundled with modern Linux
distributions.

## License

Licensed under either of
Expand Down
13 changes: 8 additions & 5 deletions vaporetto/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,20 @@
## Examples
```
# fn main() -> Result<(), Box<dyn std::error::Error>> {
use std::fs::File;
use vaporetto::{Model, Predictor, Sentence};
let f = File::open(\"../resources/model.bin\").unwrap();
let model = Model::read(f).unwrap();
let predictor = Predictor::new(model, true).unwrap();
let f = File::open(\"../resources/model.bin\")?;
let model = Model::read(f)?;
let predictor = Predictor::new(model, true)?;
let mut buf = String::new();
let mut s = Sentence::default();
s.update_raw(\"まぁ社長は火星猫だ\").unwrap();
s.update_raw(\"まぁ社長は火星猫だ\")?;
predictor.predict(&mut s);
s.fill_tags();
s.write_tokenized_text(&mut buf);
Expand All @@ -30,14 +31,16 @@ assert_eq!(
buf,
);
s.update_raw(\"まぁ良いだろう\").unwrap();
s.update_raw(\"まぁ良いだろう\")?;
predictor.predict(&mut s);
s.fill_tags();
s.write_tokenized_text(&mut buf);
assert_eq!(
\"まぁ/副詞/マー 良い/形容詞/ヨイ だろう/助動詞/ダロー\",
buf,
);
# Ok(())
# }
```
"
)]
Expand Down

0 comments on commit 6afa5ed

Please sign in to comment.