-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
433 additions
and
68 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
chcp 65001 | ||
java -jar out\artifacts\DumpMoegirl\DumpMoegirl.jar -o out\moegirl-debug -p 6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Dict Trick | ||
|
||
这是一组方便处理词库的工具 | ||
|
||
目前已经完成初步整理的有: | ||
|
||
## Clean | ||
|
||
用于过滤词库中的废词,完成简繁转换 | ||
|
||
|
||
|
||
## DumpMoeGirl | ||
|
||
用户dump萌娘百科的词库,并调用Clean工具完成处理 | ||
|
||
|
||
|
||
## 使用方法 | ||
|
||
1. 下载或者build jar文件 | ||
2. 下载opencc (由于java具有跨平台性,而opencc本身就是跨平台的,理论上Linux也可以使用这个工具) | ||
3. 根据需求编辑opencc的简繁转换配置文件 | ||
4. 根据需求编辑废词文件 | ||
5. 根据需求编辑配置文件,仓库中的`config.txt`是一个示例,已经备注了使用的参数(配置文件可以是任何名称) | ||
6. 使用命令 `java -jar DumpMoegirl.jar -c config.txt` 来调用配置文件完成爬虫任务 | ||
使用命令 `java -jar Clean.jar -c config.txt` 来调用配置文件完成纯文本词条过滤任务 | ||
7. 当然也可以不使用配置文件,直接在命令行内输入所需参数 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.