-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish generating chinese poetry #439
Conversation
@@ -1,12 +1,6 @@ | |||
<<<<<<< HEAD | |||
<s> | |||
<e> | |||
<unk> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要把字典放在github上面,这个字典可以通过脚本来自动构建。
# 中国古诗生成 | ||
|
||
## 简介 | ||
基于编码器-解码器(encoder-decoder)神经网络模型,利用全唐诗进行诗句-诗句(sequence to sequence)训练,实现给定诗句后,生成下一诗句。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里用一两句话描述一下默认的网络结构信息,例如默认几层LSTM encoder/decoder,是否带attention。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经更新README,增加了简要描述
generate_chinese_poetry/README.md
Outdated
python preprocess.py --datadir data/raw --outfile data/poems.txt --dictfile data/dict.txt | ||
``` | ||
|
||
上述脚本执行完后将生成处理好的训练数据poems.txt和数据字典dict.txt。poems.txt中每行为一首唐诗的信息,分为三列,分别为题目、作者、诗内容。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 数据字典 --> 字典。
- 默认情况下,字典如何构建?分词/分字?字频率统计,默认截断频率是多少,提供一些基本的信息。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经更新README,增加了字典构建的描述
generate_chinese_poetry/README.md
Outdated
``` | ||
|
||
上述脚本执行完后将生成处理好的训练数据poems.txt和数据字典dict.txt。poems.txt中每行为一首唐诗的信息,分为三列,分别为题目、作者、诗内容。 | ||
在诗内容中,诗句之间用`.`分隔。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"." 分隔之后,训练数据的构造策略是什么?谁是源谁是目标?请解释一下数据策略。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经更新README,增加了数据构建的简要描述
generate_chinese_poetry/README.md
Outdated
[required] | ||
--use_gpu TEXT Whether to use GPU in generation. | ||
--help Show this message and exit. | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
104 ~ 115 行删去。原因同上。
-
脚本
generate.py
的详细命令行参数请通过执行python generate.py --help
进行查阅。这里对重要参数进行说明。(后面如果需要说明请使用中文。)
generate_chinese_poetry/README.md
Outdated
--init_model_path TEXT The path of a trained model used to initialized all | ||
the model parameters. | ||
--help Show this message and exit. | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 48 ~ 64 行删去。其它例子后面会考虑各自进行修改。
- 这个命令行参数只是直接复制粘贴了
python train.py --help
的运行结果,并没有提供比这个更多的信息,如果需要,用户可以自行执行脚本查看。只需要在README中提醒用户查看即可。 - 直接复制粘贴也会让代码修改情况下,这里需要同步,增加工作量。
- 这个命令行参数只是直接复制粘贴了
generate_chinese_poetry/README.md
Outdated
- `use_gpu`: 是否使用GPU | ||
|
||
### 执行生成 | ||
例如将诗句 `白日依山盡,黃河入海流` 保存在文件 `input.txt` 中作为预测下句诗的输入,执行命令: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要这样构造源和目标。
源:“白日依山尽” --> 目标:"黄河如海流"。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改构建方法,并重新训练了模型,根据模型训练效果调整了默认训练参数,更新了例子
paragraphs = filter(lambda x: len(x), paragraphs) | ||
if len(paragraphs) > 1: | ||
dataset.append((title, author, paragraphs)) | ||
print("Finished...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的注释删掉。如果要保留,请print有意义的信息,例如:什么 finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删
dataset.append((title, author, paragraphs)) | ||
print("Finished...") | ||
|
||
print("Constructing vocabularies...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constructing --> Construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改
author = data[1] | ||
paragraphs = ".".join(data[2]) | ||
f.write("\t".join((title, author, paragraphs)) + "\n") | ||
print("Finished...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的注释删掉。如果要保留,请print有意义的信息,例如:什么 finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删
with io.open(dictfile, "w", encoding="utf8") as f: | ||
for v in vocab: | ||
f.write(v + "\n") | ||
print("Finished...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的注释删掉。如果要保留,请print有意义的信息,例如:什么 finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删
f.write(v + "\n") | ||
print("Finished...") | ||
|
||
print("Writing processed data...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing --> Write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Resolve #334