Skip to content

Commit

Permalink
[doc] modify project name to pai_online_project in benchmark page #390
Browse files Browse the repository at this point in the history
eval page: add parameter num_thresholds in metric auc 。
  • Loading branch information
poson authored Jun 26, 2023
1 parent 10668aa commit b62abb3
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 7 deletions.
8 changes: 4 additions & 4 deletions docs/source/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@

- 数据集采集自手机淘宝移动客户端的推荐系统日志,其中包含点击和与之关联的转化数据。[天池比赛链接](https://tianchi.aliyun.com/dataset/dataDetail?dataId=408)

- 训练数据表:pai_rec_dev.AliCCP_sample_train_data_processed
- 训练数据表:pai_online_project.aliccp_sample_train_kv_split_score

- 测试数据表:pai_rec_dev.AliCCP_sample_test_data_processeds
- 测试数据表:pai_online_project.aliccp_sample_test_kv_split_score_1000w (只截取了1000万条)

- 在PAI上面测试使用的资源包括2个parameter server,9个worker,其中一个worker做评估:

Expand All @@ -70,5 +70,5 @@
### CENSUS

- CENSUS有48842个样本数据,每个样本14个属性,包括age, occupation, education, income等。样本的标注值为收入水平,例如>50K、\<=50K。[Census Income数据集链接](https://archive.ics.uci.edu/ml/datasets/census+income)
- 训练数据表:pai_rec_dev.census_income_train
- 测试数据表:pai_rec_dev.census_income_test
- 训练数据表:pai_online_project.census_income_train
- 测试数据表:pai_online_project.census_income_test
6 changes: 6 additions & 0 deletions docs/source/eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ eval_config {
}
}
```
当转化率很低(万分之3左右)的时候,可以在auc中再设置一个参数num_thresholds:
```sql
auc {
num_thresholds: 10000
}
```

- metrics_set: 配置评估指标,可以配置多个,如:

Expand Down
6 changes: 3 additions & 3 deletions docs/source/train.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## train_config

- log_step_count_steps: 200 # 每200轮打印一行log
- log_step_count_steps: 200 # 每200步打印一行log

- optimizer_config # 优化器相关的参数

Expand Down Expand Up @@ -62,11 +62,11 @@
print(key)
```

- save_checkpoints_steps: 每隔多少轮保存一次checkpoint, 默认是1000
- save_checkpoints_steps: 每隔多少步保存一次checkpoint, 默认是1000。当训练数据量很大的时候,这个值要设置大一些

- save_checkpoints_secs: 每隔多少s保存一次checkpoint, 不可以和save_checkpoints_steps同时指定

- keep_checkpoint_max: 最多保存多少个checkpoint, 默认是10
- keep_checkpoint_max: 最多保存多少个checkpoint, 默认是10。当模型较大的时候可以设置为5,可节约存储

- log_step_count_steps: 每隔多少轮,打印一次训练信息,默认是10

Expand Down

0 comments on commit b62abb3

Please sign in to comment.