Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix]: fix share input bug #313

Merged
merged 9 commits into from
Nov 25, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/post_fix.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# -*- encoding:utf-8 -*-
import sys

lines = []
with open(sys.argv[1], 'r') as fin:
for line_str in fin:
lines.append(line_str)

with open(sys.argv[1], 'w') as fout:
for line_str in lines:
if '_static/searchtools.js' in line_str:
fout.write(
' <script type="text/javascript" src="_static/language_data.js"></script>\n'
)
fout.write(line_str)
47 changes: 42 additions & 5 deletions docs/source/feature/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,53 @@ input_fields字段:

### input_type:

目前支持一下几种input_type:
目前支持一下几种[input_type](../proto.html#protos.DatasetConfig.InputType):

- CSVInput,表示数据格式是CSV,注意要配合separator使用

- 需要指定train_input_path和eval_input_path

```protobuf
train_input_path: "data/test/dwd_avazu_ctr_train.csv"
eval_input_path: "data/test/dwd_avazu_ctr_test.csv"
```

- OdpsInputV2,如果在MaxCompute上运行EasyRec, 则使用OdpsInputV2
- OdpsInputV3, 如果在本地或者EMR上访问MaxCompute Table, 则使用OdpsInputV3

- 需要指定train_input_path和eval_input_path
- 可以通过pai命令传入, [参考](../train.md#on-pai)

- OdpsInputV3, 如果在本地或者[DataScience](https://help.aliyun.com/document_detail/170836.html)上访问MaxCompute Table, 则使用OdpsInputV3

- HiveInput和HiveParquetInput, 在Hadoop集群上访问Hive表

- 需要配置hive_train_input和hive_eval_input
- 参考[HiveConfig](../proto.html#protos.HiveConfig)

```protobuf
hive_train_input {
host: "192.168.1"
username: "admin"
table_name: "census_income_train_simple"
}
hive_eval_input {
host: "192.168.1"
username: "admin"
table_name: "census_income_eval_simple"
}
```

- 如果需要使用RTP FG, 那么:
- 在EMR或者本地运行EasyRec,应使用RTPInput;

- 在EMR或者本地运行EasyRec,应使用RTPInput或者HiveRTPInput;
- 在Odps上运行,则应使用OdpsRTPInput
- KafkaInput & DatahubInput
- 实时训练需要用到的input类型

- KafkaInput & DatahubInput: [实时训练](../online_train.md)需要用到的input类型

- KafkaInput需要配置kafka_train_input 和 kafka_eval_input
- 参考[KafkaServer](../proto.html#protos.KafkaServer)
- DatahubServer需要配置datahub_train_input 和 datahub_eval_input
- 参考[DataHubServer](../proto.html#protos.DatahubServer)

### separator:

Expand Down
Loading