Skip to content

Commit

Permalink
docs: fix udf, rest api, select into and download links (#2304)
Browse files Browse the repository at this point in the history
  • Loading branch information
vagetablechicken authored Aug 9, 2022
1 parent ebeb62a commit 2150992
Show file tree
Hide file tree
Showing 16 changed files with 201 additions and 29 deletions.
153 changes: 153 additions & 0 deletions docs/en/quickstart/rest_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,157 @@ The response:
"data":[["aaa",11,22]]
}
}
```

## Query

The request URL: http://ip:port/dbs/{db_name}

HTTP method: POST

The request body example:

```json
{
"mode": "online",
"sql": "select 1"
}
```

mode: "offsync", "offasync", "online"

The response:

```json
{
"code":0,
"msg":"ok"
}
```

## Get Deployment Info


The request URL: http://ip:port/dbs/{db_name}/deployments/{deployment_name}

HTTP method: Get

The response:

```json
{
"code": 0,
"msg": "ok",
"data": {
"name": "",
"procedure": "",
"input_schema": [

],
"input_common_cols": [

],
"output_schema": [

],
"output_common_cols": [

],
"dbs": [

],
"tables": [

]
}
}
```


## List Database

The request URL: http://ip:port/dbs

HTTP method: Get

The response:

```json
{
"code": 0,
"msg": "ok",
"dbs": [

]
}
```

## List Table

The request URL: http://ip:port/dbs/{db}/tables

HTTP method: Get

The response:

```json
{
"code": 0,
"msg": "ok",
"tables": [
{
"name": "",
"table_partition_size": 8,
"tid": ,
"partition_num": 8,
"replica_num": 2,
"column_desc": [
{
"name": "",
"data_type": "",
"not_null": false
}
],
"column_key": [
{
"index_name": "",
"col_name": [

],
"ttl": {

}
}
],
"added_column_desc": [

],
"format_version": 1,
"db": "",
"partition_key": [

],
"schema_versions": [

]
}
]
}
```

## Refresh

The request URL: http://ip:port/refresh

HTTP method: POST

Empty request body.

The response:

```json
{
"code":0,
"msg":"ok"
}
```
Original file line number Diff line number Diff line change
Expand Up @@ -1571,7 +1571,7 @@ Example:


```cpp
SELECT if_null("hello", "default"), if_null(NULL, "default");
SELECT if_null("hello", "default"), if_null(cast(null as string), "default");
-- output ["hello", "default"]
```

Expand Down Expand Up @@ -2663,7 +2663,7 @@ Example:


```cpp
SELECT if_null("hello", "default"), if_null(NULL, "default");
SELECT if_null("hello", "default"), if_null(cast(null as string), "default");
-- output ["hello", "default"]
```

Expand Down
6 changes: 3 additions & 3 deletions docs/en/use_case/dolphinscheduler_task_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ In addition to the feature engineering done by OpenMLDB, the prediction also req
### Configuration
The demo can run on MacOS or Linux, or use the OpenMLDB docker image provided by us:
```
docker run -it 4pdosc/openmldb:0.5.1 bash
docker run -it 4pdosc/openmldb:0.5.3 bash
```


Expand All @@ -43,9 +43,9 @@ In the container, you can directly run the following command to start the OpenML
./init.sh
```

We will complete a workflow of importing data, offline training, and deploying the SQL and model online after successful training. For the online part of the model, you can use a simple predict server. See [predict server source](https://raw.githubusercontent.com/4paradigm/OpenMLDB/main/demo/talkingdata-adtracking-fraud-detection/predict_server.py). You can download it locally and run it in the background:
We will complete a workflow of importing data, offline training, and deploying the SQL and model online after successful training. For the online part of the model, you can use the simple predict server in `/work/talkingdata`. Run it in the background:
```
python3 predict_server.py --no-init > predict.log 2>&1 &
python3 /work/talkingdata/predict_server.py --no-init > predict.log 2>&1 &
```

Note that, DolphinScheduler has not officially released the updated version supporting OpenMLDB Task (only on the `dev` branch), so please download [dolphinscheduler-bin](https://github.com/4paradigm/OpenMLDB/releases/download/v0.5.1/apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz) that is prepared by us to have the DolphinScheduler version supporting OpenMLDB Task.
Expand Down
2 changes: 1 addition & 1 deletion docs/en/use_case/kafka_connector_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ For OpenMLDB Kafka Connector implementation, please refer to [extensions/kafka-c
This article will start the OpenMLDB in docker container, so there is no need to download the OpenMLDB separately. Moreover, Kafka and connector can be started in the same container. We recommend that you save the three downloaded packages to the same directory. Let's assume that the packages are in the `/work/kafka` directory.

```
docker run -it -v `pwd`:/work/kafka --name openmldb 4pdosc/openmldb:0.5.2 bash
docker run -it -v `pwd`:/work/kafka --name openmldb 4pdosc/openmldb:0.5.3 bash
```

### Steps
Expand Down
2 changes: 1 addition & 1 deletion docs/en/use_case/lightgbm_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Note that: (1) this case is based on the OpenMLDB cluster version for tutorial d
- Pull the OpenMLDB docker image and run the corresponding container:

```bash
docker run -it 4pdosc/openmldb:0.5.2 bash
docker run -it 4pdosc/openmldb:0.5.3 bash
```

The image is preinstalled with OpenMLDB and preset with all scripts, third-party libraries, open-source tools and training data required for this case.
Expand Down
8 changes: 4 additions & 4 deletions docs/en/use_case/pulsar_connector_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that, for the sake of simplicity, for this document, we use Pulsar Standalo

### Download

- You can download the entire demo package [here](https://github.com/vagetablechicken/pulsar-openmldb-connector-demo/releases/download/v0.2/files.tar.gz), which are needed by this demo, including the connector nar, schema files, and config files.
- You can download the entire demo package [here](https://openmldb.ai/download/pulsar-connector/files.tar.gz), which are needed by this demo, including the connector nar, schema files, and config files.

- If you would like to download the connector only, you can [download it here](https://github.com/4paradigm/OpenMLDB/releases/download/v0.4.4/pulsar-io-jdbc-openmldb-2.11.0-SNAPSHOT.nar) from the OpenMLDB release.

Expand All @@ -29,7 +29,7 @@ Only OpenMLDB cluster mode can be the sink dist, and only write to online storag

We recommend that you use ‘host network’ to run docker. And bind volume ‘files’ too. The sql scripts are in it.
```
docker run -dit --network host -v `pwd`/files:/work/taxi-trip/files --name openmldb 4pdosc/openmldb:0.5.2 bash
docker run -dit --network host -v `pwd`/files:/work/pulsar_files --name openmldb 4pdosc/openmldb:0.5.3 bash
docker exec -it openmldb bash
```
```{note}
Expand All @@ -49,7 +49,7 @@ desc connector_test;
```
Run the script:
```
../openmldb/bin/openmldb --zk_cluster=127.0.0.1:2181 --zk_root_path=/openmldb --role=sql_client < files/create.sql
/work/openmldb/bin/openmldb --zk_cluster=127.0.0.1:2181 --zk_root_path=/openmldb --role=sql_client < /work/pulsar_files/create.sql
```

![table desc](images/table.png)
Expand Down Expand Up @@ -209,6 +209,6 @@ select *, string(timestamp(pickup_datetime)), string(timestamp(dropoff_datetime)
```
In OpenMLDB container, run:
```
../openmldb/bin/openmldb --zk_cluster=127.0.0.1:2181 --zk_root_path=/openmldb --role=sql_client < files/select.sql
/work/openmldb/bin/openmldb --zk_cluster=127.0.0.1:2181 --zk_root_path=/openmldb --role=sql_client < /work/pulsar_files/select.sql
```
![openmldb result](images/openmldb_result.png)
2 changes: 1 addition & 1 deletion docs/zh/quickstart/openmldb_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Docker engine版本需求 >= 18.03
拉取镜像(镜像下载大小大约 1GB,解压后约 1.7 GB)和启动 docker 容器

```bash
docker run -it 4pdosc/openmldb:0.5.2 bash
docker run -it 4pdosc/openmldb:0.5.3 bash
```

````{important}
Expand Down
6 changes: 3 additions & 3 deletions docs/zh/reference/ip_tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ docker network inspect bridge

单机版需要暴露三个组件(nameserver,tabletserver,apiserver)的端口:
```
docker run -p 6527:6527 -p 9921:9921 -p 8080:8080 -it 4pdosc/openmldb:0.5.2 bash
docker run -p 6527:6527 -p 9921:9921 -p 8080:8080 -it 4pdosc/openmldb:0.5.3 bash
```

集群版需要暴露zk端口与所有组件的端口:
```
docker run -p 2181:2181 -p 7527:7527 -p 10921:10921 -p 10922:10922 -p 8080:8080 -p 9902:9902 -it 4pdosc/openmldb:0.5.2 bash
docker run -p 2181:2181 -p 7527:7527 -p 10921:10921 -p 10922:10922 -p 8080:8080 -p 9902:9902 -it 4pdosc/openmldb:0.5.3 bash
```

```{tip}
Expand All @@ -56,7 +56,7 @@ docker run -p 2181:2181 -p 7527:7527 -p 10921:10921 -p 10922:10922 -p 8080:8080
#### host network
或者更方便地,使用 host networking,不进行端口隔离,例如:
```
docker run --network host -it 4pdosc/openmldb:0.5.2 bash
docker run --network host -it 4pdosc/openmldb:0.5.3 bash
```
但这种情况下,很容易出现端口已被主机中其他进程占用。如果出现占用,请仔细更改端口号。

Expand Down
19 changes: 19 additions & 0 deletions docs/zh/reference/sql/dql/SELECT_INTO_STATEMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,24 @@ SELECT col1, col2, col3 FROM t1 INTO OUTFILE 'data.csv' OPTIONS ( delimiter = ',
SELECT col1, col2, col3 FROM t1 INTO OUTFILE 'data2.csv' OPTIONS ( delimiter = '|', null_value='NA');
```

## Q&A

Q: select into 错误 Found duplicate column(s)?
```
Exception in thread "main" org.apache.spark.sql.AnalysisException: Found duplicate column(s) when inserting into file:/tmp/out: `c1`;
at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:90)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:84)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:944)
```

A: 查询语句是允许列名重复的。但`SELECT INTO`除了查询还需要写入,写入中会检查重复列名。请避免重复列名,可以用`c1 as c_new`来重命名列。
Original file line number Diff line number Diff line change
Expand Up @@ -1583,7 +1583,7 @@ Example:

```sql

SELECT if_null("hello", "default"), if_null(NULL, "default");
SELECT if_null("hello", "default"), if_null(cast(null as string), "default");
-- output ["hello", "default"]
```

Expand Down Expand Up @@ -1624,7 +1624,7 @@ Example:

```sql

SELECT if_null("hello", "default"), if_null(NULL, "default");
SELECT if_null("hello", "default"), if_null(cast(null as string), "default");
-- output ["hello", "default"]
```
Expand Down Expand Up @@ -2669,7 +2669,7 @@ Example:

```sql

SELECT if_null("hello", "default"), if_null(NULL, "default");
SELECT if_null("hello", "default"), if_null(cast(null as string), "default");
-- output ["hello", "default"]
```

Expand Down
6 changes: 3 additions & 3 deletions docs/zh/use_case/dolphinscheduler_task_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ OpenMLDB 希望能达成开发即上线的目标,让开发回归本质,而

推荐在我们提供的 OpenMLDB 镜像内进行演示测试:
```
docker run -it 4pdosc/openmldb:0.5.2 bash
docker run -it 4pdosc/openmldb:0.5.3 bash
```
```{attention}
DolphinScheduler 需要操作系统的用户,并且该用户需要有 sudo 权限。所以推荐在 OpenMLDB 容器内下载并启动 DolphinScheduler。否则,请准备有sudo权限的操作系统用户。
Expand All @@ -44,9 +44,9 @@ DolphinScheduler 需要操作系统的用户,并且该用户需要有 sudo 权

**运行 Predict Server**

我们将完成一个导入数据,离线训练,训练成功后模型上线的工作流。模型上线的部分,可以使用简单的predict server,见[predict server source](https://raw.githubusercontent.com/4paradigm/OpenMLDB/main/demo/talkingdata-adtracking-fraud-detection/predict_server.py)。你可以将它下载至本地,并运行至后台
我们将完成一个导入数据,离线训练,训练成功后模型上线的工作流。模型上线的部分,可以使用`/work/talkingdata`中的的predict server来完成。将它运行至后台
```
python3 predict_server.py --no-init > predict.log 2>&1 &
python3 /work/talkingdata/predict_server.py --no-init > predict.log 2>&1 &
```

**运行 DolphinScheduler**
Expand Down
2 changes: 1 addition & 1 deletion docs/zh/use_case/kafka_connector_demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ OpenMLDB Kafka Connector实现见[extensions/kafka-connect-jdbc](https://github.

我们推荐你将下载的三个文件包都绑定到文件目录`kafka`。当然,也可以在启动容器后,再进行文件包的下载。我们假设文件包都在`/work/kafka`目录中。
```
docker run -it -v `pwd`:/work/kafka --name openmldb 4pdosc/openmldb:0.5.2 bash
docker run -it -v `pwd`:/work/kafka --name openmldb 4pdosc/openmldb:0.5.3 bash
```

### 流程
Expand Down
Loading

0 comments on commit 2150992

Please sign in to comment.