feat: main to v0.6 (#2412)

4paradigm · Aug 30, 2022 · ce759f1 · ce759f1
1 parent 321e848
commit ce759f1
Show file tree

Hide file tree

Showing 483 changed files with 22,835 additions and 5,545 deletions.
diff --git a/.github/workflows/integration-test-src.yml b/.github/workflows/integration-test-src.yml
diff --git a/.github/workflows/sdk.yml b/.github/workflows/sdk.yml
@@ -135,7 +135,7 @@ jobs:
 
       - name: stop services
         run: |
-          cd onebox && sh stop_all.sh && cd - || exit
+          cd onebox && ./stop_all.sh && cd - || exit
           sh steps/ut_zookeeper.sh stop
 
 
@@ -289,7 +289,7 @@ jobs:
         run: |
           cp python/openmldb_sdk/dist/openmldb*.whl .
           cp python/openmldb_tool/dist/openmldb*.whl .
-          twine upload openmldb-*.whl
+          twine upload openmldb*.whl
         env:
           TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
           TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}

diff --git a/.gitignore b/.gitignore
@@ -96,7 +96,9 @@ java/hybridse-proto/src
 **/scalastyle-output.xml
 
 # test
-logs
+logs/
+out/
+allure-results/
 
 # python builds
 /python/openmldb_sdk/dist/

diff --git a/.gitpod.yml b/.gitpod.yml
@@ -0,0 +1,11 @@
+tasks:
+  - before: |
+      sudo apt update -y
+      DEBIAN_FRONTEND=noninteractive sudo apt-get install -y python3-dev build-essential autoconf git curl
+    init: |
+      make NPROC=16 # gitpod.io offers 16 CPU & 60 GB RAM
+      make install
+
+vscode:
+  extensions:
+    - ms-vscode.cpptools-extension-pack
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,25 @@
 # Changelog
 
+## [0.6.1] - 2022-08-30
+
+### Features
+- Support new build-in functions `last_day` and `regexp_like` (#2262 @HeZean, #2187 @jiang1997)
+- Support Jupyter Notebook for the TalkingData use case (#2354 @vagetablechicken)
+- Add a new API to disable Saprk logs of the batch engine (#2359 @tobegit3hub)
+- Add the use case of precision marketing based on OneFlow (#2267 @Elliezza @vagetablechicken @siqi)
+- Support the RPC request timeout in CLI and Python SDK (#2371 @vagetablechicken)
+- Improve the documents (#2021 @liuceyim, #2348 #2316 #2324 #2361 #2315 #2323 #2355 #2328 #2360 #2378 #2319 #2350 #2395 #2398 @michelle-qinqin, #2373 @njzyfr, #2370 @tobegit3hub, #2367 #2382 #2375 #2401 @vagetablechicken, #2387 #2394 @dl239, #2379 @aceforeverd, #2403 @lumianph, #2400 gitpod-for-oss @aceforeverd, )
+- Other minor features (#2363 @aceforeverd, #2185 @qsliu2017)
+
+### Bug Fixes
+- `APIServer` will core dump if no `rs` in `QueryResp`. (#2346 @vagetablechicken)
+- Data has not been deleted from `pre-aggr` tables if there are delete operations in a main table. (#2300 @zhanghaohit)
+- Task jobs will core dump when enabling `UnsafeRowOpt` with multiple threads in the Yarn cluster. (#2352 #2364 @tobegit3hub)
+- Other minor bug fixes (#2336 @dl239, #2337 @dl239, #2385 #2372 @aceforeverd, #2383 #2384 @vagetablechicken)
+
+### Code Refactoring
+#2310 @hv789, #2306 #2305 @yeya24, #2311 @Mattt47, #2368 @TBCCC, #2391 @PrajwalBorkar, #2392 @zahyaah, #2405 @wang-jiahua
+
 ## [0.6.0] - 2022-08-10
 
 ### Highlights
@@ -305,6 +325,7 @@ Removed
 - openmldb-0.2.0-linux.tar.gz targets on x86_64
 - aarch64 artifacts consider experimental
 
+[0.6.1]: https://github.com/4paradigm/OpenMLDB/compare/v0.6.0...v0.6.1
 [0.6.0]: https://github.com/4paradigm/OpenMLDB/compare/v0.5.3...v0.6.0
 [0.5.3]: https://github.com/4paradigm/OpenMLDB/compare/v0.5.2...v0.5.3
 [0.5.2]: https://github.com/4paradigm/OpenMLDB/compare/v0.5.1...v0.5.2

diff --git a/Makefile b/Makefile
@@ -84,7 +84,7 @@ endif
 TEST_TARGET ?=
 TEST_LEVEL ?=
 
-.PHONY: all coverage coverage-cpp coverage-java build test configure clean thirdparty-fast thirdparty openmldb-clean thirdparty-configure thirdparty-clean thirdpartybuild-clean thirdpartysrc-clean
+.PHONY: all coverage coverage-cpp coverage-java build test configure clean thirdparty-fast udf_doc_gen thirdparty openmldb-clean thirdparty-configure thirdparty-clean thirdpartybuild-clean thirdpartysrc-clean
 
 all: build
 
@@ -125,6 +125,10 @@ openmldb-clean:
 	rm -rf "$(OPENMLDB_BUILD_DIR)"
 	@cd java && ./mvnw clean
 
+udf_doc_gen:
+	$(MAKE) build OPENMLDB_BUILD_TARGET=export_udf_info
+	$(MAKE) -C ./hybridse/tools/documentation/udf_doxygen
+
 THIRD_PARTY_BUILD_DIR ?= $(MAKEFILE_DIR)/.deps
 THIRD_PARTY_SRC_DIR ?= $(MAKEFILE_DIR)/thirdsrc
 THIRD_PARTY_DIR ?= $(THIRD_PARTY_BUILD_DIR)/usr

diff --git a/README.md b/README.md
@@ -30,7 +30,7 @@
 12. [Publications](#12-publications)
 13. [The User List](#13-the-user-list)
 
-### OpenMLDB is an open-source machine learning database that provides a feature platform enabling consistent features for training and inference.
+### OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.
 
 ## 1. Our Philosophy
 
@@ -86,6 +86,10 @@ In order to achieve the goal of Development as Deployment, OpenMLDB is designed
 
 :point_right: [Read more](https://openmldb.ai/docs/en/main/deploy/index.html)
 
+Or you can directly start working on this repository by clicking on the following button
+
+[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/4paradigm/OpenMLDB)
+
 ## 6. QuickStart
 
 **Cluster and Standalone Versions**
@@ -105,7 +109,11 @@ We are building a list of real-world use cases based on OpenMLDB to demonstrate
 | [New York City Taxi Trip Duration](https://openmldb.ai/docs/en/main/use_case/lightgbm_demo.html) | OpenMLDB, LightGBM                                           | This is a challenge from Kaggle to predict the total ride duration of taxi trips in New York City. You can read [more detail here](https://www.kaggle.com/c/nyc-taxi-trip-duration/). It demonstrates using the open-source tools OpenMLDB + LightGBM to build an end-to-end machine learning applications easily. |
 | [Importing real-time data streams from Pulsar](https://openmldb.ai/docs/en/main/use_case/pulsar_openmldb_connector_demo.html) | OpenMLDB, Pulsar, [OpenMLDB-Pulsar connector](https://pulsar.apache.org/docs/next/io-connectors/#jdbc-openmldb) | Apache Pulsar is a cloud-native streaming platform. Based on the OpenMLDB-Kafka connector , we are able to seamlessly import real-time data streams from Pulsar to OpenMLDB as the online data sources. |
 | [Importing real-time data streams from Kafka](https://openmldb.ai/docs/en/main/use_case/kafka_connector_demo.html) | OpenMLDB, Kafka, [OpenMLDB-Kafka connector](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/kafka-connect-jdbc) | Apache Kafka is a distributed event streaming platform. With the OpenMLDB-Kafka connector, the real-time data streams can be imported from Kafka as the online data sources for OpenMLDB. |
-| [Building an end-to-end ML pipeline in DolphinScheduler](https://openmldb.ai/docs/en/main/use_case/dolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https://dolphinscheduler.apache.org/zh-cn/docs/dev/user_doc/guide/task/openmldb.html) | We demonstrate to build an end-to-end machine learning pipeline based on OpenMLDB and DolphinScheduler (an open-source workflow scheduler platform). It consists of feature engineering, model training, and deployment. |
+| [Building end-to-end ML pipelines in DolphinScheduler](https://openmldb.ai/docs/en/main/use_case/dolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https://dolphinscheduler.apache.org/zh-cn/docs/dev/user_doc/guide/task/openmldb.html) | We demonstrate to build an end-to-end machine learning pipeline based on OpenMLDB and DolphinScheduler (an open-source workflow scheduler platform). It consists of feature engineering, model training, and deployment. |
+| [Ad Tracking Fraud Detection](https://openmldb.ai/docs/zh/main/use_case/talkingdata_demo.html) | OpenMLDB, XGBoost                                            | This demo uses OpenMLDB and XGBoost to [detect click fraud](https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/) for online advertisements. |
+| [SQL-based ML pipelines](https://openmldb.ai/docs/zh/main/use_case/OpenMLDB_Byzer_taxi.html) | OpenMLDB, Byzer, [OpenMLDB Plugin for Byzer](https://github.com/byzer-org/byzer-extension/tree/master/byzer-openmldb) | Byzer is a low-code open-source programming language for data pipeline, analytics and AI. Byzer has integrated OpenMLDB to deliver the capability of building ML pipelines with SQL. |
+| [Building end-to-end ML pipelines in Airflow](https://openmldb.ai/docs/zh/main/use_case/airflow_provider_demo.html) | OpenMLDB, Airflow, [Airflow OpenMLDB Provider](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/airflow-provider-openmldb), XGBoost | Airflow is a popular workflow management and scheduling tool. This demo shows how to effectively schedule OpenMLDB tasks in the Airflow through the provider package. |
+| [Precision marketing](https://openmldb.ai/docs/zh/main/use_case/JD_recommendation.html) | OpenMLDB, OneFlow                                            | OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. This use case demonstrates to use OpenMLDB for feature engineering and OneFlow for model training/inference, to build an application for [precision marketing](https://jdata.jd.com/html/detail.html?id=1). |
 
 ## 8. Documentation
 
@@ -123,20 +131,14 @@ Furthermore, there are a few important features on the development roadmap but h
 - Optimization based on heterogeneous storage and computing resources
 - A lightweight OpenMLDB for edge computing
 
-## 10. Contributors
+## 10. Contribution
 
 We really appreciate the contribution from our community.
 
 - If you are interested to contribute, please read our [Contribution Guideline](CONTRIBUTING.md) for more details. 
 - If you are a new contributor, you may get start with [the list of issues labeled with `good first issue`](https://github.com/4paradigm/OpenMLDB/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
 - If you have experience of OpenMLDB development, or want to tackle a challenge that may take 1-2 weeks, you may find [the list of issues labeled with `call-for-contributions`](https://github.com/4paradigm/OpenMLDB/issues?q=is%3Aopen+is%3Aissue+label%3Acall-for-contributions).
 
-Let's clap hands for our community contributors :clap:
-
-<a href="https://github.com/4paradigm/openmldb/graphs/contributors">
-  <img src="https://contrib.rocks/image?repo=4paradigm/openmldb" width=600/>
-</a>
-
 ## 11. Community
 
 - Website: [https://openmldb.ai/en](https://openmldb.ai/en)

diff --git a/README_cn.md b/README_cn.md
@@ -102,7 +102,11 @@ OpenMLDB 有两种部署模式：集群版（cluster version）和单机版（st
 | [出租车行程时间预测](https://openmldb.ai/docs/zh/main/use_case/taxi_tour_duration_prediction.html) | OpenMLDB, LightGBM                                           | 这是个来自 Kaggle 的挑战，用于预测纽约市的出租车行程时间。你可以从这里阅读更多关于[该应用场景的描述](https://www.kaggle.com/c/nyc-taxi-trip-duration/)。本案例展示使用 OpenMLDB + LightGBM 的开源方案，快速搭建完整的机器学习应用。 |
 | [使用 Pulsar connector 接入实时数据流](https://openmldb.ai/docs/zh/main/use_case/pulsar_openmldb_connector_demo.html) | OpenMLDB, Pulsar, [OpenMLDB-Pulsar connector](https://github.com/apache/pulsar/tree/master/pulsar-io/jdbc/openmldb) | Apache Pulsar 是一个高性能的云原生的消息队列平台，基于  OpenMLDB-Pulsar connector，我们可以高效的将 Pulsar 的数据流作为 OpenMLDB 的在线数据源，实现两者的无缝整合。 |
 | [使用 Kafka connector 接入实时数据流](https://openmldb.ai/docs/zh/main/use_case/kafka_connector_demo.html) | OpenMLDB, Kafka, [OpenMLDB-Kafka connector](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/kafka-connect-jdbc) | Apache Kafka 是一个分布式消息流平台。基于 OpenMLDB-Kafka connector，实时数据流可以被简单的引入到 OpenMLDB 作为在线数据源。 |
-| [构建端到端的机器学习工作流](https://openmldb.ai/docs/zh/main/use_case/dolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https://dolphinscheduler.apache.org/zh-cn/docs/dev/user_doc/guide/task/openmldb.html) | 这个案例新演示了基于 OpenMLDB 和 DolphinScheduler（一个开源的工作流任务调度平台）来构建一个完整的机器学习工作流，包括了特征工程、模型训练，以及部署上线。 |
+| [在 DolphinScheduler 中构建端到端的机器学习工作流](https://openmldb.ai/docs/zh/main/use_case/dolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https://dolphinscheduler.apache.org/zh-cn/docs/dev/user_doc/guide/task/openmldb.html) | 这个案例新演示了基于 OpenMLDB 和 DolphinScheduler（一个开源的工作流任务调度平台）来构建一个完整的机器学习工作流，包括了特征工程、模型训练，以及部署上线。 |
+| [在线广告点击欺诈检测](https://openmldb.ai/docs/zh/main/use_case/talkingdata_demo.html) | OpenMLDB, XGBoost                                            | 该案例演示了基于 OpenMLDB 以及 XGBoost 去构建一个[在线广告反欺诈的应用](https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/)。 |
+| [基于 SQL 构建机器学习全流程](https://openmldb.ai/docs/zh/main/use_case/OpenMLDB_Byzer_taxi.html) | OpenMLDB, Byzer, [OpenMLDB Plugin for Byzer](https://github.com/byzer-org/byzer-extension/tree/master/byzer-openmldb) | Byzer 是一门面向 Data 和 AI 的低代码、云原生的开源编程语言。Byzer 已经把 OpenMLDB 整合在内，用来一起构建完整的机器学习应用全流程。 |
+| [在 Airflow 中构建机器学习应用](https://openmldb.ai/docs/zh/main/use_case/airflow_provider_demo.html) | OpenMLDB, Airflow, [Airflow OpenMLDB Provider](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/airflow-provider-openmldb), XGBoost | Airflow 是一个流行的工作流编排和管理软件。该案例展示了如何在 Airflow 内，通过提供的 provder package，来方便的编排基于 OpenMLDB 的机器学习任务。 |
+| [精准营销](https://openmldb.ai/docs/zh/main/use_case/JD_recommendation.html) | OpenMLDB, OneFlow                                            | OneFlow 是一个用户友好、可扩展、高效的深度学习框架。改案例展示了如何使用 OpenMLDB 做特征工程，串联 OneFlow 进行模型训练和预测，来构造一个用于[精准营销的机器学习应用](https://jdata.jd.com/html/detail.html?id=1)。 |
 
 ## 8. OpenMLDB 文档
 
@@ -121,7 +125,7 @@ OpenMLDB 有两种部署模式：集群版（cluster version）和单机版（st
 - 基于异构存储和异构计算资源进行优化
 - 轻量级 edge 版本
 
-## 10. 社区开发者
+## 10. 社区贡献
 
 我们非常感谢来自社区的贡献。
 
@@ -130,12 +134,6 @@ OpenMLDB 有两种部署模式：集群版（cluster version）和单机版（st
 - 如果你是有一定的开发经验，可以查找 [call-for-contributions](https://github.com/4paradigm/OpenMLDB/issues?q=is%3Aopen+is%3Aissue+label%3Acall-for-contributions) 标签的 issues。
 - 也可以阅读我们[这个文档](https://go005qabor.feishu.cn/docs/doccn7oEU0AlCOGtYz09chIebzd)来了解不同层级的开发任务，参与和开发者讨论
 
-为我们已有的社区贡献者鼓掌表示感谢 :clap: 
-
-<a href="https://github.com/4paradigm/openmldb/graphs/contributors">
-  <img src="https://contrib.rocks/image?repo=4paradigm/openmldb" width=600/>
-</a>
-
 ## 11. 加入社区
 
 - 网站：[https://openmldb.ai/](https://openmldb.ai)