Skip to content

Commit

Permalink
feat: main to v0.6 (#2412)
Browse files Browse the repository at this point in the history
  • Loading branch information
dl239 authored Aug 30, 2022
1 parent 321e848 commit ce759f1
Show file tree
Hide file tree
Showing 483 changed files with 22,835 additions and 5,545 deletions.
331 changes: 230 additions & 101 deletions .github/workflows/integration-test-src.yml

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions .github/workflows/sdk.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ jobs:

- name: stop services
run: |
cd onebox && sh stop_all.sh && cd - || exit
cd onebox && ./stop_all.sh && cd - || exit
sh steps/ut_zookeeper.sh stop
Expand Down Expand Up @@ -289,7 +289,7 @@ jobs:
run: |
cp python/openmldb_sdk/dist/openmldb*.whl .
cp python/openmldb_tool/dist/openmldb*.whl .
twine upload openmldb-*.whl
twine upload openmldb*.whl
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,9 @@ java/hybridse-proto/src
**/scalastyle-output.xml

# test
logs
logs/
out/
allure-results/

# python builds
/python/openmldb_sdk/dist/
Expand Down
11 changes: 11 additions & 0 deletions .gitpod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
tasks:
- before: |
sudo apt update -y
DEBIAN_FRONTEND=noninteractive sudo apt-get install -y python3-dev build-essential autoconf git curl
init: |
make NPROC=16 # gitpod.io offers 16 CPU & 60 GB RAM
make install
vscode:
extensions:
- ms-vscode.cpptools-extension-pack
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Changelog

## [0.6.1] - 2022-08-30

### Features
- Support new build-in functions `last_day` and `regexp_like` (#2262 @HeZean, #2187 @jiang1997)
- Support Jupyter Notebook for the TalkingData use case (#2354 @vagetablechicken)
- Add a new API to disable Saprk logs of the batch engine (#2359 @tobegit3hub)
- Add the use case of precision marketing based on OneFlow (#2267 @Elliezza @vagetablechicken @siqi)
- Support the RPC request timeout in CLI and Python SDK (#2371 @vagetablechicken)
- Improve the documents (#2021 @liuceyim, #2348 #2316 #2324 #2361 #2315 #2323 #2355 #2328 #2360 #2378 #2319 #2350 #2395 #2398 @michelle-qinqin, #2373 @njzyfr, #2370 @tobegit3hub, #2367 #2382 #2375 #2401 @vagetablechicken, #2387 #2394 @dl239, #2379 @aceforeverd, #2403 @lumianph, #2400 gitpod-for-oss @aceforeverd, )
- Other minor features (#2363 @aceforeverd, #2185 @qsliu2017)

### Bug Fixes
- `APIServer` will core dump if no `rs` in `QueryResp`. (#2346 @vagetablechicken)
- Data has not been deleted from `pre-aggr` tables if there are delete operations in a main table. (#2300 @zhanghaohit)
- Task jobs will core dump when enabling `UnsafeRowOpt` with multiple threads in the Yarn cluster. (#2352 #2364 @tobegit3hub)
- Other minor bug fixes (#2336 @dl239, #2337 @dl239, #2385 #2372 @aceforeverd, #2383 #2384 @vagetablechicken)

### Code Refactoring
#2310 @hv789, #2306 #2305 @yeya24, #2311 @Mattt47, #2368 @TBCCC, #2391 @PrajwalBorkar, #2392 @zahyaah, #2405 @wang-jiahua

## [0.6.0] - 2022-08-10

### Highlights
Expand Down Expand Up @@ -305,6 +325,7 @@ Removed
- openmldb-0.2.0-linux.tar.gz targets on x86_64
- aarch64 artifacts consider experimental

[0.6.1]: https://github.com/4paradigm/OpenMLDB/compare/v0.6.0...v0.6.1
[0.6.0]: https://github.com/4paradigm/OpenMLDB/compare/v0.5.3...v0.6.0
[0.5.3]: https://github.com/4paradigm/OpenMLDB/compare/v0.5.2...v0.5.3
[0.5.2]: https://github.com/4paradigm/OpenMLDB/compare/v0.5.1...v0.5.2
Expand Down
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ endif
TEST_TARGET ?=
TEST_LEVEL ?=

.PHONY: all coverage coverage-cpp coverage-java build test configure clean thirdparty-fast thirdparty openmldb-clean thirdparty-configure thirdparty-clean thirdpartybuild-clean thirdpartysrc-clean
.PHONY: all coverage coverage-cpp coverage-java build test configure clean thirdparty-fast udf_doc_gen thirdparty openmldb-clean thirdparty-configure thirdparty-clean thirdpartybuild-clean thirdpartysrc-clean

all: build

Expand Down Expand Up @@ -125,6 +125,10 @@ openmldb-clean:
rm -rf "$(OPENMLDB_BUILD_DIR)"
@cd java && ./mvnw clean

udf_doc_gen:
$(MAKE) build OPENMLDB_BUILD_TARGET=export_udf_info
$(MAKE) -C ./hybridse/tools/documentation/udf_doxygen

THIRD_PARTY_BUILD_DIR ?= $(MAKEFILE_DIR)/.deps
THIRD_PARTY_SRC_DIR ?= $(MAKEFILE_DIR)/thirdsrc
THIRD_PARTY_DIR ?= $(THIRD_PARTY_BUILD_DIR)/usr
Expand Down
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
12. [Publications](#12-publications)
13. [The User List](#13-the-user-list)

### OpenMLDB is an open-source machine learning database that provides a feature platform enabling consistent features for training and inference.
### OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.

## 1. Our Philosophy

Expand Down Expand Up @@ -86,6 +86,10 @@ In order to achieve the goal of Development as Deployment, OpenMLDB is designed

:point_right: [Read more](https://openmldb.ai/docs/en/main/deploy/index.html)

Or you can directly start working on this repository by clicking on the following button

[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/4paradigm/OpenMLDB)

## 6. QuickStart

**Cluster and Standalone Versions**
Expand All @@ -105,7 +109,11 @@ We are building a list of real-world use cases based on OpenMLDB to demonstrate
| [New York City Taxi Trip Duration](https://openmldb.ai/docs/en/main/use_case/lightgbm_demo.html) | OpenMLDB, LightGBM | This is a challenge from Kaggle to predict the total ride duration of taxi trips in New York City. You can read [more detail here](https://www.kaggle.com/c/nyc-taxi-trip-duration/). It demonstrates using the open-source tools OpenMLDB + LightGBM to build an end-to-end machine learning applications easily. |
| [Importing real-time data streams from Pulsar](https://openmldb.ai/docs/en/main/use_case/pulsar_openmldb_connector_demo.html) | OpenMLDB, Pulsar, [OpenMLDB-Pulsar connector](https://pulsar.apache.org/docs/next/io-connectors/#jdbc-openmldb) | Apache Pulsar is a cloud-native streaming platform. Based on the OpenMLDB-Kafka connector , we are able to seamlessly import real-time data streams from Pulsar to OpenMLDB as the online data sources. |
| [Importing real-time data streams from Kafka](https://openmldb.ai/docs/en/main/use_case/kafka_connector_demo.html) | OpenMLDB, Kafka, [OpenMLDB-Kafka connector](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/kafka-connect-jdbc) | Apache Kafka is a distributed event streaming platform. With the OpenMLDB-Kafka connector, the real-time data streams can be imported from Kafka as the online data sources for OpenMLDB. |
| [Building an end-to-end ML pipeline in DolphinScheduler](https://openmldb.ai/docs/en/main/use_case/dolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https://dolphinscheduler.apache.org/zh-cn/docs/dev/user_doc/guide/task/openmldb.html) | We demonstrate to build an end-to-end machine learning pipeline based on OpenMLDB and DolphinScheduler (an open-source workflow scheduler platform). It consists of feature engineering, model training, and deployment. |
| [Building end-to-end ML pipelines in DolphinScheduler](https://openmldb.ai/docs/en/main/use_case/dolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https://dolphinscheduler.apache.org/zh-cn/docs/dev/user_doc/guide/task/openmldb.html) | We demonstrate to build an end-to-end machine learning pipeline based on OpenMLDB and DolphinScheduler (an open-source workflow scheduler platform). It consists of feature engineering, model training, and deployment. |
| [Ad Tracking Fraud Detection](https://openmldb.ai/docs/zh/main/use_case/talkingdata_demo.html) | OpenMLDB, XGBoost | This demo uses OpenMLDB and XGBoost to [detect click fraud](https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/) for online advertisements. |
| [SQL-based ML pipelines](https://openmldb.ai/docs/zh/main/use_case/OpenMLDB_Byzer_taxi.html) | OpenMLDB, Byzer, [OpenMLDB Plugin for Byzer](https://github.com/byzer-org/byzer-extension/tree/master/byzer-openmldb) | Byzer is a low-code open-source programming language for data pipeline, analytics and AI. Byzer has integrated OpenMLDB to deliver the capability of building ML pipelines with SQL. |
| [Building end-to-end ML pipelines in Airflow](https://openmldb.ai/docs/zh/main/use_case/airflow_provider_demo.html) | OpenMLDB, Airflow, [Airflow OpenMLDB Provider](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/airflow-provider-openmldb), XGBoost | Airflow is a popular workflow management and scheduling tool. This demo shows how to effectively schedule OpenMLDB tasks in the Airflow through the provider package. |
| [Precision marketing](https://openmldb.ai/docs/zh/main/use_case/JD_recommendation.html) | OpenMLDB, OneFlow | OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. This use case demonstrates to use OpenMLDB for feature engineering and OneFlow for model training/inference, to build an application for [precision marketing](https://jdata.jd.com/html/detail.html?id=1). |

## 8. Documentation

Expand All @@ -123,20 +131,14 @@ Furthermore, there are a few important features on the development roadmap but h
- Optimization based on heterogeneous storage and computing resources
- A lightweight OpenMLDB for edge computing

## 10. Contributors
## 10. Contribution

We really appreciate the contribution from our community.

- If you are interested to contribute, please read our [Contribution Guideline](CONTRIBUTING.md) for more details.
- If you are a new contributor, you may get start with [the list of issues labeled with `good first issue`](https://github.com/4paradigm/OpenMLDB/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
- If you have experience of OpenMLDB development, or want to tackle a challenge that may take 1-2 weeks, you may find [the list of issues labeled with `call-for-contributions`](https://github.com/4paradigm/OpenMLDB/issues?q=is%3Aopen+is%3Aissue+label%3Acall-for-contributions).

Let's clap hands for our community contributors :clap:

<a href="https://github.com/4paradigm/openmldb/graphs/contributors">
<img src="https://contrib.rocks/image?repo=4paradigm/openmldb" width=600/>
</a>

## 11. Community

- Website: [https://openmldb.ai/en](https://openmldb.ai/en)
Expand Down
14 changes: 6 additions & 8 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,11 @@ OpenMLDB 有两种部署模式:集群版(cluster version)和单机版(st
| [出租车行程时间预测](https://openmldb.ai/docs/zh/main/use_case/taxi_tour_duration_prediction.html) | OpenMLDB, LightGBM | 这是个来自 Kaggle 的挑战,用于预测纽约市的出租车行程时间。你可以从这里阅读更多关于[该应用场景的描述](https://www.kaggle.com/c/nyc-taxi-trip-duration/)。本案例展示使用 OpenMLDB + LightGBM 的开源方案,快速搭建完整的机器学习应用。 |
| [使用 Pulsar connector 接入实时数据流](https://openmldb.ai/docs/zh/main/use_case/pulsar_openmldb_connector_demo.html) | OpenMLDB, Pulsar, [OpenMLDB-Pulsar connector](https://github.com/apache/pulsar/tree/master/pulsar-io/jdbc/openmldb) | Apache Pulsar 是一个高性能的云原生的消息队列平台,基于 OpenMLDB-Pulsar connector,我们可以高效的将 Pulsar 的数据流作为 OpenMLDB 的在线数据源,实现两者的无缝整合。 |
| [使用 Kafka connector 接入实时数据流](https://openmldb.ai/docs/zh/main/use_case/kafka_connector_demo.html) | OpenMLDB, Kafka, [OpenMLDB-Kafka connector](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/kafka-connect-jdbc) | Apache Kafka 是一个分布式消息流平台。基于 OpenMLDB-Kafka connector,实时数据流可以被简单的引入到 OpenMLDB 作为在线数据源。 |
| [构建端到端的机器学习工作流](https://openmldb.ai/docs/zh/main/use_case/dolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https://dolphinscheduler.apache.org/zh-cn/docs/dev/user_doc/guide/task/openmldb.html) | 这个案例新演示了基于 OpenMLDB 和 DolphinScheduler(一个开源的工作流任务调度平台)来构建一个完整的机器学习工作流,包括了特征工程、模型训练,以及部署上线。 |
| [在 DolphinScheduler 中构建端到端的机器学习工作流](https://openmldb.ai/docs/zh/main/use_case/dolphinscheduler_task_demo.html) | OpenMLDB, DolphinScheduler, [OpenMLDB task plugin](https://dolphinscheduler.apache.org/zh-cn/docs/dev/user_doc/guide/task/openmldb.html) | 这个案例新演示了基于 OpenMLDB 和 DolphinScheduler(一个开源的工作流任务调度平台)来构建一个完整的机器学习工作流,包括了特征工程、模型训练,以及部署上线。 |
| [在线广告点击欺诈检测](https://openmldb.ai/docs/zh/main/use_case/talkingdata_demo.html) | OpenMLDB, XGBoost | 该案例演示了基于 OpenMLDB 以及 XGBoost 去构建一个[在线广告反欺诈的应用](https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection/)|
| [基于 SQL 构建机器学习全流程](https://openmldb.ai/docs/zh/main/use_case/OpenMLDB_Byzer_taxi.html) | OpenMLDB, Byzer, [OpenMLDB Plugin for Byzer](https://github.com/byzer-org/byzer-extension/tree/master/byzer-openmldb) | Byzer 是一门面向 Data 和 AI 的低代码、云原生的开源编程语言。Byzer 已经把 OpenMLDB 整合在内,用来一起构建完整的机器学习应用全流程。 |
| [在 Airflow 中构建机器学习应用](https://openmldb.ai/docs/zh/main/use_case/airflow_provider_demo.html) | OpenMLDB, Airflow, [Airflow OpenMLDB Provider](https://github.com/4paradigm/OpenMLDB/tree/main/extensions/airflow-provider-openmldb), XGBoost | Airflow 是一个流行的工作流编排和管理软件。该案例展示了如何在 Airflow 内,通过提供的 provder package,来方便的编排基于 OpenMLDB 的机器学习任务。 |
| [精准营销](https://openmldb.ai/docs/zh/main/use_case/JD_recommendation.html) | OpenMLDB, OneFlow | OneFlow 是一个用户友好、可扩展、高效的深度学习框架。改案例展示了如何使用 OpenMLDB 做特征工程,串联 OneFlow 进行模型训练和预测,来构造一个用于[精准营销的机器学习应用](https://jdata.jd.com/html/detail.html?id=1)|

## 8. OpenMLDB 文档

Expand All @@ -121,7 +125,7 @@ OpenMLDB 有两种部署模式:集群版(cluster version)和单机版(st
- 基于异构存储和异构计算资源进行优化
- 轻量级 edge 版本

## 10. 社区开发者
## 10. 社区贡献

我们非常感谢来自社区的贡献。

Expand All @@ -130,12 +134,6 @@ OpenMLDB 有两种部署模式:集群版(cluster version)和单机版(st
- 如果你是有一定的开发经验,可以查找 [call-for-contributions](https://github.com/4paradigm/OpenMLDB/issues?q=is%3Aopen+is%3Aissue+label%3Acall-for-contributions) 标签的 issues。
- 也可以阅读我们[这个文档](https://go005qabor.feishu.cn/docs/doccn7oEU0AlCOGtYz09chIebzd)来了解不同层级的开发任务,参与和开发者讨论

为我们已有的社区贡献者鼓掌表示感谢 :clap:

<a href="https://github.com/4paradigm/openmldb/graphs/contributors">
<img src="https://contrib.rocks/image?repo=4paradigm/openmldb" width=600/>
</a>

## 11. 加入社区

- 网站:[https://openmldb.ai/](https://openmldb.ai)
Expand Down
Loading

0 comments on commit ce759f1

Please sign in to comment.