Skip to content

Commit

Permalink
Merge branch 'flinkConfigtoHdfsConfig2' into flinkConfigtoHdfsConfig
Browse files Browse the repository at this point in the history
  • Loading branch information
cuibo01 committed May 7, 2022
2 parents 643062f + 42a480f commit 655551e
Show file tree
Hide file tree
Showing 2,580 changed files with 65,607 additions and 12,386 deletions.
Empty file modified .asf.yaml
100644 → 100755
Empty file.
Empty file modified .codecov.yml
100644 → 100755
Empty file.
Empty file modified .github/ISSUE_TEMPLATE/SUPPORT_REQUEST.md
100644 → 100755
Empty file.
Empty file modified .github/ISSUE_TEMPLATE/config.yml
100644 → 100755
Empty file.
Empty file modified .github/PULL_REQUEST_TEMPLATE.md
100644 → 100755
Empty file.
61 changes: 41 additions & 20 deletions .github/workflows/bot.yml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,26 @@ jobs:
strategy:
matrix:
include:
- scala: "scala-2.11"
spark: "spark2"
skipModules: ""
- scala: "scala-2.11"
spark: "spark2,spark-shade-unbundle-avro"
skipModules: ""
- scala: "scala-2.12"
spark: "spark3.1.x"
skipModules: "!hudi-spark-datasource/hudi-spark3"
- scala: "scala-2.12"
spark: "spark3.1.x,spark-shade-unbundle-avro"
skipModules: "!hudi-spark-datasource/hudi-spark3"
- scala: "scala-2.12"
spark: "spark3"
- scala: "scala-2.12"
spark: "spark3,spark-shade-unbundle-avro"
- scalaProfile: "scala-2.11"
sparkProfile: "spark2.4"
flinkProfile: "flink1.13"

- scalaProfile: "scala-2.11"
sparkProfile: "spark2.4"
flinkProfile: "flink1.14"

- scalaProfile: "scala-2.12"
sparkProfile: "spark2.4"
flinkProfile: "flink1.13"

- scalaProfile: "scala-2.12"
sparkProfile: "spark3.1"
flinkProfile: "flink1.14"

- scalaProfile: "scala-2.12"
sparkProfile: "spark3.2"
flinkProfile: "flink1.14"

steps:
- uses: actions/checkout@v2
- name: Set up JDK 8
Expand All @@ -42,7 +46,24 @@ jobs:
architecture: x64
- name: Build Project
env:
SCALA_PROFILE: ${{ matrix.scala }}
SPARK_PROFILE: ${{ matrix.spark }}
SKIP_MODULES: ${{ matrix.skipModules }}
run: mvn install -P "$SCALA_PROFILE,$SPARK_PROFILE" -pl "$SKIP_MODULES" -DskipTests=true -Dmaven.javadoc.skip=true -B -V
SCALA_PROFILE: ${{ matrix.scalaProfile }}
SPARK_PROFILE: ${{ matrix.sparkProfile }}
FLINK_PROFILE: ${{ matrix.flinkProfile }}
run:
mvn clean install -Pintegration-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"$FLINK_PROFILE" -DskipTests=true -B -V
- name: Quickstart Test
env:
SCALA_PROFILE: ${{ matrix.scalaProfile }}
SPARK_PROFILE: ${{ matrix.sparkProfile }}
FLINK_PROFILE: ${{ matrix.flinkProfile }}
if: ${{ !endsWith(env.SPARK_PROFILE, '3.2') }} # skip test spark 3.2 before hadoop upgrade to 3.x
run:
mvn test -Punit-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"$FLINK_PROFILE" -DfailIfNoTests=false -pl hudi-examples/hudi-examples-flink,hudi-examples/hudi-examples-java,hudi-examples/hudi-examples-spark
- name: Spark SQL Test
env:
SCALA_PROFILE: ${{ matrix.scalaProfile }}
SPARK_PROFILE: ${{ matrix.sparkProfile }}
FLINK_PROFILE: ${{ matrix.flinkProfile }}
if: ${{ !endsWith(env.SPARK_PROFILE, '2.4') }} # skip test spark 2.4 as it's covered by Azure CI
run:
mvn test -Punit-tests -D"$SCALA_PROFILE" -D"$SPARK_PROFILE" -D"$FLINK_PROFILE" '-Dtest=org.apache.spark.sql.hudi.Test*' -pl hudi-spark-datasource/hudi-spark
Empty file modified .gitignore
100644 → 100755
Empty file.
Empty file modified .idea/vcs.xml
100644 → 100755
Empty file.
Empty file modified LICENSE
100644 → 100755
Empty file.
Empty file modified NOTICE
100644 → 100755
Empty file.
68 changes: 41 additions & 27 deletions README.md
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,27 @@
-->

# Apache Hudi
Apache Hudi (pronounced Hoodie) stands for `Hadoop Upserts Deletes and Incrementals`.
Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage).

Apache Hudi (pronounced Hoodie) stands for `Hadoop Upserts Deletes and Incrementals`. Hudi manages the storage of large
analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage).

<img src="https://hudi.apache.org/assets/images/hudi-logo-medium.png" alt="Hudi logo" height="80px" align="right" />

<https://hudi.apache.org/>

[![Build](https://github.com/apache/hudi/actions/workflows/bot.yml/badge.svg)](https://github.com/apache/hudi/actions/workflows/bot.yml)
[![Test](https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_apis/build/status/apachehudi-ci.hudi-mirror?branchName=master)](https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/latest?definitionId=3&branchName=master)
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/org.apache.hudi/hudi/badge.svg)](http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.hudi%22)
![GitHub commit activity](https://img.shields.io/github/commit-activity/m/apache/hudi)
[![Join on Slack](https://img.shields.io/badge/slack-%23hudi-72eff8?logo=slack&color=48c628&label=Join%20on%20Slack)](https://join.slack.com/t/apache-hudi/shared_invite/enQtODYyNDAxNzc5MTg2LTE5OTBlYmVhYjM0N2ZhOTJjOWM4YzBmMWU2MjZjMGE4NDc5ZDFiOGQ2N2VkYTVkNzU3ZDQ4OTI1NmFmYWQ0NzE)
![Twitter Follow](https://img.shields.io/twitter/follow/ApacheHudi)

## Features

* Upsert support with fast, pluggable indexing
* Atomically publish data with rollback support
* Snapshot isolation between writer & queries
* Snapshot isolation between writer & queries
* Savepoints for data recovery
* Manages file sizes, layout using statistics
* Async compaction of row & columnar data
Expand Down Expand Up @@ -64,47 +70,55 @@ spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```

To build for integration tests that include `hudi-integ-test-bundle`, use `-Dintegration-tests`.

To build the Javadoc for all Java and Scala classes:
```
# Javadoc generated under target/site/apidocs
mvn clean javadoc:aggregate -Pjavadocs
```

### Build with Scala 2.12
### Build with different Spark versions

The default Scala version supported is 2.11. To build for Scala 2.12 version, build using `scala-2.12` profile
The default Spark version supported is 2.4.4. Refer to the table below for building with different Spark and Scala versions.

```
mvn clean package -DskipTests -Dscala-2.12
```
| Maven build options | Expected Spark bundle jar name | Notes |
|:--------------------------|:---------------------------------------------|:-------------------------------------------------|
| (empty) | hudi-spark-bundle_2.11 (legacy bundle name) | For Spark 2.4.4 and Scala 2.11 (default options) |
| `-Dspark2.4` | hudi-spark2.4-bundle_2.11 | For Spark 2.4.4 and Scala 2.11 (same as default) |
| `-Dspark2.4 -Dscala-2.12` | hudi-spark2.4-bundle_2.12 | For Spark 2.4.4 and Scala 2.12 |
| `-Dspark3.1 -Dscala-2.12` | hudi-spark3.1-bundle_2.12 | For Spark 3.1.x and Scala 2.12 |
| `-Dspark3.2 -Dscala-2.12` | hudi-spark3.2-bundle_2.12 | For Spark 3.2.x and Scala 2.12 |
| `-Dspark3` | hudi-spark3-bundle_2.12 (legacy bundle name) | For Spark 3.2.x and Scala 2.12 |
| `-Dscala-2.12` | hudi-spark-bundle_2.12 (legacy bundle name) | For Spark 2.4.4 and Scala 2.12 |

### Build with Spark 3
For example,
```
# Build against Spark 3.2.x
mvn clean package -DskipTests -Dspark3.2 -Dscala-2.12
The default Spark version supported is 2.4.4. To build for different Spark 3 versions, use the corresponding profile
# Build against Spark 3.1.x
mvn clean package -DskipTests -Dspark3.1 -Dscala-2.12
# Build against Spark 2.4.4 and Scala 2.12
mvn clean package -DskipTests -Dspark2.4 -Dscala-2.12
```
# Build against Spark 3.2.1 (the default build shipped with the public Spark 3 bundle)
mvn clean package -DskipTests -Dspark3

# Build against Spark 3.1.2
mvn clean package -DskipTests -Dspark3.1.x
```
#### What about "spark-avro" module?

### Build without spark-avro module
Starting from versions 0.11, Hudi no longer requires `spark-avro` to be specified using `--packages`

The default hudi-jar bundles spark-avro module. To build without spark-avro module, build using `spark-shade-unbundle-avro` profile
### Build with different Flink versions

```
# Checkout code and build
git clone https://github.com/apache/hudi.git && cd hudi
mvn clean package -DskipTests -Pspark-shade-unbundle-avro
The default Flink version supported is 1.14. Refer to the table below for building with different Flink and Scala versions.

# Start command
spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
--packages org.apache.spark:spark-avro_2.11:2.4.4 \
--jars `ls packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
```
| Maven build options | Expected Flink bundle jar name | Notes |
|:---------------------------|:-------------------------------|:------------------------------------------------|
| (empty) | hudi-flink1.14-bundle_2.11 | For Flink 1.14 and Scala 2.11 (default options) |
| `-Dflink1.14` | hudi-flink1.14-bundle_2.11 | For Flink 1.14 and Scala 2.11 (same as default) |
| `-Dflink1.14 -Dscala-2.12` | hudi-flink1.14-bundle_2.12 | For Flink 1.14 and Scala 2.12 |
| `-Dflink1.13` | hudi-flink1.13-bundle_2.11 | For Flink 1.13 and Scala 2.11 |
| `-Dflink1.13 -Dscala-2.12` | hudi-flink1.13-bundle_2.12 | For Flink 1.13 and Scala 2.12 |

## Running Tests

Expand Down
102 changes: 45 additions & 57 deletions azure-pipelines.yml
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -22,75 +22,59 @@ pool:
vmImage: 'ubuntu-18.04'

variables:
MAVEN_CACHE_FOLDER: $(Pipeline.Workspace)/.m2/repository
MAVEN_OPTS: '-Dmaven.repo.local=$(MAVEN_CACHE_FOLDER) -Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true'
MAVEN_OPTS: '-Dcheckstyle.skip=true -Drat.skip=true -Djacoco.skip=true'
SPARK_VERSION: '2.4.4'
HADOOP_VERSION: '2.7'
SPARK_ARCHIVE: spark-$(SPARK_VERSION)-bin-hadoop$(HADOOP_VERSION)
EXCLUDE_TESTED_MODULES: '!hudi-examples/hudi-examples-common,!hudi-examples/hudi-examples-flink,!hudi-examples/hudi-examples-java,!hudi-examples/hudi-examples-spark,!hudi-common,!hudi-flink-datasource/hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync'

stages:
- stage: test
jobs:
- job: UT_FT_1
displayName: UT FT common & flink & UT client/spark-client
timeoutInMinutes: '90'
timeoutInMinutes: '120'
steps:
- task: Cache@2
displayName: set cache
inputs:
key: 'maven | "$(Agent.OS)" | **/pom.xml'
restoreKeys: |
maven | "$(Agent.OS)"
maven
path: $(MAVEN_CACHE_FOLDER)
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
goals: 'clean install'
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT common flink client/spark-client
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Punit-tests -pl hudi-common,hudi-flink,hudi-client/hudi-spark-client
options: -Punit-tests -pl hudi-common,hudi-flink-datasource/hudi-flink,hudi-client/hudi-spark-client
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT common flink
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Pfunctional-tests -pl hudi-common,hudi-flink
options: -Pfunctional-tests -pl hudi-common,hudi-flink-datasource/hudi-flink
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- job: UT_FT_2
displayName: FT client/spark-client
timeoutInMinutes: '90'
timeoutInMinutes: '120'
steps:
- task: Cache@2
displayName: set cache
inputs:
key: 'maven | "$(Agent.OS)" | **/pom.xml'
restoreKeys: |
maven | "$(Agent.OS)"
maven
path: $(MAVEN_CACHE_FOLDER)
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
goals: 'clean install'
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT client/spark-client
inputs:
Expand All @@ -99,28 +83,20 @@ stages:
options: -Pfunctional-tests -pl hudi-client/hudi-spark-client
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- job: UT_FT_3
displayName: UT FT clients & cli & utilities & sync/hive-sync
timeoutInMinutes: '90'
timeoutInMinutes: '120'
steps:
- task: Cache@2
displayName: set cache
inputs:
key: 'maven | "$(Agent.OS)" | **/pom.xml'
restoreKeys: |
maven | "$(Agent.OS)"
maven
path: $(MAVEN_CACHE_FOLDER)
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
goals: 'clean install'
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT clients & cli & utilities & sync/hive-sync
inputs:
Expand All @@ -129,7 +105,7 @@ stages:
options: -Punit-tests -pl hudi-client/hudi-client-common,hudi-client/hudi-flink-client,hudi-client/hudi-java-client,hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT clients & cli & utilities & sync/hive-sync
inputs:
Expand All @@ -138,48 +114,60 @@ stages:
options: -Pfunctional-tests -pl hudi-client/hudi-client-common,hudi-client/hudi-flink-client,hudi-client/hudi-java-client,hudi-cli,hudi-utilities,hudi-sync/hudi-hive-sync
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- job: UT_FT_4
displayName: UT FT other modules
timeoutInMinutes: '90'
timeoutInMinutes: '120'
steps:
- task: Cache@2
displayName: set cache
inputs:
key: 'maven | "$(Agent.OS)" | **/pom.xml'
restoreKeys: |
maven | "$(Agent.OS)"
maven
path: $(MAVEN_CACHE_FOLDER)
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'install'
goals: 'clean install'
options: -T 2.5C -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT other modules
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Punit-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
options: -Punit-tests -pl $(EXCLUDE_TESTED_MODULES)
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: FT other modules
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Pfunctional-tests -pl !hudi-common,!hudi-flink,!hudi-client/hudi-spark-client,!hudi-client/hudi-client-common,!hudi-client/hudi-flink-client,!hudi-client/hudi-java-client,!hudi-cli,!hudi-utilities,!hudi-sync/hudi-hive-sync
options: -Pfunctional-tests -pl $(EXCLUDE_TESTED_MODULES)
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx2g $(MAVEN_OPTS)'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- job: IT
displayName: IT modules
timeoutInMinutes: '120'
steps:
- task: Maven@3
displayName: maven install
inputs:
mavenPomFile: 'pom.xml'
goals: 'clean install'
options: -T 2.5C -Pintegration-tests -DskipTests
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: Maven@3
displayName: UT integ-test
inputs:
mavenPomFile: 'pom.xml'
goals: 'test'
options: -Pintegration-tests -DskipUTs=false -DskipITs=true -pl hudi-integ-test test
publishJUnitResults: false
jdkVersionOption: '1.8'
mavenOptions: '-Xmx4g $(MAVEN_OPTS)'
- task: AzureCLI@2
displayName: Prepare for IT
inputs:
Expand Down
Empty file modified conf/hudi-defaults.conf.template
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-flink-bundle_2.11.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-flink-bundle_2.12.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-hadoop-mr-bundle.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-hive-sync-bundle.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-integ-test-bundle.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-kafka-connect-bundle.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-presto-bundle.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-spark-bundle_2.11.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-spark-bundle_2.12.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-spark3-bundle_2.12.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-timeline-server-bundle.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-utilities-bundle_2.11.txt
100644 → 100755
Empty file.
Empty file modified dependencies/hudi-utilities-bundle_2.12.txt
100644 → 100755
Empty file.
Loading

0 comments on commit 655551e

Please sign in to comment.