Skip to content

Commit

Permalink
Merge branch 'gen_wrappers_script' of https://github.com/ahmedabu98/beam
Browse files Browse the repository at this point in the history
 into gen_wrappers_script
  • Loading branch information
ahmedabu98 committed Feb 21, 2024
2 parents 35da8b9 + 8fdb02a commit bc43578
Show file tree
Hide file tree
Showing 88 changed files with 3,280 additions and 2,013 deletions.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ jobs:
beam_PostCommit_Java_PVR_Spark3_Streaming:
name: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
runs-on: [self-hosted, ubuntu-20.04, main]
timeout-minutes: 120
timeout-minutes: 180
strategy:
matrix:
job_name: [beam_PostCommit_Java_PVR_Spark3_Streaming]
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ sdks/python/NOTICE
sdks/python/README.md
sdks/python/apache_beam/transforms/xlang/*
sdks/python/apache_beam/portability/api/*
sdks/python/apache_beam/yaml/docs/*
sdks/python/nosetests*.xml
sdks/python/pytest*.xml
sdks/python/postcommit_requirements.txt
Expand Down
13 changes: 10 additions & 3 deletions .test-infra/tools/stale_dataflow_prebuilt_image_cleaner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ done

HAS_STALE_IMAGES=""
FAILED_IMAGES=""
FAILED_COUNT=0

for image_name in ${IMAGE_NAMES[@]}; do
echo IMAGES FOR image ${image_name}
Expand Down Expand Up @@ -99,6 +100,7 @@ for image_name in ${IMAGE_NAMES[@]}; do
if [ -z "$MANIFEST" ]; then
# Sometimes "no such manifest" seen. Skip current if command hit error
FAILED_IMAGES+=" $current"
FAILED_COUNT=$(($FAILED_COUNT + 1))
continue
fi
SHOULD_DELETE=0
Expand Down Expand Up @@ -129,7 +131,7 @@ for image_name in ${IMAGE_NAMES[@]}; do
echo "Failed to delete the following images: ${FAILED_TO_DELETE}. Retrying each of them."
for current in $RETRY_DELETE; do
echo "Trying again to delete image ${image_name}@"${current}". Command: gcloud container images delete ${image_name}@"${current}" --force-delete-tags -q"
gcloud container images delete ${image_name}@"${current}" --force-delete-tags -q || FAILED_IMAGES+=" ${image_name}@${current}"
gcloud container images delete ${image_name}@"${current}" --force-delete-tags -q || (FAILED_IMAGES+=" ${image_name}@${current}" && FAILED_COUNT=$(($FAILED_COUNT + 1)))
done
fi
done
Expand All @@ -142,5 +144,10 @@ fi

if [ -n "$FAILED_IMAGES" ]; then
echo "Failed delete images $FAILED_IMAGES"
exit 1
fi
# Sometimes images may not be deleted on the first pass if they have dependencies on previous images. Only fail if we have a persistent leak
FAILED_THRESHOLD=10
if [ $FAILED_COUNT -gt $FAILED_THRESHOLD ]; then
echo "Failed delete at least $FAILED_THRESHOLD images, failing job."
exit 1
fi
fi
15 changes: 11 additions & 4 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,16 @@

## I/Os

* Added support for handling bad records to BigQueryIO ([#30081](https://github.com/apache/beam/pull/30081)).
* Full Support for Storage Read and Write APIs
* Partial Support for File Loads (Failures writing to files supported, failures loading files to BQ unsupported)
* No Support for Extract or Streaming Inserts
* Support for X source added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).

## New Features / Improvements

* [Enrichment Transform](https://s.apache.org/enrichment-transform) along with GCP BigTable handler added to Python SDK ([#30001](https://github.com/apache/beam/pull/30001)).
* Allow writing clustered and not time partitioned BigQuery tables (Java) ([#30094](https://github.com/apache/beam/pull/30094)).
* Redis cache support added to RequestResponseIO and Enrichment transform (Python) ([#30307](https://github.com/apache/beam/pull/30307))
* Merged sdks/java/fn-execution and runners/core-construction-java into the main SDK. These artifacts were never meant for users, but noting
that they no longer exist. These are steps to bring portability into the core SDK alongside all other core functionality.

Expand All @@ -78,6 +82,7 @@
* Go SDK users who build custom worker containers may run into issues with the move to distroless containers as a base (see Security Fixes).
* The issue stems from distroless containers lacking additional tools, which current custom container processes may rely on.
* See https://beam.apache.org/documentation/runtime/environments/#from-scratch-go for instructions on building and using a custom container.
* Python SDK has changed the default value for the `--max_cache_memory_usage_mb` pipeline option from 100 to 0. This option was first introduced in 2.52.0 SDK. This change restores the behavior of 2.51.0 SDK, which does not use the state cache. If your pipeline uses iterable side inputs views, consider increasing the cache size by setting the option manually. ([#30360](https://github.com/apache/beam/issues/30360)).

## Deprecations

Expand Down Expand Up @@ -132,7 +137,7 @@

## Known Issues

* N/A
* Some Python pipelines that run with 2.52.0-2.54.0 SDKs and use large materialized side inputs might be affected by a performance regression. To restore the prior behavior on these SDK versions, supply the `--max_cache_memory_usage_mb=0` pipeline option. ([#30360](https://github.com/apache/beam/issues/30360)).

# [2.53.0] - 2024-01-04

Expand Down Expand Up @@ -173,7 +178,8 @@

## Known Issues

* ([#29987](https://github.com/apache/beam/issues/29987)).
* Potential race condition causing NPE in DataflowExecutionStateSampler in Dataflow Java Streaming pipelines ([#29987](https://github.com/apache/beam/issues/29987)).
* Some Python pipelines that run with 2.52.0-2.54.0 SDKs and use large materialized side inputs might be affected by a performance regression. To restore the prior behavior on these SDK versions, supply the `--max_cache_memory_usage_mb=0` pipeline option. ([#30360](https://github.com/apache/beam/issues/30360)).

# [2.52.0] - 2023-11-17

Expand All @@ -192,7 +198,7 @@ should handle this. ([#25252](https://github.com/apache/beam/issues/25252)).
jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed
using the DataSet API.
* `upload_graph` as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK ([PR#28621](https://github.com/apache/beam/pull/28621)).
* state amd side input cache has been enabled to a default of 100 MB. Use `--max_cache_memory_usage_mb=X` to provide cache size for the user state API and side inputs. (Python) ([#28770](https://github.com/apache/beam/issues/28770)).
* Introduced a pipeline option `--max_cache_memory_usage_mb` to configure state and side input cache size. The cache has been enabled to a default of 100 MB. Use `--max_cache_memory_usage_mb=X` to provide cache size for the user state API and side inputs. ([#28770](https://github.com/apache/beam/issues/28770)).
* Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO's and turnkey transforms. More information can be found in the YAML root folder and in the [README](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md).


Expand All @@ -218,6 +224,7 @@ as a workaround, a copy of "old" `CountingSource` class should be placed into a
## Known issues

* MLTransform drops the identical elements in the output PCollection. For any duplicate elements, a single element will be emitted downstream. ([#29600](https://github.com/apache/beam/issues/29600)).
* Some Python pipelines that run with 2.52.0-2.54.0 SDKs and use large materialized side inputs might be affected by a performance regression. To restore the prior behavior on these SDK versions, supply the `--max_cache_memory_usage_mb=0` pipeline option. (Python) ([#30360](https://github.com/apache/beam/issues/30360)).

# [2.51.0] - 2023-10-03

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2646,6 +2646,7 @@ class BeamModulePlugin implements Plugin<Project> {

project.evaluationDependsOn(":sdks:python")
project.evaluationDependsOn(":sdks:java:testing:expansion-service")
project.evaluationDependsOn(":sdks:java:core")
project.evaluationDependsOn(":sdks:java:extensions:python")
project.evaluationDependsOn(":sdks:go:test")

Expand Down Expand Up @@ -2710,9 +2711,11 @@ class BeamModulePlugin implements Plugin<Project> {
systemProperty "expansionPort", port
systemProperty "semiPersistDir", config.semiPersistDir
classpath = config.classpath + project.files(
project.project(":sdks:java:core").sourceSets.test.runtimeClasspath,
project.project(":sdks:java:extensions:python").sourceSets.test.runtimeClasspath
)
testClassesDirs = project.files(
project.project(":sdks:java:core").sourceSets.test.output.classesDirs,
project.project(":sdks:java:extensions:python").sourceSets.test.output.classesDirs
)
maxParallelForks config.numParallelTests
Expand Down
3 changes: 3 additions & 0 deletions it/google-cloud-platform/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -81,4 +81,7 @@ dependencies {

tasks.register("GCSPerformanceTest", IoPerformanceTestUtilities.IoPerformanceTest, project, 'google-cloud-platform', 'FileBasedIOLT', ['configuration':'large','project':'apache-beam-testing', 'artifactBucket':'io-performance-temp'])
tasks.register("BigTablePerformanceTest", IoPerformanceTestUtilities.IoPerformanceTest, project, 'google-cloud-platform', 'BigTableIOLT', ['configuration':'large','project':'apache-beam-testing', 'artifactBucket':'io-performance-temp'])
tasks.register("BigQueryPerformanceTest", IoPerformanceTestUtilities.IoPerformanceTest, project, 'google-cloud-platform', 'BigQueryIOLT', ['configuration':'medium','project':'apache-beam-testing', 'artifactBucket':'io-performance-temp'])
tasks.register("BigQueryStressTest", IoPerformanceTestUtilities.IoPerformanceTest, project, 'google-cloud-platform', 'BigQueryIOST', ['configuration':'medium','project':'apache-beam-testing', 'artifactBucket':'io-performance-temp'])
tasks.register("BigQueryStorageApiStreamingPerformanceTest", IoPerformanceTestUtilities.IoPerformanceTest, project, 'google-cloud-platform', 'BigQueryStreamingLT', ['configuration':'large', 'project':'apache-beam-testing', 'artifactBucket':'io-performance-temp'])
tasks.register("WordCountIntegrationTest", IoPerformanceTestUtilities.IoPerformanceTest, project, 'google-cloud-platform', 'WordCountIT', ['project':'apache-beam-testing', 'artifactBucket':'io-performance-temp'])
Loading

0 comments on commit bc43578

Please sign in to comment.