-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-7] Initial Dataflow code drop #1
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112105439
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112118850
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112173826
Users should not need to compare DataflowAssert objects on Java equality. Instead, it's nearly always a broken test that will silently fail. Throw an UnsupportedOperationException instead, and direct users to isEqualTo (Singleton) or containsInAnyOrder (Iterable). This change caught a broken test. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112200184
Generalize the 'game' example BigQuery write classes to take a map that specifies how to generate the output fields. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112253306
Some tools don't support .zip in the class path. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112261905
gcloud moved where it stores the credentials configured on the command line. Since there is still no support in standard libraries to get the default project, update DefaultProjectFactory to support the new location. Note that users who have not upgraded gcloud are still supported. ----Release Notes---- The DataflowPipelineRunner will now prefer the default project configuration produced by newer versions of the gcloud utility. Users with old gcloud clients are still supported. [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112281533
From: http://stackoverflow.com/a/4023351/1715495 ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112287922
Fix Javadoc issue in HourlyTeamScore pipeline. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112311676
initializationStateLock should be held for short, bounded amounts of time, because it is acquired on the dynamic work rebalancing code path (requestDynamicSplit) which must be effectively non-blocking. NativeReader.iterator() can do I/O and thus can take unbounded amount of time, so it shouldn't be done under the lock. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112375806
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112415033
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112466110
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112480742
This resolves the user issue on SO: http://stackoverflow.com/questions/34780459/runtimeexception-from-cloud-dataflow-related-to-serializing-coder Since Jackson 2.3, TypeIdResolvers were meant to implement this method since typeFromId(String) became deprecated. This newer versions of Jackson enforce this. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112487029
Custom unbounded readers are read in bundles of at most 10k elements or 10 seconds. A recent change accidentally removed the 10k element limit. This change reintroduces it and adds a test. The previous test also was passing vacuously because the iteration limit was incorrect (it would always have only one iteration). ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112723469
Adapt join-library module to be able to upload to maven-central
Updating version numbers from 1.4.0-SNAPSHOT to 1.5.0-SNAPSHOT ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=113022038
As in 6a11a72, this makes BigQueryIO.Read work in the DirectPipelineRunner as it does in the DataflowPipelineRunner. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112496161
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112515243
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112529131
Also updates /heapz so that it downloads the heapdump rather than just telling you where on the worker it is. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112535088
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112546981
This is a deterministic coder for ByteString. In the wholeStream context, it simply writes the string. Otherwise, it writes the string delimited with its length (encoded as a VarInt). ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112586805
----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112587034
Users who check out and edit the SDK in Eclipse should use m2e's Eclipse import wizard, and should not want to commit their actual project configurations. ----Release Notes---- [] ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=112597945
udim
pushed a commit
that referenced
this pull request
Aug 25, 2020
…odule of the Python SDK (#12657) * Add myself to authors * Add blog post #1: improved annotation support * Add draft of blog post #2: performance runtime type checking * Finish blog post #2 * Remove white space * Resolve PR comments Co-authored-by: Saavan Nanavati <[email protected]>
steveniemitz
referenced
this pull request
in twitter-forks/beam
Sep 18, 2020
Co-authored-by: steve <[email protected]> Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz
referenced
this pull request
in twitter-forks/beam
Sep 18, 2020
Co-authored-by: steve <[email protected]> Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz
referenced
this pull request
in twitter-forks/beam
Nov 7, 2020
Co-authored-by: steve <[email protected]> Co-authored-by: Kanishk Karanawat <[email protected]>
nikie
referenced
this pull request
in nikie/beam
Nov 17, 2020
pabloem
pushed a commit
that referenced
this pull request
Dec 1, 2020
…ovider for Python SDK * Support for NestedValueProvider for Python SDK * Fix typo * Update CHANGES.md * Update value_provider_test.py * Fix NestedValueProvider docstrings. (#1) * Fix isort and doc errors. (#2) * Update CHANGES.md Co-authored-by: Eugene Nikolaiev <[email protected]>
kennknowles
pushed a commit
that referenced
this pull request
Jan 25, 2021
pabloem
pushed a commit
that referenced
this pull request
Feb 17, 2021
Debeziumio PoC (#7) * New DebeziumIO class. * Merge connector code * DebeziumIO and MySqlConnector integrated. * Added FormatFuntion param to Read builder on DebeziumIO. * Added arguments checker to DebeziumIO. * Add simple JSON mapper object (#1) * Add simple JSON mapper object * Fixed Mapper. * Add SqlServer connector test * Added PostgreSql Connector Test PostgreSql now works with Json mapper * Added PostgreSql Connector Test PostgreSql now works with Json mapper * Fixing MySQL schema DataException Using file instead of schema should fix it * MySQL Connector updated from 1.3.0 to 1.3.1 Co-authored-by: osvaldo-salinas <[email protected]> Co-authored-by: Carlos Dominguez <[email protected]> Co-authored-by: Carlos Domínguez <[email protected]> * Add debeziumio tests * Debeziumio testing json mapper (#3) * Some code refactors. Use a default DBHistory if not provided * Add basic tests for Json mapper * Debeziumio time restriction (#5) * Add simple JSON mapper object * Fixed Mapper. * Add SqlServer connector test * Added PostgreSql Connector Test PostgreSql now works with Json mapper * Added PostgreSql Connector Test PostgreSql now works with Json mapper * Fixing MySQL schema DataException Using file instead of schema should fix it * MySQL Connector updated from 1.3.0 to 1.3.1 * Some code refactors. Use a default DBHistory if not provided * Adding based-time restriction Stop polling after specified amount of time * Add basic tests for Json mapper * Adding new restriction Uses a time-based restriction * Adding optional restrcition Uses an optional time-based restriction Co-authored-by: juanitodread <[email protected]> Co-authored-by: osvaldo-salinas <[email protected]> * Upgrade DebeziumIO connector (#4) * Address comments (Change dependencies to testCompile, Set JsonMapper/Coder as default, refactors) (#8) * Revert file * Change dependencies to testCompile * Move Counter sample to unit test * Set JsonMapper as default mapper function * Set String Coder as default coder when using JsonMapper * Change logs from info to debug * Debeziumio javadoc (#9) * Adding javadoc * Added some titles and examples * Added SourceRecordJson doc * Added Basic Connector doc * Added KafkaSourceConsumer doc * Javadoc cleanup * Removing BasicConnector No usages of this class were found overall * Editing documentation * Debeziumio fetched records restriction (#10) * Adding javadoc * Adding restriction by number of fetched records Also adding a quick-fix for null value within SourceRecords Minor fix on both MySQL and PostgreSQL Connectors Tests * Run either by time or by number of records * Added DebeziumOffsetTrackerTest Tests both restrictions: By amount of time and by Number of records * Removing comment * DebeziumIO test for DB2. (#11) * DebeziumIO test for DB2. * DebeziumIO javadoc. * Clean code:removed commented code lines on DebeziumIOConnectorTest.java * Clean code:removing unused imports and using readAsJson(). Co-authored-by: Carlos Domínguez <[email protected]> * Debezium limit records (now configurable) (#12) * Adding javadoc * Records Limit is now configurable (It was fixed before) * Debeziumio dockerize (#13) * Add mysql docker container to tests * Move debezium mysql integration test to its own file * Add assertion to verify that the results contains a record. * Debeziumio readme (#15) * Adding javadoc * Adding README file * Add number of records configuration to the DebeziumIO component (#16) * Code refactors (#17) * Remove/ignore null warnings * Remove DB2 code * Remove docker dependency in DebeziumIO unit test and max number of recods to MySql integration test * Change access modifiers accordingly * Remove incomplete integration tests (Postgres and SqlServer) * Add experimenal tag * Debezium testing stoppable consumer (#18) * Add try-catch-finally, stop SourceTask at finally. * Fix warnings * stopConsumer and processedRecords local variables removed. UT for task stop use case added * Fix minor code style issue Co-authored-by: juanitodread <[email protected]> * Fix style issues (check, spotlessApply) (#19) Co-authored-by: Osvaldo Salinas <[email protected]> Co-authored-by: alejandro.maguey <[email protected]> Co-authored-by: osvaldo-salinas <[email protected]> Co-authored-by: Carlos Dominguez <[email protected]> Co-authored-by: Carlos Domínguez <[email protected]> Co-authored-by: Carlos Domínguez <[email protected]> Co-authored-by: Alejandro Maguey <[email protected]> Co-authored-by: Hassan Reyes <[email protected]> Add missing apache license to README.md Enabling integration test for DebeziumIO (#20) Rename connector package cdc=>debezium. Update doc references (#21) Fix code style on DebeziumIOMySqlConnectorIT
steveniemitz
referenced
this pull request
in twitter-forks/beam
Mar 11, 2021
Co-authored-by: steve <[email protected]> Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz
referenced
this pull request
in twitter-forks/beam
Apr 26, 2021
Co-authored-by: steve <[email protected]> Co-authored-by: Kanishk Karanawat <[email protected]>
pabloem
pushed a commit
that referenced
this pull request
Jan 21, 2022
Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions from change streams and process them accordingly. This component receives a change stream name, a partition, a start time and an end time to query. It then initiates a change stream query with the received parameters. Within a change stream, 3 types of records can be received: 1. A Data record 2. A Heartbeat record 3. A Child partitions record Upon receiving #1, the function updates the watermark with the record's commit timestamp and emits the record into the output PCollection. Upon receiving #2, the function updates the watermark with the record's timestamp, but it does not emit any record into the PCollection. Upon receiving #3, the function updates the watermark with the record's timestamp and writes the new child partitions into the metadata table. These partitions will be later scheduled by the DetectNewPartitions component. Once the change stream query for the element partition finishes, it marks the partition as finished in the metadata table and terminates.
hengfengli
referenced
this pull request
in hengfengli/beam
Mar 21, 2022
4 tasks
damccorm
pushed a commit
that referenced
this pull request
Sep 2, 2022
* Tour of Beam frontend blank project * TOBF (ToB frontend): welcome screen (#226) * theme setup * Replaced ThemeProvider with ThemeSwitchNotifier * header with theme mode switcher and logo * page container with header & footer * theme mode tests * renamed the directory to tour-of-beam * compressed beam_logo.png * added missing license comments * rudimentary layout of the first screen * review comments fixes #1 * moved notifyListeners inside then * responsive todo * split into 2 simple functions * deleted redundant constants & replaced 2018 text theme with 2021 * styling refinement * home screen layout * clickable sign in text * font weights fix * removed _getBaseFontTheme function * fixed border and bg color * color fixes * difficulty component * _LastModuleBody * todo in test * footer border * fixed overflows * replaced Project prefix with Tob * replaced then with await * inferred type * started translation of the home screen * sorted translations * Complexity comments * comment fixes * home screen translations * sign in overlay * import fix * integration test does not fail * playground_components package with dismissible_overlay * missing license * removed _dots from build * widgets refinement * renamed home screen to welcome screen * deleted copyWith * _SdkButton * trailing comma & pubspec formatting * license and lints * license * removed license from .metadata * pubspec formatting * total lints update * changed from tour_of_beam to tour-of-beam in build.gradle.kts * license check * _SdkButton mimics Radio button * renamed MyApp to TourOfBeamApp * onChanged mimics Radio button Co-authored-by: darkhan.nausharipov <[email protected]> * removed whitespace from readme (issue-22583) * renamed "content" to "child" to mimic widgets * README in tour-of-beam * translation path rename, grey dot with opacity, footer link text style * report issue in github, grey dot color * table instead of row to clip the laptop image * horizontalHalves & verticalHalves * cropped laptop image * row in an expensive intrinsic height * laptop image in the bottom * ScreenBreakpoints * intrinsic height is not needed! * _WideWelcome and _NarrowWelcome * draft readme * blank line in readme * removed irrelevant info from readme * removed whitespace Co-authored-by: Alexey Inkin <[email protected]> Co-authored-by: darkhan.nausharipov <[email protected]> Co-authored-by: alexeyinkin <[email protected]>
pabloem
pushed a commit
that referenced
this pull request
Sep 24, 2022
* [Tour of Beam][Frontend][#22600] TourScreen layout * theme setup * Replaced ThemeProvider with ThemeSwitchNotifier * header with theme mode switcher and logo * page container with header & footer * theme mode tests * renamed the directory to tour-of-beam * compressed beam_logo.png * added missing license comments * rudimentary layout of the first screen * review comments fixes #1 * moved notifyListeners inside then * responsive todo * split into 2 simple functions * deleted redundant constants & replaced 2018 text theme with 2021 * styling refinement * home screen layout * clickable sign in text * font weights fix * removed _getBaseFontTheme function * fixed border and bg color * color fixes * difficulty component * _LastModuleBody * todo in test * footer border * fixed overflows * replaced Project prefix with Tob * replaced then with await * inferred type * started translation of the home screen * sorted translations * Complexity comments * comment fixes * home screen translations * sign in overlay * import fix * integration test does not fail * playground_components package with dismissible_overlay * missing license * removed _dots from build * widgets refinement * renamed home screen to welcome screen * deleted copyWith * _SdkButton * trailing comma & pubspec formatting * license and lints * license * removed license from .metadata * pubspec formatting * total lints update * changed from tour_of_beam to tour-of-beam in build.gradle.kts * license check * _SdkButton mimics Radio button * renamed MyApp to TourOfBeamApp * onChanged mimics Radio button Tour of Beam frontend blank project [Tour of Beam][Frontend][#22600] TourScreen layout TourScreen layout (#22600) common theme, constants, split view missing license flutter_gen, summary layout details content layout details no functional widgets in split view main screen todos & translation main screen todos & translation comment fixes #1 ExpansionTileWrapper SplitViewController lists in tour screen widgets comment fixes #1 (31.08) split view package in PGC fixed button overflow splitter theme color comment fixes #2 (31.08) gradlew check welcome screen overflow test (#22600) SDK dropdown (#22600) flexible complete unit OutlinedButton (#22600) renamed PageContainer to TobScaffold dropdown style refinement DropdownButton implicit type sdk instead of e licenses #22600 renamed _ShrinkedTour to _NarrowTour #22600 tour screen style refinement #22600 BeamDivider in PGC #22600 removed todo, added license #22600 built with text #22600 _WideWelcome with IntrinsicHeight (#22600) Co-Authored-By: darkhan.nausharipov <[email protected]> * addressing review comments #22600 replaced magic numbers #22600 comments (#22600) comments #22600 comments #22600 comments #22600 comments #22600 comments, flutter 3.3.0 upgrade #22600 renamed ActionPadding to ActionVerticalPadding #22600 actions formatting #22600 * branded sign in buttons #22600 * _BrandedSignInButtons #22600 * _Divider color #22600 * profile #22600 * moved split_view from PGC into ToB #22600 * indentation fix #22600 * split ProfileContent into widgets #22600 * Extract playground components to a separate package (#22600) * Minor fixes (#22600) * Address review issues (#22600) * Upgrade Flutter to v3.3.2 (#22600) * Add precommit Gradle task for playground_components, add code generation to frontend Gradle task, remove generated mocks, fix linter issues (#22600) * startTour button (#22600) * lint fixes (#22600) * Fix highlighting for Python and SCIO (#22600) Co-authored-by: darkhan.nausharipov <[email protected]>
4 tasks
15 tasks
pl04351820
pushed a commit
to pl04351820/beam
that referenced
this pull request
Dec 20, 2023
chore: add split repo templates
lostluck
added a commit
that referenced
this pull request
May 29, 2024
* [prism] Add basic processing time queue. * Initial residual handling refactor. * Re-work teststream initilization. Remove pending element race. * touch up * rm merge duplicate * Simplify watermark hold tracking. * First successful run! * Remove duplicated test run. * Deduplicate processing time heap. * rm debug text * Remove some debug prints, cleanup. * tiny todo cleanup * ProcessingTime workming most of the time! * Some cleanup * try to get github suite to pass #1 * touch * reduce counts a bit, filter tests some. * Clean up unrelated state changes. Clean up comments somewhat. * Filter out dataflow incompatible test. * Refine processing time event comment. * Remove test touch. --------- Co-authored-by: lostluck <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Initial contribution of the Google Cloud Dataflow Java SDK to Apache Beam.
Caveat: There is still a lot to do before this becomes usable as Apache Beam. In particular:
Beaming with joy ;-D