Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-7] Initial Dataflow code drop #1

Merged
merged 1,575 commits into from
Feb 26, 2016
Merged

[BEAM-7] Initial Dataflow code drop #1

merged 1,575 commits into from
Feb 26, 2016

Conversation

francesperry
Copy link
Member

Initial contribution of the Google Cloud Dataflow Java SDK to Apache Beam.

Caveat: There is still a lot to do before this becomes usable as Apache Beam. In particular:

  • Reorganize directories.
  • Incorporate additional drops by Google, Cloudera, and dataArtisans.
  • Make major backwards incompatible API changes.
  • Rename from Dataflow to Beam.

Beaming with joy ;-D

peihe and others added 30 commits January 15, 2016 10:13
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112105439
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112118850
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112173826
Users should not need to compare DataflowAssert objects on Java equality.
Instead, it's nearly always a broken test that will silently fail.

Throw an UnsupportedOperationException instead, and direct users to
isEqualTo (Singleton) or containsInAnyOrder (Iterable).

This change caught a broken test.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112200184
Generalize the 'game' example BigQuery write classes to take a map that specifies how
to generate the output fields.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112253306
Some tools don't support .zip in the class path.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112261905
gcloud moved where it stores the credentials configured on the command line.
Since there is still no support in standard libraries to get the default
project, update DefaultProjectFactory to support the new location.

Note that users who have not upgraded gcloud are still supported.

----Release Notes----
The DataflowPipelineRunner will now prefer the default project configuration
produced by newer versions of the gcloud utility. Users with old gcloud clients
are still supported.
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112281533
From: http://stackoverflow.com/a/4023351/1715495

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112287922
Fix Javadoc issue in HourlyTeamScore pipeline.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112311676
initializationStateLock should be held for short, bounded amounts of time,
because it is acquired on the dynamic work rebalancing code path
(requestDynamicSplit) which must be effectively non-blocking.
NativeReader.iterator() can do I/O and thus can take unbounded amount
of time, so it shouldn't be done under the lock.
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112375806
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112415033
----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112466110
----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112480742
This resolves the user issue on SO:
http://stackoverflow.com/questions/34780459/runtimeexception-from-cloud-dataflow-related-to-serializing-coder
Since Jackson 2.3, TypeIdResolvers were meant to implement
this method since typeFromId(String) became deprecated.
This newer versions of Jackson enforce this.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112487029
Custom unbounded readers are read in bundles of at most
10k elements or 10 seconds. A recent change accidentally removed
the 10k element limit. This change reintroduces it and
adds a test.

The previous test also was passing vacuously because
the iteration limit was incorrect (it would always
have only one iteration).
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112723469
Adapt join-library module to be able to upload to maven-central
Updating version numbers from 1.4.0-SNAPSHOT to 1.5.0-SNAPSHOT

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=113022038
As in 6a11a72, this makes BigQueryIO.Read work in the
DirectPipelineRunner as it does in the DataflowPipelineRunner.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112496161
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112515243
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112529131
Also updates /heapz so that it downloads the heapdump rather than just
telling you where on the worker it is.

----Release Notes----

[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112535088
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112546981
This is a deterministic coder for ByteString. In the
wholeStream context, it simply writes the string. Otherwise,
it writes the string delimited with its length (encoded as a
VarInt).

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112586805
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112587034
Users who check out and edit the SDK in Eclipse should
use m2e's Eclipse import wizard, and should not want to
commit their actual project configurations.

----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=112597945
udim pushed a commit that referenced this pull request Aug 25, 2020
…odule of the Python SDK (#12657)

* Add myself to authors

* Add blog post #1: improved annotation support

* Add draft of blog post #2: performance runtime type checking

* Finish blog post #2

* Remove white space

* Resolve PR comments

Co-authored-by: Saavan Nanavati <[email protected]>
steveniemitz referenced this pull request in twitter-forks/beam Sep 18, 2020
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz referenced this pull request in twitter-forks/beam Sep 18, 2020
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz referenced this pull request in twitter-forks/beam Nov 7, 2020
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
nikie referenced this pull request in nikie/beam Nov 17, 2020
pabloem pushed a commit that referenced this pull request Dec 1, 2020
…ovider for Python SDK

* Support for NestedValueProvider for Python SDK

* Fix typo

* Update CHANGES.md

* Update value_provider_test.py

* Fix NestedValueProvider docstrings. (#1)

* Fix isort and doc errors. (#2)

* Update CHANGES.md

Co-authored-by: Eugene Nikolaiev <[email protected]>
kennknowles pushed a commit that referenced this pull request Jan 25, 2021
pabloem pushed a commit that referenced this pull request Feb 17, 2021
Debeziumio PoC (#7)

* New DebeziumIO class.

* Merge connector code

* DebeziumIO and MySqlConnector integrated.

* Added FormatFuntion param to Read builder on DebeziumIO.

* Added arguments checker to DebeziumIO.

* Add simple JSON mapper object (#1)

* Add simple JSON mapper object

* Fixed Mapper.

* Add SqlServer connector test

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Fixing MySQL schema DataException

Using file instead of schema should fix it

* MySQL Connector updated from 1.3.0 to 1.3.1

Co-authored-by: osvaldo-salinas <[email protected]>
Co-authored-by: Carlos Dominguez <[email protected]>
Co-authored-by: Carlos Domínguez <[email protected]>

* Add debeziumio tests

* Debeziumio testing json mapper (#3)

* Some code refactors. Use a default DBHistory if not provided

* Add basic tests for Json mapper

* Debeziumio time restriction (#5)

* Add simple JSON mapper object

* Fixed Mapper.

* Add SqlServer connector test

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Added PostgreSql Connector Test

PostgreSql now works with Json mapper

* Fixing MySQL schema DataException

Using file instead of schema should fix it

* MySQL Connector updated from 1.3.0 to 1.3.1

* Some code refactors. Use a default DBHistory if not provided

* Adding based-time restriction

Stop polling after specified amount of time

* Add basic tests for Json mapper

* Adding new restriction

Uses a time-based restriction

* Adding optional restrcition

Uses an optional time-based restriction

Co-authored-by: juanitodread <[email protected]>
Co-authored-by: osvaldo-salinas <[email protected]>

* Upgrade DebeziumIO connector (#4)

* Address comments (Change dependencies to testCompile, Set JsonMapper/Coder as default, refactors) (#8)

* Revert file

* Change dependencies to testCompile
* Move Counter sample to unit test

* Set JsonMapper as default mapper function
* Set String Coder as default coder when using JsonMapper
* Change logs from info to debug

* Debeziumio javadoc (#9)

* Adding javadoc

* Added some titles and examples

* Added SourceRecordJson doc

* Added Basic Connector doc

* Added KafkaSourceConsumer doc

* Javadoc cleanup

* Removing BasicConnector

No usages of this class were found overall

* Editing documentation

* Debeziumio fetched records restriction (#10)

* Adding javadoc

* Adding restriction by number of fetched records

Also adding a quick-fix for null value within SourceRecords
Minor fix on both MySQL and PostgreSQL Connectors Tests

* Run either by time or by number of records

* Added DebeziumOffsetTrackerTest

Tests both restrictions: By amount of time and by Number of records

* Removing comment

* DebeziumIO test for DB2. (#11)

* DebeziumIO test for DB2.

* DebeziumIO javadoc.

* Clean code:removed commented code lines on DebeziumIOConnectorTest.java

* Clean code:removing unused imports and using readAsJson().

Co-authored-by: Carlos Domínguez <[email protected]>

* Debezium limit records (now configurable) (#12)

* Adding javadoc

* Records Limit is now configurable

(It was fixed before)

* Debeziumio dockerize (#13)

* Add mysql docker container to tests

* Move debezium mysql integration test to its own file

* Add assertion to verify that the results contains a record.

* Debeziumio readme (#15)

* Adding javadoc

* Adding README file

* Add number of records configuration to the DebeziumIO component (#16)

* Code refactors (#17)

* Remove/ignore null warnings

* Remove DB2 code

* Remove docker dependency in DebeziumIO unit test and max number of recods to MySql integration test

* Change access modifiers accordingly

* Remove incomplete integration tests (Postgres and SqlServer)

* Add experimenal tag

* Debezium testing stoppable consumer (#18)

* Add try-catch-finally, stop SourceTask at finally.

* Fix warnings

* stopConsumer and processedRecords local variables removed. UT for task stop use case added

* Fix minor code style issue

Co-authored-by: juanitodread <[email protected]>

* Fix style issues (check, spotlessApply) (#19)

Co-authored-by: Osvaldo Salinas <[email protected]>
Co-authored-by: alejandro.maguey <[email protected]>
Co-authored-by: osvaldo-salinas <[email protected]>
Co-authored-by: Carlos Dominguez <[email protected]>
Co-authored-by: Carlos Domínguez <[email protected]>
Co-authored-by: Carlos Domínguez <[email protected]>
Co-authored-by: Alejandro Maguey <[email protected]>
Co-authored-by: Hassan Reyes <[email protected]>

Add missing apache license to README.md

Enabling integration test for DebeziumIO (#20)

Rename connector package cdc=>debezium. Update doc references (#21)

Fix code style on DebeziumIOMySqlConnectorIT
steveniemitz referenced this pull request in twitter-forks/beam Mar 11, 2021
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
steveniemitz referenced this pull request in twitter-forks/beam Apr 26, 2021
Co-authored-by: steve <[email protected]>
Co-authored-by: Kanishk Karanawat <[email protected]>
pabloem pushed a commit that referenced this pull request Jan 21, 2022
Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions
from change streams and process them accordingly. This component
receives a change stream name, a partition, a start time and an end time
to query. It then initiates a change stream query with the received
parameters.

Within a change stream, 3 types of records can be received:

1. A Data record
2. A Heartbeat record
3. A Child partitions record

Upon receiving #1, the function updates the watermark with the record's
commit timestamp and emits the record into the output PCollection.
Upon receiving #2, the function updates the watermark with the record's
timestamp, but it does not emit any record into the PCollection.
Upon receiving #3, the function updates the watermark with the record's
timestamp and writes the new child partitions into the metadata table.
These partitions will be later scheduled by the DetectNewPartitions
component.

Once the change stream query for the element partition finishes, it
marks the partition as finished in the metadata table and terminates.
robertwb pushed a commit that referenced this pull request May 6, 2022
@bullet03 bullet03 mentioned this pull request Jul 14, 2022
4 tasks
damccorm pushed a commit that referenced this pull request Sep 2, 2022
* Tour of Beam frontend blank project

* TOBF (ToB frontend): welcome screen (#226)

* theme setup

* Replaced ThemeProvider with ThemeSwitchNotifier

* header with theme mode switcher and logo

* page container with header & footer

* theme mode tests

* renamed the directory to tour-of-beam

* compressed beam_logo.png

* added missing license comments

* rudimentary layout of the first screen

* review comments fixes #1

* moved notifyListeners inside then

* responsive todo

* split into 2 simple functions

* deleted redundant constants &
replaced 2018 text theme with 2021

* styling refinement

* home screen layout

* clickable sign in text

* font weights fix

* removed _getBaseFontTheme function

* fixed border and bg color

* color fixes

* difficulty component

* _LastModuleBody

* todo in test

* footer border

* fixed overflows

* replaced Project prefix with Tob

* replaced then with await

* inferred type

* started translation of the home screen

* sorted translations

* Complexity comments

* comment fixes

* home screen translations

* sign in overlay

* import fix

* integration test does not fail

* playground_components package with
dismissible_overlay

* missing license

* removed _dots from build

* widgets refinement

* renamed home screen to welcome screen

* deleted copyWith

* _SdkButton

* trailing comma & pubspec formatting

* license and lints

* license

* removed license from .metadata

* pubspec formatting

* total lints update

* changed from tour_of_beam to
tour-of-beam in build.gradle.kts

* license check

* _SdkButton mimics Radio button

* renamed MyApp to TourOfBeamApp

* onChanged mimics Radio button

Co-authored-by: darkhan.nausharipov <[email protected]>

* removed whitespace from readme (issue-22583)

* renamed "content" to "child" to mimic widgets

* README in tour-of-beam

* translation path rename,
grey dot with opacity,
footer link text style

* report issue in github, grey dot color

* table instead of row to clip the laptop image

* horizontalHalves & verticalHalves

* cropped laptop image

* row in an expensive intrinsic height

* laptop image in the bottom

* ScreenBreakpoints

* intrinsic height is not needed!

* _WideWelcome and _NarrowWelcome

* draft readme

* blank line in readme

* removed irrelevant info from readme

* removed whitespace

Co-authored-by: Alexey Inkin <[email protected]>
Co-authored-by: darkhan.nausharipov <[email protected]>
Co-authored-by: alexeyinkin <[email protected]>
pabloem pushed a commit that referenced this pull request Sep 24, 2022
* [Tour of Beam][Frontend][#22600] TourScreen layout

* theme setup

* Replaced ThemeProvider with ThemeSwitchNotifier

* header with theme mode switcher and logo

* page container with header & footer

* theme mode tests

* renamed the directory to tour-of-beam

* compressed beam_logo.png

* added missing license comments

* rudimentary layout of the first screen

* review comments fixes #1

* moved notifyListeners inside then

* responsive todo

* split into 2 simple functions

* deleted redundant constants &
replaced 2018 text theme with 2021

* styling refinement

* home screen layout

* clickable sign in text

* font weights fix

* removed _getBaseFontTheme function

* fixed border and bg color

* color fixes

* difficulty component

* _LastModuleBody

* todo in test

* footer border

* fixed overflows

* replaced Project prefix with Tob

* replaced then with await

* inferred type

* started translation of the home screen

* sorted translations

* Complexity comments

* comment fixes

* home screen translations

* sign in overlay

* import fix

* integration test does not fail

* playground_components package with
dismissible_overlay

* missing license

* removed _dots from build

* widgets refinement

* renamed home screen to welcome screen

* deleted copyWith

* _SdkButton

* trailing comma & pubspec formatting

* license and lints

* license

* removed license from .metadata

* pubspec formatting

* total lints update

* changed from tour_of_beam to
tour-of-beam in build.gradle.kts

* license check

* _SdkButton mimics Radio button

* renamed MyApp to TourOfBeamApp

* onChanged mimics Radio button

Tour of Beam frontend blank project

[Tour of Beam][Frontend][#22600] TourScreen layout

TourScreen layout (#22600)

common theme, constants, split view

missing license

flutter_gen, summary layout details

content layout details

no functional widgets in split view

main screen todos & translation

main screen todos & translation

comment fixes #1

ExpansionTileWrapper

SplitViewController

lists in tour screen widgets

comment fixes #1 (31.08)

split view package in PGC

fixed button overflow

splitter theme color

comment fixes #2 (31.08)

gradlew check

welcome screen overflow test (#22600)

SDK dropdown (#22600)

flexible complete unit OutlinedButton (#22600)

renamed PageContainer to TobScaffold

dropdown style refinement

DropdownButton implicit type

sdk instead of e

licenses #22600

renamed _ShrinkedTour to _NarrowTour #22600

tour screen style refinement #22600

BeamDivider in PGC #22600

removed todo, added license #22600

built with text #22600

_WideWelcome with IntrinsicHeight (#22600)

Co-Authored-By: darkhan.nausharipov <[email protected]>

* addressing review comments #22600

replaced magic numbers #22600

comments (#22600)

comments #22600

comments #22600

comments #22600

comments #22600

comments, flutter 3.3.0 upgrade #22600

renamed ActionPadding to ActionVerticalPadding #22600

actions formatting #22600

* branded sign in buttons #22600

* _BrandedSignInButtons #22600

* _Divider color #22600

* profile #22600

* moved split_view from PGC into ToB #22600

* indentation fix #22600

* split ProfileContent into widgets #22600

* Extract playground components to a separate package (#22600)

* Minor fixes (#22600)

* Address review issues (#22600)

* Upgrade Flutter to v3.3.2 (#22600)

* Add precommit Gradle task for playground_components, add code generation to frontend Gradle task, remove generated mocks, fix linter issues (#22600)

* startTour button (#22600)

* lint fixes (#22600)

* Fix highlighting for Python and SCIO (#22600)

Co-authored-by: darkhan.nausharipov <[email protected]>
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
lostluck added a commit that referenced this pull request May 29, 2024
* [prism] Add basic processing time queue.

* Initial residual handling refactor.

* Re-work teststream initilization. Remove pending element race.

* touch up

* rm merge duplicate

* Simplify watermark hold tracking.

* First successful run!

* Remove duplicated test run.

* Deduplicate processing time heap.

* rm debug text

* Remove some debug prints, cleanup.

* tiny todo cleanup

* ProcessingTime workming most of the time!

* Some cleanup

* try to get github suite to pass #1

* touch

* reduce counts a bit, filter tests some.

* Clean up unrelated state changes. Clean up comments somewhat.

* Filter out dataflow incompatible test.

* Refine processing time event comment.

* Remove test touch.

---------

Co-authored-by: lostluck <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.