-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-10137] Add KinesisIO for cross-language usage with python wrapper #12297
Conversation
ab6da5a
to
b0afbe6
Compare
R: @TheNeuralBit Could I ask you for review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for doing this out of order! I meant to look at Write but I got halfway through reviewing the Read code before realizing my mistake.
I have a few initial comments
...a/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisReadTransformRegistrar.java
Outdated
Show resolved
Hide resolved
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
Outdated
Show resolved
Hide resolved
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
Outdated
Show resolved
Hide resolved
2c1c700
to
613cf8a
Compare
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
Show resolved
Hide resolved
4786d19
to
24e2a7f
Compare
@TheNeuralBit I've moved all commits to this PR. I have a feeling that big PR is better than few smaller PRs dependent of each other - they are much less maintainable. I'm used to Gerrit where each dependent commit is a PR rebased on the previous commit. I tried to use testcontainers with localstack but I encountered a problem that I can create, read and write streams via boto3, but I'm unable to use it with Beam. What I found is that kinesis java sdk by default uses CBOR which can be disabled by executing java program with kinesis sdk with executing it with -Dcom.amazonaws.sdk.disableCertChecking=true. But I don't have an idea how to do that in beam's execution model - do you have any idea or suggestion how could I do that? In general the tests before introducing localstack work correctly. Commits after the localstack introduction don't work, I just put vanilla kinesis test to indicate that the container seems to work properly. Edit: write test is now working after adding disabling certificate verification option. |
bc57f6d
to
ee2e78b
Compare
@TheNeuralBit I did my best to make read test work but without success. I tried what people suggested here localstack/localstack#507 and https://stackoverflow.com/a/56867870/7946496 but nothing worked. When I use http in service_endpoint, which works for boto3 python sdk I get error: When I use https in the transform argument service_endpoint I get Setting these values in the main method of expansion service also had no effect.
So I'm a bit stuck. How about skipping the read test and creating a Jira issue to make it work? I think that it's possible to run this test with localstack but I'm missing something. I've spent a lot of time on it already and I doubt I'll come up with something. |
aa6090d
to
18b5475
Compare
@TheNeuralBit Sorry for lots of messages but every day brings something new. I've managed to modify existing KinesisIOIT to work with localstack (I'll push it in a separate PR). I've also managed to run cross-language read test. But it required putting this code anywhere in harness code, which is definitely not any solution because it affects whole beam:
I have no idea how to dynamically inject these except for creating custom beam java sdk docker image with modified harness code especially for this (read) test. I don't think it's an acceptable solution though. Do you possibly have a better idea? If this is the only thing we could do I'd just skip read test and just execute only write instead in the postcommit. |
ac912f2
to
9145e29
Compare
7bf5e80
to
42c62b1
Compare
@chamikaramj Done |
Codecov Report
@@ Coverage Diff @@
## master #12297 +/- ##
==========================================
- Coverage 40.22% 40.10% -0.13%
==========================================
Files 454 455 +1
Lines 53669 54025 +356
==========================================
+ Hits 21587 21665 +78
- Misses 32082 32360 +278
Continue to review full report at Codecov.
|
Run Python 3.8 PostCommit |
Run Typescript PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
We can merge when tests pass.
Run Python PreCommit |
Run Typescript PreCommit |
Run Python PreCommit |
1 similar comment
Run Python PreCommit |
Thanks for make it finally merged. |
I'm getting this error when trying
Seems like beam-sdks-java-io-kinesis-expansion-service no longer exists on Maven? |
@yuanhunglo ReadDataFromKinesis transform will be available in the 2.25.0 release. It was merged too late to make it to the 2.24.0. If you want to run it using 2.25 code then you need to build the expansion service using
Then ReadDataFromKinesis should be able to detect the expansion service jar in its default path |
I tried that and received this error. I'll wait till 2.25.0 release to try again. Thanks!
|
Sorry, I forgot there is some additional setup needed to run it locally. You also need to build the jar of the runner you are using - example for Flink (you don't need it for DirectRunner)
Current python container:
Building java sdk container should not be needed - pulling the latest (2.24) from docker hub should be sufficient. But to be on the safe side:
It will definitely be the easiest to wait for 2.25 but on the other hand it's a long time. |
This transform sends bytes stream instead of the whole KinesisRecord
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.