Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-50] Implement BigQueryIO.Write as a custom sink. #48

Closed
wants to merge 1 commit into from

Conversation

peihe
Copy link
Contributor

@peihe peihe commented Mar 15, 2016

commit: "Move tempLocation to PipelineOptions." is sent out as a separate pull request.
I am including it to have diff base, and will remove it once the other pull request is merged.

@peihe peihe changed the title Implement BigQueryIO.Write as a custom sink. [BEAM-50] Implement BigQueryIO.Write as a custom sink. Mar 15, 2016
@davorbonaci
Copy link
Member

R: @dhalperi

@peihe peihe force-pushed the custom-bigquery-sink branch from 467a924 to d252f4c Compare March 16, 2016 23:38
@dhalperi
Copy link
Contributor

@peihe Can you rebase? I don't want GitHub to lose all comments across the rename.

@peihe
Copy link
Contributor Author

peihe commented Mar 28, 2016

@dhalperi
Done, merged with the master

@@ -778,6 +786,7 @@ private Bound(String name, TableReference ref,
this.createDisposition = createDisposition;
this.writeDisposition = writeDisposition;
this.validate = validate;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dhalperi
Copy link
Contributor

Overall, looks pretty good. I'd like to see some more comments -- lots of things going on in this file -- and more sanity checks.

I'd like to see some end-to-end tests of BigQueryIO.Write -- e.g., the write data is actually written.

@peihe peihe force-pushed the custom-bigquery-sink branch 2 times, most recently from a7d6519 to 0141f7f Compare March 30, 2016 20:15
BackOff backoff = new AttemptBoundedExponentialBackOff(
MAX_LOAD_JOB_RPC_ATTEMPTS, INITIAL_LOAD_JOB_RPC_BACKOFF_MILLIS);
String retryingJobId = jobId + "-" + i;
insertLoadJob(retryingJobId, loadConfig, Sleeper.DEFAULT, backoff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming: this will block until the load job itself succeeds, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no
pollJobStatus() blocks until the job is done(SUCCEEDED or FAILED) or it exceeds max retries (UNKNOWN)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant finishes. thanks,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log job ID -- attempting to create BigQuery load job?

@peihe peihe force-pushed the custom-bigquery-sink branch from 0141f7f to b70452b Compare March 31, 2016 00:19
CreateDisposition createDisposition,
String tempFile,
Coder<TableRow> coder) {
super(tempFile, "" /* extension */);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not json as extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.avro, right?
done

} catch (IOException e) {
throw new RuntimeException(e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@peihe
Copy link
Contributor Author

peihe commented Apr 7, 2016

Ready for another look.
(Travis fails for TextIOTest, seems unrelated)

CreateDisposition createDisposition, WriteDisposition writeDisposition,
boolean validate, BigQueryServices testBigQueryServices) {
super(name);
this.jsonTableRef = jsonTableRef;
this.tableRefFunction = tableRefFunction;
this.schema = schema;
this.jsonSchema = jsonSchema;
this.createDisposition = createDisposition;
this.writeDisposition = writeDisposition;
this.validate = validate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkNotNulls for all non-@Nullable fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the tests now fail is a symptom, I think, that we did not set the defaults in the initial constructor. But feel free to revert if it's not easy to fix, as this is only a minor user win.

@dhalperi
Copy link
Contributor

dhalperi commented Apr 7, 2016

  1. If you sink and push, Travis and Jenkins should pass
  2. LGTM

I would, however, like you to manually disable the backend IO translation for Dataflow service and run the 1T import integration test (manually is fine for now).

That reasonable?

@peihe peihe force-pushed the custom-bigquery-sink branch from 72c1fe7 to 24fc00e Compare April 8, 2016 17:56
@peihe
Copy link
Contributor Author

peihe commented Apr 11, 2016

Verified with 0M, 1M, 1G, 1T bq imports.
Ready to merge

Thanks @dhalperi

peihe added a commit to peihe/DataflowJavaSDK that referenced this pull request Apr 11, 2016
Create BigQueryServices interface and added BigQueryIO pipeline tests.
Removed BigQueryIO.Write evaluator.
@peihe peihe force-pushed the custom-bigquery-sink branch from a63b339 to 26a4eb1 Compare April 11, 2016 21:32
@asfgit asfgit closed this in 936d028 Apr 11, 2016
@davorbonaci
Copy link
Member

Yay!

peihe added a commit to peihe/DataflowJavaSDK that referenced this pull request Apr 13, 2016
peihe added a commit to peihe/DataflowJavaSDK that referenced this pull request Apr 13, 2016
dhalperi pushed a commit to GoogleCloudPlatform/DataflowJavaSDK that referenced this pull request Apr 14, 2016
dhalperi pushed a commit to GoogleCloudPlatform/DataflowJavaSDK that referenced this pull request Apr 14, 2016
@peihe peihe deleted the custom-bigquery-sink branch April 15, 2016 18:55
peihe added a commit to peihe/DataflowJavaSDK that referenced this pull request Apr 26, 2016
axelmagn pushed a commit to axelmagn/beam that referenced this pull request Mar 30, 2018
Wire state request handler for streaming.
mareksimunek pushed a commit to mareksimunek/beam that referenced this pull request May 9, 2018
dmvk pushed a commit to dmvk/beam that referenced this pull request May 15, 2018
dmvk referenced this pull request in seznam/beam Aug 17, 2018
dmvk referenced this pull request in seznam/beam Oct 5, 2018
charlesccychen pushed a commit to cosmoskitten/beam that referenced this pull request Oct 8, 2018
[BEAM-3900] Kryo - build and test fix
hengfengli referenced this pull request in hengfengli/beam Mar 21, 2022
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants