-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Troubleshooting Guide #176
Troubleshooting Guide #176
Conversation
...igquery/src/main/java/com/google/cloud/flink/bigquery/sink/writer/BigQueryDefaultWriter.java
Outdated
Show resolved
Hide resolved
@@ -264,7 +264,7 @@ static String timestampRestrictionFromPartitionType( | |||
// extract a datetime from the value and restrict | |||
// between previous and next hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know support for unbounded read is going to be removed. But this was a very simple and easy change to fix reading of Time Partitions based on DAY, MONTH and YEAR.
...va/com/google/cloud/flink/bigquery/source/reader/deserializer/AvroDeserializationSchema.java
Outdated
Show resolved
Hide resolved
...ogle/cloud/flink/bigquery/source/reader/deserializer/AvroToRowDataDeserializationSchema.java
Outdated
Show resolved
Hide resolved
...om/google/cloud/flink/bigquery/source/reader/deserializer/BigQueryDeserializationSchema.java
Outdated
Show resolved
Hide resolved
TROUBLESHOOT.md
Outdated
- It is also expected that the value passed in the Avro Generic Record follows the Schema. | ||
Here the “records passed” indicates the modified records after passing through the series of | ||
subtasks defined in the application pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This info is well captured in preceding and next points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value in the avro record might differ from the avro schema of the Generic Record (since Flink Does not impose any check on the value of the field) This is the error faced by GMF very early on in testing the connector when they accidentally passed the wrong value INTEGER
in an ARRAY
type field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a problem where we must prompt the user to ensure that their destination table matches with the schema of records received by the sink.
The statement we should remove is ..
Here the “records passed” indicates the modified records after passing through the series of
subtasks defined in the application pipeline.
.. because this doesn't prompt a schema check, instead says something fairly obvious that doesn't add much value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, removed.
89b1e1a
to
89050de
Compare
|
||
public AvroDeserializationSchema(String avroSchemaString) { | ||
this.avroSchemaString = avroSchemaString; | ||
} | ||
|
||
@Override | ||
public GenericRecord deserialize(GenericRecord record) throws IOException { | ||
public GenericRecord deserialize(GenericRecord record) throws BigQueryConnectorException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You dont need to throw any exception in this method's definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
try { | ||
if (deserialize != null) { | ||
out.collect(deserialize); | ||
} | ||
} catch (Exception e) { | ||
LOG.error( | ||
String.format( | ||
"Failed to forward the deserialized record %s to the next operator.%nError %s%nCause %s", | ||
deserialize, e.getMessage(), e.getCause())); | ||
throw new BigQueryConnectorException( | ||
"Failed to forward the deserialized record to the next operator.", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce nesting as much as you can.
How about:
if (deserialize == null) {
return;
}
try {
out.collect(deserialize);
} catch (Exception e) {
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
// Reset the "Since Checkpoint" values to 0. | ||
numberOfRecordsBufferedByBigQuerySinceCheckpoint.dec( | ||
numberOfRecordsBufferedByBigQuerySinceCheckpoint.getCount()); | ||
numberOfRecordsSeenByWriterSinceCheckpoint.dec( | ||
numberOfRecordsSeenByWriterSinceCheckpoint.getCount()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is being tracked in a separate PR. Please remove from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
TROUBLESHOOT.md
Outdated
- The problem lies with the pipeline, the previous chain of subtasks that are performed before | ||
sink is called. | ||
- The pipeline is not processing and passing the records forward for the sink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely not an issue in the sink, since previous subtasks are not passing records forward for the sink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
TROUBLESHOOT.md
Outdated
#### The records are arriving at the sink but not being successfully written to BigQuery. | ||
Check the logs or error message for the following errors: | ||
#### `BigQuerySerializationException` | ||
- This message illustrates that the record(s) could not be serialized by the connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record
not record(s)
since serialize exception for every record will be logged individually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
TROUBLESHOOT.md
Outdated
Check the logs or error message for the following errors: | ||
#### `BigQuerySerializationException` | ||
- This message illustrates that the record(s) could not be serialized by the connector. | ||
- The error message would also contain the actual cause for the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed
TROUBLESHOOT.md
Outdated
- Note: This error is not thrown but logged, | ||
indicating that the connector was "Unable to serialize record" due to this error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This error is logged not thrown, explaining why the record could not be serialized.
- In future, this will be supplemented with dead letter queues.
Also, please mention logged not thrown
in bold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
TROUBLESHOOT.md
Outdated
- It is also expected that the value passed in the Avro Generic Record follows the Schema. | ||
Here the “records passed” indicates the modified records after passing through the series of | ||
subtasks defined in the application pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a problem where we must prompt the user to ensure that their destination table matches with the schema of records received by the sink.
The statement we should remove is ..
Here the “records passed” indicates the modified records after passing through the series of
subtasks defined in the application pipeline.
.. because this doesn't prompt a schema check, instead says something fairly obvious that doesn't add much value
@clmccart Pls review this PR. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please merge after @clmccart's approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the changes that arent the troubleshooting guide supposed to be included in this PR?
TROUBLESHOOT.md
Outdated
### Records are not being written to BigQuery | ||
With the help of metrics available as a part of 0.4.0 release of the connector, | ||
users should be able to track the number of records that enter the sink(writer) and the | ||
number of records successfully written to BigQuery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets reference the two metric names here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
users should not face this error. | ||
Users might face this error in case custom serializer is used. | ||
|
||
## Known Issues/Limitations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also reference the 100 max parallelism here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have mentioned this in the Readme, should we mention it here as well?
@clmccart Yep, some error messages documented in the troubleshooting guide and minor bugs are fixed as a part of these PR as well. |
/gcbrun