Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Destination Postgres: fix \u0000(NULL) value processing #5336

Merged
merged 13 commits into from
Aug 30, 2021

Conversation

DoNotPanicUA
Copy link
Contributor

@DoNotPanicUA DoNotPanicUA commented Aug 11, 2021

What

Fix #3476

In addition, small rework of jdbc-destination and move specific implementation to the PostgresSqlOperations.
// todo (cgardens) - move this into a postgres version of this. this syntax is postgres-specific

How

Replace all \u0000 unicode values from airbyte data messages

Recommended reading order

  1. PostgresSqlOperations.java
  2. others

Pre-merge Checklist

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions
  • Connector version bumped like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Aug 11, 2021
@sherifnada sherifnada requested review from subodh1810 and removed request for jrhizor, ChristopheDuong and cgardens August 12, 2021 04:32
@sherifnada
Copy link
Contributor

@subodh1810 would you be able to take a look at this one?

@subodh1810
Copy link
Contributor

Could not look into it today, will look into it tomorrow first thing

Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 1 comment.

private List<AirbyteRecordMessage> formatRecords(List<AirbyteRecordMessage> records) {
// Postgres fails if json contains \u0000 unicode (NULL) in a json.
records.forEach(airbyteRecordMessage -> airbyteRecordMessage
.setData(Jsons.deserialize(Jsons.serialize(airbyteRecordMessage.getData()).replaceAll("\\\\u0000", ""))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine but my concern is that doing the Jsons.deserialize(Jsons.serialize for each record here is going to have a performance impact. How about we move this to BufferedStreamConsumer. We already have a string conversion here so it would save us from doing the serialization twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Thanks ;)

@subodh1810 subodh1810 self-requested a review August 24, 2021 18:17
@AllanSituma
Copy link

Do we sort of have a timeline when this will be ready? I am currently experiencing this issue and I can't replicate a very important table.Thanks.

@subodh1810
Copy link
Contributor

subodh1810 commented Aug 25, 2021

/test connector=destination-postgres

🕑 destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1167618494
✅ destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1167618494

@subodh1810
Copy link
Contributor

@DoNotPanicUA can you trigger the tests for all the connectors that use BufferedStreamConsumer.java like I did here so that we are sure nothing would break
#5336 (comment)

@jrhizor jrhizor temporarily deployed to more-secrets August 25, 2021 18:16 Inactive
adaptValueNodes(null, rootNode, null);
}

private void adaptValueNodes(String fieldName, JsonNode node, JsonNode parentNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am not sure I follow the logic of this method. When can node.isValueNode() be true? Also I dont like the fact that fieldName can be null and we dont have a null check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if an element contains a value - it's a value node. An element also might be an array or object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DoNotPanicUA can you put a comment on this method explaining how it works so that its clear for anyone who is reading this code for the first time

// TODO Truncate json data instead of throwing whole record away?
// or should we upload it into a special rejected record folder in s3 instead?
var emittedAt = Timestamp.from(Instant.ofEpochMilli(recordMessage.getEmittedAt()));
pairToCopier.get(pair).write(id, data, emittedAt);
pairToCopier.get(pair).write(id, Jsons.serialize(recordMessage.getData()), emittedAt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem, we are not going to do Jsons.serialize twice for Redshift. First in the isValidData method and then here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, it's designed in a way that we don't have many options. Original implementation does additional serialization for all destinations. Here we have one additional serialization for one destination and only for the Copy flow. So, we already do a significant improvement here.
I propose to create a new issue for the RedShift destination improvement in order to unblock the Postgres issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Please create a follow up issue to resolve this

Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining the logic for method adaptValueNodes

adaptValueNodes(null, rootNode, null);
}

private void adaptValueNodes(String fieldName, JsonNode node, JsonNode parentNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DoNotPanicUA can you put a comment on this method explaining how it works so that its clear for anyone who is reading this code for the first time

@DoNotPanicUA
Copy link
Contributor Author

DoNotPanicUA commented Aug 30, 2021

/test connector=destination-postgres

🕑 destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1182952718
✅ destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1182952718

@DoNotPanicUA
Copy link
Contributor Author

DoNotPanicUA commented Aug 30, 2021

/test connector=destination-meilisearch

🕑 destination-meilisearch https://github.com/airbytehq/airbyte/actions/runs/1182952917
❌ destination-meilisearch https://github.com/airbytehq/airbyte/actions/runs/1182952917

@DoNotPanicUA
Copy link
Contributor Author

DoNotPanicUA commented Aug 30, 2021

/test connector=destination-mssql

🕑 destination-mssql https://github.com/airbytehq/airbyte/actions/runs/1182953304
✅ destination-mssql https://github.com/airbytehq/airbyte/actions/runs/1182953304

@DoNotPanicUA
Copy link
Contributor Author

DoNotPanicUA commented Aug 30, 2021

/test connector=destination-mysql

🕑 destination-mysql https://github.com/airbytehq/airbyte/actions/runs/1182954038
✅ destination-mysql https://github.com/airbytehq/airbyte/actions/runs/1182954038

@DoNotPanicUA
Copy link
Contributor Author

DoNotPanicUA commented Aug 30, 2021

/test connector=destination-oracle

🕑 destination-oracle https://github.com/airbytehq/airbyte/actions/runs/1182954624
❌ destination-oracle https://github.com/airbytehq/airbyte/actions/runs/1182954624

@DoNotPanicUA
Copy link
Contributor Author

DoNotPanicUA commented Aug 30, 2021

/test connector=destination-redshift

🕑 destination-redshift https://github.com/airbytehq/airbyte/actions/runs/1182954911
✅ destination-redshift https://github.com/airbytehq/airbyte/actions/runs/1182954911

@DoNotPanicUA
Copy link
Contributor Author

DoNotPanicUA commented Aug 30, 2021

/test connector=destination-snowflake

🕑 destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1182955243
✅ destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1182955243

@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 16:41 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 16:41 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 16:41 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 16:41 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 16:41 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 16:41 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 16:41 Inactive
@DoNotPanicUA
Copy link
Contributor Author

DoNotPanicUA commented Aug 30, 2021

/publish connector=connectors/destination-postgres

🕑 connectors/destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1183254176
✅ connectors/destination-postgres https://github.com/airbytehq/airbyte/actions/runs/1183254176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

\u0000 cannot be converted to text - Postgres Destination
7 participants