-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination Oracle: change table limit and format record data #5542
Conversation
@@ -150,8 +152,9 @@ private static void insertRawRecordsInSingleQuery(String tableName, | |||
int i = 1; | |||
for (final AirbyteRecordMessage message : records) { | |||
// 1-indexed | |||
final JsonNode formattedData = StandardNameTransformer.formatJsonPath(message.getData()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is modifying the data itself. Is there any way around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed with sherif. the name transformer was intended to transform "names" (e.g. table names and column names). in general we want to avoid modifying the data if at all possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing the same as in BigQuery destination (this change the json path to remove any non-supported char in the stream name not change the original data):
Lines 124 to 129 in 1954284
final JsonNode formattedData = StandardNameTransformer.formatJsonPath(recordMessage.getData()); | |
return Jsons.jsonNode(ImmutableMap.of( | |
JavaBaseConstants.COLUMN_NAME_AB_ID, UUID.randomUUID().toString(), | |
JavaBaseConstants.COLUMN_NAME_DATA, Jsons.serialize(formattedData), | |
JavaBaseConstants.COLUMN_NAME_EMITTED_AT, formattedEmittedAt)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment in PR, sorry the confuction: Oracle 19 json_value doesnt allow ' or " in keys so it we need to clean some records the key name using a formatted function. Example: column`_'with\"_quotes
becomes column___with\__quotes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of AIRBYTE_DATA
saved using this function. The original data is kept but to run normalization I need a "cleaned" field name.
'{"date":"2020-08-29T00:00:00Z",
"partition":{"DATA":[{"currency":"EUR"}],
"double_array_data":[[{"id":"EUR"}]],
"column___with__quotes":[{"currency":"EUR"}], --formatted
"column`_\'with\\"_quotes":[{"currency":"EUR"}]}, --original
"id":4.2}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough. this feels like a weird sacrifice we are making for dbt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something needed for the destination itself because the source is writing a json key not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removes support for Oracle 11g in destination.
👍 Fine with me. Is there some way that we can check the version of a database at runtime? e.g. during check connection. If so, can we throw a useful error message if they are using a version of the db that is too old.
What this causes to our tests? The unique public container exists is Dockerhub is Oracle 11g (dont know how...). Oracle allows you download Oracle 18 or 19 version from their registry, which could be a way to run integration tests. The problem is this container took ~45 min to spin up which can be really complicated. The viable solution is have a cloud instance similar to Redshift to help us with integration tests.
I don't understand what you're saying here. Is is possible to user a container with newer versions of oracle that is faster to start up? or are you saying even with newer versions it is really slow? I am fine with use using newer versions of oracle db in the tests if it allows us to user a docker container that is faster to spin up.
@@ -150,8 +152,9 @@ private static void insertRawRecordsInSingleQuery(String tableName, | |||
int i = 1; | |||
for (final AirbyteRecordMessage message : records) { | |||
// 1-indexed | |||
final JsonNode formattedData = StandardNameTransformer.formatJsonPath(message.getData()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed with sherif. the name transformer was intended to transform "names" (e.g. table names and column names). in general we want to avoid modifying the data if at all possible.
@cgardens sorry. The container we use today (Oracle 11g) is pretty fast to spin up. The problem is with Oracle 18/19 that takes 45 min to setup (the newest version...) |
That is awful! 😵 Yeah. I guess if that's the case we need to just have an oracle db that we keep running in our integration test environment. @sherifnada is that the right way to do it or is there something better? |
@cgardens yup we should keep one running encoded via terraform |
/test connector=connectors/destination-oracle
|
database.query(ctx -> { | ||
ctx.execute("alter database default tablespace users"); | ||
return null; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this because requires a very strong privilege to execute it. I'm running the AWS instance where I can change the permission of my admin user.
yeah. i think that's okay. |
/publish connector=connectors/destination-oracle
|
* change oracle dest connector * remove container to run tests add depency to secrets/config.json * correct functions in tests * gradlew format
What
Closes #5498
Today we support Oracle minimum version 11 source and destination.
Unfortunately Oracle 11 doesn't have the
json_value
function or support json objects. This makes normalization really hard to be implemented.Suggestion: Removes support for Oracle 11g in destination. The EOL for Oracle 11G was dec-2020.
What this causes to our tests? The unique public container exists is Dockerhub is Oracle 11g (dont know how...). Oracle allows you download Oracle 18 or 19 version from their registry, which could be a way to run integration tests. The problem is this container took ~45 min to spin up which can be really complicated. The viable solution is have a cloud instance similar to Redshift to help us with integration tests.
This modification will support Oracle 12g+ (this is the first version supports table name with 128 chars and json objects). The suggestions to use Oracle 19 in tests is because this version is available in Oracle Registry/AWS RDS.
How
Oracle 19 json_value doesnt allow ' or " in keys so it we need to clean
some recordsthe key name using a formatted function. Example:column`_'with\"_quotes
becomescolumn___with\__quotes
.Recommended reading order
x.java
y.python
Checklist