-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Oracle connector data type mappings #3838
Fix Oracle connector data type mappings #3838
Conversation
570b3d9
to
64951f6
Compare
@@ -215,10 +213,10 @@ public WriteMapping toWriteMapping(ConnectorSession session, Type type) | |||
return WriteMapping.longMapping("number(19,0)", bigintWriteFunction()); | |||
} | |||
if (type instanceof TimestampType) { | |||
return WriteMapping.longMapping(format("timestamp(%s)", timestampDefaultPrecision), timestampWriteFunction(session)); | |||
return WriteMapping.longMapping("timestamp(3)", timestampWriteFunction(session)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 3.. oracle default is 6.
we shouldn't carry a limitation of presto timestamp to oracle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp(6)
will lead to incorrect results (different results when predicate is pushed down and when it is not)
Anyway, this will be addressed soon the right way with the ongoing work at #1284
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so any table create outside presto will have that problem (it will be created as timestamp(6))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Predicate pushdown seems like an unrelated issue. It should work correctly regardless of the precision in Presto vs Oracle. We can't depend on the types matching.
But removing precision here and using timestamp(3)
is the right behavior as that will continue working after we support precision for timestamp in Presto (timestamp
will likely mean timestamp(3)
for compatibility reasons).
oracleServer.execute("CREATE TABLE test (num1 number, num2 number(*,-2))"); | ||
oracleServer.execute("INSERT INTO test VALUES (12345678901234567890.12345678901234567890123456789012345678, 1234567890.123)"); | ||
assertQuery("SELECT * FROM test", "VALUES (12345678901234567890.1234567890, 1234567900.0000000000)"); | ||
oracleServer.execute("CREATE TABLE test (num1 number)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test of negative scale will be moved to another side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tests for negative scale in testNumberNegativeScaleReadMapping
.
This specific case which was tested here is not supported as it would require decimal precision of 40 in Presto, while up to 38 is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really not, as -2 will affect the first part..
this 1234567890 will be saved as 1234567900
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that Oracle will allow for numbers with up to 38 meaningful digits for types with negative scale (in your example meaningful digits are 12345679).
The number of total digits (including obvious trailing zeroes) larger.
E.g. for `NUMBER(*,-2) total number of digits in represented values can be 40.
If you run such query in Oracle:
SELECT CAST(1234567890123456789012345678901234567890 AS NUMBER(*,-2)) FROM dual;
It will work fine and return 40 digit number 1234567890123456789012345678901234567900
.
As we do not support negative scale in Presto, we need to be able to represent the value, including trailing zeroes. Hence the limitation.
@@ -68,17 +67,4 @@ public OracleConfig setNumberRoundingMode(RoundingMode numberRoundingMode) | |||
this.numberRoundingMode = numberRoundingMode; | |||
return this; | |||
} | |||
|
|||
@Min(4000) | |||
public int getVarcharMaxSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starting on oracle 12c varchar could be increased from 4000 to 32767 bytes (any value between) using Extended Data Types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reintroduced the config parameter. Just changed the name slightly, so it is obvious that limit is about number of bytes. Not characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be added the @max(32767)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually can it be something in between 4000 and 32767. Or just 4000 or 32767?
If latter is the case maybe we should make the config parameter oracle.enable-extended-datatypes=true/false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@losipiuk the true/false may work, checking the documentation to activate that is used
ALTER SYSTEM SET max_string_size=extended SCOPE=SPFILE;
that change this values:
VARCHAR2 : 4000 bytes
NVARCHAR2 : 4000 bytes
RAW : 2000 bytes
to
VARCHAR2 : 32767 bytes
NVARCHAR2 : 32767 bytes
RAW : 32767 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exactly do we have this setting? What would be the downside to having this hard-coded, at 4000 (safe value)?
(since this isn't obvious, we should capture the reasoning in the code comment at least)
If desired, in the future, we could try to detect server configuration and adapt, without user needing to set anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda agree with what @findepi wrote. @eskabetxe do you feel strongly about the functionality and can provide rationale? If not I would simplify configuration (after all, the lest configuration switches the better) and live with 4000 for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR removing oracle.varchar2.max-bytes
. We may discuss later on if we want to add it (in this or other form) as a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi this allow you to use the extended data types added on 12c
you can use the hard-coded value, but if the database have the extended version activated and on creation for example you create a table with varchar (5000), the connector will create a nlob field (because 5000 > than 4000 limit)
we can address this after but this was already on connector, we are adding a limitation that previous not exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we all understand what having a parameter changes in logic for CREATE TABLE
issued from Presto to create Oracle table. The question is, is the benefit worth added complexity and extra configuration parameter. As @findepi mentioned above it can be improved to support higher limits without explicit configuration parameter. Also I believe that typical usecase for Oracle connector is to query tables created in Oracle already. Not to create new ones.
8b2954a
to
19154cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comment from @eskabetxe about the removed negative precision test
@@ -27,7 +27,7 @@ | |||
implements AutoCloseable | |||
{ | |||
private static final SecureRandom random = new SecureRandom(); | |||
private static final int RANDOM_SUFFIX_LENGTH = 12; | |||
private static final int RANDOM_SUFFIX_LENGTH = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only leaves 36^5 = 60466176
maximum values. Also, the choice of this value seems arbitrary. We should probably have a maximum length and validate that the prefix+suffix is under the length. But we can follow up on that.
@@ -215,10 +213,10 @@ public WriteMapping toWriteMapping(ConnectorSession session, Type type) | |||
return WriteMapping.longMapping("number(19,0)", bigintWriteFunction()); | |||
} | |||
if (type instanceof TimestampType) { | |||
return WriteMapping.longMapping(format("timestamp(%s)", timestampDefaultPrecision), timestampWriteFunction(session)); | |||
return WriteMapping.longMapping("timestamp(3)", timestampWriteFunction(session)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Predicate pushdown seems like an unrelated issue. It should work correctly regardless of the precision in Presto vs Oracle. We can't depend on the types matching.
But removing precision here and using timestamp(3)
is the right behavior as that will continue working after we support precision for timestamp in Presto (timestamp
will likely mean timestamp(3)
for compatibility reasons).
@@ -115,7 +116,7 @@ protected TestTable createTableWithDefaultColumns() | |||
{ | |||
return new TestTable( | |||
oracleServer::execute, | |||
"tpch.table", | |||
String.format("%s.table", TEST_SCHEMA), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use static import (that already exists)
.atZone(ZoneId.of(session.getTimeZoneKey().getId())) | ||
.withZoneSameLocal(ZoneOffset.UTC) | ||
.toInstant().toEpochMilli(); | ||
statement.setObject(index, new oracle.sql.TIMESTAMP(new Timestamp(dateTimeAsUtcMillis), Calendar.getInstance(TimeZone.getTimeZone("UTC")))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this calendar instance a constant? They are mutable, but this should be read-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather leave it as is.
Under the hood getTimeZone()
is called which looks like this:
public TimeZone getTimeZone() {
if (this.sharedZone) {
this.zone = (TimeZone)this.zone.clone();
this.sharedZone = false;
}
return this.zone;
}
It kinda can mutate calendar state, even if it does not do it in our case as sharedZone==false
. Yet it still looks fragile.
Also new oracle.sql.TIMESTAMP(...)
caretes yet another calendar instance internally, so we are not gaining much by avoiding creation of our own :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, Calendar
is mutable even when used "read only".
// Oracle conflates UTC-equivalent zones to UTC. | ||
String zoneId = zonedDateTime.getZone().getId(); | ||
if (zoneId.equals("Z")) { | ||
// Oracle conflates UTC-equivalent zones to UTC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems repeated for no reason
@@ -376,6 +376,7 @@ public WriteMapping toWriteMapping(ConnectorSession session, Type type) | |||
if (writeMapping != null) { | |||
return writeMapping; | |||
} | |||
return super.toWriteMapping(session, type); | |||
throw new PrestoException(NOT_SUPPORTED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix wrapping
@@ -101,11 +109,17 @@ | |||
public class OracleClient | |||
extends BaseJdbcClient | |||
{ | |||
// single UTF char may require at most 4 bytes of storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would read better as
// single UTF char may require up to 4 bytes of storage
return super.toWriteMapping(session, createVarcharType(varcharMaxSize)); | ||
String dataType; | ||
VarcharType varcharType = (VarcharType) type; | ||
if (varcharType.isUnbounded() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: don't wrap here
varbinaryTests(blobDataType()) | ||
.addRoundTrip(blobDataType(), new byte[] {}), | ||
varbinaryTests(rawDataType(2000))); | ||
// The test with the empty array is read-only because Oracle converts treats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "converts" here seems to be a mistake.
@Test | ||
public void testUnsupportedBasicType() | ||
{ | ||
testUnsupportedOracleType("BFILE"); // Never in mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for CONVERT_TO_VARCHAR
or jdbc-types-mapped-to-varchar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work with BFILE. Actually if one enables CONVERT_TO_VARCHAR
and tries to read from table containing BFILE
column, the query will fail because resultSet.getString(columnIndex)
here would return null
. And that is not supported.
The mapping of unsupported numbers to varchar is already tested in testHighNumberScale
and others.
It could be improved but let's do that as a followup.
rebased |
39aa34d
to
e670484
Compare
Long random suffix does not play well with limits on length of identifiers present in some databases (e.g. Oracle 11).
Timestamp precision is currently a work in progress and soon will be addressed engine-wide.
e670484
to
ef553d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AC
.atZone(ZoneId.of(session.getTimeZoneKey().getId())) | ||
.withZoneSameLocal(ZoneOffset.UTC) | ||
.toInstant().toEpochMilli(); | ||
statement.setObject(index, new oracle.sql.TIMESTAMP(new Timestamp(dateTimeAsUtcMillis), Calendar.getInstance(TimeZone.getTimeZone("UTC")))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather leave it as is.
Under the hood getTimeZone()
is called which looks like this:
public TimeZone getTimeZone() {
if (this.sharedZone) {
this.zone = (TimeZone)this.zone.clone();
this.sharedZone = false;
}
return this.zone;
}
It kinda can mutate calendar state, even if it does not do it in our case as sharedZone==false
. Yet it still looks fragile.
Also new oracle.sql.TIMESTAMP(...)
caretes yet another calendar instance internally, so we are not gaining much by avoiding creation of our own :)
@Test | ||
public void testLegacyDateReadMapping() | ||
{ | ||
legacyDateTests(s -> oracleCreateAndInsert("l_read_date_" + s)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to more appropriate zone
} | ||
|
||
/* | ||
* TODO These tests should be executed in {@link io.prestosql.Session Sessions} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped the comment. It was not valid any more.
@Test | ||
public void testUnsupportedBasicType() | ||
{ | ||
testUnsupportedOracleType("BFILE"); // Never in mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work with BFILE. Actually if one enables CONVERT_TO_VARCHAR
and tries to read from table containing BFILE
column, the query will fail because resultSet.getString(columnIndex)
here would return null
. And that is not supported.
The mapping of unsupported numbers to varchar is already tested in testHighNumberScale
and others.
It could be improved but let's do that as a followup.
Major motivation for the change is fact that UNNECESSARY rounding mode is more conservative (safer) than HALF_UP.
ef553d6
to
878a246
Compare
Merging. Ci failure unrelated |
No description provided.