Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Oracle connector data type mappings #3838

Merged
merged 16 commits into from
May 28, 2020

Conversation

losipiuk
Copy link
Member

No description provided.

@cla-bot cla-bot bot added the cla-signed label May 23, 2020
@losipiuk losipiuk requested a review from electrum May 23, 2020 22:39
@losipiuk losipiuk force-pushed the lo/lo/oracle-improve-type-mapping-o branch 2 times, most recently from 570b3d9 to 64951f6 Compare May 25, 2020 09:53
@@ -215,10 +213,10 @@ public WriteMapping toWriteMapping(ConnectorSession session, Type type)
return WriteMapping.longMapping("number(19,0)", bigintWriteFunction());
}
if (type instanceof TimestampType) {
return WriteMapping.longMapping(format("timestamp(%s)", timestampDefaultPrecision), timestampWriteFunction(session));
return WriteMapping.longMapping("timestamp(3)", timestampWriteFunction(session));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 3.. oracle default is 6.
we shouldn't carry a limitation of presto timestamp to oracle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp(6) will lead to incorrect results (different results when predicate is pushed down and when it is not)

Anyway, this will be addressed soon the right way with the ongoing work at #1284

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so any table create outside presto will have that problem (it will be created as timestamp(6))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predicate pushdown seems like an unrelated issue. It should work correctly regardless of the precision in Presto vs Oracle. We can't depend on the types matching.

But removing precision here and using timestamp(3) is the right behavior as that will continue working after we support precision for timestamp in Presto (timestamp will likely mean timestamp(3) for compatibility reasons).

oracleServer.execute("CREATE TABLE test (num1 number, num2 number(*,-2))");
oracleServer.execute("INSERT INTO test VALUES (12345678901234567890.12345678901234567890123456789012345678, 1234567890.123)");
assertQuery("SELECT * FROM test", "VALUES (12345678901234567890.1234567890, 1234567900.0000000000)");
oracleServer.execute("CREATE TABLE test (num1 number)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test of negative scale will be moved to another side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests for negative scale in testNumberNegativeScaleReadMapping.
This specific case which was tested here is not supported as it would require decimal precision of 40 in Presto, while up to 38 is supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really not, as -2 will affect the first part..
this 1234567890 will be saved as 1234567900

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that Oracle will allow for numbers with up to 38 meaningful digits for types with negative scale (in your example meaningful digits are 12345679).
The number of total digits (including obvious trailing zeroes) larger.

E.g. for `NUMBER(*,-2) total number of digits in represented values can be 40.
If you run such query in Oracle:

SELECT CAST(1234567890123456789012345678901234567890 AS NUMBER(*,-2)) FROM dual;

It will work fine and return 40 digit number 1234567890123456789012345678901234567900.

As we do not support negative scale in Presto, we need to be able to represent the value, including trailing zeroes. Hence the limitation.

@@ -68,17 +67,4 @@ public OracleConfig setNumberRoundingMode(RoundingMode numberRoundingMode)
this.numberRoundingMode = numberRoundingMode;
return this;
}

@Min(4000)
public int getVarcharMaxSize()
Copy link
Member

@eskabetxe eskabetxe May 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting on oracle 12c varchar could be increased from 4000 to 32767 bytes (any value between) using Extended Data Types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reintroduced the config parameter. Just changed the name slightly, so it is obvious that limit is about number of bytes. Not characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be added the @max(32767)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can it be something in between 4000 and 32767. Or just 4000 or 32767?
If latter is the case maybe we should make the config parameter oracle.enable-extended-datatypes=true/false?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@losipiuk the true/false may work, checking the documentation to activate that is used
ALTER SYSTEM SET max_string_size=extended SCOPE=SPFILE;

that change this values:
VARCHAR2 : 4000 bytes
NVARCHAR2 : 4000 bytes
RAW : 2000 bytes

to
VARCHAR2 : 32767 bytes
NVARCHAR2 : 32767 bytes
RAW : 32767 bytes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly do we have this setting? What would be the downside to having this hard-coded, at 4000 (safe value)?
(since this isn't obvious, we should capture the reasoning in the code comment at least)

If desired, in the future, we could try to detect server configuration and adapt, without user needing to set anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda agree with what @findepi wrote. @eskabetxe do you feel strongly about the functionality and can provide rationale? If not I would simplify configuration (after all, the lest configuration switches the better) and live with 4000 for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR removing oracle.varchar2.max-bytes. We may discuss later on if we want to add it (in this or other form) as a followup.

Copy link
Member

@eskabetxe eskabetxe May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi this allow you to use the extended data types added on 12c
you can use the hard-coded value, but if the database have the extended version activated and on creation for example you create a table with varchar (5000), the connector will create a nlob field (because 5000 > than 4000 limit)

we can address this after but this was already on connector, we are adding a limitation that previous not exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we all understand what having a parameter changes in logic for CREATE TABLE issued from Presto to create Oracle table. The question is, is the benefit worth added complexity and extra configuration parameter. As @findepi mentioned above it can be improved to support higher limits without explicit configuration parameter. Also I believe that typical usecase for Oracle connector is to query tables created in Oracle already. Not to create new ones.

@losipiuk losipiuk force-pushed the lo/lo/oracle-improve-type-mapping-o branch 2 times, most recently from 8b2954a to 19154cc Compare May 27, 2020 12:39
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment from @eskabetxe about the removed negative precision test

@@ -27,7 +27,7 @@
implements AutoCloseable
{
private static final SecureRandom random = new SecureRandom();
private static final int RANDOM_SUFFIX_LENGTH = 12;
private static final int RANDOM_SUFFIX_LENGTH = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only leaves 36^5 = 60466176 maximum values. Also, the choice of this value seems arbitrary. We should probably have a maximum length and validate that the prefix+suffix is under the length. But we can follow up on that.

@@ -215,10 +213,10 @@ public WriteMapping toWriteMapping(ConnectorSession session, Type type)
return WriteMapping.longMapping("number(19,0)", bigintWriteFunction());
}
if (type instanceof TimestampType) {
return WriteMapping.longMapping(format("timestamp(%s)", timestampDefaultPrecision), timestampWriteFunction(session));
return WriteMapping.longMapping("timestamp(3)", timestampWriteFunction(session));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predicate pushdown seems like an unrelated issue. It should work correctly regardless of the precision in Presto vs Oracle. We can't depend on the types matching.

But removing precision here and using timestamp(3) is the right behavior as that will continue working after we support precision for timestamp in Presto (timestamp will likely mean timestamp(3) for compatibility reasons).

@@ -115,7 +116,7 @@ protected TestTable createTableWithDefaultColumns()
{
return new TestTable(
oracleServer::execute,
"tpch.table",
String.format("%s.table", TEST_SCHEMA),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use static import (that already exists)

.atZone(ZoneId.of(session.getTimeZoneKey().getId()))
.withZoneSameLocal(ZoneOffset.UTC)
.toInstant().toEpochMilli();
statement.setObject(index, new oracle.sql.TIMESTAMP(new Timestamp(dateTimeAsUtcMillis), Calendar.getInstance(TimeZone.getTimeZone("UTC"))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this calendar instance a constant? They are mutable, but this should be read-only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather leave it as is.
Under the hood getTimeZone() is called which looks like this:

    public TimeZone getTimeZone() {
        if (this.sharedZone) {
            this.zone = (TimeZone)this.zone.clone();
            this.sharedZone = false;
        }

        return this.zone;
    }

It kinda can mutate calendar state, even if it does not do it in our case as sharedZone==false. Yet it still looks fragile.
Also new oracle.sql.TIMESTAMP(...) caretes yet another calendar instance internally, so we are not gaining much by avoiding creation of our own :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Calendar is mutable even when used "read only".

// Oracle conflates UTC-equivalent zones to UTC.
String zoneId = zonedDateTime.getZone().getId();
if (zoneId.equals("Z")) {
// Oracle conflates UTC-equivalent zones to UTC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems repeated for no reason

@@ -376,6 +376,7 @@ public WriteMapping toWriteMapping(ConnectorSession session, Type type)
if (writeMapping != null) {
return writeMapping;
}
return super.toWriteMapping(session, type);
throw new PrestoException(NOT_SUPPORTED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix wrapping

@@ -101,11 +109,17 @@
public class OracleClient
extends BaseJdbcClient
{
// single UTF char may require at most 4 bytes of storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would read better as

// single UTF char may require up to 4 bytes of storage

return super.toWriteMapping(session, createVarcharType(varcharMaxSize));
String dataType;
VarcharType varcharType = (VarcharType) type;
if (varcharType.isUnbounded() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't wrap here

varbinaryTests(blobDataType())
.addRoundTrip(blobDataType(), new byte[] {}),
varbinaryTests(rawDataType(2000)));
// The test with the empty array is read-only because Oracle converts treats
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "converts" here seems to be a mistake.

@Test
public void testUnsupportedBasicType()
{
testUnsupportedOracleType("BFILE"); // Never in mapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for CONVERT_TO_VARCHAR or jdbc-types-mapped-to-varchar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work with BFILE. Actually if one enables CONVERT_TO_VARCHAR and tries to read from table containing BFILE column, the query will fail because resultSet.getString(columnIndex) here would return null. And that is not supported.

The mapping of unsupported numbers to varchar is already tested in testHighNumberScale and others.

It could be improved but let's do that as a followup.

@losipiuk
Copy link
Member Author

rebased

@losipiuk losipiuk force-pushed the lo/lo/oracle-improve-type-mapping-o branch 2 times, most recently from 39aa34d to e670484 Compare May 28, 2020 12:06
@losipiuk losipiuk force-pushed the lo/lo/oracle-improve-type-mapping-o branch from e670484 to ef553d6 Compare May 28, 2020 12:09
Copy link
Member Author

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AC

.atZone(ZoneId.of(session.getTimeZoneKey().getId()))
.withZoneSameLocal(ZoneOffset.UTC)
.toInstant().toEpochMilli();
statement.setObject(index, new oracle.sql.TIMESTAMP(new Timestamp(dateTimeAsUtcMillis), Calendar.getInstance(TimeZone.getTimeZone("UTC"))));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather leave it as is.
Under the hood getTimeZone() is called which looks like this:

    public TimeZone getTimeZone() {
        if (this.sharedZone) {
            this.zone = (TimeZone)this.zone.clone();
            this.sharedZone = false;
        }

        return this.zone;
    }

It kinda can mutate calendar state, even if it does not do it in our case as sharedZone==false. Yet it still looks fragile.
Also new oracle.sql.TIMESTAMP(...) caretes yet another calendar instance internally, so we are not gaining much by avoiding creation of our own :)

@Test
public void testLegacyDateReadMapping()
{
legacyDateTests(s -> oracleCreateAndInsert("l_read_date_" + s));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to more appropriate zone

}

/*
* TODO These tests should be executed in {@link io.prestosql.Session Sessions}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped the comment. It was not valid any more.

@Test
public void testUnsupportedBasicType()
{
testUnsupportedOracleType("BFILE"); // Never in mapping
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work with BFILE. Actually if one enables CONVERT_TO_VARCHAR and tries to read from table containing BFILE column, the query will fail because resultSet.getString(columnIndex) here would return null. And that is not supported.

The mapping of unsupported numbers to varchar is already tested in testHighNumberScale and others.

It could be improved but let's do that as a followup.

Major motivation for the change is fact that UNNECESSARY rounding mode
is more conservative (safer) than HALF_UP.
@losipiuk losipiuk force-pushed the lo/lo/oracle-improve-type-mapping-o branch from ef553d6 to 878a246 Compare May 28, 2020 15:05
@losipiuk
Copy link
Member Author

Merging. Ci failure unrelated

@losipiuk losipiuk merged commit 2fad53f into trinodb:master May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants