Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance regression of writes in JDBC connectors #8559

Merged
merged 4 commits into from
Jul 15, 2021

Conversation

hashhar
Copy link
Member

@hashhar hashhar commented Jul 14, 2021

This reverts commit 541edbe.

@cla-bot cla-bot bot added the cla-signed label Jul 14, 2021
@@ -161,6 +174,7 @@ else if (javaType == Slice.class) {
PreparedStatement statement = this.statement) {
if (batchSize > 0) {
statement.executeBatch();
connection.commit();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi Do we assume that any connection pool or the driver will ensure that the connection's autoCommit state it reset before giving the connection to a new client?
Should I explicitly save (in constructor) and restore (here) the autocommit state?

I think this will cause failures with Oracle UCP (it doesn't reset things AFAIK) - let's see.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask because I remember that for DELETEs Grzegorz is using autocommit (which makes sense because it's always a single statement there) and if the pool gives a stale connection DELETEs will fail.

Copy link
Member Author

@hashhar hashhar Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it failed as expected.

And it absolutely has to do with pool returning non-reset connections because if I remove anything that uses the PageSink by applying the below diff then the test passes.

I'm fixing this by changing the OraclePoolConnectionFactory to always return connections with autoCommit enabled.

Click here for diff
diff --git a/plugin/trino-oracle/src/test/java/io/trino/plugin/oracle/TestOraclePoolConnectorSmokeTest.java b/plugin/trino-oracle/src/test/java/io/trino/plugin/oracle/TestOraclePoolConnectorSmokeTest.java
index 30d6108283..e8a38c8c75 100644
--- a/plugin/trino-oracle/src/test/java/io/trino/plugin/oracle/TestOraclePoolConnectorSmokeTest.java
+++ b/plugin/trino-oracle/src/test/java/io/trino/plugin/oracle/TestOraclePoolConnectorSmokeTest.java
@@ -13,15 +13,20 @@
  */
 package io.trino.plugin.oracle;
 
+import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
 import io.airlift.testing.Closeables;
 import io.trino.testing.QueryRunner;
+import io.trino.testing.sql.TestTable;
 import org.testng.annotations.AfterClass;
+import org.testng.annotations.Test;
 
 import java.io.IOException;
 
 import static io.trino.plugin.oracle.TestingOracleServer.TEST_PASS;
 import static io.trino.plugin.oracle.TestingOracleServer.TEST_USER;
+import static io.trino.testing.TestingConnectorBehavior.SUPPORTS_CREATE_TABLE;
+import static io.trino.testing.TestingConnectorBehavior.SUPPORTS_DELETE;
 
 public class TestOraclePoolConnectorSmokeTest
         extends BaseOracleConnectorSmokeTest
@@ -44,7 +49,7 @@ public class TestOraclePoolConnectorSmokeTest
                         .put("oracle.connection-pool.enabled", "true")
                         .put("oracle.remarks-reporting.enabled", "false")
                         .build(),
-                REQUIRED_TPCH_TABLES);
+                ImmutableList.of());
     }
 
     @AfterClass(alwaysRun = true)
@@ -54,4 +59,16 @@ public class TestOraclePoolConnectorSmokeTest
         Closeables.closeAll(oracleServer);
         oracleServer = null;
     }
+
+    @Test
+    public void testDeleteAllDataFromTable()
+    {
+        skipTestUnless(hasBehavior(SUPPORTS_CREATE_TABLE) && hasBehavior(SUPPORTS_DELETE));
+        try (TestTable table = new TestTable(oracleServer::execute, "test_delete", "AS SELECT 'good' foo FROM DUAL")) {
+            // not using assertUpdate as some connectors provide update count and some do not
+            assertQuery("SELECT count(*) FROM " + table.getName(), "VALUES 1");
+            getQueryRunner().execute("DELETE FROM " + table.getName());
+            assertQuery("SELECT count(*) FROM " + table.getName(), "VALUES 0");
+        }
+    }
 }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When was playing with Oracle, I noticed that changing a state of the connection survives in a connection in the pool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assume that any connection pool or the driver will ensure that the connection's autoCommit state it reset before giving the connection to a new client?

That should be guaranteed by the pool.

Anything else is very error-prone.

When was playing with Oracle, I noticed that changing a state of the connection survives in a connection in the pool.

Was this about autocommit state, or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed it for autocommit state. Hence the 2nd fixup commit.

Hikari does guarantee it - Oracle UCP seems not to.

@hashhar hashhar requested review from wendigo, findepi and kokosing July 14, 2021 16:35
@kokosing
Copy link
Member

Can you please share more about performance regression you are fixing here?

@hashhar
Copy link
Member Author

hashhar commented Jul 15, 2021

Can you please share more about performance regression you are fixing here?

Sure. In 541edbe we changed the PageSink to use autocommit instead of explicit transaction management.
They are equivalent ONLY when a single statement is being executed.
But in JdbcPageSink that's not the case. We add multiple statements (1 per addBatch call) and then executeBatch.

From JDBC javadocs "If a connection is in auto-commit mode, then all its SQL statements will be executed and committed as individual transactions". MySQL and SQL Server follow this and hence those 1000 addBatch calls turn into 1000 commits which kills performance.

Using explicit transaction ensures that we only do 1 COMMIT for the entire batch instead of commit per statement.

For comparision I added a test that does a CTAS and INSERT of tpch.tiny.orders.

On MySQL:

# Current master
INSERT [27598, 35814, 48281, 37213, 42254]; Avg(ms): 38232.0
CTAS [25855, 35361, 40970, 63781, 57870]; Avg(ms): 44767.4

# After revert
INSERT [16066, 14602, 12628, 11886, 14542]; Avg(ms): 13944.8
CTAS [14474, 23864, 24198, 12889, 13070]; Avg(ms): 17699.0

On SQL Server:

# Current master
INSERT [27110, 24551, 29237, 40681, 41884]; Avg(ms): 32692.6
CTAS [46106, 37056, 34141, 29358, 45614]; Avg(ms): 38455.0

# After revert
INSERT [10097, 2962, 7849, 4154, 1923]; Avg(ms): 5397.0
CTAS [10392, 7737, 7869, 2042, 3284]; Avg(ms): 6264.8

@@ -161,6 +174,7 @@ else if (javaType == Slice.class) {
PreparedStatement statement = this.statement) {
if (batchSize > 0) {
statement.executeBatch();
connection.commit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assume that any connection pool or the driver will ensure that the connection's autoCommit state it reset before giving the connection to a new client?

That should be guaranteed by the pool.

Anything else is very error-prone.

When was playing with Oracle, I noticed that changing a state of the connection survives in a connection in the pool.

Was this about autocommit state, or something else?

// executed and committed as individual transactions." Notably MySQL and SQL Server respect this which
// leads to multiple commits when we close the connection leading to slow performance. Explicit commits
// where needed ensure that all of the submitted statements are committed as a single transaction and
// performs better.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to separate commit, so that revert is a pure revert.

However, please add this to revert's commit message -- revert commit message should tell the "why revert".

Comment on lines 84 to 86
// Oracle's pool doesn't reset autocommit of connections when reusing them so we explicitly enable autocommit
// by default to match the JDBC spec.
connection.setAutoCommit(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the oracle pool be configured to do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, by passing the connection property OracleConnection.CONNECTION_PROPERTY_AUTOCOMMIT to true to the PoolDataSource.
I'll do that.

Copy link
Member Author

@hashhar hashhar Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite - turns out that it only affects the initial connection. UCP isn't touching the connections once created.

I'll need to keep how I've done it now.

hashhar added 3 commits July 15, 2021 18:06
The JDBC specification says that connections are expected to be in
autocommit mode by default. Oracle's UCP connection pool doesn't reset
the autocommit state when reusing an already created connection.
So we explicitly enable autocommit for any connection returned from the
pool to match the JDBC specification.
This reverts commit 541edbe.

In 541edbe JdbcPageSink was changed to
use autocommit instead of explicit transactions. They are equivalent
ONLY when a single statement is being executed. But in JdbcPageSink we
add multiple statements (1 per `addBatch` call) and then execute all of
them using `executeBatch`.

From JDBC javadocs "If a connection is in auto-commit mode, then all its
SQL statements will be executed and committed as individual
transactions". MySQL and SQL Server follow this and hence each statement
within an `executeBatch` call is executed in it's own transaction which
hurts performance because instead of a single commit we get `batchSize`
(default 1000) commits for each `executeBatch` call.

Using explicit transaction ensures that we only perform a single commit
for the entire batch instead of a commit per statement.
@hashhar hashhar force-pushed the hashhar/page-sink-regression branch from 1aca943 to dd38bae Compare July 15, 2021 12:44
@hashhar
Copy link
Member Author

hashhar commented Jul 15, 2021

@findepi @kokosing PTAL again. applied comments.

@hashhar
Copy link
Member Author

hashhar commented Jul 15, 2021

The failed Oracle test passes locally - pushed an empty commit to make sure.

@hashhar
Copy link
Member Author

hashhar commented Jul 15, 2021

All green.

@hashhar hashhar merged commit 543cd1f into trinodb:master Jul 15, 2021
@hashhar hashhar deleted the hashhar/page-sink-regression branch July 15, 2021 14:52
@hashhar hashhar added this to the 360 milestone Jul 15, 2021
@hashhar hashhar mentioned this pull request Jul 15, 2021
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants