clickhouse-jdbc driver using a lot of object allocations/memory when inserting data #1115

lavanyachennupati · 2022-10-20T18:08:16Z

When using clickhouse-jdbc driver to insert data into clickhouse, we are noticing a lot of object allocations from the StringBuilder.append() and .toString() calls that are used in several places when inserting data to clickhouse.

Context on how we insert data:

we initialize a clickhouse connection that is reused:

    val properties = new Properties()
properties.setProperty(ClickHouseClientOption.ASYNC.getKey, "true")
properties.setProperty(ClickHouseClientOption.COMPRESS.getKey, "true")  properties.setProperty(ClickHouseClientOption.DECOMPRESS.getKey, "true")
properties.setProperty(ClickHouseClientOption.SOCKET_TIMEOUT.getKey, "300000")
    val dataSource = new ClickHouseDataSource(s"jdbc:clickhouse://$ip:8123", properties)
datasSource. getConnection

 we create a prepareStatement from this connection and add an event/row to the prepareStatement and do a  `statement.addBatch()` for ~1000 events before doing a `statement.executeBatch()`

 val statement = connection.prepareStatement(
          s"INSERT INTO $tableName ${ClickHouseEvent.columnNames} VALUES ${ClickHouseEvent.columnValues}"
        )

//for 1000 events:
  statement.setString(1, clickHouseEvent.event_id)
    statement.setString(2, clickHouseEvent.shortDesc)
    statement.setString(3, clickHouseEvent.level)
    statement.setString(4, clickHouseEvent.platform)
    statement.setLong(5, clickHouseEvent.timestamp)
    statement.setInt(6, clickHouseEvent.hour)
    statement.setString(7, clickHouseEvent.version)
    statement.setString(8, clickHouseEvent.`type`)
    statement.setString(9, clickHouseEvent.logger)
    statement.setString(10, clickHouseEvent.message)
    statement.setString(11, clickHouseEvent.formattedMessage)
    ............
    statement.setString(42, Json.encode(clickHouseEvent.exception))
    statement.setString(43, clickHouseEvent.exceptionType)
    statement.setString(44, clickHouseEvent.exceptionModule)
    statement.setString(45, clickHouseEvent.exceptionValue)
    statement.setLong(46, clickHouseEvent.epoch)
    statement.addBatch()

//after adding 1000 events to the statement
statement.executeBatch()

There are several places in the statement.addBatch() where a StringBuilder with initial capacity 10 is created and constantly copied/resized that's forcing a lot of object allocations and a significant memory usage as each of the column values(we have 46 columns) for each row of the 1000 events added.
Later statement.executeBacth() again does a toString implementation.

Attaching the flame graphs for allocation profiles that shows about 70% of the allocations coming from jdbc driver.

Is there a more memory efficient client that clickhouse offers? Or any alternative/more memory efficient ways that clickhouse supports inserting data?

cc: @e-mars

The text was updated successfully, but these errors were encountered:

zhicwu · 2022-10-20T22:45:34Z

Hi @lavanyachennupati, what's the exact String passed to PreparedStatement? As ClickHouse does not support prepared statement, the JDBC driver has to figure out the table structure by analyzing the SQL query. Based on the information you provided, it looks like JDBC driver was trying to build a large SQL statement, which means it didn't get enough information about the target table.

Could you check Java examples at here? You may want to start with the simplified version insert into <table name> except (<columns not needed> and see if it will stop the madness.

If you must use JVM language, you may try Java client for memory-efficient inserting, for example:

https://github.com/ClickHouse/clickhouse-jdbc/blob/3e462a3cc90707189424b3137738abe59f606c85/clickhouse-client/src/test/java/com/clickhouse/client/ClientIntegrationTest.java#L1335-L1346

#928 contains some test results (QUERY TIME -> WRITE) that you want to check as well.

e-mars · 2022-10-20T23:00:14Z

@gingerwizard for visibility

lavanyachennupati · 2022-10-21T00:14:29Z

what's the exact String passed to PreparedStatement?

This is the exact statement:

 val statement = connection.prepareStatement("
INSERT INTO $tableName
(event_id,
shortDesc,
level,
platform,
timestamp,
hour,
version,
type,
logger,
message,
formattedMessage,
.........
exceptionType,
exceptionModule,
exceptionValue,
dateTime) 
VALUES 
(?,
?,
?,
?, 
?, 
......
?,
?,
?)")

We switched to the clickhouse-jdbc batch inserts with the approach in the description after we ran into very expensive sql parsing from javacc similar to this issue

Looking at the examples will give the jdbc batch inserts where schema might not have to be interpreted a try as the first step.

However, looking at the benchmarks from write performance here it looks like Java client for inserting data might be performing worse in terms of the CPU and memory consumption than JDBC with mixed values(which is our use case, as we have a majority of fields that are string, but a few other data types). It seems using clickhouse-client directly might be the best performance wise compared to the java client and clickhouse-jdbc for us.

Thanks for taking a look at this super fast. Will share the results once we try out the above and any suggestions on the above approach meanwhile is very much appreciated.

zhicwu · 2022-10-21T12:22:37Z

Thanks @lavanyachennupati. Was $tableName substituted to the real table name before you pass the query to PreparedStatement? In your case, since you're not using input function, JDBC driver will issue query select <columns you specified> from $tableName where 0 to get column types. Also if you have complex expression in the VALUES part like VALUES(? + 1, ? || 'suffix', ...), the driver will simply give up trying to get table structure. Anyway, you can try to remove the VALUES expression, use real table name and try again. If query log is not disabled, you may check the query before insert in system.query_log.

As to memory usage, since I didn't specify heap size in my test, the number is higher than it should be(due to less GC involved). Taking Mixed query as an example, the test result shows the driver took ~210 MB memory for the query, but if you re-run the test using -Xms24m -Xmx24m, the number will be dropped to ~77MB(including non-heap memory usage) while the performance was not impacted. Actually you can verify that by using jmap to inspect histogram of Java objects in the heap, there'll be lots of byte[] and String there, but most of them are not considered as live.

Lastly, to achieve better performance, although clickhouse-client looks faster, please keep in mind that the test was using clickhouse-client to load a file in Native/RowBinary format, meaning you'll have to spend extra time serializing Java objects into supported format and then hand over to the client, which is pretty much the Java client provided :p

lavanyachennupati · 2022-10-21T17:47:29Z

Was $tableName substituted to the real table name before you pass the query to PreparedStatement?

Yes it is substituted to the real table name. We don't use any complex expression in the VALUES part like VALUES(? + 1, ? || 'suffix', ...).

Thanks for the additional info on the test set up. Will try some alternatives mentioned and see how the live objects/heap usage looks for our use case.

lavanyachennupati · 2022-10-28T16:58:20Z

Changing the prepared statement as suggested above to the following reduced the object allocations.

connection.prepareStatement(
          s"INSERT INTO $tableName select ${ClickHouseEvent.columnNames} from input ${ClickHouseEvent.columnValues}"
        )

ClickhouseSqlParser seems to interpret the above sql as Insert statement which results in using the InputBasedPreparedStatement implementation of the ClickHousePreparedStatement which writes data in Clickhouse RowBinary format to a java.nio.ByteBuffer backed input stream with java.util.concurrent.BlockingQueue that seems to reduce the object allocations a lot compared to the SqlBasedPreparedStatement implementation.

lavanyachennupati closed this as completed Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clickhouse-jdbc driver using a lot of object allocations/memory when inserting data #1115

clickhouse-jdbc driver using a lot of object allocations/memory when inserting data #1115

lavanyachennupati commented Oct 20, 2022

zhicwu commented Oct 20, 2022

e-mars commented Oct 20, 2022

lavanyachennupati commented Oct 21, 2022 •

edited

Loading

zhicwu commented Oct 21, 2022

lavanyachennupati commented Oct 21, 2022 •

edited

Loading

lavanyachennupati commented Oct 28, 2022

clickhouse-jdbc driver using a lot of object allocations/memory when inserting data #1115

clickhouse-jdbc driver using a lot of object allocations/memory when inserting data #1115

Comments

lavanyachennupati commented Oct 20, 2022

zhicwu commented Oct 20, 2022

e-mars commented Oct 20, 2022

lavanyachennupati commented Oct 21, 2022 • edited Loading

zhicwu commented Oct 21, 2022

lavanyachennupati commented Oct 21, 2022 • edited Loading

lavanyachennupati commented Oct 28, 2022

lavanyachennupati commented Oct 21, 2022 •

edited

Loading

lavanyachennupati commented Oct 21, 2022 •

edited

Loading