-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clickhouse-jdbc driver using a lot of object allocations/memory when inserting data #1115
Comments
Hi @lavanyachennupati, what's the exact String passed to PreparedStatement? As ClickHouse does not support prepared statement, the JDBC driver has to figure out the table structure by analyzing the SQL query. Based on the information you provided, it looks like JDBC driver was trying to build a large SQL statement, which means it didn't get enough information about the target table. Could you check Java examples at here? You may want to start with the simplified version If you must use JVM language, you may try Java client for memory-efficient inserting, for example: #928 contains some test results (QUERY TIME -> WRITE) that you want to check as well. |
@gingerwizard for visibility |
This is the exact statement:
We switched to the clickhouse-jdbc batch inserts with the approach in the description after we ran into very expensive sql parsing from javacc similar to this issue Looking at the examples will give the jdbc batch inserts where schema might not have to be interpreted a try as the first step. However, looking at the benchmarks from write performance here it looks like Java client for inserting data might be performing worse in terms of the CPU and memory consumption than JDBC with mixed values(which is our use case, as we have a majority of fields that are string, but a few other data types). It seems using Thanks for taking a look at this super fast. Will share the results once we try out the above and any suggestions on the above approach meanwhile is very much appreciated. |
Thanks @lavanyachennupati. Was As to memory usage, since I didn't specify heap size in my test, the number is higher than it should be(due to less GC involved). Taking Lastly, to achieve better performance, although |
Yes it is substituted to the real table name. We don't use any complex expression in the VALUES part like VALUES(? + 1, ? || 'suffix', ...). Thanks for the additional info on the test set up. Will try some alternatives mentioned and see how the live objects/heap usage looks for our use case. |
Changing the prepared statement as suggested above to the following reduced the object allocations.
|
When using clickhouse-jdbc driver to insert data into clickhouse, we are noticing a lot of object allocations from the
StringBuilder.append()
and.toString()
calls that are used in several places when inserting data to clickhouse.Context on how we insert data:
There are several places in the
statement.addBatch()
where a StringBuilder with initial capacity 10 is created and constantly copied/resized that's forcing a lot of object allocations and a significant memory usage as each of the column values(we have 46 columns) for each row of the 1000 events added.Later
statement.executeBacth()
again does atoString
implementation.Attaching the flame graphs for allocation profiles that shows about 70% of the allocations coming from jdbc driver.
Is there a more memory efficient client that clickhouse offers? Or any alternative/more memory efficient ways that clickhouse supports inserting data?
cc: @e-mars
The text was updated successfully, but these errors were encountered: