-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High frequency writing scene,execute prepareStatement.executeBatch() about 20 times, then it blocks! #1374
Comments
Hi @xlvchao, so you're reusing prepared statement to execute multiple batches, right? I wrote a simple loop but couldn't reproduce the issue. Could you share more details like table structure and insert query etc.? Moreover, is there any clue in server log / system.query_log or output of Lastly, I want to understand how long it will take for a batch update at your end. In the case that the connection is idle for too long and the network is unstable, you may run into ClickHouse/clickhouse-docs#1178. |
1、DB scripts: 2、JDBC demo:
} 3、Problem:Full GC occured frequently, then triggered a series of briefly STW, and this is why my program often block for a little while. |
Thanks @xlvchao, I'm not sure how this related to the JDBC driver. If you look at heap usage and thread stack, you'll see JVM is occupied by fastjson which I'm afraid has nothing to do with the driver. By the way, ClickHouse supports JSONEachRow format, which might be of use in your case, for instance: clickhouse-java/examples/jdbc/src/main/java/com/clickhouse/examples/jdbc/Basic.java Lines 118 to 137 in c25ffc6
|
@zhicwu It seems doesn't work JDBC version:0.4.6 DB script is here:My code:
The Exception:
|
Apologies for the confusion. The example I provided was for RowBinary format. JSONEachRow, on the other hand, is a text-based format where you simply need to write a JSON string into the output stream for insertion, with each row on a separate line. For example: // CREATE TABLE test_insert_with_format(i Int32, s String) ENGINE=Memory
try (PreparedStatement ps = conn
.prepareStatement(
"INSERT INTO test_insert_with_format(s,i) format JSONEachRow")) {
ps.setObject(1, new ClickHouseWriter() {
@Override
public void write(ClickHouseOutputStream out) throws IOException {
// actually line-break is optional
out.write("{\"i\":2,\"s\":\"22\"}\n".getBytes());
out.write("{\"i\":5,\"s\":\"55\"}\n".getBytes());
}
});
ps.executeUpdate();
} If you already have the JSON string from the start, there is no need to use fastjson to deserialize it back into a Java object and then serialize it again for the JDBC driver. You can simply pass the JSON string directly to the driver, as demonstrated above. ClickHouse will handle the rest of the process for you. |
@zhicwu have conducted multiple tests of high-concurrency writing and added tracking logs in the code for analysis. The problem is that sometimes the amount of data in the database matches, while sometimes it is missing a few thousand records. I can ensure that every piece of data from the start of the test program to the data written to out.write() is not lost. Therefore, I suspect that either our ClickHouse testing environment maybe has some problems, or there is a problem with the underlying driver code. Or do I need to modify or add some configuration parameters? Currently, I have added connection parameters max_queued_buffers=0 after the JDBC URL. |
Have you checked |
@zhicwu It doesn't seem as fast as expected, and sometimes even slower! DDL:
JDBC version:
Java demo:
|
@xlvchao, do you still see execution being blocked, OOM, or serialization error? As to performance, you're comparing insertion using different formats(JSONEachRow vs. RowBinary), so it's not surprising that the latter is faster. However, I'd suggest considering the cost of JSON deserialization etc. at client side as well as resource utilization on server in the comparison for better understanding. Lastly, if you're handling small amount of data(as shown in above example), it's not worthy of using multi-threading with multiple batches. Instead, one insert with all the data you have should be good enough in most cases. Simply put, it's faster to insert 1M rows directly in one go instead of 100 times each just for 10,000 rows. |
@zhicwu I adjusted the JVM parameters, and then I have never see execution being blocked, OOM, or serialization error. Our team inserts over 15 billion pieces of data into Clickhouse every day, so if I only consider insertion performance, I should consider using clickhouse-client instead of clickhouse-jdbc. Am I right? |
Good to know.
Yes, in general, clickhouse-client is faster than clickhouse-jdbc, especially when (de)serialization was involved. If you need to process data before ingesting into ClickHouse, there'll be not much difference - see comparison among clickhouse-client, curl, and Java Client(dump or load) in #928. Apart from that, if the source of data are messages from MQ, you probably just need to use the built-in connector in ClickHouse for ingestion. |
My question is:
High frequency writing scene,execute prepareStatement.executeBatch() about 20 times, then it blocks for a while. The batch is 1000 records.
Jdbc version: 0.3.1
I tried all the versions(from 0.3.1 to 0.4.6), the problem is still there! Please help me!
And it does not work about this way: #991
The text was updated successfully, but these errors were encountered: