Position of SETTINGS matters for configuration and batching #1152

ivenhov · 2022-12-15T19:12:22Z

Hi

I'm using JDBC java client v0.3.2-patch11 in spring-boot application.
I'm trying to enable async_insert which I have mixed results with

Testing with a single inserts at the moment, inserting as quickly as possible.
If during my insert I do

        StringBuilder sb = new StringBuilder(1024);
        sb.append("INSERT INTO ").append(_tableDescriptor.getFqn());
        sb.append(" SETTINGS async_insert=1,wait_for_async_insert=0");
        sb.append(" VALUES ");
        sb.append("("?,?,?)");
        String sql = sb.toString();

I can see that on the server disk space does not grow rapidly and number of parts created is under control considering that I'm not batching. This means the setting is working. It also matches documentation example
https://clickhouse.com/docs/en/cloud/bestpractices/asynchronous-inserts/

However, if I use

        StringBuilder sb = new StringBuilder(1024);
        sb.append("INSERT INTO ").append(_tableDescriptor.getFqn());
        sb.append(" VALUES ");
        sb.append("("?,?,?)");
        sb.append(" SETTINGS async_insert=1,wait_for_async_insert=0");
        String sql = sb.toString();

then I can see disk usage growing quickly eventually leading to out of disk space.
This means setting is not working.
Why is the position of Settings relevant?

I discovered 2nd version not working one when I was trying to do batching 1000 values per insert.
With that version batching is extremely quick, much faster than with the 1st version so I would like to use that if possible.
I think batching does not work at all in the 1st version .
It may have something to do with parsing broken due to SETTING inserted.
Constructor of SqlBasedPreparedStatement seems to bypass the
https://github.com/ClickHouse/clickhouse-jdbc/blob/f285b73d45597e29cf36dff4daf4cb4f9f53453a/clickhouse-jdbc/src/main/java/com/clickhouse/jdbc/internal/SqlBasedPreparedStatement.java#L78

due to parsedStmt.hasValues() returning false

The same In ClickHouseConnectionImpl
https://github.com/ClickHouse/clickhouse-jdbc/blob/f285b73d45597e29cf36dff4daf4cb4f9f53453a/clickhouse-jdbc/src/main/java/com/clickhouse/jdbc/internal/ClickHouseConnectionImpl.java#L705
it skips this section due to parsedStmt.hasValues() returning false.

As it stands I cannot reliably configure inserts to have decent performance and avoid rapid parts growth and at the same time benefit from bulk inserts.

Any help with this would be appreciated.

The text was updated successfully, but these errors were encountered:

ivenhov · 2022-12-15T23:17:16Z

Further testing with this shows following behaviour for 100k records inserted, with 1000 records per batch

1st version of the code

        StringBuilder sb = new StringBuilder(1024);
        sb.append("INSERT INTO ").append("tablename");
        sb.append(" SETTINGS async_insert=1,wait_for_async_insert=0");
        sb.append(" VALUES ");
        sb.append("("?,?,?)");
        String sql = sb.toString();

inserts it in 1m 22secs

2nd version of the code

        StringBuilder sb = new StringBuilder(1024);
        sb.append("INSERT INTO ").append("tablename");
        sb.append(" VALUES ");
        sb.append("("?,?,?)");
        sb.append(" SETTINGS async_insert=1,wait_for_async_insert=0");
        String sql = sb.toString();

in 1s 600ms. !!! so much faster
The second version performs exactly the same when SETTINGS line is commented out.
This means that line is irrelevant and not used/parsed at all.
This can be proven by putting bogus SETTING values which are still tolerated.

This means 1st version of the code is correct but the question is

why there is such a performance drop with batch insert when SETTINGS is used in the correct place?

Also parser should probably warn/stop if SETTING is used in the wrong place. Even better if it would not matter at all.

For reference call executed is

List<Object[]> batchArgs...
int[] columnTypes = ...
_jdbcTemplate.batchUpdate(sql, batchArgs, columnTypes);

zhicwu · 2022-12-16T03:01:14Z

Hi @ivenhov, thanks for the reporting and sorry for the confusing results. It's caused by the outdated SQL parser, which expects SETTINGS expression at the end of the insert statement but it should really before VALUES expression :<

A few more comments for you to consider:

the second query is invalid
If you try that using ClickHouse native CLI client and you'll end up with a parsing error.
async insert
It's just my opinion for your reference. Async insert is convenient for small writes from many clients. In the past we use Buffer table or MQ to achieve similar result. However, depending on your use case, in many cases, it's more elastic and flexible to use MQ.
batch insert
It is expected that batch insert is fast, because ClickHouse is fast :) If you find JDBC driver is too slow for inserting large amount of data, you may want to switch to Java client, which is async and stream-based. Performance [WIP] #928 has some more numbers for comparison.

Update: the parsing issue was caused by breaking change on server side ClickHouse/ClickHouse#35883.

ivenhov · 2022-12-16T10:57:12Z

Thanks for prompt reply.

You are right about the CLI, 2nd version is indeed incorrect. I don't know why it didn't occur to me check that via CLI.
in my case it may be single messages but with bursts to 100s per second, at which point I would like to do the batching.
Messages are coming from RabbitMQ which I need to pre-process and persist in CH. Because of the nature of the queue I think I will receive only 1 message at a time.
This is different in Kafka as far as I know where you can get multiple messages and advance the pointer.
I need to look into it.
Is using async-insert preferred to use Buffer tables?
Thank you for the links. I was considering Java client but having pure JDBC has its advantages.
Definitely on the table if the performance suffers

zhicwu added the bug label Dec 16, 2022

zhicwu mentioned this issue Dec 16, 2022

Fix issue of parsing insert with settings #1153

Merged

zhicwu mentioned this issue Jan 11, 2023

Row with map column not inserted when async_insert used #1186

Closed

chernser closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Position of SETTINGS matters for configuration and batching #1152

Position of SETTINGS matters for configuration and batching #1152

ivenhov commented Dec 15, 2022

ivenhov commented Dec 15, 2022

zhicwu commented Dec 16, 2022 •

edited

Loading

ivenhov commented Dec 16, 2022

Position of SETTINGS matters for configuration and batching #1152

Position of SETTINGS matters for configuration and batching #1152

Comments

ivenhov commented Dec 15, 2022

ivenhov commented Dec 15, 2022

zhicwu commented Dec 16, 2022 • edited Loading

ivenhov commented Dec 16, 2022

zhicwu commented Dec 16, 2022 •

edited

Loading