How to use clickhouse-jdbc to query millions of data？And stream way for reading large result set? #929

kiwimg · 2022-05-11T02:05:57Z

问题使用clickhouse-jdbc 查询千万数据量？ Stream方式读取SELECT超大结果集？
需要导出百万级别数据从ck中导出，clickhouse-jdbc怎么支持？如果使用api 解决这个问题？

zhicwu · 2022-05-11T05:36:10Z

Use clickhouse-jdbc to query millions of data?

By default, clickhouse-jdbc is synchronous and resource-efficient. It simply reads data one field at a time and then deserialize in the same thread. There's tiny buffer(8192 bytes) and objects like ClickHouseRecord and ClickHouseValue reused for less CPU and memory consumption. As long as you don't cache lots of data in your application, there's not much to worry about. Having said that, you may still run into OOM error when dealing with large field(e.g. a movie stored in a column) with limited memory.

Stream way to read SELECT large result set?

Both JDBC driver and Java client uses streaming for both query and insert. There's nothing special, but you should avoid to use large SQL statement with many values expressions.

Need to export millions of data from ck, how does clickhouse-jdbc support it? If use api to solve this problem?

JDBC standard does not provide convenient way for loading/dumping data. However, you can use one-liner in Java client, for example: ClickHouseClient.load() or ClickHouseClient.dump() instead.

kiwimg · 2022-05-11T06:29:52Z

有对应的例子吗

zhicwu · 2022-05-11T12:20:06Z

I'll add more details into #928 and maybe examples in weekend, but for starters:

Java Client
- one-liner for loading and dumping data - see ClickHouseClient.load(...) and ClickHouseClient.dump(...)
- streaming basics - see ClickHouseInputStream.of(...) and ClickHouseOutputStream.of(...)
- piped output stream - ClickHouseDataStreamFactory.getInstance().createPipedOutputStream(...)
- writing data into request - ClickHouseRequest.write().query(...).data(...).send() Parquet ingestion issue #909
- reading response - ClickHouseResponse.getInputStream() (you have to deserialize by yourself in this case)
JDBC Driver
- batch insert - see examples at here
- unwrapping - ClickHouseStatement.unwrap(ClickHouseRequest.class) and then use Java Client to do the rest

realcbb · 2023-05-23T01:50:17Z

stmt.unwrap(ClickHouseRequest.class).query(selectSQL).output("/data/test.csv").executeAndWait()
this code generates the outFile, it should have data, but the file is empty. Is the code wrong?
using clickhouse-jdbc:0.3.2-patch11

zhicwu added the question label May 11, 2022

zhicwu changed the title ~~问题使用clickhouse-jdbc 查询千万数据量？ Stream方式读取SELECT超大结果集？~~ How to use clickhouse-jdbc to query millions of data？And stream way for reading large result set? May 11, 2022

zhicwu closed this as completed May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use clickhouse-jdbc to query millions of data？And stream way for reading large result set? #929

How to use clickhouse-jdbc to query millions of data？And stream way for reading large result set? #929

kiwimg commented May 11, 2022

zhicwu commented May 11, 2022

kiwimg commented May 11, 2022

zhicwu commented May 11, 2022

realcbb commented May 23, 2023

How to use clickhouse-jdbc to query millions of data？And stream way for reading large result set? #929

How to use clickhouse-jdbc to query millions of data？And stream way for reading large result set? #929

Comments

kiwimg commented May 11, 2022

zhicwu commented May 11, 2022

kiwimg commented May 11, 2022

zhicwu commented May 11, 2022

realcbb commented May 23, 2023