Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use clickhouse-jdbc to query millions of data?And stream way for reading large result set? #929

Closed
kiwimg opened this issue May 11, 2022 · 4 comments
Labels

Comments

@kiwimg
Copy link

kiwimg commented May 11, 2022

问题使用clickhouse-jdbc 查询千万数据量? Stream方式读取SELECT超大结果集 ?
需要导出百万级别数据 从ck中导出,clickhouse-jdbc怎么支持?如果使用api 解决这个问题?

@zhicwu
Copy link
Contributor

zhicwu commented May 11, 2022

Use clickhouse-jdbc to query millions of data?

By default, clickhouse-jdbc is synchronous and resource-efficient. It simply reads data one field at a time and then deserialize in the same thread. There's tiny buffer(8192 bytes) and objects like ClickHouseRecord and ClickHouseValue reused for less CPU and memory consumption. As long as you don't cache lots of data in your application, there's not much to worry about. Having said that, you may still run into OOM error when dealing with large field(e.g. a movie stored in a column) with limited memory.

Stream way to read SELECT large result set?

Both JDBC driver and Java client uses streaming for both query and insert. There's nothing special, but you should avoid to use large SQL statement with many values expressions.

Need to export millions of data from ck, how does clickhouse-jdbc support it? If use api to solve this problem?

JDBC standard does not provide convenient way for loading/dumping data. However, you can use one-liner in Java client, for example: ClickHouseClient.load() or ClickHouseClient.dump() instead.

@zhicwu zhicwu changed the title 问题使用clickhouse-jdbc 查询千万数据量? Stream方式读取SELECT超大结果集 ? How to use clickhouse-jdbc to query millions of data?And stream way for reading large result set? May 11, 2022
@kiwimg
Copy link
Author

kiwimg commented May 11, 2022

有对应的例子吗

@zhicwu
Copy link
Contributor

zhicwu commented May 11, 2022

I'll add more details into #928 and maybe examples in weekend, but for starters:

  • Java Client

    • one-liner for loading and dumping data - see ClickHouseClient.load(...) and ClickHouseClient.dump(...)
    • streaming basics - see ClickHouseInputStream.of(...) and ClickHouseOutputStream.of(...)
    • piped output stream - ClickHouseDataStreamFactory.getInstance().createPipedOutputStream(...)
    • writing data into request - ClickHouseRequest.write().query(...).data(...).send() Parquet ingestion issue #909
    • reading response - ClickHouseResponse.getInputStream() (you have to deserialize by yourself in this case)
  • JDBC Driver

    • batch insert - see examples at here
    • unwrapping - ClickHouseStatement.unwrap(ClickHouseRequest.class) and then use Java Client to do the rest

@zhicwu zhicwu closed this as completed May 23, 2022
@realcbb
Copy link

realcbb commented May 23, 2023

stmt.unwrap(ClickHouseRequest.class).query(selectSQL).output("/data/test.csv").executeAndWait()
this code generates the outFile, it should have data, but the file is empty. Is the code wrong?
using clickhouse-jdbc:0.3.2-patch11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants