-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune JDBC fetch-size automatically based on column count #16644
Conversation
9b3093b
to
ae7226a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea is nice however I don't know whether larger fetch sizes cause issues for wide tables, have you observed something?
I would've expected a different heurestic: the wider table or more rows we pull the higher the fetch count.
One concern I have with this is that this will impact memory estimation since the fetch size is no longer a constant value.
throws SQLException | ||
{ | ||
PreparedStatement statement = connection.prepareStatement(sql); | ||
statement.setFetchSize(1000); | ||
// This is a heuristic, not exact science. A better formula can perhaps be found with measurements. | ||
// Column count is not known for non-SELECT queries. Not setting fetch size for these. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not setting fetch size can mean a fetch size of 1 in some drivers. IMO in case we don't know column size we should default to older value of 1k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this originally, but note that column count is not known only for queries like DELETE (not SELECT queries).
All the queries reading data know their projected column count.
Thus I had a choice: do the change defensively, as if I didn't know when column count may be missing. Or write the code "the way it would be written today".
I would expect this can cause memory pressure issues. Note that
in what context do we do memory estimation, taking into account rows prefetched by the JDBC driver? |
We don't do it today. Now that I re-read my comment actually even today the prefetch is not constant. For wider tables we prefetch more compared to narrow tables. So actually your change is probably better in this regard. This looks like a good starting point. We can iterate over time if someone finds issues here. Do you think it'd be useful to have a killswitch for sometime? |
yes, that's the idea
Sure, i can add one |
ae7226a
to
2f01b25
Compare
2f01b25
to
6cc400e
Compare
// Column count is not known for non-SELECT queries. Not setting fetch size for these. | ||
else if (columnCount.isPresent()) { | ||
statement.setFetchSize(min(100_000 / columnCount.get(), 1_000)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, we can avoid overriding the fetch size when the size is specified with defaultRowFetchSize
connection property in addition to this change? It will allow users configure the value in their side. We can retrieve the value with PgConnection#getDefaultFetchSize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like this is orthogonal, i.e. the existing code was setting the fetch size unconditionally, and this one sets fetch size unconditionally, just with a different value.
However, i don't think we should go into that direction at all.
if our intent is to let users configure the fetch size, we should have an explicit toggle rather than inspect fetch size provided in the JDBC URL string. Note however that giving users' control is good, but having our code be smarter is even better, and those things are at odds. i DO think we should desist desire to throw toggle at a problem.
I think the idea is good as we consider more factors while fetching the values. |
private ArrayMapping arrayMapping = ArrayMapping.DISABLED; | ||
private boolean includeSystemTables; | ||
private boolean enableStringPushdownWithCollate; | ||
|
||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this instead live in JdbcMetadataConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's applicable to only some connectors (postgresql, oracle, redshift)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comments from me and Yuya
see #16269 (comment) tl;dr: few users would benefit from a config toggle; many users may benefit from auto-adjusted value
@chenjian2664 i suppose in next iteration someone will do some experiments and propose a better formula. |
6cc400e
to
bebfcd3
Compare
There is a conflict with #16379 (just merged) |
bebfcd3
to
5b3bcca
Compare
Conflicts with #16616 (just merged), will rebase. |
PostgreSQL, Redshift and Oracle connectors had hard-coded fetch-size value of 1000. The value was found not to be optimal when server is far (high latency) or when number of columns selected is low. This commit improves in the latter case by picking fetch size automatically based on number of columns projected. After the change, the fetch size will be automatically picked in the range 1000 to 100,000.
5b3bcca
to
039eeb9
Compare
CI |
CI #16652 (again) |
PostgreSQL, Redshift and Oracle connectors had hard-coded fetch-size value of 1000. The value was found not to be optimal when server is far (high latency) or when number of columns selected is low. This commit improves in the latter case by picking fetch size automatically based on number of columns projected. After the change, the fetch size will be automatically picked in the range 1000 to 100,000.
Fixes #16153
Alternative to #16269