-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow multiple incremental commits in optimize #1621
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
66c5045
to
acca200
Compare
let now = Instant::now(); | ||
if !actions.is_empty() && (self.min_commit_interval.map_or(false, |i| now.duration_since(last_commit) > i) || end) { | ||
let actions = std::mem::take(&mut actions); | ||
last_commit = now; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if this is updated before the commit success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters much, because min_commit_interval
is supposed to be much larger than the time it takes to do a commit.
Should we move those out of |
39a12bb
to
236c86a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach seems reasonable so far.
Yeah that sounds like a good idea 👍 |
236c86a
to
4c09037
Compare
Done. |
@kvap is this option also exposed to the python bindings? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see one more change to the tests before merging.
@ion-elgreco I've created a follow up issue for the Python bindings: #1640
max_concurrent_tasks: usize, | ||
#[allow(unused_variables)] // used behind a feature flag | ||
max_spill_size: usize, | ||
min_commit_interval: Option<Duration>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now, but eventually we might want to put these execution settings in a struct.
let maybe_metrics = plan | ||
.execute(dt.object_store(), &dt.state, 1, 20, None) | ||
.await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add at least one test where we pass in a commit interval? even if it doesn't make an intermediate commit, it would be good to know those code paths run through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test where an intermediate commit is actually created.
Currently "optimize" executes the whole plan in one commit, which might fail. The larger the table, the more likely it is to fail and the more expensive the failure is. Add an option in OptimizeBuilder that allows specifying a commit interval. If that is provided, the plan executor will periodically commit the accumulated actions.
4c09037
to
15892d9
Compare
Currently "optimize" executes the whole plan in one commit, which might fail. The larger the table, the more likely it is to fail and the more expensive the failure is.
Add an option in OptimizeBuilder that allows specifying a commit interval. If that is provided, the plan executor will periodically commit the accumulated actions.