Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw a runtime error if the memory allocated to GroupByHash exceeds a limit #3940

Closed
Tracked by #1570
alamb opened this issue Oct 24, 2022 · 1 comment · Fixed by #4405
Closed
Tracked by #1570

Throw a runtime error if the memory allocated to GroupByHash exceeds a limit #3940

alamb opened this issue Oct 24, 2022 · 1 comment · Fixed by #4405

Comments

@alamb
Copy link
Contributor

alamb commented Oct 24, 2022

No description provided.

@alamb alamb changed the title Throw a runtime error if the memory allocated to GroupByHash exceeds some limit Throw a runtime error if the memory allocated to GroupByHash exceeds a limit Oct 24, 2022
@crepererum
Copy link
Contributor

I'm working on this.

crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Nov 14, 2022
Most of it is refactoring to allow us to call the async memory subsystem
while polling the stream. The actual memory accounting is rather easy
(since it's only ever growing except when the stream is dropped).

Helps with apache#3940. (not closing yet, also need to do V1)

Performance Impact:
-------------------

```text
❯ cargo bench -p datafusion --bench aggregate_query_sql -- --baseline issue3940a-pre
    Finished bench [optimized] target(s) in 0.08s
     Running benches/aggregate_query_sql.rs (target/release/deps/aggregate_query_sql-e9e315ab7a06a262)
aggregate_query_no_group_by 15 12
                        time:   [654.77 µs 655.49 µs 656.29 µs]
                        change: [-1.6711% -1.2910% -0.8435%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

aggregate_query_no_group_by_min_max_f64
                        time:   [579.93 µs 580.59 µs 581.27 µs]
                        change: [-3.8985% -3.2219% -2.6198%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

aggregate_query_no_group_by_count_distinct_wide
                        time:   [2.4610 ms 2.4801 ms 2.4990 ms]
                        change: [-2.9300% -1.8414% -0.7493%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Benchmarking aggregate_query_no_group_by_count_distinct_narrow: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
aggregate_query_no_group_by_count_distinct_narrow
                        time:   [1.6578 ms 1.6661 ms 1.6743 ms]
                        change: [-4.5391% -3.5033% -2.5050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

aggregate_query_group_by
                        time:   [2.1767 ms 2.2045 ms 2.2486 ms]
                        change: [-4.1048% -2.5858% -0.3237%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking aggregate_query_group_by_with_filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter
                        time:   [1.0916 ms 1.0927 ms 1.0941 ms]
                        change: [-0.8524% -0.4230% -0.0724%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

aggregate_query_group_by_u64 15 12
                        time:   [2.2108 ms 2.2238 ms 2.2368 ms]
                        change: [-4.2142% -3.2743% -2.3523%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking aggregate_query_group_by_with_filter_u64 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter_u64 15 12
                        time:   [1.0922 ms 1.0931 ms 1.0940 ms]
                        change: [-0.6872% -0.3192% +0.1193%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  4 (4.00%) high severe

aggregate_query_group_by_u64_multiple_keys
                        time:   [14.714 ms 15.023 ms 15.344 ms]
                        change: [-5.8337% -2.7471% +0.2798%] (p = 0.09 > 0.05)
                        No change in performance detected.

aggregate_query_approx_percentile_cont_on_u64
                        time:   [3.7776 ms 3.8049 ms 3.8329 ms]
                        change: [-4.4977% -3.4230% -2.3282%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

aggregate_query_approx_percentile_cont_on_f32
                        time:   [3.1769 ms 3.1997 ms 3.2230 ms]
                        change: [-4.4664% -3.2597% -2.0955%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
```

I think the mild improvements are either flux or due to the somewhat
manual memory allocation pattern.
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Nov 14, 2022
Most of it is refactoring to allow us to call the async memory subsystem
while polling the stream. The actual memory accounting is rather easy
(since it's only ever growing except when the stream is dropped).

Helps with apache#3940. (not closing yet, also need to do V1)

Performance Impact:
-------------------

```text
❯ cargo bench -p datafusion --bench aggregate_query_sql -- --baseline issue3940a-pre
    Finished bench [optimized] target(s) in 0.08s
     Running benches/aggregate_query_sql.rs (target/release/deps/aggregate_query_sql-e9e315ab7a06a262)
aggregate_query_no_group_by 15 12
                        time:   [654.77 µs 655.49 µs 656.29 µs]
                        change: [-1.6711% -1.2910% -0.8435%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

aggregate_query_no_group_by_min_max_f64
                        time:   [579.93 µs 580.59 µs 581.27 µs]
                        change: [-3.8985% -3.2219% -2.6198%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

aggregate_query_no_group_by_count_distinct_wide
                        time:   [2.4610 ms 2.4801 ms 2.4990 ms]
                        change: [-2.9300% -1.8414% -0.7493%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Benchmarking aggregate_query_no_group_by_count_distinct_narrow: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
aggregate_query_no_group_by_count_distinct_narrow
                        time:   [1.6578 ms 1.6661 ms 1.6743 ms]
                        change: [-4.5391% -3.5033% -2.5050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

aggregate_query_group_by
                        time:   [2.1767 ms 2.2045 ms 2.2486 ms]
                        change: [-4.1048% -2.5858% -0.3237%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking aggregate_query_group_by_with_filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter
                        time:   [1.0916 ms 1.0927 ms 1.0941 ms]
                        change: [-0.8524% -0.4230% -0.0724%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

aggregate_query_group_by_u64 15 12
                        time:   [2.2108 ms 2.2238 ms 2.2368 ms]
                        change: [-4.2142% -3.2743% -2.3523%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking aggregate_query_group_by_with_filter_u64 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter_u64 15 12
                        time:   [1.0922 ms 1.0931 ms 1.0940 ms]
                        change: [-0.6872% -0.3192% +0.1193%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  4 (4.00%) high severe

aggregate_query_group_by_u64_multiple_keys
                        time:   [14.714 ms 15.023 ms 15.344 ms]
                        change: [-5.8337% -2.7471% +0.2798%] (p = 0.09 > 0.05)
                        No change in performance detected.

aggregate_query_approx_percentile_cont_on_u64
                        time:   [3.7776 ms 3.8049 ms 3.8329 ms]
                        change: [-4.4977% -3.4230% -2.3282%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

aggregate_query_approx_percentile_cont_on_f32
                        time:   [3.1769 ms 3.1997 ms 3.2230 ms]
                        change: [-4.4664% -3.2597% -2.0955%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
```

I think the mild improvements are either flux or due to the somewhat
manual memory allocation pattern.
alamb pushed a commit that referenced this issue Nov 18, 2022
…upedHashAggregateStreamV2` (Row Hash) (#4202)

* refactor: remove needless async

* feat: wire memory management into `GroupedHashAggregateStreamV2`

Most of it is refactoring to allow us to call the async memory subsystem
while polling the stream. The actual memory accounting is rather easy
(since it's only ever growing except when the stream is dropped).

Helps with #3940. (not closing yet, also need to do V1)

Performance Impact:
-------------------

```text
❯ cargo bench -p datafusion --bench aggregate_query_sql -- --baseline issue3940a-pre
    Finished bench [optimized] target(s) in 0.08s
     Running benches/aggregate_query_sql.rs (target/release/deps/aggregate_query_sql-e9e315ab7a06a262)
aggregate_query_no_group_by 15 12
                        time:   [654.77 µs 655.49 µs 656.29 µs]
                        change: [-1.6711% -1.2910% -0.8435%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

aggregate_query_no_group_by_min_max_f64
                        time:   [579.93 µs 580.59 µs 581.27 µs]
                        change: [-3.8985% -3.2219% -2.6198%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

aggregate_query_no_group_by_count_distinct_wide
                        time:   [2.4610 ms 2.4801 ms 2.4990 ms]
                        change: [-2.9300% -1.8414% -0.7493%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Benchmarking aggregate_query_no_group_by_count_distinct_narrow: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
aggregate_query_no_group_by_count_distinct_narrow
                        time:   [1.6578 ms 1.6661 ms 1.6743 ms]
                        change: [-4.5391% -3.5033% -2.5050%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

aggregate_query_group_by
                        time:   [2.1767 ms 2.2045 ms 2.2486 ms]
                        change: [-4.1048% -2.5858% -0.3237%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking aggregate_query_group_by_with_filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter
                        time:   [1.0916 ms 1.0927 ms 1.0941 ms]
                        change: [-0.8524% -0.4230% -0.0724%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

aggregate_query_group_by_u64 15 12
                        time:   [2.2108 ms 2.2238 ms 2.2368 ms]
                        change: [-4.2142% -3.2743% -2.3523%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking aggregate_query_group_by_with_filter_u64 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter_u64 15 12
                        time:   [1.0922 ms 1.0931 ms 1.0940 ms]
                        change: [-0.6872% -0.3192% +0.1193%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  4 (4.00%) high severe

aggregate_query_group_by_u64_multiple_keys
                        time:   [14.714 ms 15.023 ms 15.344 ms]
                        change: [-5.8337% -2.7471% +0.2798%] (p = 0.09 > 0.05)
                        No change in performance detected.

aggregate_query_approx_percentile_cont_on_u64
                        time:   [3.7776 ms 3.8049 ms 3.8329 ms]
                        change: [-4.4977% -3.4230% -2.3282%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

aggregate_query_approx_percentile_cont_on_f32
                        time:   [3.1769 ms 3.1997 ms 3.2230 ms]
                        change: [-4.4664% -3.2597% -2.0955%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
```

I think the mild improvements are either flux or due to the somewhat
manual memory allocation pattern.

* refactor: simplify memory accounting

* refactor: de-couple memory allocation
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Nov 25, 2022
@alamb alamb closed this as completed in be1d376 Nov 28, 2022
crepererum added a commit to crepererum/arrow-datafusion that referenced this issue Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants