Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

work in batches of docs #1937

Merged
merged 2 commits into from
Mar 21, 2023
Merged

work in batches of docs #1937

merged 2 commits into from
Mar 21, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Mar 13, 2023

Work in batches of docs when collecting results

  • Adding collect_block to SegmentCollector for cases without scores.
  • Add specalized fill_buffer for AllQuery. The check in advance is quite expensive
  • Add missing fill_buffer to Box<TDocSet>
  • Fixed size buffer for fill_buffer as a hint to the compiler

The batch handling could also be done for scores, e.g. collect_block_with_score.

Performance improvements look really good

 aggregation::agg_tests::bench::bench_aggregation_average_f64                       8,475,539        5,351,862           -3,123,677  -36.86%   x 1.58 
 aggregation::agg_tests::bench::bench_aggregation_average_f64_multi                 14,934,251       11,492,754          -3,441,497  -23.04%   x 1.30 
 aggregation::agg_tests::bench::bench_aggregation_average_f64_opt                   13,295,030       9,724,984           -3,570,046  -26.85%   x 1.37 
 aggregation::agg_tests::bench::bench_aggregation_average_u64                       8,107,198        5,359,265           -2,747,933  -33.89%   x 1.51 
 aggregation::agg_tests::bench::bench_aggregation_average_u64_and_f64               12,963,225       9,566,766           -3,396,459  -26.20%   x 1.36 
 aggregation::agg_tests::bench::bench_aggregation_average_u64_and_f64_multi         23,682,025       22,689,063            -992,962   -4.19%   x 1.04 
 aggregation::agg_tests::bench::bench_aggregation_average_u64_and_f64_opt           20,925,034       18,479,274          -2,445,760  -11.69%   x 1.13 
 aggregation::agg_tests::bench::bench_aggregation_average_u64_multi                 13,663,622       11,543,616          -2,120,006  -15.52%   x 1.18 
 aggregation::agg_tests::bench::bench_aggregation_average_u64_opt                   13,323,016       9,832,037           -3,490,979  -26.20%   x 1.36 
 aggregation::agg_tests::bench::bench_aggregation_avg_and_range_with_avg            23,500,390       21,357,001          -2,143,389   -9.12%   x 1.10 
 aggregation::agg_tests::bench::bench_aggregation_avg_and_range_with_avg_multi      38,782,161       35,499,394          -3,282,767   -8.46%   x 1.09 
 aggregation::agg_tests::bench::bench_aggregation_avg_and_range_with_avg_opt        33,908,693       30,860,009          -3,048,684   -8.99%   x 1.10 
 aggregation::agg_tests::bench::bench_aggregation_histogram_only                    22,357,460       20,540,925          -1,816,535   -8.12%   x 1.09 
 aggregation::agg_tests::bench::bench_aggregation_histogram_only_hard_bounds        16,599,594       13,048,134          -3,551,460  -21.39%   x 1.27 
 aggregation::agg_tests::bench::bench_aggregation_histogram_only_hard_bounds_multi  20,886,318       17,816,289          -3,070,029  -14.70%   x 1.17 
 aggregation::agg_tests::bench::bench_aggregation_histogram_only_hard_bounds_opt    19,210,837       15,905,418          -3,305,419  -17.21%   x 1.21 
 aggregation::agg_tests::bench::bench_aggregation_histogram_only_multi              29,869,368       27,007,167          -2,862,201   -9.58%   x 1.11 
 aggregation::agg_tests::bench::bench_aggregation_histogram_only_opt                25,172,715       22,390,367          -2,782,348  -11.05%   x 1.12 
 aggregation::agg_tests::bench::bench_aggregation_histogram_with_avg                48,240,088       45,420,994          -2,819,094   -5.84%   x 1.06 
 aggregation::agg_tests::bench::bench_aggregation_histogram_with_avg_multi          68,637,350       66,299,311          -2,338,039   -3.41%   x 1.04 
 aggregation::agg_tests::bench::bench_aggregation_histogram_with_avg_opt            63,490,770       61,551,842          -1,938,928   -3.05%   x 1.03 
 aggregation::agg_tests::bench::bench_aggregation_range_only                        11,631,475       8,509,606           -3,121,869  -26.84%   x 1.37 
 aggregation::agg_tests::bench::bench_aggregation_range_only_multi                  16,446,888       13,241,036          -3,205,852  -19.49%   x 1.24 
 aggregation::agg_tests::bench::bench_aggregation_range_only_opt                    14,737,189       10,854,391          -3,882,798  -26.35%   x 1.36 
 aggregation::agg_tests::bench::bench_aggregation_range_with_avg                    20,722,099       16,146,332          -4,575,767  -22.08%   x 1.28 
 aggregation::agg_tests::bench::bench_aggregation_range_with_avg_multi              29,723,431       26,693,781          -3,029,650  -10.19%   x 1.11 
 aggregation::agg_tests::bench::bench_aggregation_range_with_avg_opt                25,390,323       23,517,092          -1,873,231   -7.38%   x 1.08 
 aggregation::agg_tests::bench::bench_aggregation_stats_f64                         8,244,255        5,338,127           -2,906,128  -35.25%   x 1.54 
 aggregation::agg_tests::bench::bench_aggregation_stats_f64_multi                   14,475,345       12,176,140          -2,299,205  -15.88%   x 1.19 
 aggregation::agg_tests::bench::bench_aggregation_stats_f64_opt                     12,830,282       10,377,016          -2,453,266  -19.12%   x 1.24 
 aggregation::agg_tests::bench::bench_aggregation_terms_few                         6,549,730        3,577,244           -2,972,486  -45.38%   x 1.83 
 aggregation::agg_tests::bench::bench_aggregation_terms_few_multi                   19,969,163       17,757,300          -2,211,863  -11.08%   x 1.12 
 aggregation::agg_tests::bench::bench_aggregation_terms_few_opt                     15,391,557       12,275,784          -3,115,773  -20.24%   x 1.25 
 aggregation::agg_tests::bench::bench_aggregation_terms_many2                       17,329,689       15,005,286          -2,324,403  -13.41%   x 1.15 
 aggregation::agg_tests::bench::bench_aggregation_terms_many2_multi                 36,266,867       33,543,039          -2,723,828   -7.51%   x 1.08 
 aggregation::agg_tests::bench::bench_aggregation_terms_many2_opt                   33,012,959       30,288,960          -2,723,999   -8.25%   x 1.09 
 aggregation::agg_tests::bench::bench_aggregation_terms_many_order_by_term          18,708,459       15,791,601          -2,916,858  -15.59%   x 1.18 
 aggregation::agg_tests::bench::bench_aggregation_terms_many_order_by_term_multi    37,182,700       34,490,866          -2,691,834   -7.24%   x 1.08 
 aggregation::agg_tests::bench::bench_aggregation_terms_many_order_by_term_opt      32,750,782       31,254,708          -1,496,074   -4.57%   x 1.05 
 aggregation::agg_tests::bench::bench_aggregation_terms_many_with_sub_agg           132,758,440      131,027,529         -1,730,911   -1.30%   x 1.01 
 aggregation::agg_tests::bench::bench_aggregation_terms_many_with_sub_agg_multi     203,791,080      212,675,305          8,884,225    4.36%   x 0.96 
 aggregation::agg_tests::bench::bench_aggregation_terms_many_with_sub_agg_opt       198,716,848      188,832,143         -9,884,705   -4.97%   x 1.05 
 aggregation::bucket::range::bench::bench_range_100_buckets                         288,427          282,725                 -5,702   -1.98%   x 1.02 
 aggregation::bucket::range::bench::bench_range_10_buckets                          154,329          149,996                 -4,333   -2.81%   x 1.03 

@PSeitz PSeitz requested a review from fulmicoton March 13, 2023 04:35
@codecov-commenter
Copy link

Codecov Report

Merging #1937 (116f2fb) into main (8459efa) will increase coverage by 0.01%.
The diff coverage is 85.86%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #1937      +/-   ##
==========================================
+ Coverage   94.46%   94.48%   +0.01%     
==========================================
  Files         309      309              
  Lines       56822    56825       +3     
==========================================
+ Hits        53676    53690      +14     
+ Misses       3146     3135      -11     
Impacted Files Coverage Δ
src/query/bitset/mod.rs 99.45% <ø> (ø)
src/query/vec_docset.rs 100.00% <ø> (ø)
src/aggregation/bucket/term_agg.rs 98.88% <50.00%> (+2.32%) ⬆️
src/aggregation/collector.rs 88.00% <56.25%> (-8.63%) ⬇️
src/query/boolean_query/boolean_weight.rs 93.50% <80.00%> (+0.06%) ⬆️
src/query/all_query.rs 93.93% <94.73%> (+0.18%) ⬆️
columnar/src/column_values/mod.rs 89.65% <100.00%> (ø)
src/aggregation/buf_collector.rs 52.27% <100.00%> (-15.73%) ⬇️
src/collector/mod.rs 54.63% <100.00%> (+1.36%) ⬆️
src/docset.rs 85.71% <100.00%> (+1.96%) ⬆️
... and 5 more

... and 4 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

}
//#[test]
// pub fn test_fill_buffer() {
// let doc_ids: Vec<DocId> = (1u32..210u32).collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should we do with this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reenabled it for the new block size and added another test for AllScorer

@PSeitz PSeitz merged commit 6a7a110 into main Mar 21, 2023
@PSeitz PSeitz deleted the batch_get_val branch March 21, 2023 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants