Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: use term hashmap in fastfield #2243

Merged
merged 5 commits into from
Nov 9, 2023
Merged

Perf: use term hashmap in fastfield #2243

merged 5 commits into from
Nov 9, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Nov 9, 2023

Faster indexing + Lower Memory consumption + Less allocations

  • add shared arena hashmap
  • bench fastfield indexing
  • use shared arena hashmap in columnar
  • lower minimum resize to 1<<3 in term hashmap
 index-hdfs/index-hdfs-no-commit-only-FAST
                        time:   [97.217 ms 98.820 ms 100.69 ms]
                        thrpt:  [212.22 MiB/s 216.24 MiB/s 219.81 MiB/s]
                 change:
                        time:   [-43.555% -40.996% -38.348%] (p = 0.00 < 0.05)
                        thrpt:  [+62.202% +69.480% +77.163%]
                        Performance has improved.

index-hdfs/index-hdfs-with-commit-only-FAST
                        time:   [177.69 ms 182.69 ms 188.42 ms]
                        thrpt:  [113.41 MiB/s 116.97 MiB/s 120.26 MiB/s]
                 change:
                        time:   [-22.793% -20.433% -17.976%] (p = 0.00 < 0.05)
                        thrpt:  [+21.915% +25.679% +29.521%]
                        Performance has improved.

index-hdfs/index-hdfs-no-commit-json-without-docstore (no ff)
                        time:   [453.40 ms 465.27 ms 477.92 ms]
                        thrpt:  [44.712 MiB/s 45.928 MiB/s 47.131 MiB/s]
                 change:
                        time:   [-4.2451% -1.3006% +1.9388%] (p = 0.44 > 0.05)
                        thrpt:  [-1.9019% +1.3177% +4.4334%]
                        No change in performance detected.

index-gh/index-gh-no-commit
                        time:   [48.227 ms 48.766 ms 49.341 ms]
                        thrpt:  [45.815 MiB/s 46.356 MiB/s 46.873 MiB/s]
                 change:
                        time:   [-23.855% -22.003% -20.074%] (p = 0.00 < 0.05)
                        thrpt:  [+25.116% +28.209% +31.328%]
                        Performance has improved.

index-gh/index-gh-fast  time:   [18.293 ms 18.679 ms 19.089 ms]
                        thrpt:  [118.42 MiB/s 121.03 MiB/s 123.57 MiB/s]
                 change:
                        time:   [-26.113% -24.553% -22.710%] (p = 0.00 < 0.05)
                        thrpt:  [+29.383% +32.543% +35.342%]
                        Performance has improved.


index-gh/index-gh-with-commit
                        time:   [101.85 ms 103.74 ms 105.75 ms]
                        thrpt:  [21.376 MiB/s 21.791 MiB/s 22.196 MiB/s]
                 change:
                        time:   [-17.899% -15.228% -12.681%] (p = 0.00 < 0.05)
                        thrpt:  [+14.523% +17.963% +21.802%]
                        Performance has improved.


index-wiki/index-wiki-no-commit
                        time:   [26.359 ms 26.873 ms 27.390 ms]
                        thrpt:  [40.285 MiB/s 41.061 MiB/s 41.861 MiB/s]
                 change:
                        time:   [-7.2213% -5.2060% -3.5958%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7299% +5.4919% +7.7834%]
                        Performance has improved.

index-wiki/index-wiki-with-commit
                        time:   [52.138 ms 52.674 ms 53.206 ms]
                        thrpt:  [20.739 MiB/s 20.948 MiB/s 21.164 MiB/s]
                 change:
                        time:   [-11.533% -10.166% -8.7357%] (p = 0.00 < 0.05)
                        thrpt:  [+9.5719% +11.316% +13.036%]
                        Performance has improved.

@PSeitz PSeitz changed the title use term hashmap in fastfield text Perf: use term hashmap in fastfield text Nov 9, 2023
@PSeitz PSeitz changed the title Perf: use term hashmap in fastfield text Perf: use term hashmap in fastfield Nov 9, 2023
@PSeitz PSeitz requested a review from fulmicoton November 9, 2023 04:07
@@ -57,60 +11,8 @@ impl KeyValue {
/// the computation of the hash of the key twice,
/// or copying the key as long as there is no insert.
pub struct ArenaHashMap {
table: Vec<KeyValue>,
shared_arena_hashmap: SharedArenaHashMap,
Copy link
Collaborator

@fulmicoton fulmicoton Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you relationship between ArenaHashMap vs SharedArenaHashMap?

Copy link
Contributor Author

@PSeitz PSeitz Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to clarify

/// SharedArenaHashMap is like ArenaHashMap but gets the memory arena
/// passed as an argument to the methods.
/// So one MemoryArena can be shared with multiple SharedArenaHashMap.

@PSeitz PSeitz merged commit 927b443 into main Nov 9, 2023
4 checks passed
@PSeitz PSeitz deleted the term_hash_map branch November 9, 2023 12:44
PSeitz added a commit that referenced this pull request Apr 10, 2024
* add shared arena hashmap

* bench fastfield indexing

* use shared arena hashmap in columnar

lower minimum resize in hashtable

* clippy

* add comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants