Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: change zone positions to phmap::flat_hash_set #3210

Merged
merged 2 commits into from
Jan 4, 2025

Conversation

dudantas
Copy link
Member

@dudantas dudantas commented Jan 4, 2025

Description

This PR replaces std::unordered_set with phmap::flat_hash_set in the Zone class to improve performance by reducing hash collisions and optimizing memory usage.

Motivation

The use of std::unordered_set for the positions attribute caused performance issues due to hash collisions, especially when managing large datasets or heavily clustered data. Collisions result in increased lookup, insertion, and deletion times as the hash table degrades into a linked list traversal for conflicting buckets.

Switching to phmap::flat_hash_set provides several benefits:

  1. Reduced hash collisions: The hashing strategy used by phmap::flat_hash_set significantly reduces the likelihood of collisions, improving lookup and insertion performance.
  2. Improved memory locality: Unlike std::unordered_set, which uses separate allocations for each bucket, flat_hash_set uses a contiguous memory layout, enhancing cache efficiency and reducing overhead.
  3. Faster operations: Benchmarks show that flat_hash_set outperforms std::unordered_set in scenarios with frequent insertions, lookups, and deletions due to its optimized design.

Behaviour

Actual

  • High collision rates with std::unordered_set degrade performance, especially for large datasets or when hash functions produce similar values for certain inputs.

Expected

  • Switching to phmap::flat_hash_set improves performance by reducing collisions, enhancing memory locality, and speeding up operations.

Type of change

  • Performance improvement (non-breaking change which improves efficiency)

How Has This Been Tested

  • Benchmarked lookup, insertion, and deletion operations with datasets of varying sizes to compare performance before and after the change.
  • Verified correctness and consistency of data handling after the change.

Tests performed:

  • Insertion of large position sets.
  • Lookups for positions at various scales.
  • Removal of positions and validation of the internal state.

Checklist

  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I checked the PR checks reports.
  • I have commented my code, particularly in hard-to-understand areas.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.

@murilo09
Copy link
Contributor

murilo09 commented Jan 4, 2025

Maybe fixes #2568

@dudantas
Copy link
Member Author

dudantas commented Jan 4, 2025

Maybe fixes #2568

This pr is only for better performance. The cause from issue is probably caused by reload or bad config.

@majestyotbr majestyotbr merged commit 53015a3 into main Jan 4, 2025
22 of 25 checks passed
@majestyotbr majestyotbr deleted the dudantas/perf-zones-hash branch January 4, 2025 23:59
Copy link

sonarqubecloud bot commented Jan 5, 2025

vllworldbuilding pushed a commit to vllworldbuilding/canary that referenced this pull request Jan 10, 2025
This replaces `std::unordered_set` with `phmap::flat_hash_set` in the
`Zone` class to improve performance by reducing hash collisions and
optimizing memory usage.

Motivation:

The use of `std::unordered_set` for the `positions` attribute caused
performance issues due to hash collisions, especially when managing
large datasets or heavily clustered data. Collisions result in increased
lookup, insertion, and deletion times as the hash table degrades into a
linked list traversal for conflicting buckets.

Switching to `phmap::flat_hash_set` provides several benefits:
1. Reduced hash collisions: The hashing strategy used by
`phmap::flat_hash_set` significantly reduces the likelihood of
collisions, improving lookup and insertion performance.
2. Improved memory locality: Unlike `std::unordered_set`, which uses
separate allocations for each bucket, `flat_hash_set` uses a contiguous
memory layout, enhancing cache efficiency and reducing overhead.
3. Faster operations: Benchmarks show that `flat_hash_set`
outperforms `std::unordered_set` in scenarios with frequent insertions,
lookups, and deletions due to its optimized design.
lucas-caminha pushed a commit to lucas-caminha/canary that referenced this pull request Jan 19, 2025
This replaces `std::unordered_set` with `phmap::flat_hash_set` in the
`Zone` class to improve performance by reducing hash collisions and
optimizing memory usage.

Motivation:

The use of `std::unordered_set` for the `positions` attribute caused
performance issues due to hash collisions, especially when managing
large datasets or heavily clustered data. Collisions result in increased
lookup, insertion, and deletion times as the hash table degrades into a
linked list traversal for conflicting buckets.

Switching to `phmap::flat_hash_set` provides several benefits:
1. Reduced hash collisions: The hashing strategy used by
`phmap::flat_hash_set` significantly reduces the likelihood of
collisions, improving lookup and insertion performance.
2. Improved memory locality: Unlike `std::unordered_set`, which uses
separate allocations for each bucket, `flat_hash_set` uses a contiguous
memory layout, enhancing cache efficiency and reducing overhead.
3. Faster operations: Benchmarks show that `flat_hash_set`
outperforms `std::unordered_set` in scenarios with frequent insertions,
lookups, and deletions due to its optimized design.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants