Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: add cache to get evaluation rules storage method #1910

Merged
merged 2 commits into from
Jul 27, 2023

Conversation

rudineirk
Copy link
Contributor

@rudineirk rudineirk commented Jul 26, 2023

this improves the flags evaluation performance, after some investigation we found the main cause of delays in evaluation requests were the rules query, in our case running Flipt with a PostgreSQL DB on AWS RDS we were getting up to 1s of delays, with this fix it reduced to a few miliseconds.

this structure could also be used later to implement the cache for the ListFlags mentioned in issue #935

flipt=> explain (analyze, buffers) SELECT r.id, r.namespace_key, r.flag_key, r.segment_key, s.match_type, r."rank", c.id, c.type, c.property, c.operator, c.value FROM rules r JOIN segments s ON (r.segment_key = s."key" AND r.namespace_key = s.namespace_key) LEFT JOIN constraints c ON (s."key" = c.segment_key AND s.namespace_key = c.namespace_key) WHERE (r.flag_key = 'test-flag' AND r.namespace_key = 'production') GROUP BY r.id, c.id, s.match_type ORDER BY r."rank" ASC;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=22.17..22.32 rows=59 width=2134) (actual time=0.227..0.230 rows=6 loops=1)
   Sort Key: r.rank
   Sort Method: quicksort  Memory: 26kB
   Buffers: shared hit=14
   ->  HashAggregate  (cost=19.85..20.44 rows=59 width=2134) (actual time=0.218..0.222 rows=6 loops=1)
         Group Key: r.id, c.id, s.match_type
         Batches: 1  Memory Usage: 32kB
         Buffers: shared hit=14
         ->  Hash Join  (cost=2.14..19.40 rows=59 width=2134) (actual time=0.036..0.212 rows=6 loops=1)
               Hash Cond: ((s.key)::text = (r.segment_key)::text)
               Buffers: shared hit=14
               ->  Hash Right Join  (cost=1.06..17.96 rows=59 width=1098) (actual time=0.016..0.163 rows=235 loops=1)
                     Hash Cond: (((c.namespace_key)::text = (s.namespace_key)::text) AND ((c.segment_key)::text = (s.key)::text))
                     Buffers: shared hit=13
                     ->  Seq Scan on constraints c  (cost=0.00..14.94 rows=235 width=107) (actual time=0.005..0.072 rows=235 loops=1)
                           Filter: ((namespace_key)::text = 'production'::text)
                           Buffers: shared hit=12
                     ->  Hash  (cost=1.05..1.05 rows=1 width=1036) (actual time=0.006..0.007 rows=4 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           Buffers: shared hit=1
                           ->  Seq Scan on segments s  (cost=0.00..1.05 rows=1 width=1036) (actual time=0.003..0.005 rows=4 loops=1)
                                 Filter: ((namespace_key)::text = 'production'::text)
                                 Buffers: shared hit=1
               ->  Hash  (cost=1.06..1.06 rows=1 width=2068) (actual time=0.017..0.017 rows=2 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     Buffers: shared hit=1
                     ->  Seq Scan on rules r  (cost=0.00..1.06 rows=1 width=2068) (actual time=0.013..0.014 rows=2 loops=1)
                           Filter: (((namespace_key)::text = 'production'::text) AND ((flag_key)::text = 'test-flag'::text))
                           Rows Removed by Filter: 2
                           Buffers: shared hit=1
 Planning Time: 0.240 ms
 Execution Time: 0.282 ms
(32 rows)

@rudineirk rudineirk requested a review from a team as a code owner July 26, 2023 17:25
@rudineirk rudineirk force-pushed the feat/add-evaluation-rules-cache branch 2 times, most recently from 347e65e to 9e86c1b Compare July 26, 2023 17:40
@rudineirk rudineirk changed the title feat: add cache to get evaluation rules storage method perf: add cache to get evaluation rules storage method Jul 26, 2023
Copy link
Collaborator

@markphelps markphelps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @rudineirk !! Definitely love any changes that help improve performance.

Ive been thinking about if moving the existing caching back to wrap the storage layer makes sense instead of the current interceptor/server cache that we have, so this is a good first step!

I'm trying to remember why I decided to move the caching to the server layer in the first place, as it was initially in the storage layer similar to this approach..🤔. Here is when it changed: https://github.com/flipt-io/flipt/pull/954/files#diff-ee870b8fa81722aba3a872f0a81e22f5e100f4595dda3a4de59b39e23e8e882bR120

I think it makes sense to rely on TTL cache expiration (like you are doing) since that's already what we rely on for the existing entityID evaluation cache.

Also wondering if we even need the existing evaluation cache after this change, as im not sure how much performance benefit we get there since this is likely the slowest part of the process as you described.. something for us to benchmark I suppose.

Have you seen any overhead performance-wise of the marshalling to/from JSON for large sets of EvaluationRules? Im guessing most flags don't actually have that many rules to begin with so maybe its a non-issue?

@rudineirk rudineirk force-pushed the feat/add-evaluation-rules-cache branch from 9e86c1b to 5dc0558 Compare July 26, 2023 18:31
@codecov
Copy link

codecov bot commented Jul 26, 2023

Codecov Report

Merging #1910 (a58d4eb) into main (112e86d) will increase coverage by 0.09%.
The diff coverage is 83.72%.

@@            Coverage Diff             @@
##             main    #1910      +/-   ##
==========================================
+ Coverage   71.21%   71.30%   +0.09%     
==========================================
  Files          60       61       +1     
  Lines        5777     5820      +43     
==========================================
+ Hits         4114     4150      +36     
- Misses       1431     1436       +5     
- Partials      232      234       +2     
Files Changed Coverage Δ
internal/cache/memory/cache.go 100.00% <ø> (ø)
internal/cache/redis/cache.go 63.63% <ø> (ø)
internal/cmd/grpc.go 0.00% <0.00%> (ø)
internal/server/middleware/grpc/middleware.go 69.96% <ø> (ø)
internal/storage/cache/cache.go 85.71% <85.71%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@rudineirk
Copy link
Contributor Author

rudineirk commented Jul 26, 2023

Thanks for the PR @rudineirk !! Definitely love any changes that help improve performance.

Ive been thinking about if moving the existing caching back to wrap the storage layer makes sense instead of the current interceptor/server cache that we have, so this is a good first step!

I'm trying to remember why I decided to move the caching to the server layer in the first place, as it was initially in the storage layer similar to this approach..thinking. Here is when it changed: https://github.com/flipt-io/flipt/pull/954/files#diff-ee870b8fa81722aba3a872f0a81e22f5e100f4595dda3a4de59b39e23e8e882bR120

I think it makes sense to rely on TTL cache expiration (like you are doing) since that's already what we rely on for the existing entityID evaluation cache.

Also wondering if we even need the existing evaluation cache after this change, as im not sure how much performance benefit we get there since this is likely the slowest part of the process as you described.. something for us to benchmark I suppose.

Have you seen any overhead performance-wise of the marshalling to/from JSON for large sets of EvaluationRules? Im guessing most flags don't actually have that many rules to begin with so maybe its a non-issue?

I think the final evaluation result cache is still valid, in our use case we make multiple evaluation requests over the same flag/entityId, the evaluation result cache helps with this, reducing a few ms from each request. The cache that is being added in this PR helps with evaluation requests with different entityIDs for the same flag, where we got the worst delays.

About the overhead of decoding a large JSON, it has a bit of impact CPU wise, but I think it's still way better than querying a lot of data from the DB/storage.

@rudineirk rudineirk requested a review from markphelps July 26, 2023 18:42
Copy link
Contributor

@yquansah yquansah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR indeed! Awesome to see users running the software and making improvements @rudineirk.

I am wondering if we should keep the same original semantics of serializing the rules as protobuf as we did EvaluationResponse, but if the JSON serialization is not too big of a CPU hit as you've stated, then maybe it is not worth worrying about that right now.

Code looks good though.

@rudineirk rudineirk force-pushed the feat/add-evaluation-rules-cache branch from 5dc0558 to 7333573 Compare July 26, 2023 19:57
@rudineirk
Copy link
Contributor Author

Thanks for the PR indeed! Awesome to see users running the software and making improvements @rudineirk.

I am wondering if we should keep the same original semantics of serializing the rules as protobuf as we did EvaluationResponse, but if the JSON serialization is not too big of a CPU hit as you've stated, then maybe it is not worth worrying about that right now.

Code looks good though.

I thought about that too, to use protobuf to serialize the cache, but it would require to convert all storage structs into protobuf, and I don't know if the performance gains over JSON would be worth it.

@rudineirk rudineirk requested review from markphelps and yquansah July 26, 2023 20:02
Copy link
Collaborator

@markphelps markphelps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @rudineirk ! added some thoughts around error handling

@rudineirk rudineirk force-pushed the feat/add-evaluation-rules-cache branch from 7333573 to 5135f2c Compare July 27, 2023 13:39
@rudineirk rudineirk requested a review from markphelps July 27, 2023 13:40
Copy link
Collaborator

@markphelps markphelps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you again @rudineirk !!

@markphelps
Copy link
Collaborator

@all-contributors please add @rudineirk for code

@allcontributors
Copy link
Contributor

@markphelps

I've put up a pull request to add @rudineirk! 🎉

@markphelps markphelps enabled auto-merge (squash) July 27, 2023 17:46
@markphelps
Copy link
Collaborator

will go out in v1.24.0, likely Monday 🎉

@markphelps markphelps merged commit ef8c772 into flipt-io:main Jul 27, 2023
markphelps added a commit that referenced this pull request Jul 27, 2023
* 'main' of https://github.com/flipt-io/flipt:
  docs: add rudineirk as a contributor for code (#1913)
  perf: add cache to get evaluation rules storage method (#1910)
  fix: renable pprof (#1912)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants