Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Indexed symbol search: definitive performance regression #5364

Closed
emidoots opened this issue Aug 26, 2019 · 13 comments
Closed

Indexed symbol search: definitive performance regression #5364

emidoots opened this issue Aug 26, 2019 · 13 comments
Assignees
Labels
regression Regressions in Sourcegraph

Comments

@emidoots
Copy link
Member

emidoots commented Aug 26, 2019

Workaround

When upgrading to Sourcegraph v3.7.1, set the following in your site configuration to avoid opting into the new symbol search implementation:

"search.index.symbols.enabled": false,

Problem

After we introduced indexed symbol search in v3.7.1, both symbol and text search performance appears to have regressed. This issue is to investigate and fix that.

Note in the following that:

  1. All tests ran solely on their own for 17 minutes each (one test cannot affect the other).
  2. The name field indicates the exact search query that was being ran.
  3. The P95 change (orange) column indicates the percentage change in 95th percentile query duration after migrating from Sourcegraph v3.5.2 to v3.7.1. Positive numbers are performance regressions, negative numbers are performance gains.

All queries were ran against theoa.sgdev.org over 10h period using vegeta and the CSV comparison report was generated using kingkai. The instance has 12,981 repositories In total cloned and indexed fully, and 42GB of data was generated by vegeta (happy to upload this somewhere if anyone is interested). I eliminated all external factors that I could not contribute to the code / design itself, i.e.:

  • All repositories were completely cloned and reindexed.
  • The "before" column actually indicates when Zoekt was constrained in CPU resources and Jaeger was enabled, the "after" column is actually after me doubling the CPU availability count and disabling Jaeger due to proven performance concerns with it. Given this, I expected to see substantially better perf, but did not.

Results: https://docs.google.com/spreadsheets/d/12YpKTP58FOIUkFqrdBotlDJHyinu7-itiY-Hd0xuCoY/edit#gid=2136071970

@emidoots emidoots added search regression Regressions in Sourcegraph labels Aug 26, 2019
@emidoots
Copy link
Member Author

Some notes on resource consumption of zoekt-webserver in this case:

  • The "before" numbers were running v3.5.2 with zoekt-webserver having 8 CPU and 80G memory.
  • The "after" numbers I thought were zoekt-webserver with 16 CPU, 100G memory (that is what I intended to run with), but it appears I actually did this before bumping the resources based on graphs below so it could have been memory constrained during this test. I'll retest the regressed query benchmarks later with the 100G memory to eliminate this as a cause.

During the testing, this is how CPU usage looked (the spike shows 100% of 7.7 CPUs consumed):

image

And how memory usage looked (GiB):

image

Residential memory was only about half that (indicating maybe we can further tune GOGC or something):

image

No swapping was occurring, though:

image

And network bytes in/out in mbps:

image

image

@emidoots
Copy link
Member Author

Note: I suspect the easiest way to make forward progress on this will be to run vegeta against k8s.sgdev.org or a local dev server and find a similarly poor-performing query.

@keegancsmith
Copy link
Member

keegancsmith commented Aug 27, 2019

This is surprising. My best guess is we are using more ram due to storing symbol information offsets in memory now (previously we never ran ctags). This leaves less RAM for the FS cache => more often paging in memory when reading the posting lists.

But that is just a guess. Best thing to do is run a profile with a regressed pod. What exactly is the CPU and RAM limits/requests btw? Any chance we were indexing while these tests were running (in either case).

Another thing is you don't specify index:only in the symbol search, which means we may be falling back to the old symbol search in the new version. However, the more important regression is how this has affected normal indexed text search.

An interesting thing to first rule out is to run the new zoekt-webserver, but only on v15 indexes. And see if we still have a perf regression. This may indicate some other surprising change that happened, rather than it being due to symbols.

@kzh can you comment on this and create a plan of action. We can sync sometime this week and do some investigation together if you like. I see you are on this from some chat on slack, but some visibility for others would be great (so please comment here :) )

I suggest we have a disk snapshot of v15 and v16 indexes so we can test both relatively easily. Alternatively we can temporarily run indexed-search twice, and just update the svc/indexed-server to change which endpoint it points to.

@emidoots
Copy link
Member Author

This leaves less RAM for the FS cache => more often paging in memory when reading the posting lists.

Not saying this isn't a possible explanation, but the graph above shows there is no swapping taking place (at least according to cadviser / docker).

What exactly is the CPU and RAM limits/requests btw?

Since this is a deploy-sourcegraph-docker deployment, they are both the same (just limits). zoekt-webserver

  • Before (v3.5.2): 8 CPUs, 80 GB memory
  • After (v3.7.1): 16 CPUs, 100 GB memory

Given the CPU bump, it's especially surprising I am seeing these results.

Another thing is you don't specify index:only in the symbol search, which means we may be falling back to the old symbol search in the new version. However, the more important regression is how this has affected normal indexed text search.

I am not specifying index:only BUT I am 100% confident that all repositories on the instance were both cloned and fully indexed, with no updates or reindexing occurring during, when I ran the tests.

An interesting thing to first rule out is to run the new zoekt-webserver, but only on v15 indexes.

If I understand correctly, this isn't possible without a code change because the new zoekt-webserver deletes the v15 indexes and generates the v16 ones, is that right? I assume we just need to add an env var or create a debug image with that disabled?

This update may also interest you:

@kevin interesting (albeit probably not surprising) update here, with the "search.index.symbols.enabled": false, flag off, after reindexing occurs and I run another load test, memory usage reduces down from the previous 80 GB -> 40 GB and RSS/residential memory from 53 GB -> 27 GB

Also note that:

  1. There weren't any restarts of containers / OOMs so far as I can tell, and we're not running out of memory.
  2. These benchmarks were from v3.5.2 -> v3.7.1. I'll eliminate that explanation next by comparing v3.6.2 -> v3.7.x.

@kzh
Copy link
Contributor

kzh commented Aug 28, 2019

@keegancsmith

My best guess is we are using more ram due to storing symbol information offsets in memory now (previously we never ran ctags). This leaves less RAM for the FS cache => more often paging in memory when reading the posting lists.

Initially, I thought this was the case too. However, we did another load test after Stephen turned off the feature flag and waited for all the repositories to reindex. This would mean the bulk of what was added (the ctags information) to the index file would no longer be present. There were similar results showing performance regression for text search. I'm not sure how long it would take for zoekt-webserver to reload all the shards into mmap'ed memory.

@kzh can you comment on this and create a plan of action.

Next steps will be running profiling via pprof while the vegeta load tests are happening. This will probably be done for the k8s instance.

$ kubectl port-forward svc/indexed-search 6070
# (unsure of what duration, will experiment, maybe for the duration of the tests 17m?)
$ go tool pprof -http 8080 localhost:6070/debug/pprof/profile?seconds=1020
< run load tests>
$ go tool pprof -http 8081 localhost:6070/debug/pprof/heap

From this, we should be able to figure out what is hanging during the high QPS (100) tests.

I suggest we have a disk snapshot of v15 and v16 indexes so we can test both relatively easily. Alternatively we can temporarily run indexed-search twice, and just update the svc/indexed-server to change which endpoint it points to.

I'll try this out. However, the v16 indexes with ctags disabled would be very similar in storage/memory/cpu to the v15 indexes anyways.

@keegancsmith
Copy link
Member

Plan sounds good. I'd caution against making the vegeta concurrency too high, since profiles become harder to read. Use the lowest QPS rate with a regression. cc @sourcegraph/core-services

@tsenart
Copy link
Contributor

tsenart commented Aug 28, 2019

report was generated using kingkai

Oh hello there, new tool, I like it!

@emidoots
Copy link
Member Author

I'd caution against making the vegeta concurrency too high, since profiles become harder to read. Use the lowest QPS rate with a regression.

Definitely agree, I plan to redo these load tests with this actually to eliminate that concern since our behavior can be quite erratic with even 1 QPS on very slow / heavy searches.

I found @tsenart added support for sub-1 QPS rates in tsenart/vegeta#423 last night which I've found quite useful -- i.e.:

vegeta attack [...] -rate=0 -max-workers=1 [...]

maybe I'll send a PR to note that in vegeta's README.

@tsenart
Copy link
Contributor

tsenart commented Aug 29, 2019

I found @tsenart added support for sub-1 QPS rates in tsenart/vegeta#423 last night which I've found quite useful -- i.e.:

It's not sub-1 QPS. It simply sends requests as fast as possible, when -rate=0. -max-workers allows you to define a maximum concurrency level.

@emidoots
Copy link
Member Author

yeah, that was poor wording on my part -- thanks for the clarification. I said sub-1 QPS because I was thinking specifically about -rate=0 -max-workers=1 with requests of consistent duration >1s, in which case it is sub-1 QPS -- but I see now that was a roundabout / poor way to explain it.

@kzh
Copy link
Contributor

kzh commented Aug 30, 2019

Update: I'm currently working on a bunch of memory optimizations for Zoekt symbol search. This regression is probably due to what Keegan mentioned about increased paging since there are now more contenders for memory (being the symbols data). This regression can probably be easily fixed by increasing memory, but before falling back to that option, I want to see if the memory optimizations will do the trick or at least significantly lessen the necessary memory increase.

Some random observations that I found interesting:

  • The very first search after zoekt-webserver is started (after kubectl delete pod <indexed-search pod>) will always be slow (ex: 10s). I think this is due to all the indexes initially being loaded via mmap which acts somewhat like a cache (initially empty) in physical memory that does lazy fetching (paging) from disk. Afterwards, repeating that search will be a lot faster (ex: ~0.5s) since all the relevant data has already been fetched from disk. However, if in the future, there is a greater demand for memory, the current memory in the cache will be invalidated to make room. Most likely, this is what's happening with the high memory demand tests, a barrage of paging due to insufficient memory. Also, this would mean one test query can affect another.

@emidoots emidoots added this to the 3.7.2 patch release milestone Sep 4, 2019
@emidoots
Copy link
Member Author

emidoots commented Sep 5, 2019

Update: Kevin and I spent a while re-running some independent benchmarks, it appears as though the cases where performance looked worse was entirely due to factors I hadn't accounted for in my benchmarks. After accounting for the too-high QPS, avoiding running the requests in parallel, and a few other things, it does appear perf is generally the same.

I'll post a further update here tomorrow with numbers from an overnight load test.

@emidoots
Copy link
Member Author

Took me longer than I thought to make sense of the numbers, but I am finally confident enough in them to close this out. Definitive isn't so definitive, and load testing is H.A.R.D..

Final load test comparison between v3.6.2 and v3.7.1: https://docs.google.com/spreadsheets/d/1oPzePjD8YLrnppLm3nk46h48_Cxipz4_QqRMBYaIOYQ/edit?usp=sharing

Quoting my summary in there:

  • Large symbol searches roughly 20x faster, performance and throughput on-par with indexed text search.
  • Some 0-2s symbol searches have worse throughput, but better non-truncated results
  • Text search performance roughly the same, marginally worse in some cases

Thank you @kzh and @keegancsmith for you diligence here in helping me look into this (and sorry this turned out to be a waste of your time!) -- I've learned a lot here so I can do this better in the future and have some ideas to make our search more load-testable seamlessly I'll propose later.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
regression Regressions in Sourcegraph
Projects
None yet
Development

No branches or pull requests

4 participants