Indexed symbol search: definitive performance regression #5364

emidoots · 2019-08-26T07:45:29Z

Workaround

When upgrading to Sourcegraph v3.7.1, set the following in your site configuration to avoid opting into the new symbol search implementation:

"search.index.symbols.enabled": false,

Problem

After we introduced indexed symbol search in v3.7.1, both symbol and text search performance appears to have regressed. This issue is to investigate and fix that.

Note in the following that:

All tests ran solely on their own for 17 minutes each (one test cannot affect the other).
The name field indicates the exact search query that was being ran.
The P95 change (orange) column indicates the percentage change in 95th percentile query duration after migrating from Sourcegraph v3.5.2 to v3.7.1. Positive numbers are performance regressions, negative numbers are performance gains.

All queries were ran against theoa.sgdev.org over 10h period using vegeta and the CSV comparison report was generated using kingkai. The instance has 12,981 repositories In total cloned and indexed fully, and 42GB of data was generated by vegeta (happy to upload this somewhere if anyone is interested). I eliminated all external factors that I could not contribute to the code / design itself, i.e.:

All repositories were completely cloned and reindexed.
The "before" column actually indicates when Zoekt was constrained in CPU resources and Jaeger was enabled, the "after" column is actually after me doubling the CPU availability count and disabling Jaeger due to proven performance concerns with it. Given this, I expected to see substantially better perf, but did not.

Results: https://docs.google.com/spreadsheets/d/12YpKTP58FOIUkFqrdBotlDJHyinu7-itiY-Hd0xuCoY/edit#gid=2136071970

The text was updated successfully, but these errors were encountered:

emidoots · 2019-08-26T16:00:58Z

Some notes on resource consumption of zoekt-webserver in this case:

The "before" numbers were running v3.5.2 with zoekt-webserver having 8 CPU and 80G memory.
The "after" numbers I thought were zoekt-webserver with 16 CPU, 100G memory (that is what I intended to run with), but it appears I actually did this before bumping the resources based on graphs below so it could have been memory constrained during this test. I'll retest the regressed query benchmarks later with the 100G memory to eliminate this as a cause.

During the testing, this is how CPU usage looked (the spike shows 100% of 7.7 CPUs consumed):

And how memory usage looked (GiB):

Residential memory was only about half that (indicating maybe we can further tune GOGC or something):

No swapping was occurring, though:

And network bytes in/out in mbps:

emidoots · 2019-08-26T16:03:56Z

Note: I suspect the easiest way to make forward progress on this will be to run vegeta against k8s.sgdev.org or a local dev server and find a similarly poor-performing query.

keegancsmith · 2019-08-27T07:55:17Z

This is surprising. My best guess is we are using more ram due to storing symbol information offsets in memory now (previously we never ran ctags). This leaves less RAM for the FS cache => more often paging in memory when reading the posting lists.

But that is just a guess. Best thing to do is run a profile with a regressed pod. What exactly is the CPU and RAM limits/requests btw? Any chance we were indexing while these tests were running (in either case).

Another thing is you don't specify index:only in the symbol search, which means we may be falling back to the old symbol search in the new version. However, the more important regression is how this has affected normal indexed text search.

An interesting thing to first rule out is to run the new zoekt-webserver, but only on v15 indexes. And see if we still have a perf regression. This may indicate some other surprising change that happened, rather than it being due to symbols.

@kzh can you comment on this and create a plan of action. We can sync sometime this week and do some investigation together if you like. I see you are on this from some chat on slack, but some visibility for others would be great (so please comment here :) )

I suggest we have a disk snapshot of v15 and v16 indexes so we can test both relatively easily. Alternatively we can temporarily run indexed-search twice, and just update the svc/indexed-server to change which endpoint it points to.

emidoots · 2019-08-27T18:37:21Z

This leaves less RAM for the FS cache => more often paging in memory when reading the posting lists.

Not saying this isn't a possible explanation, but the graph above shows there is no swapping taking place (at least according to cadviser / docker).

What exactly is the CPU and RAM limits/requests btw?

Since this is a deploy-sourcegraph-docker deployment, they are both the same (just limits). zoekt-webserver

Before (v3.5.2): 8 CPUs, 80 GB memory
After (v3.7.1): 16 CPUs, 100 GB memory

Given the CPU bump, it's especially surprising I am seeing these results.

Another thing is you don't specify index:only in the symbol search, which means we may be falling back to the old symbol search in the new version. However, the more important regression is how this has affected normal indexed text search.

I am not specifying index:only BUT I am 100% confident that all repositories on the instance were both cloned and fully indexed, with no updates or reindexing occurring during, when I ran the tests.

An interesting thing to first rule out is to run the new zoekt-webserver, but only on v15 indexes.

If I understand correctly, this isn't possible without a code change because the new zoekt-webserver deletes the v15 indexes and generates the v16 ones, is that right? I assume we just need to add an env var or create a debug image with that disabled?

This update may also interest you:

@kevin interesting (albeit probably not surprising) update here, with the "search.index.symbols.enabled": false, flag off, after reindexing occurs and I run another load test, memory usage reduces down from the previous 80 GB -> 40 GB and RSS/residential memory from 53 GB -> 27 GB

Also note that:

There weren't any restarts of containers / OOMs so far as I can tell, and we're not running out of memory.
These benchmarks were from v3.5.2 -> v3.7.1. I'll eliminate that explanation next by comparing v3.6.2 -> v3.7.x.

kzh · 2019-08-28T02:09:26Z

@keegancsmith

My best guess is we are using more ram due to storing symbol information offsets in memory now (previously we never ran ctags). This leaves less RAM for the FS cache => more often paging in memory when reading the posting lists.

Initially, I thought this was the case too. However, we did another load test after Stephen turned off the feature flag and waited for all the repositories to reindex. This would mean the bulk of what was added (the ctags information) to the index file would no longer be present. There were similar results showing performance regression for text search. I'm not sure how long it would take for zoekt-webserver to reload all the shards into mmap'ed memory.

@kzh can you comment on this and create a plan of action.

Next steps will be running profiling via pprof while the vegeta load tests are happening. This will probably be done for the k8s instance.

$ kubectl port-forward svc/indexed-search 6070
# (unsure of what duration, will experiment, maybe for the duration of the tests 17m?)
$ go tool pprof -http 8080 localhost:6070/debug/pprof/profile?seconds=1020
< run load tests>
$ go tool pprof -http 8081 localhost:6070/debug/pprof/heap

From this, we should be able to figure out what is hanging during the high QPS (100) tests.

I suggest we have a disk snapshot of v15 and v16 indexes so we can test both relatively easily. Alternatively we can temporarily run indexed-search twice, and just update the svc/indexed-server to change which endpoint it points to.

I'll try this out. However, the v16 indexes with ctags disabled would be very similar in storage/memory/cpu to the v15 indexes anyways.

keegancsmith · 2019-08-28T05:54:46Z

Plan sounds good. I'd caution against making the vegeta concurrency too high, since profiles become harder to read. Use the lowest QPS rate with a regression. cc @sourcegraph/core-services

tsenart · 2019-08-28T07:03:53Z

report was generated using kingkai

Oh hello there, new tool, I like it!

emidoots · 2019-08-28T21:59:30Z

I'd caution against making the vegeta concurrency too high, since profiles become harder to read. Use the lowest QPS rate with a regression.

Definitely agree, I plan to redo these load tests with this actually to eliminate that concern since our behavior can be quite erratic with even 1 QPS on very slow / heavy searches.

I found @tsenart added support for sub-1 QPS rates in tsenart/vegeta#423 last night which I've found quite useful -- i.e.:

vegeta attack [...] -rate=0 -max-workers=1 [...]

maybe I'll send a PR to note that in vegeta's README.

tsenart · 2019-08-29T08:11:50Z

I found @tsenart added support for sub-1 QPS rates in tsenart/vegeta#423 last night which I've found quite useful -- i.e.:

It's not sub-1 QPS. It simply sends requests as fast as possible, when -rate=0. -max-workers allows you to define a maximum concurrency level.

emidoots · 2019-08-29T08:43:17Z

yeah, that was poor wording on my part -- thanks for the clarification. I said sub-1 QPS because I was thinking specifically about -rate=0 -max-workers=1 with requests of consistent duration >1s, in which case it is sub-1 QPS -- but I see now that was a roundabout / poor way to explain it.

kzh · 2019-08-30T04:30:53Z

Update: I'm currently working on a bunch of memory optimizations for Zoekt symbol search. This regression is probably due to what Keegan mentioned about increased paging since there are now more contenders for memory (being the symbols data). This regression can probably be easily fixed by increasing memory, but before falling back to that option, I want to see if the memory optimizations will do the trick or at least significantly lessen the necessary memory increase.

Some random observations that I found interesting:

The very first search after zoekt-webserver is started (after kubectl delete pod <indexed-search pod>) will always be slow (ex: 10s). I think this is due to all the indexes initially being loaded via mmap which acts somewhat like a cache (initially empty) in physical memory that does lazy fetching (paging) from disk. Afterwards, repeating that search will be a lot faster (ex: ~0.5s) since all the relevant data has already been fetched from disk. However, if in the future, there is a greater demand for memory, the current memory in the cache will be invalidated to make room. Most likely, this is what's happening with the high memory demand tests, a barrage of paging due to insufficient memory. Also, this would mean one test query can affect another.

emidoots · 2019-09-05T09:49:03Z

Update: Kevin and I spent a while re-running some independent benchmarks, it appears as though the cases where performance looked worse was entirely due to factors I hadn't accounted for in my benchmarks. After accounting for the too-high QPS, avoiding running the requests in parallel, and a few other things, it does appear perf is generally the same.

I'll post a further update here tomorrow with numbers from an overnight load test.

emidoots · 2019-09-11T08:33:04Z

Took me longer than I thought to make sense of the numbers, but I am finally confident enough in them to close this out. Definitive isn't so definitive, and load testing is H.A.R.D..

Final load test comparison between v3.6.2 and v3.7.1: https://docs.google.com/spreadsheets/d/1oPzePjD8YLrnppLm3nk46h48_Cxipz4_QqRMBYaIOYQ/edit?usp=sharing

Quoting my summary in there:

Large symbol searches roughly 20x faster, performance and throughput on-par with indexed text search.

Some 0-2s symbol searches have worse throughput, but better non-truncated results

Text search performance roughly the same, marginally worse in some cases

Thank you @kzh and @keegancsmith for you diligence here in helping me look into this (and sorry this turned out to be a waste of your time!) -- I've learned a lot here so I can do this better in the future and have some ideas to make our search more load-testable seamlessly I'll propose later.

emidoots mentioned this issue Aug 26, 2019

symbol search slow at scale #3534

Closed

emidoots added search regression Regressions in Sourcegraph labels Aug 26, 2019

emidoots added this to the 3.7.2 patch release milestone Sep 4, 2019

emidoots assigned emidoots and kzh Sep 5, 2019

emidoots closed this as completed Sep 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexed symbol search: definitive performance regression #5364

Indexed symbol search: definitive performance regression #5364

emidoots commented Aug 26, 2019 •

edited

Loading

emidoots commented Aug 26, 2019

emidoots commented Aug 26, 2019

keegancsmith commented Aug 27, 2019 •

edited

Loading

emidoots commented Aug 27, 2019

kzh commented Aug 28, 2019 •

edited

Loading

keegancsmith commented Aug 28, 2019

tsenart commented Aug 28, 2019

emidoots commented Aug 28, 2019

tsenart commented Aug 29, 2019

emidoots commented Aug 29, 2019

kzh commented Aug 30, 2019 •

edited

Loading

emidoots commented Sep 5, 2019

emidoots commented Sep 11, 2019

Indexed symbol search: definitive performance regression #5364

Indexed symbol search: definitive performance regression #5364

Comments

emidoots commented Aug 26, 2019 • edited Loading

Workaround

Problem

emidoots commented Aug 26, 2019

emidoots commented Aug 26, 2019

keegancsmith commented Aug 27, 2019 • edited Loading

emidoots commented Aug 27, 2019

kzh commented Aug 28, 2019 • edited Loading

keegancsmith commented Aug 28, 2019

tsenart commented Aug 28, 2019

emidoots commented Aug 28, 2019

tsenart commented Aug 29, 2019

emidoots commented Aug 29, 2019

kzh commented Aug 30, 2019 • edited Loading

emidoots commented Sep 5, 2019

emidoots commented Sep 11, 2019

emidoots commented Aug 26, 2019 •

edited

Loading

keegancsmith commented Aug 27, 2019 •

edited

Loading

kzh commented Aug 28, 2019 •

edited

Loading

kzh commented Aug 30, 2019 •

edited

Loading