-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substantial workload performance degradation #7189
Comments
Intra-VPC single-thread iperf3 throughput and number of retransmits are still on par with the previous build on rack2. Another disk I/O SQL server workload (with sysbench as load generator co-located with the database in the same VM) also hasn't shown perf degradation. I also checked the TCP session queues on the load generator and MongoDB primary. The loadgen send queue length for each of the threads stays between 0-1400 requests and those numbers haven't increased whereas the DB primary has a very small queue length:
One thing of interest is that the READ and UPDATE latency/IOPS of the MongoDB workload are still on par with the previous runs. INSERT is the only type of transaction that has degraded. I think crucible is mostly cleared as the source of issue so I'm moving this issue into omicron. |
Also dumping the otpe stats here although I haven't observed anything out of the ordinary.
|
After trying out different commits on a racklet, a significant performance change shows up between omicron commits On
On
|
Here's some notes from testing on London, which was seeing 40-50 TPS. In short, we don't see anything suspicious in the MongoDB VMs or Crucible; it just seems like they're being asked to do IO slowly. On London, all 3x MongoDB images happen to be on the same sled:
This is convenient, because it gives us a single place to look. First of all, not much IO is happening; here's one second:
Looking at flush performance from the Upstairs' perspective: crucible_upstairs*:::gw-flush-start
{
start[arg0, pid] = timestamp;
}
crucible_upstairs*:::up-to-ds-flush-start
/start[arg0, pid]/
{
@[probename, pid] = quantize(timestamp - start[arg0, pid]);
start[arg0, pid] = 0;
substart[arg0, pid] = timestamp;
}
crucible_upstairs*:::gw-flush-done
/substart[arg0, pid]/
{
@[probename, pid] = quantize(timestamp - substart[arg0, pid]);
substart[arg0, pid] = 0;
}
tick-1s {
exit(0)
}
Using
(I also modified the script to get flushes, and saw similarly reasonable values) Looking at the Propolis values is important, because the Crucible I also duct-taped together timing between claiming permits and submitting jobs; there's nothing concerning here (10s of ns):
All of this suggests that the MongoDB VMs aren't see any IO performance degradation; they're just being asked to do Not Very Many IOs. Further observations from @leftwo:
|
One more observation: if the loadgen host is sending 6 Mb/s, that's 786 KiB/s. 45 TPS × 4 KiB per transaction (from the NVME probes) × 3 replicas = 528 KiB/s, so this is the correct order of magnitude for the performance that we're seeing. (This doesn't say anything about why the loadgen VM is only sending 6 Mb/s, e.g. it could either be slow itself, or slowing down due to backpressure from the MongoDB VMs). |
Tracing individual IOs on all 3x MongoDB VMs (using the Propolis probes), I see roughly 13ms pauses between each IO operation: This is import pylab as plt
import numpy as np
from collections import defaultdict
writes = defaultdict(lambda: [])
flushes = defaultdict(lambda: [])
for line in open('timing_mk').read().split('\n'):
if not line: continue
(mode, pid, ts) = line.split()
pid = int(pid)
ts = int(ts)
if mode == 'W':
writes[pid].append(ts)
elif mode == 'F':
flushes[pid].append(ts)
start = min([min(writes[k]) for k in writes] + [min(flushes[k]) for k in flushes])
ax = plt.subplot(211)
for k in writes:
v = (np.array(writes[k]) - start) / 1e9
v = v[v < 10.0]
plt.hist(v, bins=10000)
plt.ylabel('write count (1 ms bins)')
plt.subplot(212, sharex=ax)
for k in flushes:
v = (np.array(flushes[k]) - start) / 1e9
v = v[v < 10.0]
plt.hist(v, bins=10000)
plt.ylabel('flush count (1 ms bins)')
plt.xlabel('time (secs)') |
Looking at iperf3 traffic between the loadgen and the primary host and things look as expected:
|
Using This is true on the primary and both secondaries.
This is pointing more and more towards the load generator just... not generating load. There are flamegraphs in |
We had the following test setups: dogfood (new, slow) is
london (old, fast) is
Both london and dogfood are on host OS commit 49d6894 In an effort to rule in/out any propolis changes, we build the following tuf repo:
This build was installed on London, and the same instances were started up and the same test ran, and we saw the slower performance. Here is a mongostat output showing slow behavior:
The previous (fast) test on London gave us this mongostat output:
DTrace output on the primary node also showed a lower rate of IO. The only difference in Propolis builds is just this one commit:
So, now suspicions turn to how that change could have impacted the mongodb performance test. |
Further testing using mdb black magic by @pfmooney , we were able to confirm it was the viona parameters change that had slowed things donw. We were able to get the performance number to look better on the london cluster by updating the viona params as follows:
This gave us the "good" numbers on the london setup.
The dtrace output also reflected more IOs flowing through the system (from the primary):
And iostat inside the primary instance showed improvement as well:
A workaround to "turn back on copying" should land with this propolis PR: |
#7206 should put performance back where it was before. |
Propolis: Switch viona back to packet copying for now #823 This is a workaround for #7189 I turns off new work that we think is causing the slow performance until we can get a better idea of what exactly the problem is and if/how we might want to fix it. Co-authored-by: Alan Hanson <[email protected]>
Confirmed that workload performance is back to the original level after #7206. |
A certain workload I've been using for release-to-release performance comparison shows major degradation. The workload comprises a load generator running YCSB and a MongoDB clusters with 3 nodes. They are located on 4 different sleds and the traffic among them is confined to the VPC they are on:
These were the typical rates of INSERT previously, on omicron commit
e7d32ae2375b0231193f1dc84271f900915b2d6b
(the workload took no more than 3 mins to complete):The same workload on omicron commit
41d7c9b0c110e6d3690bf96bb969b74f8c385bf6
runs more than 40 times slower (it's been running for two hours and still hasn't completed):I ran a fio regression test and the disk I/O numbers are roughly the same between the two commits. I'll check the VPC network throughput next to see if there is any change .
The text was updated successfully, but these errors were encountered: