Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substantial workload performance degradation #7189

Closed
askfongjojo opened this issue Nov 28, 2024 · 12 comments
Closed

Substantial workload performance degradation #7189

askfongjojo opened this issue Nov 28, 2024 · 12 comments
Assignees
Milestone

Comments

@askfongjojo
Copy link

askfongjojo commented Nov 28, 2024

A certain workload I've been using for release-to-release performance comparison shows major degradation. The workload comprises a load generator running YCSB and a MongoDB clusters with 3 nodes. They are located on 4 different sleds and the traffic among them is confined to the VPC they are on:

root@oxz_switch0:~# omdb db instance list | egrep 'mongo|loadgen'
6b252c74-548c-4f1b-a98a-6c0e2da02c09 running 26ff55b2-cd2d-4772-84b5-35738b014bfb 5f6720b8-8a31-45f8-8c94-8e699218f28b BRM42220017 mongodb-xfs-bs4096-primary          
9e2451c0-6eec-4b90-b49a-6a9a2e02f35f running 38206026-242c-48a0-9fd3-c570d75785e6 a2adea92-b56e-44fc-8a0d-7d63b5fd3b93 BRM42220031 loadgen                             
e97e9fb9-62c3-4745-9d91-b0b6fa2baeba running 0bd90ecf-b1dc-47f1-a527-728aaf560992 7b862eb6-7f50-4c2f-b9a6-0d12ac913d3c BRM44220010 mongodb-xfs-bs4096-secondary2       
fa658925-3e4d-4e76-b839-53a62beb4f5d running 30cb91f3-6b2c-4afb-86b2-70cb24608c75 f15774c1-b8e5-434f-a493-ec43f96cba06 BRM44220005 mongodb-xfs-bs4096-secondary1

These were the typical rates of INSERT previously, on omicron commit e7d32ae2375b0231193f1dc84271f900915b2d6b (the workload took no more than 3 mins to complete):

2024-11-25 22:54:19:811 40 sec: 168557 operations; 4429.2 current ops/sec; est completion in 3 minutes [INSERT: Count=44290, Max=114751, Min=281, Avg=449.16, 90=547, 99=753, 99.9=2283, 99.99=18431] 
2024-11-25 22:54:29:811 50 sec: 210618 operations; 4206.1 current ops/sec; est completion in 3 minutes [INSERT: Count=42062, Max=280063, Min=269, Avg=473.32, 90=580, 99=821, 99.9=1341, 99.99=8823] 
2024-11-25 22:54:39:811 60 sec: 255041 operations; 4442.3 current ops/sec; est completion in 2 minutes [INSERT: Count=44422, Max=11159, Min=266, Avg=448.4, 90=555, 99=750, 99.9=1313, 99.99=4255] 
2024-11-25 22:54:49:811 70 sec: 297718 operations; 4267.7 current ops/sec; est completion in 2 minutes [INSERT: Count=42676, Max=143487, Min=268, Avg=466.57, 90=569, 99=794, 99.9=2255, 99.99=50847] 
2024-11-25 22:54:59:811 80 sec: 340315 operations; 4259.7 current ops/sec; est completion in 2 minutes [INSERT: Count=42598, Max=12839, Min=280, Avg=467.39, 90=575, 99=759, 99.9=1898, 99.99=5727] 
2024-11-25 22:55:09:811 90 sec: 385175 operations; 4486 current ops/sec; est completion in 2 minutes [INSERT: Count=44862, Max=15335, Min=268, Avg=443.99, 90=553, 99=740, 99.9=2113, 99.99=10047] 
2024-11-25 22:55:19:811 100 sec: 430592 operations; 4541.7 current ops/sec; est completion in 2 minutes [INSERT: Count=45416, Max=11783, Min=260, Avg=438.5, 90=546, 99=743, 99.9=1599, 99.99=10327] 
2024-11-25 22:55:29:811 110 sec: 473296 operations; 4270.4 current ops/sec; est completion in 2 minutes [INSERT: Count=42703, Max=106175, Min=272, Avg=466.14, 90=577, 99=790, 99.9=1572, 99.99=10311] 
2024-11-25 22:55:39:811 120 sec: 517959 operations; 4466.3 current ops/sec; est completion in 1 minute [INSERT: Count=44663, Max=238207, Min=275, Avg=445.7, 90=538, 99=731, 99.9=1776, 99.99=11447] 
2024-11-25 22:55:49:811 130 sec: 561913 operations; 4395.4 current ops/sec; est completion in 1 minute [INSERT: Count=43954, Max=113727, Min=264, Avg=453.1, 90=552, 99=731, 99.9=1430, 99.99=19023] 
2024-11-25 22:5

The same workload on omicron commit 41d7c9b0c110e6d3690bf96bb969b74f8c385bf6 runs more than 40 times slower (it's been running for two hours and still hasn't completed):

2024-11-28 03:08:55:912 100 sec: 6182 operations; 53.6 current ops/sec; est completion in 4 hours 27 minutes [INSERT: Count=536, Max=50207, Min=461, Avg=37272.15, 90=40159, 99=40415, 99.9=41119, 99.99=50207] 
2024-11-28 03:09:05:912 110 sec: 7202 operations; 102 current ops/sec; est completion in 4 hours 12 minutes [INSERT: Count=1020, Max=40607, Min=441, Avg=19651.31, 90=40063, 99=40255, 99.9=40447, 99.99=40607] 
2024-11-28 03:09:15:912 120 sec: 9066 operations; 186.4 current ops/sec; est completion in 3 hours 38 minutes [INSERT: Count=1864, Max=40383, Min=366, Avg=10713.42, 90=39903, 99=40159, 99.9=40319, 99.99=40383] 
2024-11-28 03:09:25:912 130 sec: 11088 operations; 202.2 current ops/sec; est completion in 3 hours 13 minutes [INSERT: Count=2022, Max=40383, Min=385, Avg=9881, 90=39839, 99=40159, 99.9=40319, 99.99=40383] 
2024-11-28 03:09:35:912 140 sec: 13059 operations; 197.1 current ops/sec; est completion in 2 hours 56 minutes [INSERT: Count=1971, Max=40255, Min=372, Avg=10131.55, 90=39871, 99=40191, 99.9=40255, 99.99=40255] 
2024-11-28 03:09:45:912 150 sec: 15081 operations; 202.2 current ops/sec; est completion in 2 hours 43 minutes [INSERT: Count=2022, Max=43967, Min=383, Avg=9880.58, 90=39871, 99=40159, 99.9=40351, 99.99=43967] 
2024-11-28 03:09:55:912 160 sec: 17069 operations; 198.8 current ops/sec; est completion in 2 hours 33 minutes [INSERT: Count=1988, Max=43551, Min=368, Avg=10050.21, 90=39839, 99=40191, 99.9=40447, 99.99=43551] 
2024-11-28 03:10:05:912 170 sec: 19017 operations; 194.8 current ops/sec; est completion in 2 hours 26 minutes [INSERT: Count=1948, Max=44767, Min=395, Avg=10253.35, 90=39871, 99=40191, 99.9=40511, 99.99=44767] 
2024-11-28 03:10:15:912 180 sec: 21036 operations; 201.9 current ops/sec; est completion in 2 hours 19 minutes [INSERT: Count=2020, Max=40383, Min=370, Avg=9909.67, 90=39839, 99=40159, 99.9=40351, 99.99=40383] 
2024-11-28 03:10:25:913 190 sec: 23076 operations; 204 current ops/sec; est completion in 2 hours 14 minutes [INSERT: Count=2039, Max=43007, Min=382, Avg=9797.07, 90=39871, 99=40159, 99.9=40287, 99.99=43007] 
2024-11-28 03:10:35:912 200 sec: 25061 operations; 198.5 current ops/sec; est completion in 2 hours 9 minutes [INSERT: Count=1985, Max=40287, Min=358, Avg=10064.11, 90=39871, 99=40191, 99.9=40287, 99.99=40287] 
2024-11-28 03:10:45:912 210 sec: 27041 operations; 198 current ops/sec; est completion in 2 hours 5 minutes [INSERT: Count=1980, Max=40575, Min=391, Avg=10088.1, 90=39871, 99=40159, 99.9=40319, 99.99=40575] 
2024-11-28 03:10:55:913 220 sec: 29075 operations; 203.4 current ops/sec; est completion in 2 hours 2 minutes [INSERT: Count=2035, Max=41311, Min=379, Avg=9818.11, 90=39839, 99=40159, 99.9=40415, 99.99=41311] 

I ran a fio regression test and the disk I/O numbers are roughly the same between the two commits. I'll check the VPC network throughput next to see if there is any change .

@askfongjojo askfongjojo added this to the 12 milestone Nov 28, 2024
@askfongjojo
Copy link
Author

Intra-VPC single-thread iperf3 throughput and number of retransmits are still on par with the previous build on rack2. Another disk I/O SQL server workload (with sysbench as load generator co-located with the database in the same VM) also hasn't shown perf degradation.

I also checked the TCP session queues on the load generator and MongoDB primary. The loadgen send queue length for each of the threads stays between 0-1400 requests and those numbers haven't increased whereas the DB primary has a very small queue length:

ubuntu@loadgen:/opt/local/ycsb/ycsb-mongodb-binding-0.17.0$ netstat -an | egrep 'Address|ESTABLISH'
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0     36 172.30.0.6:22           172.20.17.42:51281      ESTABLISHED
tcp6       0   1393 172.30.0.6:51728        172.30.0.7:27017        ESTABLISHED
tcp6       0      0 172.30.0.6:51712        172.30.0.7:27017        ESTABLISHED
tcp6       0   1393 172.30.0.6:51724        172.30.0.7:27017        ESTABLISHED

ubuntu@primary:~$ netstat -an | egrep 'Address|ESTABLISH'
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 127.0.1.1:27017         127.0.0.1:54468         ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.6:42206        ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.51:59938       ESTABLISHED
tcp        0      0 172.30.0.7:43630        172.30.0.50:27017       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.50:50604       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.50:50590       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.51:59916       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.50:50566       ESTABLISHED
tcp        0      0 172.30.0.7:44978        172.30.0.51:27017       ESTABLISHED
tcp        0      0 172.30.0.7:36776        172.30.0.50:27017       ESTABLISHED
tcp        0      0 127.0.1.1:27017         127.0.0.1:54452         ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.51:59922       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.50:50558       ESTABLISHED
tcp        0      0 127.0.0.1:54468         127.0.1.1:27017         ESTABLISHED
tcp        0      0 127.0.0.1:40132         127.0.1.1:27017         ESTABLISHED
tcp        0     36 172.30.0.7:22           172.20.17.42:52164      ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.51:59910       ESTABLISHED
tcp        0      0 127.0.0.1:54452         127.0.1.1:27017         ESTABLISHED
tcp        0      0 172.30.0.7:44992        172.30.0.51:27017       ESTABLISHED
tcp        0      0 172.30.0.7:43638        172.30.0.50:27017       ESTABLISHED
tcp        0      0 172.30.0.7:43618        172.30.0.50:27017       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.6:42204        ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.6:42216        ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.50:50620       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.51:50834       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.50:50582       ESTABLISHED
tcp        0      0 172.30.0.7:45008        172.30.0.51:27017       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.50:50586       ESTABLISHED
tcp        0      0 172.30.0.7:27017        172.30.0.51:59896       ESTABLISHED
tcp        0      0 172.30.0.7:45022        172.30.0.51:27017       ESTABLISHED
tcp        0      0 127.0.1.1:27017         127.0.0.1:40132         ESTABLISHED

One thing of interest is that the READ and UPDATE latency/IOPS of the MongoDB workload are still on par with the previous runs. INSERT is the only type of transaction that has degraded. I think crucible is mostly cleared as the source of issue so I'm moving this issue into omicron.

@askfongjojo askfongjojo transferred this issue from oxidecomputer/crucible Nov 28, 2024
@askfongjojo
Copy link
Author

Also dumping the otpe stats here although I haven't observed anything out of the ordinary.

BRM42220031 # opteadm dump-layer -p opte7 nat
Port opte7 - Layer nat
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Outbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Inbound Rules
----------------------------------------------------------------------
ID   PRI  HITS  PREDICATES                 ACTION
6    10   13    inner.ip.dst=172.20.26.17  "Stateful: 172.30.0.6 <=> (external)"
DEF  --   414   --                         "allow"

Outbound Rules
----------------------------------------------------------------------
ID   PRI  HITS  PREDICATES                                                   ACTION
18   10   34    inner.ether.ether_type=IPv4                                  "Stateful: 172.30.0.6 <=> 172.20.26.17"
                meta: router-target=ig=a4361b0b-b461-4674-a9ea-80296755f302  
                                                                             
19   100  0     inner.ether.ether_type=IPv4                                  "Stateful: 172.20.26.13:16384-32767"
                meta: router-target=ig=a4361b0b-b461-4674-a9ea-80296755f302  
                                                                             
20   255  0     meta: router-target-class=ig                                 "Deny"
DEF  --   477   --                                                           "allow"

BRM42220031 # opteadm dump-layer -p opte7 router
Port opte7 - Layer router
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Outbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Inbound Rules
----------------------------------------------------------------------
ID   PRI  HITS  PREDICATES  ACTION
DEF  --   534   --          "allow"

Outbound Rules
----------------------------------------------------------------------
ID   PRI  HITS  PREDICATES                              ACTION
4    27   0     inner.ip.dst=192.168.96.0/24            "Meta: Target = Subnet: 192.168.96.0/24"
2    27   0     inner.ip.dst=192.168.64.0/24            "Meta: Target = Subnet: 192.168.64.0/24"
0    27   0     inner.ip.dst=192.168.32.0/24            "Meta: Target = Subnet: 192.168.32.0/24"
3    31   477   inner.ip.dst=172.30.0.0/22              "Meta: Target = Subnet: 172.30.0.0/22"
5    75   120   inner.ip.dst=0.0.0.0/0                  "Meta: Target = IG(Some(a4361b0b-b461-4674-a9ea-80296755f302))"
9    139  0     inner.ip6.dst=fddb:bb4e:6c24:62cc::/64  "Meta: Target = Subnet: fddb:bb4e:6c24:62cc::/64"
7    139  0     inner.ip6.dst=fddb:bb4e:6c24:7631::/64  "Meta: Target = Subnet: fddb:bb4e:6c24:7631::/64"
6    139  0     inner.ip6.dst=fddb:bb4e:6c24:168c::/64  "Meta: Target = Subnet: fddb:bb4e:6c24:168c::/64"
1    139  0     inner.ip6.dst=fddb:bb4e:6c24::/64       "Meta: Target = Subnet: fddb:bb4e:6c24::/64"
8    267  0     inner.ip6.dst=::/0                      "Meta: Target = IG(Some(a4361b0b-b461-4674-a9ea-80296755f302))"
DEF  --   0     --                                      "deny"
BRM42220017 #  opteadm dump-layer -p opte8 nat
Port opte8 - Layer nat
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Outbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Inbound Rules
----------------------------------------------------------------------
ID   PRI  HITS   PREDICATES                  ACTION
13   10   1      inner.ip.dst=172.20.26.195  "Stateful: 172.30.0.7 <=> (external)"
DEF  --   12221  --                          "allow"

Outbound Rules
----------------------------------------------------------------------
ID   PRI  HITS   PREDICATES                                                   ACTION
39   10   10     inner.ether.ether_type=IPv4                                  "Stateful: 172.30.0.7 <=> 172.20.26.195"
                 meta: router-target=ig=a4361b0b-b461-4674-a9ea-80296755f302  
                                                                              
40   100  0      inner.ether.ether_type=IPv4                                  "Stateful: 172.20.26.192:32768-49151"
                 meta: router-target=ig=a4361b0b-b461-4674-a9ea-80296755f302  
                                                                              
41   255  0      meta: router-target-class=ig                                 "Deny"
DEF  --   12237  --                                                           "allow"

BRM42220017 #  opteadm dump-layer -p opte8 router 
Port opte8 - Layer router
======================================================================
Inbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Outbound Flows
----------------------------------------------------------------------
PROTO  SRC IP  SPORT  DST IP  DPORT  HITS  ACTION

Inbound Rules
----------------------------------------------------------------------
ID   PRI  HITS   PREDICATES  ACTION
DEF  --   12362  --          "allow"

Outbound Rules
----------------------------------------------------------------------
ID   PRI  HITS   PREDICATES                              ACTION
8    27   0      inner.ip.dst=192.168.96.0/24            "Meta: Target = Subnet: 192.168.96.0/24"
4    27   0      inner.ip.dst=192.168.64.0/24            "Meta: Target = Subnet: 192.168.64.0/24"
2    27   0      inner.ip.dst=192.168.32.0/24            "Meta: Target = Subnet: 192.168.32.0/24"
0    31   12257  inner.ip.dst=172.30.0.0/22              "Meta: Target = Subnet: 172.30.0.0/22"
3    75   122    inner.ip.dst=0.0.0.0/0                  "Meta: Target = IG(Some(a4361b0b-b461-4674-a9ea-80296755f302))"
7    139  0      inner.ip6.dst=fddb:bb4e:6c24:62cc::/64  "Meta: Target = Subnet: fddb:bb4e:6c24:62cc::/64"
6    139  0      inner.ip6.dst=fddb:bb4e:6c24:168c::/64  "Meta: Target = Subnet: fddb:bb4e:6c24:168c::/64"
5    139  0      inner.ip6.dst=fddb:bb4e:6c24::/64       "Meta: Target = Subnet: fddb:bb4e:6c24::/64"
1    139  0      inner.ip6.dst=fddb:bb4e:6c24:7631::/64  "Meta: Target = Subnet: fddb:bb4e:6c24:7631::/64"
9    267  0      inner.ip6.dst=::/0                      "Meta: Target = IG(Some(a4361b0b-b461-4674-a9ea-80296755f302))"
DEF  --   0      --                                      "deny"

@askfongjojo
Copy link
Author

After trying out different commits on a racklet, a significant performance change shows up between omicron commits 2a9f1641e68410d3406d950eee9864df971a5afc and 6a773fb0608b446c5a1b35d8293e86ff461f5601, though not as substantial as the delta seen on rack2

On 6a773fb0608b446c5a1b35d8293e86ff461f5601 (crucible commit 2cfc7e0c8572b3bfafbfc838c4e6d658f442d239)

ubuntu@loadgen:/opt/local/ycsb/ycsb-mongodb-binding-0.17.0$ tail -f logs/workloada-threads_2-load.err
2024-12-02 20:05:14:693 420 sec: 47419 operations; 150.1 current ops/sec; est completion in 7 minutes [INSERT: Count=1501, Max=41919, Min=2234, Avg=13310.47, 90=39839, 99=40895, 99.9=41311, 99.99=41919] 
2024-12-02 20:05:24:693 430 sec: 48753 operations; 133.4 current ops/sec; est completion in 7 minutes [INSERT: Count=1334, Max=235391, Min=2262, Avg=14990.18, 90=39871, 99=40991, 99.9=210943, 99.99=235391] 
2024-12-02 20:05:34:693 440 sec: 50266 operations; 151.3 current ops/sec; est completion in 7 minutes [INSERT: Count=1514, Max=51423, Min=2244, Avg=13228.85, 90=39839, 99=40863, 99.9=41855, 99.99=51423] 
2024-12-02 20:05:44:693 450 sec: 51792 operations; 152.6 current ops/sec; est completion in 6 minutes [INSERT: Count=1525, Max=42207, Min=2264, Avg=13107.17, 90=39743, 99=41023, 99.9=41759, 99.99=42207] 
2024-12-02 20:05:54:693 460 sec: 52993 operations; 120.1 current ops/sec; est completion in 6 minutes [INSERT: Count=1201, Max=218239, Min=2226, Avg=16645.58, 90=39935, 99=41151, 99.9=211199, 99.99=218239] 
2024-12-02 20:06:04:693 470 sec: 54163 operations; 117 current ops/sec; est completion in 6 minutes [INSERT: Count=1170, Max=47103, Min=2260, Avg=17077.77, 90=39967, 99=41087, 99.9=43583, 99.99=47103] 
2024-12-02 20:06:14:693 480 sec: 55530 operations; 136.7 current ops/sec; est completion in 6 minutes [INSERT: Count=1367, Max=42175, Min=2350, Avg=14618.11, 90=39935, 99=40895, 99.9=41631, 99.99=42175] 
2024-12-02 20:06:24:693 490 sec: 56933 operations; 140.3 current ops/sec; est completion in 6 minutes [INSERT: Count=1403, Max=42175, Min=2276, Avg=14248.16, 90=39871, 99=40863, 99.9=41663, 99.99=42175] 
2024-12-02 20:06:34:693 500 sec: 58228 operations; 129.5 current ops/sec; est completion in 5 minutes [INSERT: Count=1295, Max=47871, Min=2168, Avg=15433.19, 90=39935, 99=41055, 99.9=46399, 99.99=47871] 
2024-12-02 20:06:44:693 510 sec: 59673 operations; 144.5 current ops/sec; est completion in 5 minutes [INSERT: Count=1445, Max=41599, Min=2270, Avg=13834.05, 90=39903, 99=40735, 99.9=41247, 99.99=41599] 

On 2a9f1641e68410d3406d950eee9864df971a5afc (crucible commit b7b9d5660b28ca5e865242b2bdecd032c0852d40 )

2024-12-02 21:17:03:254 20 sec: 12435 operations; 658.2 current ops/sec; est completion in 2 minutes [INSERT: Count=6581, Max=8559, Min=2122, Avg=3033.1, 90=3545, 99=4299, 99.9=5699, 99.99=6923] 
2024-12-02 21:17:13:253 30 sec: 19104 operations; 666.9 current ops/sec; est completion in 2 minutes [INSERT: Count=6669, Max=13223, Min=2052, Avg=2994.79, 90=3529, 99=4291, 99.9=6883, 99.99=11759] 
2024-12-02 21:17:23:254 40 sec: 25730 operations; 662.6 current ops/sec; est completion in 1 minute [INSERT: Count=6626, Max=7719, Min=2120, Avg=3014.84, 90=3559, 99=4215, 99.9=6215, 99.99=6967] 
2024-12-02 21:17:33:254 50 sec: 32299 operations; 656.9 current ops/sec; est completion in 1 minute [INSERT: Count=6569, Max=60479, Min=2012, Avg=3040.44, 90=3573, 99=4683, 99.9=18255, 99.99=59551] 
2024-12-02 21:17:43:254 60 sec: 38820 operations; 652.1 current ops/sec; est completion in 1 minute [INSERT: Count=6521, Max=8727, Min=2120, Avg=3063.39, 90=3607, 99=4279, 99.9=6031, 99.99=8671] 
2024-12-02 21:17:53:253 70 sec: 45321 operations; 650.1 current ops/sec; est completion in 1 minute [INSERT: Count=6502, Max=35039, Min=2100, Avg=3072.49, 90=3567, 99=4291, 99.9=6603, 99.99=34975] 
2024-12-02 21:18:03:254 80 sec: 52024 operations; 670.3 current ops/sec; est completion in 1 minute [INSERT: Count=6703, Max=10663, Min=2047, Avg=2980.62, 90=3525, 99=4319, 99.9=6759, 99.99=9863] 
2024-12-02 21:18:13:253 90 sec: 58667 operations; 664.3 current ops/sec; est completion in 1 minute [INSERT: Count=6643, Max=8935, Min=2052, Avg=3007.19, 90=3523, 99=4335, 99.9=6535, 99.99=8311] 
2024-12-02 21:18:23:254 100 sec: 65181 operations; 651.4 current ops/sec; est completion in 54 seconds [INSERT: Count=6514, Max=8463, Min=1968, Avg=3067.1, 90=3631, 99=4439, 99.9=5775, 99.99=7787] 
2024-12-02 21:18:33:254 110 sec: 71373 operations; 619.2 current ops/sec; est completion in 45 seconds [INSERT: Count=6191, Max=191487, Min=1916, Avg=3227.53, 90=3689, 99=4795, 99.9=13479, 99.99=189695] 
2024-12-02 21:18:43:253 120 sec: 77994 operations; 662.1 current ops/sec; est completion in 34 seconds [INSERT: Count=6621, Max=12191, Min=2048, Avg=3018.45, 90=3541, 99=4299, 99.9=5799, 99.99=11623]

2a9f1641e68410d3406d950eee9864df971a5afc and 4 earlier commits I used for this same benchmark have average insert latency around 3000 μs.
cc @leftwo @mkeeter

@mkeeter
Copy link
Contributor

mkeeter commented Dec 3, 2024

Here's some notes from testing on London, which was seeing 40-50 TPS. In short, we don't see anything suspicious in the MongoDB VMs or Crucible; it just seems like they're being asked to do IO slowly.


On London, all 3x MongoDB images happen to be on the same sled:

root@oxz_switch1:~# omdb db instances
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using DNS server for subnet fd00:1122:3344::/48
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using database URL postgresql://root@[fd00:1122:3344:104::3]:32221,[fd00:1122:3344:103::3]:32221,[fd00:1122:3344:102::3]:32221,[fd00:1122:3344:104::4]:32221,[fd00:1122:3344:101::3]:32221/omicron?sslmode=disable
note: database schema version matches expected (116.0.0)
ID                                   STATE   PROPOLIS_ID                          SLED_ID                              HOST_SERIAL NAME
d658b9cd-a0cf-40a5-b628-2f8660016a52 running 021d58ed-2767-40c2-8b28-14c638e6f835 87bb36bd-ba53-4e7d-8451-50e477cc7970 BRM42220062 mongodb-xfs-bs4096-secondary2
73d5f03d-d4d8-46bf-ba01-aa4cb10fd138 running 453342c8-215d-4d8b-bcb7-65facd4a3ddd 87bb36bd-ba53-4e7d-8451-50e477cc7970 BRM42220062 mongodb-xfs-bs4096-primary
1cb2c72f-1771-464c-96ec-bd36015a8327 running 99e6c859-bf00-4aff-bc24-f6f5f5dc9d59 87bb36bd-ba53-4e7d-8451-50e477cc7970 BRM42220062 mongodb-xfs-bs4096-secondary1
feb144ce-8695-425c-8782-9e50989f7af0 running d37fb063-6a9d-498e-b061-2dc85db5a0b2 d28a2240-875a-4c52-a03f-6413478b1ac8 BRM42220030 loadgen

This is convenient, because it gives us a single place to look. First of all, not much IO is happening; here's one second:

BRM42220062 # dtrace -n 'crucible_upstairs**:::gw-flush-start { @[pid, "flush"] = count() } crucible_upstairs**:::gw-write-start { @[pid, "write"] = count() } tick-1s { exit(0); }'
dtrace: description 'crucible_upstairs**:::gw-flush-start ' matched 9 probes
CPU     ID                    FUNCTION:NAME
 86  67492                         :tick-1s

    23612  flush                                                            41
    22821  flush                                                            44
    22839  flush                                                            45
    23612  write                                                            74
    22821  write                                                            81
    22839  write                                                            82

Looking at flush performance from the Upstairs' perspective:

crucible_upstairs*:::gw-flush-start
{
    start[arg0, pid] = timestamp;
}

crucible_upstairs*:::up-to-ds-flush-start
/start[arg0, pid]/
{
    @[probename, pid] = quantize(timestamp - start[arg0, pid]);
    start[arg0, pid] = 0;
    substart[arg0, pid] = timestamp;
}

crucible_upstairs*:::gw-flush-done
/substart[arg0, pid]/
{
    @[probename, pid] = quantize(timestamp - substart[arg0, pid]);
    substart[arg0, pid] = 0;
}

tick-1s {
        exit(0)
}
dtrace: script 'perf-upstairs-f.d' matched 13 probes
CPU     ID                    FUNCTION:NAME
 35  67492                         :tick-1s

  up-to-ds-flush-start                                  22821
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           27
           16384 |@@@@@@@@@@                               9
           32768 |                                         0

  up-to-ds-flush-start                                  22839
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@               25
           16384 |@@@@@@@@@@@@@                            12
           32768 |@                                        1
           65536 |                                         0

  up-to-ds-flush-start                                  23612
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@                                8
           16384 |@@@@@@@@@@@@@@@@@                        15
           32768 |@@@                                      3
           65536 |@@@@@@@@@@                               9
          131072 |                                         0

  gw-flush-done                                         23612
           value  ------------- Distribution ------------- count
           65536 |                                         0
          131072 |@@@                                      3
          262144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    32
          524288 |                                         0

  gw-flush-done                                         22821
           value  ------------- Distribution ------------- count
           65536 |                                         0
          131072 |@@                                       2
          262144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   34
          524288 |                                         0

  gw-flush-done                                         22839
           value  ------------- Distribution ------------- count
          131072 |                                         0
          262144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   36
          524288 |@@                                       2
         1048576 |                                         0

timestamp is in ns, so 262144 is 0.00026 seconds; this would limit us to 3814 IOPS, not 40-ish.

Using nvme-trace.d, we see writes completing without any significant delay from Propolis's perspective

BRM42220062 # dtrace -s nvme-trace-propolis.d  -p `pgrep -nf propolis`
Tracing propolis PID 23612... Hit Ctrl-C to end.
write(cid=57344) 4096 bytes from offset 0x5057fe000 in 57us
write(cid=1) 4096 bytes from offset 0x5057ff000 in 70us
write(cid=20482) 4096 bytes from offset 0xa0060a000 in 83us
write(cid=24579) 4096 bytes from offset 0x5057ff000 in 46us
write(cid=61445) 4096 bytes from offset 0x5057ff000 in 97us
write(cid=32775) 4096 bytes from offset 0x5057ff000 in 56us
write(cid=8) 4096 bytes from offset 0x505800000 in 75us
write(cid=12297) 4096 bytes from offset 0xa0060b000 in 38us

(I also modified the script to get flushes, and saw similarly reasonable values)

Looking at the Propolis values is important, because the Crucible Guest object has to claim IO permits, which happens before the DTrace probes for gw-flush-start. If those permits were highly contended, then we'd see larger values in the Propolis numbers, which we don't see here.

I also duct-taped together timing between claiming permits and submitting jobs; there's nothing concerning here (10s of ns):

BRM42220062 # dtrace -q -n 'pid23612::*IOLimitView5claim*:entry { printf("claim at %u\n", timestamp); start[pid] = timestamp } pid23612::*submit_flush*:entry /start[pid]/ { printf("flush %u -> %u\n", timestamp - start[pid], timestamp); start[pid] = 0 } pid23612::*submit_write*:entry /start[pid]/ { printf("write %u -> %u\n", timestamp - start[pid], timestamp); start[pid] = 0 }'
claim at 11672003265470
flush 19557 -> 11672003285027
claim at 11672004303913
write 32872 -> 11672004336785
claim at 11672004489564
flush 11712 -> 11672004501276
claim at 11672018837250
claim at 11672018853801
write 10420 -> 11672018864221
claim at 11672019071723
flush 15569 -> 11672019087292
claim at 11672019505974
write 24687 -> 11672019530661
claim at 11672028978673
write 20338 -> 11672028999011
claim at 11672029268911
flush 8326 -> 11672029277237
claim at 11672059003297
write 15279 -> 11672059018576
claim at 11672059166946
flush 10740 -> 11672059177686
claim at 11672068776133

All of this suggests that the MongoDB VMs aren't see any IO performance degradation; they're just being asked to do Not Very Many IOs.


Further observations from @leftwo:

Looking at network traffic, I think I see loadgen host sending out less that 2Mb/s
Oh, no, it's more like just below 6Mb/s

The ubuntu loadgen instance itself is pretty idle. I can't say for sure, but it does not look like it's "trying" very hard.

@mkeeter
Copy link
Contributor

mkeeter commented Dec 3, 2024

One more observation: if the loadgen host is sending 6 Mb/s, that's 786 KiB/s.

45 TPS × 4 KiB per transaction (from the NVME probes) × 3 replicas = 528 KiB/s, so this is the correct order of magnitude for the performance that we're seeing.

(This doesn't say anything about why the loadgen VM is only sending 6 Mb/s, e.g. it could either be slow itself, or slowing down due to backpressure from the MongoDB VMs).

@mkeeter
Copy link
Contributor

mkeeter commented Dec 3, 2024

Tracing individual IOs on all 3x MongoDB VMs (using the Propolis probes), I see roughly 13ms pauses between each IO operation:

Image

This is dtrace -q -n 'propolis*:::nvme_flush_enqueue { printf("F %u %u\n", pid, timestamp) } propolis*:::nvme_write_enqueue { printf("W %u %u\n", pid, timestamp) } tick-10s { exit(0) }', followed by post-processing with a Python script

import pylab as plt
import numpy as np
from collections import defaultdict

writes = defaultdict(lambda: [])
flushes = defaultdict(lambda: [])
for line in open('timing_mk').read().split('\n'):
    if not line: continue
    (mode, pid, ts) = line.split()
    pid = int(pid)
    ts = int(ts)
    if mode == 'W':
        writes[pid].append(ts)
    elif mode == 'F':
        flushes[pid].append(ts)

start = min([min(writes[k]) for k in writes] + [min(flushes[k]) for k in flushes])
ax = plt.subplot(211)
for k in writes:
    v = (np.array(writes[k]) - start) / 1e9
    v = v[v < 10.0]
    plt.hist(v, bins=10000)
    plt.ylabel('write count (1 ms bins)')

plt.subplot(212, sharex=ax)
for k in flushes:
    v = (np.array(flushes[k]) - start) / 1e9
    v = v[v < 10.0]
    plt.hist(v, bins=10000)
    plt.ylabel('flush count (1 ms bins)')
    plt.xlabel('time (secs)')

@leftwo
Copy link
Contributor

leftwo commented Dec 3, 2024

Looking at iperf3 traffic between the loadgen and the primary host and things look as expected:
(This is over on dogfood now, but a similar test on london's setup looked the same)

ubuntu@primary:~$ iperf3 -c 172.30.0.6
Connecting to host 172.30.0.6, port 5201
[  5] local 172.30.0.7 port 48336 connected to 172.30.0.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   400 MBytes  3.36 Gbits/sec  134    461 KBytes       
[  5]   1.00-2.00   sec   400 MBytes  3.36 Gbits/sec  314    320 KBytes       
[  5]   2.00-3.00   sec   409 MBytes  3.43 Gbits/sec    0    543 KBytes       
[  5]   3.00-4.00   sec   398 MBytes  3.33 Gbits/sec   65    529 KBytes       
[  5]   4.00-5.00   sec   394 MBytes  3.30 Gbits/sec  125    488 KBytes       
[  5]   5.00-6.00   sec   406 MBytes  3.41 Gbits/sec    0    585 KBytes       
[  5]   6.00-7.00   sec   399 MBytes  3.35 Gbits/sec    0    632 KBytes       
[  5]   7.00-8.00   sec   395 MBytes  3.31 Gbits/sec   63    505 KBytes       
[  5]   8.00-9.00   sec   395 MBytes  3.31 Gbits/sec  120    468 KBytes       
[  5]   9.00-10.00  sec   399 MBytes  3.34 Gbits/sec   21    485 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.90 GBytes  3.35 Gbits/sec  842             sender
[  5]   0.00-10.00  sec  3.90 GBytes  3.35 Gbits/sec                  receiver

@mkeeter
Copy link
Contributor

mkeeter commented Dec 3, 2024

Using mongotop, we see that MongoDB is basically 99% idle: sampling at 1s intervals, it's only doing 10-12 ms of work.

This is true on the primary and both secondaries.

ubuntu@primary:~$ mongotop 1 --uri="mongodb://admin:[email protected]:27017/?authSource=admin"
2024-12-03T22:17:46.513+0000    WARNING: On some systems, a password provided directly in a connection string or using --uri may be visible to system status programs such as `ps` that may be invoked by other users. Consider omitting the password to provide it via stdin, or using the --config option to specify a configuration file with the password.
2024-12-03T22:17:46.521+0000    connected to: mongodb://[**REDACTED**]@127.0.0.1:27017/?authSource=admin

                             ns    total    read    write    2024-12-03T22:17:47Z
                 local.oplog.rs     10ms    10ms      0ms
                 ycsb.usertable     10ms     0ms     10ms
                 admin.atlascli      0ms     0ms      0ms
              admin.system.keys      0ms     0ms      0ms
             admin.system.users      0ms     0ms      0ms
           admin.system.version      0ms     0ms      0ms
config.external_validation_keys      0ms     0ms      0ms
          config.queryAnalyzers      0ms     0ms      0ms
                config.settings      0ms     0ms      0ms
    config.shardMergeRecipients      0ms     0ms      0ms

                             ns    total    read    write    2024-12-03T22:17:48Z
                 ycsb.usertable     12ms     0ms     12ms
                 local.oplog.rs     10ms    10ms      0ms
                 admin.atlascli      0ms     0ms      0ms
              admin.system.keys      0ms     0ms      0ms
             admin.system.users      0ms     0ms      0ms
           admin.system.version      0ms     0ms      0ms
config.external_validation_keys      0ms     0ms      0ms
          config.queryAnalyzers      0ms     0ms      0ms
                config.settings      0ms     0ms      0ms
    config.shardMergeRecipients      0ms     0ms      0ms

mongostat shows similar numbers to what we've been seeing before:

ubuntu@primary:~$ mongostat --uri="mongodb://admin:[email protected]:27017/?authSource=admin"
2024-12-03T22:13:27.907+0000    WARNING: On some systems, a password provided directly in a connection string or using --uri may be visible to system status programs such as `ps` that may be invoked by other users. Consider omitting the password to provide it via stdin, or using the --config option to specify a configuration file with the password.
insert query update delete getmore command dirty used flushes vsize  res qrw arw net_in net_out conn     set repl                time
    67    *0     *0     *0     133   271|0  1.2% 4.9%       0 1.15G 371M 0|1 0|0   408k    381k   22 mongoRs  SLV Dec  3 22:13:28.922
    70    *0     *0     *0     142   284|0  1.3% 4.9%       0 1.15G 371M 0|1 0|0   431k    400k   22 mongoRs  SLV Dec  3 22:13:29.923
    78    *0     *0     *0     157   313|0  1.3% 4.9%       0 1.15G 371M 0|0 0|0   471k    434k   22 mongoRs  SLV Dec  3 22:13:30.922
    64    *0     *0     *0     129   260|0  1.3% 4.9%       0 1.15G 371M 0|0 0|0   393k    371k   22 mongoRs  SLV Dec  3 22:13:31.922
    66    *0     *0     *0     132   267|0  1.3% 4.9%       0 1.15G 371M 0|0 0|0   402k    377k   22 mongoRs  SLV Dec  3 22:13:32.921
    65    *0     *0     *0     131   264|0  1.3% 4.9%       0 1.15G 371M 0|0 0|0   401k    376k   22 mongoRs  SLV Dec  3 22:13:33.922
    64    *0     *0     *0     129   263|0  1.3% 5.0%       0 1.15G 370M 0|0 0|0   395k    372k   22 mongoRs  SLV Dec  3 22:13:34.922
    62    *0     *0     *0     124   248|0  1.3% 5.0%       0 1.15G 371M 0|0 0|0   375k    358k   22 mongoRs  SLV Dec  3 22:13:35.922
    59    *0     *0     *0     118   241|0  1.3% 5.0%       0 1.15G 371M 0|0 0|0   359k    345k   22 mongoRs  SLV Dec  3 22:13:36.922
    62    *0     *0     *0     125   252|0  1.3% 5.0%       0 1.15G 371M 0|0 0|0   382k    363k   22 mongoRs  SLV Dec  3 22:13:37.922
insert query update delete getmore command dirty used flushes vsize  res qrw arw net_in net_out conn     set repl                time
    64    *0     *0     *0     128   258|0  0.3% 5.0%       1 1.15G 371M 0|0 0|0   388k    368k   22 mongoRs  SLV Dec  3 22:13:38.921
    64    *0     *0     *0     129   259|0  0.4% 5.0%       0 1.15G 371M 0|0 0|0   393k    370k   22 mongoRs  SLV Dec  3 22:13:39.922
    65    *0     *0     *0     129   266|0  0.4% 5.0%       0 1.15G 371M 0|0 0|0   396k    375k   22 mongoRs  SLV Dec  3 22:13:40.922
    60    *0     *0     *0     120   241|0  0.5% 5.0%       0 1.15G 371M 0|0 0|0   363k    348k   22 mongoRs  SLV Dec  3 22:13:41.922
    64    *0     *0     *0     128   259|0  0.5% 5.0%       0 1.15G 371M 0|0 0|0   390k    368k   22 mongoRs  SLV Dec  3 22:13:42.921
    61    *0     *0     *0     123   247|0  0.6% 5.0%       0 1.15G 371M 0|0 0|0   375k    357k   22 mongoRs  SLV Dec  3 22:13:43.922
    65    *0     *0     *0     130   264|0  0.6% 5.0%       0 1.15G 371M 0|0 0|0   395k    373k   22 mongoRs  SLV Dec  3 22:13:44.921
    60    *0     *0     *0     120   242|0  0.6% 5.0%       0 1.15G 371M 0|0 0|0   365k    349k   22 mongoRs  SLV Dec  3 22:13:45.921
    61    *0     *0     *0     123   252|0  0.7% 5.0%       0 1.15G 371M 0|0 0|0   377k    359k   22 mongoRs  SLV Dec  3 22:13:46.921

This is pointing more and more towards the load generator just... not generating load.

There are flamegraphs in /staff/artemis/slow-mongo, but they're not particularly informative: the primary is writing to disk, and the load generator is writing to sockets. In both cases, we're nowhere near saturating the CPU, so they're presumably limited by some other factor.

@leftwo
Copy link
Contributor

leftwo commented Dec 4, 2024

We had the following test setups:

dogfood (new, slow) is

  • Omicron is 41d7c9b
  • Propolis is 6936f1a949d15, using Crucible 2cfc7e0c85
  • Crucible is 2cfc7e0c8572b3 for the Downstairs

london (old, fast) is

  • Omicron is 2a9f164
  • Propolis is 8f8fbb74662b4e19, which uses Crucible 1d82cc9fd99
  • Crucible is also 1d82cc9fd9 for the Downstairs

Both london and dogfood are on host OS commit 49d6894

In an effort to rule in/out any propolis changes, we build the following tuf repo:
/staff/lab/london/artemis-update-2024-12-04-oldomicru-newpro
Made up of:

  • Omicron 2a9f164
  • Propolis 6936f1a949d15 using crucible 1d82cc9fd9
  • Crucible 1d82cc9fd9

This build was installed on London, and the same instances were started up and the same test ran, and we saw the slower performance.
The host OS did get a bump: SunOS BRM42220062 5.11 helios-2.0.23049 oxide i386 oxide

Here is a mongostat output showing slow behavior:

ubuntu@primary:~$ mongostat --uri="mongodb://admin:[email protected]:27017/?authSource=admin"
2024-12-04T23:18:09.894+0000	WARNING: On some systems, a password provided directly in a connection string or using --uri may be visible to system status programs such as `ps` that may be invoked by other users. Consider omitting the password to provide it via stdin, or using the --config option to specify a configuration file with the password.
insert query update delete getmore command dirty used flushes vsize  res qrw arw net_in net_out conn     set repl                time
    53    *0     *0     *0     106   109|0  0.2% 4.7%       0 1.08G 330M 0|0 0|0   201k    274k   16 mongoRs  SLV Dec  4 23:18:10.913
    51    *0     *0     *0     103   107|0  0.2% 4.8%       0 1.08G 330M 0|0 0|0   196k    270k   16 mongoRs  SLV Dec  4 23:18:11.915
    51    *0     *0     *0     102   104|0  0.2% 4.8%       0 1.08G 331M 0|0 0|0   193k    266k   16 mongoRs  SLV Dec  4 23:18:12.914
    52    *0     *0     *0     104   106|0  0.2% 4.8%       0 1.08G 330M 0|0 0|0   196k    270k   16 mongoRs  SLV Dec  4 23:18:13.913
    51    *0     *0     *0     101   103|0  0.2% 4.8%       0 1.08G 331M 0|0 0|0   194k    268k   16 mongoRs  SLV Dec  4 23:18:14.914

The previous (fast) test on London gave us this mongostat output:

ubuntu@primary:~$ mongostat --uri="mongodb://admin:[email protected]:27017/?authSource=admin"
2024-12-04T16:47:07.550+0000	WARNING: On some systems, a password provided directly in a connection string or using --uri may be visible to system status programs such as `ps` that may be invoked by other users. Consider omitting the password to provide it via stdin, or using the --config option to specify a configuration file with the password.
insert query update delete getmore command dirty  used flushes vsize   res qrw arw net_in net_out conn     set repl                time
   747    *0     *0     *0    2713  2734|0  1.1% 79.9%       0 3.70G 2.81G 2|0 0|1  4.75m   5.22m   25 mongoRs  SLV Dec  4 16:47:08.573
   747    *0     *0     *0    2777  2828|0  1.1% 80.0%       0 3.70G 2.81G 0|0 0|0  4.87m   5.31m   25 mongoRs  SLV Dec  4 16:47:09.570
   755    *0     *0     *0    2886  2923|0  1.2% 79.9%       0 3.70G 2.81G 0|0 0|1  5.01m   5.45m   25 mongoRs  SLV Dec  4 16:47:10.569
   726    *0     *0     *0    2731  2837|0  1.3% 80.0%       0 3.70G 2.81G 0|0 0|0  4.83m   5.22m   25 mongoRs  SLV Dec  4 16:47:11.569
   759    *0     *0     *0    2897  2933|0  1.3% 79.9%       0 3.70G 2.81G 1|0 0|0  5.03m   5.47m   25 mongoRs  SLV Dec  4 16:47:12.570
   735    *0     *0     *0    2698  2789|0  1.4% 79.9%       0 3.70G 2.80G 1|0 0|1  4.78m   5.20m   25 mongoRs  SLV Dec  4 16:47:13.569

DTrace output on the primary node also showed a lower rate of IO.

The only difference in Propolis builds is just this one commit:

70a94f9fc60c2d12b876ffbbb55efe37d5b408f5
Wire up viona params from illumos#16738

So, now suspicions turn to how that change could have impacted the mongodb performance test.

@leftwo
Copy link
Contributor

leftwo commented Dec 5, 2024

Further testing using mdb black magic by @pfmooney , we were able to confirm it was the viona parameters change that had slowed things donw. We were able to get the performance number to look better on the london cluster by updating the viona params as follows:

copy_data: true,

This gave us the "good" numbers on the london setup.

ubuntu@primary:~$ mongostat --uri="mongodb://admin:[email protected]:27017/?authSource=admin"
2024-12-05T00:43:38.079+0000    WARNING: On some systems, a password provided directly in a connection string or using --uri may be visible to system status programs such as `ps` that may be invoked by other users. Consider omitting the password to provide it via stdin, or using the --config option to specify a configuration file with the password.
insert query update delete getmore command dirty  used flushes vsize  res qrw arw net_in net_out conn     set repl                time
   728    *0     *0     *0    2689  2805|0  1.9% 17.3%       0 1.48G 748M 0|0 0|0  4.51m   5.19m   23 mongoRs  SLV Dec  5 00:43:39.098
   714    *0     *0     *0    2638  2756|0  1.9% 17.3%       0 1.48G 748M 1|0 0|0  4.43m   5.09m   23 mongoRs  SLV Dec  5 00:43:40.101
   749    *0     *0     *0    2908  2922|0  2.0% 17.4%       0 1.48G 747M 0|0 0|0  4.71m   5.44m   23 mongoRs  SLV Dec  5 00:43:41.099
   743    *0     *0     *0    2744  2865|0  2.1% 17.4%       0 1.48G 748M 0|0 0|0  4.60m   5.29m   23 mongoRs  SLV Dec  5 00:43:42.100
   750    *0     *0     *0    2869  2937|0  2.1% 17.5%       0 1.48G 748M 1|0 0|0  4.72m   5.42m   23 mongoRs  SLV Dec  5 00:43:43.101
   731    *0     *0     *0    2518  2632|0  1.6% 17.0%       0 1.50G 756M 0|0 0|0  4.29m   5.00m   23 mongoRs  SLV Dec  5 00:43:44.099
   778    *0     *0     *0    3021  3046|0  1.7% 17.1%       0 1.50G 751M 2|0 0|0  4.91m   5.65m   23 mongoRs  SLV Dec  5 00:43:45.099
   755    *0     *0     *0    2824  2477|0  1.8% 17.1%       0 1.50G 748M 1|0 0|0  4.26m   5.22m   23 mongoRs  SLV Dec  5 00:43:46.099
   766    *0     *0     *0    2878  2915|0  1.8% 17.2%       0 1.50G 747M 0|1 0|0  4.72m   5.46m   23 mongoRs  SLV Dec  5 00:43:47.099

The dtrace output also reflected more IOs flowing through the system (from the primary):

 SESSION        DS STATE 0        DS STATE 1        DS STATE 2   UPW   DSW NEXT_JOB DELTA   WRITE_BO    IP0   IP1   IP2     D0    D1    D2     S0    S1    S2    ER0   ER1   ER2    EC0   EC1   EC2
3f842eb0            active            active            active     1     3  2085880  1983          0      1     1     1      2     2     2      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     1     3  2087931  2051          0      1     1     1      2     2     2      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     0     2  2089882  1951       8192      2     2     2      0     0     0      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     0     0  2091888  2006          0      0     0     0      0     0     0      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     1     2  2093843  1955          0      1     1     1      1     1     1      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     0     1  2095856  2013       4096      1     1     1      0     0     0      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     0     1  2097800  1944       4096      1     1     1      0     0     0      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     0     1  2099642  1842       4096      1     1     1      0     0     0      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     0     1  2101605  1963       4096      1     1     1      0     0     0      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     0     1  2103503  1898          0      0     0     0      1     1     1      0     0     0      0     0     0      0     0     0
3f842eb0            active            active            active     0     0  2105470  1967          0      0     0     0      0     0     0      0     0     0      0     0     0      0     0     0

And iostat inside the primary instance showed improvement as well:

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1         22.50      0.47     0.00   0.00    0.40    21.52  145.13      0.44     0.18   0.12    0.23     3.12    0.00      0.00     0.00   0.00    0.00     0.00   62.01    0.36    0.07  10.58
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1665.00      4.80     0.00   0.00    0.23     2.95    0.00      0.00     0.00   0.00    0.00     0.00  701.00    0.36    0.63 100.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1582.00      4.56     0.00   0.00    0.23     2.95    0.00      0.00     0.00   0.00    0.00     0.00  669.00    0.37    0.61 100.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1489.00      4.30     0.00   0.00    0.23     2.96    0.00      0.00     0.00   0.00    0.00     0.00  626.00    0.36    0.56 100.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1751.00      5.04     0.00   0.00    0.23     2.95    0.00      0.00     0.00   0.00    0.00     0.00  737.00    0.36    0.66 100.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1614.00      4.66     0.00   0.00    0.23     2.95    0.00      0.00     0.00   0.00    0.00     0.00  682.00    0.36    0.61  99.60
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1621.00      4.62     0.00   0.00    0.23     2.92    0.00      0.00     0.00   0.00    0.00     0.00  702.00    0.36    0.63 100.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1583.00      4.57     0.00   0.00    0.23     2.96    0.00      0.00     0.00   0.00    0.00     0.00  668.00    0.37    0.61  99.60
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1505.00      4.37     0.00   0.00    0.23     2.97    0.00      0.00     0.00   0.00    0.00     0.00  631.00    0.36    0.58  98.00
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1579.00      4.61     0.00   0.00    0.23     2.99    0.00      0.00     0.00   0.00    0.00     0.00  660.00    0.36    0.60  99.60

A workaround to "turn back on copying" should land with this propolis PR:
oxidecomputer/propolis#823

@leftwo leftwo mentioned this issue Dec 5, 2024
@leftwo
Copy link
Contributor

leftwo commented Dec 5, 2024

#7206 should put performance back where it was before.

leftwo added a commit that referenced this issue Dec 5, 2024
Propolis: Switch viona back to packet copying for now #823

This is a workaround for
#7189
I turns off new work that we think is causing the slow performance until
we can get
a better idea of what exactly the problem is and if/how we might want to
fix it.

Co-authored-by: Alan Hanson <[email protected]>
@askfongjojo
Copy link
Author

Confirmed that workload performance is back to the original level after #7206.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants