routecache: handle case when sent to our non-loopback address

This patch modifies the OSv routecache to handle the case when a route entry gets added for a non-loopback address and loopback device before packets start coming from outside of the local network. The practical example is an app that on startup calls itself on its HTTP endpoint but using non-loopback address like 192.168.122.15 before any external requests from outside of the guest. This is exactly what would happen when running SeaWeedFS (see #1188). This bug can be also replicated with a golang-httpclient and httpserver-api apps: ./scripts/build image=golang-httpclient,httpserver-monitoring-api //Change '0.0.0.0' in modules/httpserver-api/global_server.cc to '192.168.122.15' ./script/run.py -api -e '/httpclient.so http://192.168.122.15:8000/os/version 0 10' //The app will wait for 10 seconds before shutting down After it starts, run 'curl http://localhost:8000/os/version' and observe curl never receive the response. Let us assume we have an app that binds to 192.168.122.15 and listens on some port. When a call to connect on a non-loopback address comes from the app itself, it goes through the layers of the networking stack and eventually calls route_cache::lookup() to find a route entry for 192.168.122.15 as in the stack trace below. in route_cache::lookup (dst=dst@entry=0xffff8000014bb84c, fibnum=<optimized out>, ret=ret@entry=0xffff8000014bb860) at ./bsd/sys/net/routecache.hh:197 in_pcbladdr (inp=inp@entry=0xffffa0000155a200, faddr=faddr@entry=0xffff8000014bb98c, laddr=laddr@entry=0xffff8000014bb988, cred=0x0) at bsd/sys/netinet/in_pcb.cc:881 in_pcbconnect_setup (inp=0xffffa0000155a200, nam=0xffffa000008e0510, laddrp=0xffff8000014bb9e4, lportp=0xffff8000014bb9e2, faddrp=0xffffa0000155a264, fportp=0xffffa0000155a254, oinpp=0xffff8000014bb9e8, cred=0x0) at bsd/sys/netinet/in_pcb.cc:1056 tcp_connect (tp=tp@entry=0xffffa00000f76800, nam=nam@entry=0xffffa000008e0510, td=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:1089 tcp_usr_connect (td=<optimized out>, nam=0xffffa000008e0510, so=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:463 tcp_usr_connect (so=<optimized out>, nam=0xffffa000008e0510, td=<optimized out>) at bsd/sys/netinet/tcp_usrreq.cc:436 kern_connect (fd=<optimized out>, sa=0xffffa000008e0510) at bsd/sys/kern/uipc_syscalls.cc:374 Initially the route cache is empty so it adds new entry that associates the address 192.168.122.15, the loopback device and its netmask 255.0.0.0. Once all subsequent networking stack calls to handle this particular HTTP request complete, at some point later external client calls the same HTTP endpoint from outside of OSv guest. While handling this call, at some point we get to this point as illustrated by the stack trace below: in route_cache::lookup (dst=<optimized out>, fibnum=<optimized out>, ret=0xffff800000c4ead0) at ./bsd/sys/net/routecache.hh:197 in tcp_maxmtu (inc=<optimized out>, flags=0x0) at bsd/sys/netinet/tcp_subr.cc:1690 in tcp_mssopt (inc=0xffffa00000f7b910) at bsd/sys/netinet/tcp_input.cc:3131 in syncookie_generate (flowlabel=<synthetic pointer>, sc=0xffffa00000f7b900, sch=<optimized out>) at bsd/sys/netinet/tcp_syncache.cc:1536 in _syncache_add (inc=<optimized out>, to=<optimized out>, th=<optimized out>, inp=<optimized out>, lsop=0xffff800000c4edc8, m=0xffffa00000f7bf00, toepcb=0x0, tu=0x0) at bsd/sys/netinet/tcp_syncache.cc:1210 in tcp_input (m=0xffffa00000f7bf00, off0=<optimized out>) at bsd/sys/netinet/tcp_input.cc:941 in ip_input (m=<optimized out>) at bsd/sys/netinet/ip_input.cc:774 in netisr_dispatch_src (proto=1, source=<optimized out>, m=0xffffa00000f7bf00) at bsd/sys/net/netisr.cc:769 in netisr_dispatch_src (proto=9, source=<optimized out>, m=0xffffa00000f7bf00) at bsd/sys/net/netisr.cc:769 in virtio::net::receiver (this=0xffff900000ba3000) at drivers/virtio-net.cc:545 This time the destination address (dst) is 192.168.122.1 which is a gateway for the non-loopback (eth0) network. However at this point we have a single route entry created when handling the first request and the search() method used by route_cache::lookup() simply compares this entry network with the network of the IP address in question by applying netmask 255.0.0.0 and it matches so that entry is returned. The problem is that this and all subsequent packets get routed to the loopback device instead of the virtio device in this case. And the client never receives relevant packets to complete the handshake and TCP connection does not get established from the client perspective. Effectively the application will never be able to receive any external requests, however it can still handle any internal ones like the 1st one in this example. What is interesting if we revert the sequence in this example and send external request before the app makes an internal request to itself everything works fine. It is because when handling the external call, the initial and only route entry gets created for IP 192.168.122.1 and netmask 255.255.255.0 and non-loopback (eth0) device. The subsequent internal request would match the netmask and go through the ifloop() like this illustrates: 0xffff80000119b040 /httpclient.so 0 15.368111615 net_packet_in b'IP 192.168.122.15.46890 > 192.168.122.15.8000: Flags [S], seq 744996567, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 15368 ecr 0], length 0' log_packet_in(mbuf*, int) core/net_trace.cc:143 netisr_queue_workstream(netisr_workstream*, unsigned int, netisr_work*, mbuf*, int*) [clone .constprop.0] bsd/sys/net/netisr.cc:633 netisr_queue_src bsd/sys/net/netisr.cc:684 netisr_queue_src bsd/sys/net/netisr.cc:712 if_simloop bsd/sys/net/if_loop.cc:291 ether_output bsd/sys/net/if_ethersubr.cc:256 ip_output(mbuf*, mbuf*, route*, int, ip_moptions*, inpcb*) bsd/sys/netinet/ip_output.cc:621 tcp_output bsd/sys/netinet/tcp_output.cc:1385 tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:465 tcp_usr_connect bsd/sys/netinet/tcp_usrreq.cc:436 kern_connect bsd/sys/kern/uipc_syscalls.cc:374 So this patch fixes this problem by changing the search() method to handle the first scenario. In essence, instead of simply comparing the networks of the entry IP address and dst, it first identifies type of the device the entry is associated with. If non-loopback it does the same as before, if device is loopback if checks if dst is loopback in which case it is a match otherwise it compares full IP addresses. With the patch applied, in first scenario when the second external call is handled the entry device is checked which is loopback, and new logic returns null as full addresses would not match. This patch has been backported from the original one by Jan-Michael Kho contributed to the Spirent fork of OSv - SpirentOrion@f6e6e54. V2: Comparing to V1 this version adds more comments to the code and removes some unnecessary includes. Co-authored-by: "Jan-Michael Kho" <[email protected]> Co-authored-by: Waldemar Kozaczuk <[email protected]> Signed-off-by: Waldemar Kozaczuk <[email protected]>
cloudius-systems · Jul 24, 2022 · a0251df · a0251df
1 parent fd716a7
commit a0251df
Showing 1 changed file with 31 additions and 10 deletions.
diff --git a/bsd/sys/net/routecache.hh b/bsd/sys/net/routecache.hh
@@ -44,17 +44,9 @@
 #ifndef INCLUDED_ROUTECACHE_HH
 #define INCLUDED_ROUTECACHE_HH
 
-// FIXME: probably most of these includes are unnecessary
 #include <bsd/porting/netport.h>
-#include <bsd/porting/sync_stub.h>
-#include <bsd/sys/sys/param.h>
-#include <bsd/sys/sys/mbuf.h>
-#include <bsd/sys/sys/socket.h>
-#include <bsd/sys/sys/domain.h>
 #include <bsd/sys/net/if.h>
-#include <bsd/sys/net/if_dl.h>
 #include <bsd/sys/netinet/in_var.h>
-
 #include <bsd/sys/net/route.h>
 
 #include <osv/rcu.hh>
@@ -88,6 +80,10 @@ public:
 #endif
         return *this;
     }
+
+    bool is_loopback(void) const {
+        return (rt_ifp && (rt_ifp->if_flags & IFF_LOOPBACK)) ? true : false;
+    }
 };
 
 // Silly routing table implementation, allowing search given address in list
@@ -116,10 +112,35 @@ public:
         }
         entries.emplace_front(a, n, r);
     }
+    // address should be in host order
+    bool is_loopback_net(u32 address) const {
+        return ((address >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET) ? true : false;
+    }
     nonlockable_rtentry *search(u32 address) {
         for (silly_rtable_entry &e : entries) {
-            if ((e.address & e.netmask) == (address & e.netmask)) {
-                return &e.rte;
+            if (e.rte.is_loopback() == false) {
+                // The interface associated with this entry is non-loopback
+                // so it is enough to compare the networks of the addresses
+                if ((e.address & e.netmask) == (address & e.netmask)) {
+                    return &e.rte;
+                }
+            } else {
+                // Given the interface associated with this entry is a loopback one,
+                // consider addresses like 127.0.0.1 or 127.0.0.2 as a match
+                if (is_loopback_net(address)) {
+                    return &e.rte;
+                }
+                // Otherwise, we shouldn't use this entry on IP addresses just because they're
+                // on the same network as our non-loopback address. So match the entire
+                // address.
+                // This would handle a case when this route entry was added
+                // when looking up a non-loopback address like 192.168.122.15 but
+                // "converted" or handled as loopback as part of an optimization.
+                // The good practical example is an app like golang-httpclient
+                // which calls httpserver-api listening on 192.168.122.15
+                if (e.address == address) {
+                    return &e.rte;
+                }
             }
         }
         return nullptr;