Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message is sent to the wrong interface #11120

Closed
kghost opened this issue Oct 28, 2021 · 14 comments · Fixed by #14382
Closed

Message is sent to the wrong interface #11120

kghost opened this issue Oct 28, 2021 · 14 comments · Fixed by #14382
Assignees
Milestone

Comments

@kghost
Copy link
Contributor

kghost commented Oct 28, 2021

Problem

My testing box have 2 different active network interfaces:

4: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether d4:d2:52:79:9a:c8 brd ff:ff:ff:ff:ff:ff
inet 192.168.90.187/24 brd 192.168.90.255 scope global dynamic noprefixroute wlp2s0
valid_lft 35978sec preferred_lft 35978sec
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:76:c8:60:61 brd ff:ff:ff:ff:ff:ff
inet 192.168.9.1/24 brd 192.168.9.255 scope global docker0
valid_lft forever preferred_lft forever

The controller sends a message to 192.168.9.1, on the device side the message is received from 192.168.90.187. So the message is sent via the wrong socket.

Expected packet: 192.168.9.1:5541 -> 192.168.9.1:5540
Actual packet: 192.168.90.187:5541 -> 192.168.9.1:5540

we should response the message via the same interface where the message is received.

Scenario

This is triggered by scripts/tests/test_suites.sh

  1. Run ./out/debug/standalone/chip-all-clusters-app
  2. Run ./out/debug/chip-tool pairing qrcode ...
  3. Run ./out/debug/chip-tool test ...

During pairing phrase

Here is where the problem begins, in the following logs, the controller send a packet to 192.168.9.1, this address is discovered by mDNS, but the packet is sent via the wrong interface.

Controller side

[1635412318.928113][1128053:1128078] CHIP:DIS: Attempt to mDNS broadcast failed: ../../src/inet/IPEndPointBasis.cpp:687: OS Error 0x02000065: Network is unreachable
[1635412318.928148][1128053:1128078] CHIP:DMG: ICR moving to [AwaitingDe]
[1635412318.928159][1128053:1128078] CHIP:EM: Sending Standalone Ack for MessageCounter:6 on exchange 48303i
[1635412318.928194][1128053:1128078] CHIP:IN: Prepared encrypted message 0x7fc110ada978 to 0x0000000012344321 of type 0x10 and protocolId (0, 0) on exchange 48303i with MessageCounter:11.
[1635412318.928202][1128053:1128078] CHIP:IN: Sending encrypted msg 0x7fc110ada978 with MessageCounter:11 to 0x0000000012344321 at monotonic time: 1337236480 msec
[1635412318.928208][1128053:1128078] CHIP:IN: Sending msg to address UDP:fdb9:e90f:246d:53e2:6a9c:214:9625:d001:5540
[1635412318.928249][1128053:1128078] CHIP:EM: Flushed pending ack for MessageCounter:6 on exchange 48303i
[1635412318.928254][1128053:1128078] CHIP:EM: ec-- id: 48303i
[1635412318.928326][1128053:1128078] CHIP:DIS: Node ID resolved for 0x0000000012344321 to [192.168.9.1]:5540
[1635412318.928354][1128053:1128078] CHIP:IN: Marking old secure session for device 0x0000000012344321 as expired
[1635412318.928360][1128053:1128078] CHIP:CTL: OnConnectionExpired was called for unknown device, ignoring it.
[1635412318.928365][1128053:1128078] CHIP:IN: New secure session created for device 0x0000000012344321, key 1!!
[1635412318.928601][1128053:1128078] CHIP:TOO: OnAddressUpdateComplete: 12344321: ../../src/controller/CHIPDevice.cpp:443: Success
[1635412318.928607][1128053:1128078] CHIP:CTL: OperationalDiscoveryComplete for device ID 305414945
[1635412318.928611][1128053:1128078] CHIP:CTL: Enabling CASE session establishment for the device
[1635412318.928713][1128053:1128078] CHIP:IN: Marking old secure session for device 0x0000000012344321 as expired
[1635412318.928992][1128053:1128078] CHIP:IN: Creating UnauthenticatedSession with PeerAddress UDP:192.168.9.1:5540
[1635412318.929016][1128053:1128078] CHIP:EM: ec++ id: 48304i with delegate @0x7fc11b8e0160
[1635412318.929182][1128053:1128078] CHIP:IN: Prepared plaintext message 0x559821b33010 to 0x0000000000000000 of type 0x30 and protocolId (0, 0) on exchange 48304i with MessageCounter:1568095081.
[1635412318.929197][1128053:1128078] CHIP:IN: Sending plaintext msg 0x559821b33010 with MessageCounter:1568095081 to 0x0000000000000000 at monotonic time: 1337236481 msec
[1635412318.929205][1128053:1128078] CHIP:IN: Sending msg to address UDP:192.168.9.1:5540
[1635412318.929252][1128053:1128078] CHIP:SC: Sent Sigma1 msg
[1635412318.929299][1128053:1128078] CHIP:DIS: Discovered node without a pending query
[1635412318.929309][1128053:1128078] CHIP:DIS: Node ID resolved for 0x0000000012344321 to [192.168.9.1]:5540

Device side:

[1635412318.929268][1128023:1128023] CHIP:IN: Received msg from address UDP:192.168.90.187:5541
[1635412318.929283][1128023:1128023] CHIP:IN: Creating UnauthenticatedSession with PeerAddress UDP:192.168.90.187:5541
[1635412318.929304][1128023:1128023] CHIP:EM: Received message of type 0x30 with protocolId (0, 0) and MessageCounter:1568095081 on exchange 48304r
[1635412318.929339][1128023:1128023] CHIP:EM: ec++ id: 48304r with delegate @0x555c50c309a8
[1635412318.929346][1128023:1128023] CHIP:EM: Handling via exchange: 48304r, Delegate: 0x0x555c50c309a8
[1635412318.929357][1128023:1128023] CHIP:IN: CASE Server received Sigma1 message. Starting handshake. EC 0x555c50c30fb0
[1635412318.929364][1128023:1128023] CHIP:IN: Marking old secure session for device 0x0000000000000000 as expired
[1635412318.929373][1128023:1128023] CHIP:DIS: CASE handshake initiated, closing all BLE Connections
[1635412318.929380][1128023:1128023] CHIP:SC: Waiting for Sigma1 msg
[1635412318.929385][1128023:1128023] CHIP:IN: CASE Server disabling CASE session setups
[1635412318.929392][1128023:1128023] CHIP:SC: Received Sigma1 msg
[1635412318.929405][1128023:1128023] CHIP:SC: Peer assigned session key ID 2
[1635412318.929414][1128023:1128023] CHIP:DIS: Fabric (1) loaded from storage. Calling OnFabricRetrievedFromStorage
[1635412318.929420][1128023:1128023] CHIP:ZCL: OpCreds: Fabric 0x1 was retrieved from storage. FabricId 0x0000000000000000, NodeId 0x0000000012344321, VendorId 0x000A
[1635412318.929426][1128023:1128023] CHIP:ZCL: OpCreds: Call to writeFabricsIntoFabricsListAttribute
[1635412318.929460][1128023:1128023] CHIP:ZCL: OpCreds: Writing fabric into attribute store at index 0: fabricId 0x0000000000000000, nodeId 0x0000000012344321 vendorId 0x000A
[1635412318.929473][1128023:1128023] CHIP:ZCL: OpCreds: Stored 1 fabrics in fabrics list attribute.

During test phrase

Controller sends message:

[1635412319.074908][1128079:1128093] CHIP:IN: Creating UnauthenticatedSession with PeerAddress UDP:192.168.90.187:5540
[1635412319.074938][1128079:1128093] CHIP:EM: ec++ id: 13075i with delegate @0x7f7da4560160
[1635412319.075071][1128079:1128093] CHIP:IN: Prepared plaintext message 0x55bdb50c3720 to 0x0000000000000000 of type 0x30 and protocolId (0, 0) on exchange 13075i with MessageCounter:2354484034.
[1635412319.075085][1128079:1128093] CHIP:IN: Sending plaintext msg 0x55bdb50c3720 with MessageCounter:2354484034 to 0x0000000000000000 at monotonic time: 1337236627 msec
[1635412319.075090][1128079:1128093] CHIP:IN: Sending msg to address UDP:192.168.90.187:5540
[1635412319.075136][1128079:1128093] CHIP:SC: Sent Sigma1 msg

Device receives message and sends reply

[1635412319.075169][1128023:1128023] CHIP:IN: Received msg from address UDP:192.168.90.187:5541
[1635412319.075191][1128023:1128023] CHIP:EM: Received message of type 0x30 with protocolId (0, 0) and MessageCounter:2354484034 on exchange 13075r
[1635412319.075199][1128023:1128023] CHIP:EM: ec++ id: 13075r with delegate @0x555c50c309a8
[1635412319.075204][1128023:1128023] CHIP:EM: Handling via exchange: 13075r, Delegate: 0x0x555c50c309a8
[1635412319.075213][1128023:1128023] CHIP:IN: CASE Server received Sigma1 message. Starting handshake. EC 0x555c50c30fb0
[1635412319.075217][1128023:1128023] CHIP:DIS: CASE handshake initiated, closing all BLE Connections
[1635412319.075224][1128023:1128023] CHIP:SC: Waiting for Sigma1 msg
[1635412319.075228][1128023:1128023] CHIP:IN: CASE Server disabling CASE session setups
[1635412319.075232][1128023:1128023] CHIP:SC: Received Sigma1 msg
[1635412319.075244][1128023:1128023] CHIP:SC: Peer assigned session key ID 4
[1635412319.075249][1128023:1128023] CHIP:DIS: Fabric (1) loaded from storage. Calling OnFabricRetrievedFromStorage
[1635412319.075253][1128023:1128023] CHIP:ZCL: OpCreds: Fabric 0x1 was retrieved from storage. FabricId 0x0000000000000000, NodeId 0x0000000012344321, VendorId 0x000A
[1635412319.075258][1128023:1128023] CHIP:ZCL: OpCreds: Call to writeFabricsIntoFabricsListAttribute
[1635412319.075292][1128023:1128023] CHIP:ZCL: OpCreds: Writing fabric into attribute store at index 0: fabricId 0x0000000000000000, nodeId 0x0000000012344321 vendorId 0x000A
[1635412319.075304][1128023:1128023] CHIP:ZCL: OpCreds: Stored 1 fabrics in fabrics list attribute.
[1635412319.075339][1128023:1128023] CHIP:DIS: Fabric (1) loaded from storage. Calling OnFabricRetrievedFromStorage
[1635412319.075346][1128023:1128023] CHIP:ZCL: OpCreds: Fabric 0x1 was retrieved from storage. FabricId 0x0000000000000000, NodeId 0x0000000012344321, VendorId 0x000A
[1635412319.075352][1128023:1128023] CHIP:ZCL: OpCreds: Call to writeFabricsIntoFabricsListAttribute
[1635412319.075372][1128023:1128023] CHIP:ZCL: OpCreds: Writing fabric into attribute store at index 0: fabricId 0x0000000000000000, nodeId 0x0000000012344321 vendorId 0x000A
[1635412319.075381][1128023:1128023] CHIP:ZCL: OpCreds: Stored 1 fabrics in fabrics list attribute.
[1635412319.075681][1128023:1128023] CHIP:EM: Piggybacking Ack for MessageCounter:2354484034 on exchange: 13075r
[1635412319.075692][1128023:1128023] CHIP:IN: Prepared plaintext message 0x555c50c30ea8 to 0x0000000000000000 of type 0x31 and protocolId (0, 0) on exchange 13075r with MessageCounter:1414302350.
[1635412319.075699][1128023:1128023] CHIP:IN: Sending plaintext msg 0x555c50c30ea8 with MessageCounter:1414302350 to 0x0000000000000000 at monotonic time: 1337236627 msec
[1635412319.075705][1128023:1128023] CHIP:IN: Sending msg to address UDP:192.168.90.187:5541
[1635412319.075749][1128023:1128023] CHIP:SC: Sent Sigma2 msg

Controller receives message:

[1635412319.075767][1128079:1128093] CHIP:IN: Received msg from address UDP:192.168.9.1:5540
[1635412319.075780][1128079:1128093] CHIP:IN: Creating UnauthenticatedSession with PeerAddress UDP:192.168.9.1:5540
[1635412319.075794][1128079:1128093] CHIP:EM: Received message of type 0x31 with protocolId (0, 0) and MessageCounter:1414302350 on exchange 13075i

Due to that the message is on a different UnauthenticatedSession, it is unable to match the session with the original exchange, so it is unable to handle the message.

Proposed Solution

  • Ensure interface id is set properly to device address when discovered by mDNS.
  • UDP::SendMessage should honor interface in PeerAddress.
@kghost
Copy link
Contributor Author

kghost commented Oct 28, 2021

@kpschoedel @yufengwangca @bzbarsky-apple @pan-apple

This bug is causing some serious problems, we should fix it before TE7

@kghost
Copy link
Contributor Author

kghost commented Oct 28, 2021

Confirm that the PeerAddress, fetched from mDNS doesn't contain proper interface id.

@cecille
Copy link
Contributor

cecille commented Oct 28, 2021

Related: #11009

@cecille
Copy link
Contributor

cecille commented Oct 28, 2021

The choice not to specify an interface seems quite deliberate - from the DeviceController OnNodeIdResolved call:

    // Only use the mDNS resolution's InterfaceID for addresses that are IPv6 LLA.
    // For all other addresses, we should rely on the device's routing table to route messages sent.
    // Forcing messages down an InterfaceId might fail. For example, in bridged networks like Thread,
    // mDNS advertisements are not usually received on the same interface the peer is reachable on.
    if (nodeData.mAddress.IsIPv6LinkLocal())
    {
        interfaceId = nodeData.mInterfaceId;
    }

What actually has to be matched in order for the session to match. Ie, how is this address used? Matching on exact address seems fragile - IMO nodes should be free to send messages on whatever interface is most convenient.

From the spec (section 4.14.2.4. Exchange Message Processing)

Upon receipt of a message by the Exchange Layer, the message is matched to an existing Exchange.
A given message is part of an Exchange if it satisfies all the following criteria:

1. The Exchange ID of the message matches the Exchange ID of the Exchange,
2. The message has the I Flag set and its Source Node ID is the Initiator of the Exchange,
OR it does not have the I Flag set and its Destination Node ID is the Initiator of the Exchange.
3. The message was received over a secure session that is associated to the Fabric ID and Node ID
of the Peer Node.

^ Does matching the secure session require matching the ip address?

@cecille
Copy link
Contributor

cecille commented Oct 28, 2021

The mdns resolver does track the interface, but keep in mind that there are addresses that are reachable over multiple interfaces. Right now, Device and SecureSession assume a single address for a peer, and that address is currently whatever address was last in the mdns packet on whatever interface that was received on. If we rely on strict addresses to match sessions, we'll need to track the interfaces, make sure we don't overwrite on receipt of new mdns packets and also track the local address that we're tagging packets with. That sounds overly restrictive. My preference would be to have the Device class (and maybe even the SecureSession) track all the addresses for a device so it can fall back in case of reachability issues. We also have a gap on the other side because there's nothing to restrict the sender from selecting amongst their own addresses for the source. The device side really has no way to know all the available addresses from the controller side.

TL;DR - if there's a way to match exchange contexts without using the IP addresses, we should do that.

@kghost
Copy link
Contributor Author

kghost commented Oct 28, 2021

Is not about how to match the address, the address we are using to reply is wrong.

Expected behavior:

First session:
Controller 192.168.9.1:5541 -> 192.168.9.1:5540 Device
           192.168.9.1:5541 <- 192.168.9.1:5540
The device cache this session (peer address 192.168.9.1:5541, local interface 192.168.9.1:5540)
Second session:
Controller 192.168.9.1:5541 -> 192.168.9.1:5540 Device
           192.168.9.1:5541 <- 192.168.9.1:5540

Actual behavior

First session:
Controller 192.168.90.187:5541 -> 192.168.9.1:5540 Device
           192.168.90.187:5541 <- 192.168.9.1:5540
The device cache this session (peer address 192.168.90.187:5541, local interface 192.168.9.1:5540)
Second session:
Controller 192.168.90.187:5541 -> 192.168.90.187:5540 Device (The session matches cached session, because the peer address is same)
           192.168.90.187:5541 <- 192.168.9.1:5540 Error: The controller sends a packet to 192.168.90.187:5540 but got a response from 192.168.9.1:5540, it is unable to process the packet.

@cecille
Copy link
Contributor

cecille commented Oct 28, 2021

yeah, but what does it mean to use a "wrong" address. I can't find anywhere in the spec that says we're matching on address, so in theory anything that gets the message there should be ok.

@cecille
Copy link
Contributor

cecille commented Oct 28, 2021

For TE7, I suppose what we can do is have the device set the address only if another address is not already set.

@cecille
Copy link
Contributor

cecille commented Oct 28, 2021

The general challenge with that is that if the device really DOES need to switch its address, we'll be doing the wrong thing.

@cecille
Copy link
Contributor

cecille commented Oct 28, 2021

I suppose my question is really - given that the device does receive the message on the "incorrect" address, why is the message being rejected? What part of the code is taking the address into account? And how does that match to spec?

@bzbarsky-apple
Copy link
Contributor

The part of the spec that matches on address is the matching for unencrypted sessions. That has to match on transport address, because there's nothing else to match on.

The real question here is this: how is the interface (and hence source address) selected based on the destination address in this case?

@cecille
Copy link
Contributor

cecille commented Oct 28, 2021

Perhaps I misunderstood then - is this happening during the case session establishment? Or when we start to send encrypted messages?

@franck-apple franck-apple added this to the Test Event 7 milestone Oct 28, 2021
andy31415 pushed a commit that referenced this issue Nov 3, 2021
…1120 (#11276)

* Enforce interface match for unauthenticated session, temporary fix #11120

* Update src/transport/UnauthenticatedSessionTable.h

Co-authored-by: Boris Zbarsky <[email protected]>

Co-authored-by: Boris Zbarsky <[email protected]>
andy31415 pushed a commit that referenced this issue Nov 3, 2021
…1120 (#11276)

* Enforce interface match for unauthenticated session, temporary fix #11120

* Update src/transport/UnauthenticatedSessionTable.h

Co-authored-by: Boris Zbarsky <[email protected]>

Co-authored-by: Boris Zbarsky <[email protected]>
@kghost
Copy link
Contributor Author

kghost commented Nov 3, 2021

Reopen, because it still need a permanent fix

@kghost kghost reopened this Nov 3, 2021
hank820 pushed a commit to hank820/connectedhomeip that referenced this issue Nov 23, 2021
* Enforce interface match for unauthenticated session, temporary fix project-chip#11120

* Update src/transport/UnauthenticatedSessionTable.h

Co-authored-by: Boris Zbarsky <[email protected]>

Co-authored-by: Boris Zbarsky <[email protected]>
pankore pushed a commit to pankore/connectedhomeip that referenced this issue Nov 24, 2021
…oject-chip#11120 (project-chip#11276)

* Enforce interface match for unauthenticated session, temporary fix project-chip#11120

* Update src/transport/UnauthenticatedSessionTable.h

Co-authored-by: Boris Zbarsky <[email protected]>

Co-authored-by: Boris Zbarsky <[email protected]>
PSONALl pushed a commit to PSONALl/connectedhomeip that referenced this issue Dec 3, 2021
…oject-chip#11120 (project-chip#11276)

* Enforce interface match for unauthenticated session, temporary fix project-chip#11120

* Update src/transport/UnauthenticatedSessionTable.h

Co-authored-by: Boris Zbarsky <[email protected]>

Co-authored-by: Boris Zbarsky <[email protected]>
@msandstedt
Copy link
Contributor

Spec PR, https://github.com/CHIP-Specifications/connectedhomeip-spec/issues/4674, is now merged so work on this can resume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants