Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket timeout when trying to send data on second socket #178

Closed
lemeura opened this issue Sep 29, 2023 · 28 comments
Closed

Socket timeout when trying to send data on second socket #178

lemeura opened this issue Sep 29, 2023 · 28 comments

Comments

@lemeura
Copy link

lemeura commented Sep 29, 2023

Hi,

I'm currently facing a quite systematic issue when using 2 different sockets to send UDS data on the same bus.

The stack trace ends like this:

    IsoTPSocketConnection.send(self, data)
  File "/venv/lib/python3.8/site-packages/udsoncan/connections.py", line 65, in send
    self.specific_send(payload)
  File "/venv/lib/python3.8/site-packages/udsoncan/connections.py", line 321, in specific_send
    self.tpsock.send(payload)
  File "/venv/lib/python3.8/site-packages/isotp/tpsock/__init__.py", line 89, in send
    return self._socket.send(*args, **kwargs)
socket.timeout: timed out
  • I'm running Ubuntu 20.04LTS and the issue appeared after kernel update from 5.15.0-78-generic to 5.15.0-79-generic.
  • I also reproduced the issue with Ubuntu 22.04 LTS after update from 5.19.0-45-generic to 6.2.0-33-generic.
  • Moving back to old kernels restores the good behavior.

In details, the aim of my project is to simulate UDS conversation then:

  • opening a socket to simulate UDS requests
  • opening a second socket to simulate the answers

The 2 sockets are opened successfully, the first one sends the request, the second one raises an exception when trying to send.

This setup was working well with old kernels but with recent ones, the timeout is very frequent but not systematic.

I tried without success:

  • to update all python libs linked to can communication (udsoncan, can-isotp, python-can, msgpack 0.6.2)
  • to install most recent kernel, with the hope to get it fixed

Could you help me to identify the component which introduced this "issue"? I'm not familiar with the different layers in action for this topic:

  • python libs
  • kernel modules
  • kernel core

Thanks in advance, best regards,

@pylessard
Copy link
Owner

how do you initialize your socket?

@hartkopp: What does it means when send() time out?

@pylessard
Copy link
Owner

@lemeura Can you share a CAN log? Is there data being sent at all ?

@lemeura
Copy link
Author

lemeura commented Oct 2, 2023

Hello,

my sockets are initialized like this:
socket 1: IsoTPSocketConnection(interface="can0", rxid=467595842, txid=467550962, name="F", addressing_mode=1)
socket2 : IsoTPSocketConnection(interface="can0", rxid=1901, txid=1869, name="D", addressing_mode=0)

Please find attached the corresponding candump in which the socket timeout occured during the reading of DiDs.

can.log

FI: The timeout does not occur systematically on the same DiD

Actually, I have to keep running an outdated kernel but it's not very safe as none of security updates can be applied.

Best regards,

@pylessard
Copy link
Owner

pylessard commented Oct 2, 2023

Interesting. I wonder if that could be a bug in the kernel module.
Can you check if there's a correlation between the payload size and the issue? I noticed that your communication fails after a payload of 13 bytes which perfectly fits 2 can frames.

I will try to check this, but I will be not so available in the upcoming week.
Maybe you can raise that issue in the kernel module repo : https://github.com/hartkopp/can-isotp

@lumagi
Copy link
Contributor

lumagi commented Oct 2, 2023

I think this might be related to this bug report. It's a race condition in the kernel module that causes the send to time out if there's a second thread listening on the same socket. There's a patch for the bug here. Maybe you want to try this, but it involves recompiling the isotp module.

@pylessard
Copy link
Owner

If that is the case, @lemeura you can always switch to the pure python implementation until the patch is officially released. Or you can recompile the module like @lumagi proposed

@lemeura
Copy link
Author

lemeura commented Oct 3, 2023

Hi @pylessard ,

The issue does not occur systematically on the same frame, sometimes it's failing from the first frame, sometimes it works up to the end (8 Did reading).

@lumagi , @pylessard , I will try to compile the module and I will let you know if it can fix the issue.

Thank you very much for your support.

@lemeura
Copy link
Author

lemeura commented Oct 3, 2023

Hi,

Unfortunately, I didn't succeed in compiling the module, I faced an error mentioning that is useless as the module is part of kernel since 5.10.

FYI, I didn't apply the patch yet because the source code from master seems to not be the one from which the patch has been done.

Please find below, the outputs:

$ git clone https://github.com/hartkopp/can-isotp.git
Cloning into 'can-isotp'...
remote: Enumerating objects: 599, done.
remote: Counting objects: 100% (162/162), done.
remote: Compressing objects: 100% (50/50), done.
remote: Total 599 (delta 111), reused 136 (delta 96), pack-reused 437
Receiving objects: 100% (599/599), 120.91 KiB | 2.69 MiB/s, done.
Resolving deltas: 100% (242/242), done.

$ git log
commit 7626d0a0707391970080d493ce69638719938da7 (HEAD -> master, origin/master, origin/HEAD)
Author: Oliver Hartkopp <[email protected]>
Date:   Fri May 20 08:16:13 2022 +0200

    isotp: bump out-of-tree version date
    
    Signed-off-by: Oliver Hartkopp <[email protected]>

$ uname -r
5.15.0-84-generic

$ make
make -C /lib/modules/5.15.0-84-generic/build M=~/Documents/can-isotp modules
make[1]: Entering directory '/usr/src/linux-headers-5.15.0-84-generic'
  CC [M]  ~/Documents/can-isotp/net/can/isotp.o
~/Documents/can-isotp/net/can/isotp.c:93:2: error: #error No need to compile this out-of-tree driver! ISO-TP is part of Linux Mainline kernel since Linux 5.10.
   93 | #error No need to compile this out-of-tree driver! ISO-TP is part of Linux Mainline kernel since Linux 5.10.
      |  ^~~~~
make[3]: *** [scripts/Makefile.build:297: ~/Documents/can-isotp/net/can/isotp.o] Error 1
make[2]: *** [scripts/Makefile.build:560: ~/Documents/can-isotp/net/can] Error 2
make[1]: *** [Makefile:1909: ~/Documents/can-isotp] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.15.0-84-generic'
make: *** [Makefile:21: modules] Error 2

Could you tell me if I'm working with the correct branch ?
Did I miss any steps ?

@hartkopp
Copy link

hartkopp commented Oct 3, 2023

how do you initialize your socket?

@hartkopp: What does it means when send() time out?

Usually when there was no FC answer to a FF first frame within 1000ms.

@hartkopp
Copy link

hartkopp commented Oct 3, 2023

You should just read and follow the error messages ;-)

~/Documents/can-isotp/net/can/isotp.c:93:2: error: #error No need to compile this out-of-tree driver! ISO-TP is part of Linux Mainline kernel since Linux 5.10.
93 | #error No need to compile this out-of-tree driver! ISO-TP is part of Linux Mainline kernel since Linux 5.10.

Could you tell me if I'm working with the correct branch ? Did I miss any steps ?

There is no need to clone any can-isotp repository as the feature of CAN ISOTP sockets is "just there" in your 5.15 kernel.

@hartkopp
Copy link

hartkopp commented Oct 3, 2023

I think this might be related to this bug report. It's a race condition in the kernel module that causes the send to time out if there's a second thread listening on the same socket. There's a patch for the bug here. Maybe you want to try this, but it involves recompiling the isotp module.

@pylessard are you using the epoll() or poll() syscalls within udsoncan? If not, this patch and its description does not apply to the problem IMO.

@pylessard
Copy link
Owner

@hartkopp : I'm using send() and recv(), that's all.

@hartkopp
Copy link

hartkopp commented Oct 3, 2023

@hartkopp : I'm using send() and recv(), that's all.

That's weird. I will take a deeper look into the log file.
Did you ever use the CAN_ISOTP_WAIT_TX_DONE flag from the isotp sockopts?

This flag causes a waiting for the PDU to be transmitted entirely (or returning a timeout error).

@lumagi
Copy link
Contributor

lumagi commented Oct 3, 2023

Please also consider this. I originally did a bit of digging in the CPython source code because I just assumed Python would pass the send/recv calls directly to libC. But instead, it always configures the sockets to be nonblocking and uses poll to implement blocking semantics for its sockets.

Edit: this is the main loop that does the polling (or rather select) until the socket is ready.

@hartkopp
Copy link

hartkopp commented Oct 3, 2023

Ok @lumagi , than it makes sense that the issue relates to your fix!

But the CPython implementation should not act against the users expectation and exchange syscalls without notice. I assume this is used for multi-threading.

@pylessard
Copy link
Owner

pylessard commented Oct 3, 2023

Oh, I must add that udsoncan.IsoTPSocketConnection defines a timeout for the socket by default. By calling settimeout(), it puts the socket into non-blocking mode. I'm not exactly sure if that make usage of epoll. I just read this morning what epoll was to be honest.

https://github.com/pylessard/python-udsoncan/blob/master/udsoncan/connections.py#L280
https://github.com/pylessard/python-can-isotp/blob/master/isotp/tpsock/__init__.py#L84

@PankajJain5889
Copy link

PankajJain5889 commented Oct 6, 2023

> I think this might be related to [this bug report](https://lore.kernel.org/linux-can/[email protected]/T/#m9615a4ecbdb742749886af73414e476498c93d51). It's a race condition in the kernel module that causes the send to time out if there's a second thread listening on the same socket. There's a [patch for the bug here](https://lore.kernel.org/linux-can/[email protected]/T/#u). Maybe you want to try this, but it involves recompiling the isotp module.
I was facing similar issue of socket timing out when sending frames over CAN when I moved to Ubuntu 20.04 and using 5.15 kernel. I saw that any kernel greater than 5.15.78 was facing the same issue
I used this patch on 5.15.133 and 6.6 rc4 and rebuilt the kernel to verify and found that the said issue is resolved but since this is patch work any chance that above mentioned change will come to mainline and will be adopted in Ubuntu (fossa release)

@lumagi
Copy link
Contributor

lumagi commented Oct 6, 2023

Thank you to @hartkopp for fast-tracking the patch. The commit that lead to the regression was introduced to the upstream kernel with release 6.3. That means Ubuntu must be backporting the patches to its LTS kernels. I would assume that they will do so as well for the patch that fixes the issue if it is accepted and released in the next upstream kernel version 6.7.

@hartkopp
Copy link

hartkopp commented Oct 7, 2023

@lumagi we are currently in the 6.6-rc cycle. And when the patch hits the 6.6-rc tree it automatically gets tracked for the kernel.org LTS kernels that have the issue. So it's very likely that Ubuntu will also apply that patch to its LTS kernels.

@lumagi
Copy link
Contributor

lumagi commented Oct 8, 2023

That's good to know, thanks for the clarification @hartkopp

@pylessard
Copy link
Owner

@lemeura : does the conversation above helps you?

@lemeura
Copy link
Author

lemeura commented Oct 9, 2023

Hi all,

I would like to thank you very much for all these clarifications.

The fact that the patch is in the pipe for future integration in the kernel is a very good news, as actually I'm unable to distribute custom kernel on my different setups.

I just have to be patient :) and I will prepare the migration to future Ubuntu LTS 24.04, potentially including directly a fixed kernel.

Best regards,

@pylessard
Copy link
Owner

I will close that issue then.
@lemeura, please reopen if you believe something needs to be addressed

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Oct 11, 2023
With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
@hartkopp
Copy link

Yes. The patch has been applied to Linus' mainline kernel tree - so it will get attention for the stable kernels soon.

honjow pushed a commit to 3003n/linux that referenced this issue Oct 16, 2023
With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this issue Oct 16, 2023
[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Kaz205 pushed a commit to Kaz205/linux that referenced this issue Oct 16, 2023
[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Whissi pushed a commit to Whissi/linux-stable that referenced this issue Oct 19, 2023
[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Whissi pushed a commit to Whissi/linux-stable that referenced this issue Oct 19, 2023
[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Whissi pushed a commit to Whissi/linux-stable that referenced this issue Nov 8, 2023
From: Lukas Magel <[email protected]>

[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Whissi pushed a commit to Whissi/linux-stable that referenced this issue Nov 8, 2023
[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/linux-mvista that referenced this issue Dec 20, 2023
Source: Kernel.org
MR: 130463
Type: Integration
Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y
ChangeID: deddf60c271f1fc1edba25c4bf66d02854df5c5d
Description:

[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Armin Kuster <[email protected]>
jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/linux-mvista that referenced this issue Dec 20, 2023
Source: Kernel.org
MR: 130463
Type: Integration
Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y
ChangeID: deddf60c271f1fc1edba25c4bf66d02854df5c5d
Description:

[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Armin Kuster <[email protected]>
jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/linux-mvista that referenced this issue Dec 20, 2023
Source: Kernel.org
MR: 130463
Type: Integration
Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10.y
ChangeID: deddf60c271f1fc1edba25c4bf66d02854df5c5d
Description:

[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Armin Kuster <[email protected]>
tuxedo-bot pushed a commit to tuxedocomputers/linux that referenced this issue Jan 17, 2024
BugLink: https://bugs.launchpad.net/bugs/2046269

[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
Signed-off-by: Roxana Nicolescu <[email protected]>
sparkstar pushed a commit to sparkstar/jammy-stable that referenced this issue Jan 18, 2024
BugLink: https://bugs.launchpad.net/bugs/2049417

[ Upstream commit d9c2ba65e651467de739324d978b04ed8729f483 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa79cb5 ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Roxana Nicolescu <[email protected]>
sparkstar pushed a commit to sparkstar/jammy-stable that referenced this issue Jan 22, 2024
BugLink: https://bugs.launchpad.net/bugs/2049417

[ Upstream commit d9c2ba65e651467de739324d978b04ed8729f483 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa79cb5 ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Roxana Nicolescu <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
sileshn pushed a commit to sileshn/ubuntu-kernel-jammy that referenced this issue Feb 4, 2024
BugLink: https://bugs.launchpad.net/bugs/2049417

[ Upstream commit d9c2ba65e651467de739324d978b04ed8729f483 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa79cb5 ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Roxana Nicolescu <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
wanghao75 pushed a commit to openeuler-mirror/kernel that referenced this issue Apr 1, 2024
stable inclusion
from stable-vundefined
commit deddf60c271f1fc1edba25c4bf66d02854df5c5d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9CSYQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=deddf60c271f1fc1edba25c4bf66d02854df5c5d

--------------------------------

[ Upstream commit d9c2ba6 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: sanglipeng <[email protected]>
Elchanz3 pushed a commit to Elchanz3/android_kernel_samsung_sdk-s5e9925_r11s that referenced this issue Jun 2, 2024
[ Upstream commit d9c2ba65e651467de739324d978b04ed8729f483 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa79cb5 ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
eclipse-oniro-oh-bot pushed a commit to eclipse-oniro-mirrors/kernel_linux_5.10 that referenced this issue Jul 6, 2024
stable inclusion
from stable-5.10.200
commit deddf60c271f1fc1edba25c4bf66d02854df5c5d
category: bugfix
issue: #IA8M8T
CVE: NA

Signed-off-by: wanxiaoqing <[email protected]>
---------------------------------------

[ Upstream commit d9c2ba65e651467de739324d978b04ed8729f483 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa79cb5 ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: wanxiaoqing <[email protected]>
fluffball3 pushed a commit to fluffball3/android_kernel_samsung_m33x that referenced this issue Oct 2, 2024
[ Upstream commit d9c2ba65e651467de739324d978b04ed8729f483 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa79cb5 ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@msymt
Copy link

msymt commented Oct 11, 2024

Hi,
I am encountering a timeout issue while using the isotp library for CAN communication on the kernel version 6.8.0-45.
The issue seems to occur when sending data, and it results in the following error message:

$ python3 isotpsndtest.py 
Reception of FlowControl timed out. Stopping transmission.

Here are the details of my setup:

  • Kernel version: 6.8.0-45-generic
  • Operating System: Ubuntu 24.04.1 LTS
  • Virtual Environment: Running on VMware Fusion 13.4.2 (VM)
  • CAN Interface: vcan0 with a bitrate of 500000
  • isotp.Address settings: Using 11-bit addressing mode (Normal_11bits)

This issue has been addressed in an earlier discussion and was believed to be fixed in the 6.6-rc kernel version. However, I am still experiencing the timeout issue on kernel version 6.8.0-45.

Here is the script that causes the timeout:

# isotpsndtest.py 
import isotp
import time
import can

bus = can.interface.Bus(interface='socketcan', channel='vcan0', bitrate=500000)
addr = isotp.Address(isotp.AddressingMode.Normal_11bits, txid=0x7E1, rxid=0x7E9)
stack = isotp.CanStack(bus, address=addr)

stack.send(b'\x01\x02\x03\x04\x05\x06\x07\x08\x09\x10\x01\x02\x03\x04\x05\x06\x07\x08\x09\x10\x01\x02\x03\x04\x05\x06\x07\x08\x09\x10') 

while stack.transmitting():
   stack.process()
   time.sleep(0.0001)

bus.shutdown()

receiver terminal:

$ isotpdump -s 7e1 -d 7e9 vcan0 -a
 vcan0  7E1  [8]  [FF] ln: 30   data: 01 02 03 04 05 06     -  '......'

Steps to reproduce:

  • Set up a virtual CAN interface using vcan0.
  • Run the above script to send data over the CAN interface.
  • The transmission fails with the timeout error message.

Given that this issue was believed to be resolved in the 6.6-rc kernel but still persists in 6.8.0-45, should I be consulting or reporting this problem in a different repository? Any guidance on the correct place to raise this issue would be appreciated.

Best regards,

@pylessard
Copy link
Owner

pylessard commented Oct 11, 2024

do you run isotprecv somehwere? You need a receiver, not just a listener.

@msymt
Copy link

msymt commented Oct 11, 2024

@pylessard

I have resolved the flow control issue by correctly implementing the receiver script. It turns out my initial understanding of IsoTP was insufficient. By setting up the receiver correctly, the transmission now works without any timeouts.

Thank you for your assistance!

@pylessard
Copy link
Owner

Good.
On a side note, I see you use the isotp module like V1.x required it. V2.x is much more performant timing wise.
You don't have to call process() yourself. Just need to do a start() and a stop() Check the doc

Cheers

Ksawlii pushed a commit to Ksawlii/android_kernel_samsung_a53x-FireAsf that referenced this issue Nov 18, 2024
[ Upstream commit d9c2ba65e651467de739324d978b04ed8729f483 ]

With patch [1], isotp_poll was updated to also queue the poller in the
so->wait queue, which is used for send state changes. Since the queue
now also contains polling tasks that are not interested in sending, the
queue fill state can no longer be used as an indication of send
readiness. As a consequence, nonblocking writes can lead to a race and
lock-up of the socket if there is a second task polling the socket in
parallel.

With this patch, isotp_sendmsg does not consult wq_has_sleepers but
instead tries to atomically set so->tx.state and waits on so->wait if it
is unable to do so. This behavior is in alignment with isotp_poll, which
also checks so->tx.state to determine send readiness.

V2:
- Revert direct exit to goto err_event_drop

[1] https://lore.kernel.org/all/[email protected]

Reported-by: Maxime Jayat <[email protected]>
Closes: https://lore.kernel.org/linux-can/[email protected]/
Signed-off-by: Lukas Magel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Fixes: 79e19fa79cb5 ("can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events")
Link: pylessard/python-udsoncan#178 (comment)
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants