Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Freeze Issue on IOT2050 After OS Update #583

Open
VikyFlow opened this issue Feb 12, 2025 · 16 comments
Open

System Freeze Issue on IOT2050 After OS Update #583

VikyFlow opened this issue Feb 12, 2025 · 16 comments

Comments

@VikyFlow
Copy link

VikyFlow commented Feb 12, 2025

Dear all,

I am experiencing an issue with my application running on an IOT2050 after upgrading the operating system from Debian 6.1.54-cip6 to Debian 6.1.102-cip26.

My current configuration is:
Operating System: Debian 6.1.102-cip26
Firmware: IOT2050-FW-Update-PKG-V01.04.04-0

Following this update, the system occasionally freezes and stops reading the network interface, even though the interface is still detected by the nmcli con show command.

This issue did not occur with the previous OS version.

Have you encountered similar issues with this OS version? Are there any known solutions or updates available to resolve this problem?

I look forward to your response.

Best regards,
Victoria

@huaqianli
Copy link
Collaborator

Could you share the log and your device info?

Thanks.

@VikyFlow
Copy link
Author

VikyFlow commented Feb 12, 2025 via email

@VikyFlow
Copy link
Author

The error occured again even if I was connected thru serial,
I pinged my internal interface and it responds 👍🏻
i tried to remote access it doents work👎🏻

My modules dont work.
Tried to reconnect thru serial, it is super slow .. trying commands sometimes i get outputs sometime no.
And then after I sent a command i goot my system rebooted:
root@iot2050-debian:# systemctl status daemon
Failed to get properties: Connection timed out
root@iot2050-debian:
# AVS@[1100 1150 1150]
SIMATIC IOT2050 SE-Boot Version: V01.04.01-0-g629b172b-0x0000
BuildDate: 20240103
SYSFW ABI: 3.1 [version: 21] [21.9.1--v2021.09a (Terrific Lla]

@VikyFlow
Copy link
Author

Hi i've got more logs from the moment it breaks:

Feb 12 11:59:03 iot2050-debian bash[9928]: info: Program[0] [2025-02-12 11:59:03.678] resetted eventid to 0
Feb 12 12:03:17 iot2050-debian bash[444]: warn: Microsoft.AspNetCore.Server.Kestrel[22] As of "02/12/2025 11:03:13 +00:00", the heartbeat has been running for "00:00:01.1319190" which is longer than "00:00:01". This could be caused by thread pool starvation.
Feb 12 12:03:50 iot2050-debian bash[444]: warn: Microsoft.AspNetCore.Server.Kestrel[22] As of "02/12/2025 11:03:17 +00:00", the heartbeat has been running for "00:00:01.2409665" which is longer than "00:00:01". This could be caused by thread pool starvation.
Feb 12 12:04:25 iot2050-debian bash[9928]: Error: eno1-default - no such connection profile.

@jan-kiszka
Copy link
Collaborator

Can you still get the kernel messages (dmesg) when the situation happened? A complete view around that event would be good.

Is ip link reporting the link on the problematic interface as still present? Are you still receiving packets on that line? Does the remote side see your pings, you just don't get its replies?

@VikyFlow
Copy link
Author

VikyFlow commented Feb 13, 2025 via email

@jan-kiszka
Copy link
Collaborator

Ok, if that dump contains the time when the network connection started to fail, then the kernel does not notice this in any way.

How about my other questions?

@VikyFlow
Copy link
Author

VikyFlow commented Feb 14, 2025 via email

@jan-kiszka
Copy link
Collaborator

OK, if pinging works, then it is likely not a networking issue of the OS, rather an application problem.

Can you still reach the 2050 via ssh? Try to narrow down what is still working and what not. In some scenarios (not known for this device, though), only small packets still make it while larger get stuck somewhere. You can test that by increasing the packet size in your pings e.g.

@VikyFlow
Copy link
Author

VikyFlow commented Feb 14, 2025 via email

@jan-kiszka
Copy link
Collaborator

As I wrote above: narrow down the issue, please. The kernel sees no problem, small-size pings still work, but ssh and other app traffic fail - we are missing some piece in that puzzle in between.

@VikyFlow
Copy link
Author

HI,
How can we narrow down the issue?
I want to emphasise that both SSH and the serial connection are not working while my system doesnt work.
I can connect via serial, but it's almost as if I'm not connected because it doesn't respond to commands and is extremely slow, even when typing. However, ping still works on both interfaces.

@jan-kiszka
Copy link
Collaborator

That is a state we haven't seen yet (to my best knowledge).

Is the system under heavy load (top)?

Do interrupts still arrive (/proc/interrupts)? Specifically, are timer events still coming?

How reliably can you reproduce the state? Every variable you can remove from the reproduction pattern may be helpful for eventually having the same on our side.

@VikyFlow
Copy link
Author

VikyFlow commented Feb 18, 2025 via email

@jan-kiszka
Copy link
Collaborator

NetworkManager sees a carrier drop at some point. Does it see a re-establishment as well (your filtering might have suppressed messages)?

In any case, ip link reported the link up in the error state.

Any insights about link state from the switch or the peer (if cross-linked)?

@VikyFlow
Copy link
Author

VikyFlow commented Feb 21, 2025

Hi,

The NetworkManager does detect a carrier drop and subsequently attempts to re-establish the link. The logs show a quick transition from 'unavailable' to 'disconnected' due to 'carrier-changed', followed by an automatic reconnection process where the link state returns to UP and becomes ready.

So I managed to see if maybe if my update runtime version had to do with these problem but actually the same problems reappears but in different log description:

I see this message on the kernel that could be correlated to the issue I'm facing:

Dec 01 14:28:29 iot2050-debian kernel: keystone-pcie 5600000.pcie: Phy link never came up

ndisc[...,"eno1"]: solicit: failure sending router solicitation: Operation not permitted (1)
Could this help? How can I investigate more related to this issue?

``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants