Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semaphore overflow #2424

Open
niclar opened this issue Feb 26, 2025 · 1 comment
Open

Semaphore overflow #2424

niclar opened this issue Feb 26, 2025 · 1 comment

Comments

@niclar
Copy link
Contributor

niclar commented Feb 26, 2025

Hi we just experienced a "Semaphore overflow" doing a lot of req/reps using the waitset exclusively.

Might this be a waitset/timedwait livelock ?

-Do you have any pointers as to why this may occur, or where we should start looking.. any help is greatly appreciated.

Required information

Operating system:
Ubuntu 24.04.1 LTS

Compiler version:
clang version 19.1.3

Eclipse iceoryx version:
f33d582

Observed result or behaviour:
A process terminates with "Semaphore overflow"

Expected result or behaviour:
no error

Conditions where it occurred / Performed steps:
a lot of req/reps where the terminating module is the server part.

Additional helpful information

ROUDI module:

^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Iceoryx constants is:
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_PUBLISHERS = 4096
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_SUBSCRIBERS = 8192
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_SERVER = 2048
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_CLIENT = 4096
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_SUBSCRIBERS = 8192
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_SUBSCRIBERS_PER_PUBLISHER = 64
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_CHUNKS_ALLOCATED_PER_PUBLISHER_SIMULTANEOUSLY = 16
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_CHUNKS_HELD_PER_SUBSCRIBER_SIMULTANEOUSLY = 256
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_CLIENTS_PER_SERVER = 512
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: IOX_MAX_NUMBER_OF_NOTIFIERS = 1024

^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: RouDi config is:
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Domain ID = 2
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Unique RouDi ID = 0
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Monitoring Mode = MonitoringMode::ON
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Shares Address Space With Applications = false
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Process Termination Delay = 0s 0ns
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Process Kill Delay = 45s 0ns
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Compatibility Check Level = CompatibilityCheckLevel::PATCH
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Introspection Chunk Count = 10
^[[0;90m2025-02-11 17:50:58.953 ^[[0;1;36m[Trace]^[[m: Discovery Chunk Count = 10

2025-02-26 10:32:26.665 [Debug]: Destroy server port from runtime 'persistence.tst' and with service description 'Service: Persistance, Instance: tst, Event: Core'
2025-02-26 10:32:26.665 [Debug]: Destroy server port from runtime 'persistence.tst' and with service description 'Service: Persistance, Instance: tst, Event: ReferenceData'
2025-02-26 10:32:26.665 [Debug]: Destroy server port from runtime 'persistence.tst' and with service description 'Service: ReferencePrice, Instance: tst, Event: Pricing_Persistance'
2025-02-26 10:32:26.665 [Debug]: Destroy server port from runtime 'persistence.tst' and with service description 'Service: Fee, Instance: tst, Event: Pricing_Persistance'
2025-02-26 10:32:26.665 [Debug]: Destroy server port from runtime 'persistence.tst' and with service description 'Service: CoinEntitlement, Instance: tst, Event: Pricing_Persistance'
2025-02-26 10:32:26.665 [Error]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_hoofs/posix/sync/source/mutex.cpp:331 { expected<void, LockError> iox::mutex::lock_impl() -> iox_pthread_mutex_lock } ::: [ 130 ] Owner died
2025-02-26 10:32:26.665 [Error]: The thread/process which owned the mutex died. The mutex is now in an inconsistent state and must be put into a consistent state again with Mutex::make_consistent()
2025-02-26 10:32:26.665 [Fatal]: Locking of an inter-process mutex failed! This indicates that the application holding the lock was terminated or the resources were cleaned up by RouDi due to an unresponsive application.
2025-02-26 10:32:26.665 [Fatal]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_posh/source/popo/building_blocks/locking_policy.cpp:42 [Fatal Error] [POPO__CHUNK_LOCKING_ERROR (code = 61)] in module [iceoryx_posh (id = 2)]
2025-02-26 10:32:26.665 [Fatal]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_posh/source/popo/building_blocks/locking_policy.cpp:42 [PANIC]

"PERSISTENCE" (req/rep server) module:

2025-02-26 10:31:26.724 [Error]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_hoofs/posix/sync/source/semaphore_helper.cpp:48 { expected<void, SemaphoreError> iox::detail::sem_post(iox_sem_t *) -> iox_sem_post } ::: [ 75 ] Value too large for defined data type
2025-02-26 10:31:26.724 [Error]: Semaphore overflow. The maximum value of 2147483647 would be exceeded.
2025-02-26 10:31:26.724 [Fatal]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_posh/source/popo/building_blocks/condition_notifier.cpp:44 [Fatal Error] [POPO__CONDITION_NOTIFIER_SEMAPHORE_CORRUPT_IN_NOTIFY (code = 78)] in module [iceoryx_posh (id = 2)]
2025-02-26 10:31:26.724 [Fatal]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_posh/source/popo/building_blocks/condition_notifier.cpp:44 [PANIC]

"PRICING" (req/rep client) module

2025-02-26 10:31:28.130 [Error]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_hoofs/posix/sync/source/semaphore_helper.cpp:48 { expected<void, SemaphoreError> iox::detail::sem_post(iox_sem_t *) -> iox_sem_post } ::: [ 75 ] Value too large for defined data type
2025-02-26 10:31:28.130 [Error]: Semaphore overflow. The maximum value of 2147483647 would be exceeded.
2025-02-26 10:31:28.130 [Fatal]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_posh/source/popo/building_blocks/condition_notifier.cpp:44 [Fatal Error] [POPO__CONDITION_NOTIFIER_SEMAPHORE_CORRUPT_IN_NOTIFY (code = 78)] in module [iceoryx_posh (id = 2)]
2025-02-26 10:31:28.130 [Fatal]: /mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/7661b261a7-36a7c943b7.clean/iceoryx_posh/source/popo/building_blocks/condition_notifier.cpp:44 [PANIC]
catena@cts1:/opt/vts/log/tst/pricing$

@elfenpiff
Copy link
Contributor

@niclar It seems like a deadlock/livelock problem.

The semaphore should only overflow when one side (publisher) is sending messages and the other side is attached to a waitset but never uses the waitset to wait for messages. Then the semaphore inside the waitset is incremented but never decremented by a WaitSet::wait() or WaitSet::timed_wait() call.
Or there is some kind of deadlock occurring so that those calls are never called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants