Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer is locked and does not work #479

Closed
viordash opened this issue Aug 19, 2022 · 11 comments
Closed

Timer is locked and does not work #479

viordash opened this issue Aug 19, 2022 · 11 comments

Comments

@viordash
Copy link
Contributor

Hi!
The timer does not run if it is started between the re_thread_enter() and re_thread_leave() functions.
Code to reproduce the issue, baresip-webrtc application, calling the CalledFromOtherThread method from another thread:

struct tmr tmr = {};

static void timeout_handler(void *arg) {
	void * tag = arg;
	printf("                          ********timeout_handler, tag:%p, t:%lu\n", tag, tmr_jiffies());
}

static void CalledFromOtherThread(void * tag) {
	printf("                          ---test 0, tag:%p, t:%lu\n", tag, tmr_jiffies());
	tmr_init(&tmr);
	printf("                          ---test 1, tag:%p, t:%lu\n", tag, tmr_jiffies());
	re_thread_enter();
	printf("                          ---test 2, tag:%p, t:%lu\n", tag, tmr_jiffies());

	tmr_start(&tmr, 100, timeout_handler, tag);

	printf("                          ---test 3, tag:%p, t:%lu\n", tag, tmr_jiffies());
	re_thread_leave();
	printf("                          ---test 4, tag:%p, t:%lu\n", tag, tmr_jiffies());
	sys_msleep(2000);
	printf("                          ---test 5, tag:%p, t:%lu\n", tag, tmr_jiffies());
}

static void *test_task(void *argument) {
	sys_msleep(1000);

	printf("                          --------from task, tag:%p, t:%lu\n", argument, tmr_jiffies());
	CalledFromOtherThread(argument);
	pthread_exit(NULL);
}

log output:

demo: listening on:
    http://192.168.2.156:9000/
    https://192.168.2.156:9001/
                          --------from main, tag:0x10ca1, t:700865909
                          ---test 0, tag:0x10ca1, t:700865909
                          ---test 1, tag:0x10ca1, t:700865909
                          ---test 2, tag:0x10ca1, t:700865909
                          ---test 3, tag:0x10ca1, t:700865909
                          ---test 4, tag:0x10ca1, t:700865909
                          ---test 5, tag:0x10ca1, t:700867911
                          ********timeout_handler, tag:0x10ca1, t:700867915
                          --------from task, tag:0x42, t:700868914
                          ---test 0, tag:0x42, t:700868914
                          ---test 1, tag:0x42, t:700868914
                          ---test 2, tag:0x42, t:700868914
                          ---test 3, tag:0x42, t:700868914
                          ---test 4, tag:0x42, t:700868914
                          ---test 5, tag:0x42, t:700870916
                          ********timeout_handler, tag:0x42, t:700877708
demo: request: met=POST, path=/connect, prm=
demo: create session

As you can see from the logs, ********timeout_handler, tag:0x42, t:700877708 happened after 7 seconds, those only after I made a request to http://192.168.2.156:9000/ in the browser. As I understand poll in re/src/main/main.c:736 received an event and then my timer fired.

I came across this error while trying to test baresip-webrtc with another session initiation protocol. And I found out that repeated peerconnection_new, which is called from another thread, does not trigger peerconnection_gather_handler. Since dns query uses data from query_cache and query_handler is then called via a timer.

Is this a real problem with the timer? Or am I not using baresip correctly with multiple threads?

@sreimers
Copy link
Member

At a first look your test has a race condition. CalledFromOtherThread is called with sys_msleep(1000); by the thread and so its very likely called after the application enters re_main. if no other tmr is active at this time fd_poll waits until a event occurs.

@viordash
Copy link
Contributor Author

sys_msleep(1000) is added on purpose so that CalledFromOtherThread is called after re_main has started, to indicate that there is a problem with tmr_start.

@viordash
Copy link
Contributor Author

to fix this, I'll use mqueue for now.
But it seems to me that it would be correct to add an internal pipe to immediately unblock fd_poll after calling re_thread_leave. If you agree, then I will prepare a PR

@viordash
Copy link
Contributor Author

#480 is a draft PR

@sreimers
Copy link
Member

thank you for figuring this out, I think this could be handled by dummy async event like so:

diff --git a/src/main/main.c b/src/main/main.c
index 3584186d..58b76ff7 100644
--- a/src/main/main.c
+++ b/src/main/main.c
@@ -1354,6 +1354,9 @@ void re_thread_leave(void)
        }
 
        re_atomic_rlx_set(&re->thread_enter, false);
+       /* Dummy async event, to ensure timers are properly handled */
+       if (re->async)
+               re_thread_async(NULL, NULL, NULL);
        re_unlock(re);
 }

@viordash
Copy link
Contributor Author

Your solution is better.
I think these lines should be placed before
re_atomic_rlx_set(&re->thread_enter, false);
to pass the re_thread_check() check in the re_thread_async method

@viordash
Copy link
Contributor Author

when implementing this patch, it turned out that it was not needed, re_thread_async_init (ASYNC_WORKERS) creates a periodic timer for 100ms, worker_check, which solves the problem with the timer blocking.
In newer versions of baresip, will the re_thread_async_init(ASYNC_WORKERS) code be permanently present in the main?

@sreimers
Copy link
Member

The periodic worker_check timer is not reliable since this is mostly for debugging (maybe disabled in future RELEASE versions) and the value can change.

@viordash
Copy link
Contributor Author

viordash commented Aug 21, 2022

The PR is ready, but I'm afraid this fix depends on worker_check.
Increasing the period of this timer affects the speed of execution of timers that were affected in another thread. I think re_thread_async(NULL, NULL, NULL) doesn't cause the events poll methods to exit immediately

@sreimers
Copy link
Member

Is your branch up to date? I made the args optional with this commit: 8ca18ff

@viordash
Copy link
Contributor Author

Branch update solved the problem. The other thread's timers fire immediately and independently.
PR is now actual.
Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants