-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround random test_suite_platform
fail in time test
#7419
Workaround random test_suite_platform
fail in time test
#7419
Conversation
Signed-off-by: Jerry Yu <[email protected]>
Signed-off-by: Jerry Yu <[email protected]>
dfb63fa
to
c9c3e62
Compare
test_suite_platform
fail
Signed-off-by: Jerry Yu <[email protected]>
398386d
to
2f1e85f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -84,7 +84,16 @@ void time_delay_seconds(int delay_secs) | |||
sleep_ms(delay_secs * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need these tests?
Historically, we had somewhat similar tests in the timing
module. They often failed on the CI and so we ended up removing them. I fear that we've reintroduced the problem, and this pull request is just one of many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Should I remove these tests here? Or another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delay test were removed.
I think we should keep *get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely removing the tests may be too much. How about keeping the calls to the functions (so at least we know they don't e.g. crash), but not testing delays?
Possibly even check that t1 = ms_time(); sleep(small); t2 = ms_time(); ASSERT(t2 > t1)
. @yuhaoth Can you clarify
Built-in mbedtls_time function returns the number of seconds since the
Epoch. That is affected by discontinuous jumps and cause test fail.
Workaround it with 1 seconds tollerance.
Epoch. That is affected by discontinuous jumps. Andnanosleep
use
CLOCK_MONOTONIC(monotonically-increasing time source), That will cause
negative elapsed time difference.
What can cause a negative elapsed time difference? E.g. Can this happen from automatic drift adjustment? I would have expected that t2 - t1
might be less than small
, but not negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen from automatic drift adjustment?
It happens only in time_delay_seconds
because time source are different. Built-in mbedtls_time
was defined as standard time function, the time source is CLOCK_REALTIME
. And nanosleep take CLOCK_MONOTONIC
as time source.
If CLOCK_MONOTONIC is faster than CLOCK_REALTIME and CLOCK_REALTIME was adjusted during sleep, sometime t2 - t1 < small
happens.
And I think time_delay_milliseconds
should not be removed now :) . It use same time source.
Possibly even check that
t1 = ms_time(); sleep(small); t2 = ms_time(); ASSERT(t2 > t1).
This check can not resolve the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just update the comments. But I did not mention automatic drift adjustment
. I think that's enough.
And I revert last commit without Time: delay seconds
test
See Mbed-TLS#1517. They often failed on the CI. Signed-off-by: Jerry Yu <[email protected]>
1d7ddfb
to
4852bb8
Compare
usleep(milliseconds * 1000); | ||
#endif | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI says this endif needs to stay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fixed
Signed-off-by: Jerry Yu <[email protected]>
The test has some issues we can not avoid. Put it in code to avoid it is re-inroduced again Signed-off-by: Jerry Yu <[email protected]>
* CLOCK_REALTIME and returns the number of seconds since the Epoch. And | ||
* `nanosleep` uses CLOCK_MONOTONIC. The time sources are out of sync. | ||
* | ||
* If CLOCK_MONOTONIC is faster than CLOCK_REALTIME and `nanosleep` exits at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely it's not really about whether one clock is "faster" than the other (I would expect them to tick at the same rate all other things being equal) but if one is adjusted and the other not - which is what happens when one is a "wallclock" timer and the other is a monotonic timer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's more easily understand. I will change that.
But I think this problem can be abstracted into two different rate clock problems. The wall clock is come from remote and another one come from local. Due to implementation issues,wall clock shows discontinue jumps problem. If wall clock is updated very frequently,discontinue jumps will disappear. And user will get two different rate clocks.
I would expect them to tick at the same rate all other things being equal
It should not be expected, CPU monotonic clock source come from crystal oscillator with PLL. They are not high precision. And due to many reason,it might run faster or slower than standard time. That's why we need NTP service to adjust the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wall clock is come from remote
I don't understand. "Wall clock" in this context means that this particular timer from the kernel should match as closely as possible to the time that someone looking at their watch would see. So time may be stepped forwards or backwards as daylight savings happens (for example). A monotonic clock must by definition only ever increase.
The ticks that advance these clocks come from local hardware, which may not be precise. So there is frequency adjustment, which should affect all clocks, so that when each clock says "one second has passed" as close to one second has possible has actually passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Wall clock" in this context means that this particular timer from the kernel should match as closely as possible to the time that someone looking at their watch would see.
I mean it appears to be from a remote server if updated fast enough.
So time may be stepped forwards or backwards as daylight savings happens (for example). A monotonic clock must by definition only ever increase.
I do not think daylight savings
will affect the value of time()
, for it is the number of seconds since the Epoch
.
which should affect all clocks
No. CLOCK_BOOTTIME
and CLOCK_*_CPUTIME_ID
will not be affected by time adjustment. CLOCK_MONOTONIC is affected by the incremental adjustments performed
, that's different with CLOCK_REALTIME.
If decreasing adjustment peformed, CLOCK_REALTIME will change and CLOCK_MONOTONIC will not change.
Signed-off-by: Jerry Yu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'm not sure I like time_delay_milliseconds
as it is, but I don't want to increase the scope of this pull request so I am not requesting any changes there. The priority is to avoid random failures.
@@ -76,6 +74,13 @@ void time_delay_milliseconds(int delay_ms) | |||
/* END_CASE */ | |||
|
|||
/* BEGIN_CASE depends_on:MBEDTLS_HAVE_TIME */ | |||
|
|||
/* | |||
* WARNING: DONOT ENABLE THIS TEST. RESERVE IT HERE TO KEEP THE REASON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: English:
* WARNING: DONOT ENABLE THIS TEST. RESERVE IT HERE TO KEEP THE REASON. | |
* WARNING: DO NOT ENABLE THIS TEST. We keep the code here to document the reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, could this change be made, then I will approve
Signed-off-by: Jerry Yu <[email protected]>
test_suite_platform
failtest_suite_platform
fail in time test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving this because I'd like the random failues to stop. I'm not fully happy with keeping dead code, but we can remove it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@xkqian If you're the second person approving this PR, could you set the |
Sorry, I thought maybe also need your approval and forgot to ping you @tom-cosgrove-arm . I will take care next time. Thanks. |
Description
We got random
test_suite_platform
fails in CI testsThe CI reports with this PR
I think it is due to
time(time_t*)
returns Epoch time which is discontinuous.Gatekeeper checklist
Notes for the submitter
Please refer to the contributing guidelines, especially the
checklist for PR contributors.