-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16184. De-flake TestBlockScanner#testSkipRecentAccessFile #3329
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Can you update the description with the reason for failure and detail about the fix |
Done @ayushtkn. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanx @virajjasani for the fix, I tried reproducing this with a simple sleep, and got the same exception, and with the your fix that got sorted.
Changes LGTM
Will wait some time for Takanobu before pushing this.
Thanks @ayushtkn for detailed review with cross-verification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice fix, @virajjasani. LGTM.
Also, it's good to see two consecutive +1s from QA bot for hadoop-hdfs. Haven't seen more than single +1 from QA build for long time on trunk. |
Merged. Thanks for your contribution, @virajjasani. Thanks for your review, @ayushtkn.
That's true. QA results seem pretty good recently. Really thanks for fixing several flaky tests, @virajjasani. |
Reviewed-by: Ayush Saxena <[email protected]> Signed-off-by: Takanobu Asanuma <[email protected]> (cherry picked from commit 1b9927a)
Thanks @tasanuma for all the reviews!! |
…e#3329) Reviewed-by: Ayush Saxena <[email protected]> Signed-off-by: Takanobu Asanuma <[email protected]>
Description of PR
Test TestBlockScanner#testSkipRecentAccessFile is flaky:
Reason for failure:
Thread in VolumnScanner keeps scanning blocks. Before block scan, it executes
TestScanResultHandler#setup
and it keeps waiting oninfo
object until main thread intestSkipRecentAccessFile()
notifiesinfo
object and sets it'sshouldRun
to true for VolumnScanner thread to successfully return fromTestScanResultHandler#setup
. Now in main test, we try to assert thatinfo.blocksScanned
stays 0 which is only possible if VolumnScanner's block scanner thread does not reachinfo.blocksScanned++
point, which cannot be guaranteed always and hence this test is flaky.Fix:
In order to fix this, the main thread in
testSkipRecentAccessFile()
should initializeinfo.sem
as Semaphore (permitting only single thread to take a lock, Mutex) and take a lock by main thread so that until released, block scan thread cannot reach pointinfo.blocksScanned++
and hence the test never fails. Before the block scanner thread reach the point of incrementing blocksScanned count, it has to acquire Semaphore and it gets blocked here until main thread releases the lock.How was this patch tested?
Unit tests