-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UBsan warning from ClientHistory::update_from_ref_and_version in client reset tests #7041
Comments
that seems to be more of a core issue than sync, but we'll have a look in the sync team too. |
During the same test on
Could this assertion failure - |
I don't think they are linked. The assertion you point to would fail if the client never connected to the server. |
So, now there is an assertion in place of this failure, which so only has been observed on linux builders, which sadly doesn't print the symbols for the stacktrace as was shown under #7101. I've put some changes under #7109 to restore symbols partially, but it's not a solution. We should be using addr2line on ci artifacts to restore the stacktrace for failed builds really. |
So the linked pr did catch the failure with the stack trace once Test log, assertion, stacktrace
but it doesn't provide much new info from what i can tell. Essentially the failure happens during second open of main realm after client reset. Is there anything obviously suspicious in test log above? |
I looked a bit at the logs and found something odd. The commit producing client version 18 has ref 11704, but the assertion logs 10120. Is that expected? |
well it's a different ref - history ref, and not a new top level commit ref. So, i'd say yes, it's expected in logs. |
I think i know how this has happened - it can also be seen in the original CI failure log:
sync session received the download message on its thread 140615787591424 and started to integrate it, meanwhile the test itself on main thread 140615787603968 starts opening the realm and updating schema. Simplified stacktrace of the assertion:
So the Transaction::internal_advance_read doesn't hit shortcut path due to updated history and tries to call ClientHistory::update_from_ref_and_version which hits the assertion. The problem is that during promote_to_write the group is set through Replication::initiate_transact, which is called only after Transaction::internal_advance_read. |
One more occurence with realm-core-14.3.0 on the same test: ci run test log
|
@kiburtse Looking at the last logs, it seems the crash is after the download message is integrated. It could be that opening the realm was blocked until the integration was done, or came immediately after. We could try to time it like that and see if we can reproduce it. |
yeah, from what i can tell it also matches with my last comment here explaining how it happens. I've uploaded proposed straight forward fix, but do you have an idea how we could test it? I haven't found the way so far to trigger this exact situation in test. |
I may have a way. My theory is that opening the realm being blocked is the interesting case. We need to keep that transaction open until update_schema blocks. We have some callbacks invoked from that transaction (IIRC) and so with the help of a promise we can prevent the commit until the other thread blocks. |
That stack trace suggests that the very first time that a Transaction is being promoted to write for the ClientReplication instance it's advancing the read version. That's a normal and expected thing in multi-process usage, but should be impossible when only a single process is involved. We'll need to fix this for multi-process sync regardless, but being able to hit this currently points at another bug elsewhere. |
why should it be impossible in single process? |
In order for there to be a new version to advance to, someone must have committed a write transaction, which requires having begun a write transaction. Therefore if beginning a write advances the read transaction, it cannot be the first time a write has been begun in that session. When multiple DB instances are involved the write producing the new version could have been made via a different DB instance, but there should only be one DB instance per file per process. |
Another instance here |
during nightly build baas-integration-tests under ubuntu2004-ubsan reported an issue.
Core version
Core version: 13.22.0
Relevant lines from the log file:
so the test must be next after: 'flx: client reset' -c 'Recover: schema indexes match in before and after states'
The stack trace is missing from ubsan report sadly, but from what i can tell it can only happen from this code path
The text was updated successfully, but these errors were encountered: