-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erratic behaviour with some DB populations #1558
Comments
The answer to http://stackoverflow.com/questions/33945987/thread-safeness-at-mongodb-c-driver-regarding-indirect-connection-usage-throug/33977828#33977828 confirms that cursors cannot be use in the way we do currently (at 0.25.0). PR #1564 solves this, although it is not yet sure if that suffices to solve this issue or there is something else wrong that need to be fixed. |
The CB developed in PR #1570 seems to work in orion.lab in a right way. It has been running 5 hours without crashes, without relevant ERRORs in the log, with a
However, before definitively close this issue, we will hold a "quarentine" for it. Migrated to 0.27.0 milestone, it will be closed at the end of the 0.27.0 development cycle if no new problem related with this is found. (Lowering priority to P5). |
It is seems pretty stable, after more than 24 days of uninterrupted operaetion at orion.lab.fiware.org:
Thus, moving back to 0.26.0 milestone and closing. |
We have found at the end of 0.26.0 development cycle (i.e. 0.26.0 release candidates are on the way and the version is almost closed) some problems that makes CB behave in a weird way with some DB populations. We will use this issue to summarize what we know and possible research lines or potential solutions.
Facts
When the DB is populated in an specific way (ask to @fgalan for orion-evil-dump.tgz dump) CB behaves in a weird way. The behaviour depends if
-subCacheIval
is set to 0 (i.e. only one refresh is done, at startup) or a different value (i.e. refresh are done in a periodic way).In the case of no periodic refresh, the CB seems to be stable (we have seen it work without fail during days at orion.lab) although some weird things occur:
In the case of periodic refresh, the CB crashes after a while (using
-subCacheIval 5
it uses to crash in less than 20 min in orion.lab). The core backtrace information shows exceptions (most of the time of assertion type) related with MongoDB C++ driver methods.Theories
We don't know the actual cause of the problem yet, we only have some theories that need to be validaded.
One possible theory is related with incorrect usage of cursors due to thread safeness (more details here: http://stackoverflow.com/questions/33945987/thread-safeness-at-mongodb-c-driver-regarding-indirect-connection-usage-throug). If a cursor is corrupted that would explain why assertion in MongoDB C++ methods (such the get*Field family or more()) fail. It can also explain why the item count in the cache doesn't match the actual one at DB.
How to validate this theory:
-reqMutexPolicy all
. That will force all cursor usage in non-concurrent mode (as there aren't cursor usages outise mongoBackend and the "all" req mutex policy ensures that as much one thread at a time is executing code in the mongoBackend).More info to check: how IoTAgent C++ is using cursors:
Another things that we could test
The text was updated successfully, but these errors were encountered: