-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle the "DatabaseError: database disk image is malformed" error #7628
Handle the "DatabaseError: database disk image is malformed" error #7628
Conversation
dec4465
to
0526d7e
Compare
Based on your description, this shouldn't be presented to the user in the typical manner. Instead, we should display an error to the user stating "Database is malformed" and provide additional details indicating that the database file will be renamed and removed later on. Furthermore, we should prevent these errors from being sent to us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur with @synctext's suggestion that we should resolve the issue for the user without needing to restart Tribler.
When a database corruption is detected, we should display a dialog to the user explaining the situation and assuring them that we will automatically address the problem as soon as they press the 'OK' button.
Additionally, please link to any issues that will be resolved once this PR is merged. |
fada7ac
to
394d7f6
Compare
Turns out the PR fix-db-corruption logic conflicted with the previous Upgrader logic, now I fixed it. Also, I linked the PR with the relevant issues |
394d7f6
to
5bdf408
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the GUI components and the tests.
I'll proceed with reviewing the core once the Upgrader has been completed.
d14792e
to
b34559b
Compare
I discovered that the initially implemented in this PR corruption-handled logic was incomplete. It was implemented by handling the error in specific places of Tribler code, like opening the database connection or calling the As it turns out, the corruption error is unpredictable and can randomly happen in multiple other places, like Now, I added a new Now, by using the Also, some Core-GUI interaction problems were fixed, and GUI now correctly re-connects to the restarted Core in all cases. |
if isinstance(e, DatabaseIsCorrupted): | ||
# When the database corruption is detected, we should stop the process immediately. | ||
# Tribler GUI will restart the process and the database will be recreated. | ||
process_manager = get_global_process_manager() | ||
process_manager.sys_exit(EXITCODE_DATABASE_IS_CORRUPTED, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, it is necessary. The actual reason is a bit complex to explain:
An error that is raised during the component's startup is wrapped in ComponentStartupException
. Then, it is handled in a special way:
def _reraise_startup_exception_in_separate_task(self):
self.logger.info('Reraise startup exception in separate task')
async def exception_reraiser():
self.logger.info('Exception reraiser')
e = self._startup_exception
if isinstance(e, ComponentStartupException) and e.component.tribler_should_stop_on_component_error:
self.logger.info('Shutdown with exit code 1')
self.exit_code = 1
self.shutdown_event.set()
# the exception should be intercepted by event loop exception handler
self.logger.info(f'Reraise startup exception: {self._startup_exception}')
raise self._startup_exception
self.async_group.add_task(exception_reraiser())
This logic triggers a weird asyncio behavior described in https://togithub.com/python/cpython/issues/69675. In short, if CoreExceptionHandler.unhandled_error_observer
is called during the Tribler Core shutdown, the SystemExit
error raised from unhandled_error_observer
can be suppressed and ignored by asyncio. Apparently, unhandled_error_observer
can be called inside the Task.__del__()
method that ignores all re-raised exceptions, including the SystemExit
error. As a result, Tribler Core finishes with exit code 1 (instead of code 99 dedicated to the database corruption error), and Tribler GUI does not restart the core.
Fixing and refactoring the logic of unhandled_error_observer
can be a tricky and complicated task that is outside of the current PR scope. On the other side, raising SystemExit
directly from the component's method works well and provides the desired results.
Also, raising SystemExit
in the component allows TriblerCore to restart without waiting first until all components start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please put this explanation into the code?
We discussed the requested changes with @kozlovsky in person. |
c3a2160
to
e31526d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Please, add a comment about necessity of process_manager.sys_exit()
in component.py.
386aba5
to
5c52cb1
Compare
Fixes #7623, #7037, #5252, and #1993 (the latter is the pre-ORM version of the same problem)
This PR adds handling to the error "DatabaseError: database disk image is malformed". The error is handled in all databases that Tribler has: "metadata.db", "knowledge.db", "bandwidth.db", and "tags.db" (if it presents during the migration). The database "processes.sqlite" is already handled separately.
The main goal of the PR is to restore normal Tribler functionality by deleting the corrupted database file and allowing Tribler to create a new, fresh database file. Instead of deleting the file, we can rename it as an alternative way to handle the issue. Still, I'm not sure it is worth it, as it will take some valuable disk space, and not many users will try to restore information from the corrupted database file. It is possible to extract some data from the corrupted file, but I doubt it is worth it for us to provide such recovery tools.
What the PR does: