Handle the "DatabaseError: database disk image is malformed" error #7628

kozlovsky · 2023-10-09T12:52:44Z

Fixes #7623, #7037, #5252, and #1993 (the latter is the pre-ORM version of the same problem)

This PR adds handling to the error "DatabaseError: database disk image is malformed". The error is handled in all databases that Tribler has: "metadata.db", "knowledge.db", "bandwidth.db", and "tags.db" (if it presents during the migration). The database "processes.sqlite" is already handled separately.

The main goal of the PR is to restore normal Tribler functionality by deleting the corrupted database file and allowing Tribler to create a new, fresh database file. Instead of deleting the file, we can rename it as an alternative way to handle the issue. Still, I'm not sure it is worth it, as it will take some valuable disk space, and not many users will try to restore information from the corrupted database file. It is possible to extract some data from the corrupted file, but I doubt it is worth it for us to provide such recovery tools.

What the PR does:

It adds tracking of the corresponding error to most places where the error can arise
In case of an error, the file "<db_filename>.is_corrupted" is created near the corresponding database file.
The notification message box about the database corruption fixing is displayed to the user.
The Tribler Core is automatically restarting under the hood, so the user does not need to restart Tibler.

drew2a · 2023-10-10T06:48:51Z

The error is shown to the user in the usual way; it is possible to add a special handling of the error in the GUI.

Based on your description, this shouldn't be presented to the user in the typical manner. Instead, we should display an error to the user stating "Database is malformed" and provide additional details indicating that the database file will be renamed and removed later on. Furthermore, we should prevent these errors from being sent to us.

drew2a

I concur with @synctext's suggestion that we should resolve the issue for the user without needing to restart Tribler.

When a database corruption is detected, we should display a dialog to the user explaining the situation and assuring them that we will automatically address the problem as soon as they press the 'OK' button.

drew2a · 2023-10-10T16:09:12Z

Additionally, please link to any issues that will be resolved once this PR is merged.

kozlovsky · 2023-10-16T13:00:12Z

Turns out the PR fix-db-corruption logic conflicted with the previous Upgrader logic, now I fixed it. Also, I linked the PR with the relevant issues

drew2a

I've reviewed the GUI components and the tests.

I'll proceed with reviewing the core once the Upgrader has been completed.

src/tribler/gui/tribler_window.py

src/tribler/gui/defs.py

src/tribler/core/utilities/tests/test_pony_utils.py

src/tribler/core/utilities/pony_utils.py

… dialog with the database corruption message

kozlovsky · 2023-11-22T13:12:13Z

I discovered that the initially implemented in this PR corruption-handled logic was incomplete. It was implemented by handling the error in specific places of Tribler code, like opening the database connection or calling the cursor.execute() method. To implement this, a complicated code was added to the TriblerDatabase, TriblerSQLiteProvider, and TriblerPool classes that override the standard PonyORM classes for SQLite providers.

As it turns out, the corruption error is unpredictable and can randomly happen in multiple other places, like cursor.fetchone(), cursor.__next__(), connection.commit(), etc. It is unfeasible to handle all these places in the Tribler code explicitly. For that reason, I completely reimplemented the corruption-handling logic.

Now, I added a new sqlite_utils module that can replace the native sqlite3 module completely. It provides improved versions of the Connection and Cursor classes that handle the database corruption error in all places where the error can arise.

Now, by using the sqlite_utils module instead of the base sqlite3 module, it becomes possible to remove the error-handing logic from many places of Tribler code and make the code more straightforward, as the detection of the database corruption is concentrated in a single place, the sqlite_util module. Significant code was removed from the pony_utils module, and the TriblerPool class is no longer necessary. Now, using context manager in Tribler code to handle the database corruption error is not necessary as the low-level connection and cursor methods handle it.

Also, some Core-GUI interaction problems were fixed, and GUI now correctly re-connects to the restarted Core in all cases.

src/tribler/gui/tribler_window.py

src/tribler/core/components/component.py

src/tribler/core/components/reporter/exception_handler.py

drew2a · 2023-11-22T15:25:33Z

src/tribler/core/components/component.py

+            if isinstance(e, DatabaseIsCorrupted):
+                # When the database corruption is detected, we should stop the process immediately.
+                # Tribler GUI will restart the process and the database will be recreated.
+                process_manager = get_global_process_manager()
+                process_manager.sys_exit(EXITCODE_DATABASE_IS_CORRUPTED, e)


Could be removed?

Currently, it is necessary. The actual reason is a bit complex to explain:

An error that is raised during the component's startup is wrapped in ComponentStartupException. Then, it is handled in a special way:

def _reraise_startup_exception_in_separate_task(self): self.logger.info('Reraise startup exception in separate task') async def exception_reraiser(): self.logger.info('Exception reraiser') e = self._startup_exception if isinstance(e, ComponentStartupException) and e.component.tribler_should_stop_on_component_error: self.logger.info('Shutdown with exit code 1') self.exit_code = 1 self.shutdown_event.set() # the exception should be intercepted by event loop exception handler self.logger.info(f'Reraise startup exception: {self._startup_exception}') raise self._startup_exception self.async_group.add_task(exception_reraiser())

This logic triggers a weird asyncio behavior described in https://togithub.com/python/cpython/issues/69675. In short, if CoreExceptionHandler.unhandled_error_observer is called during the Tribler Core shutdown, the SystemExit error raised from unhandled_error_observer can be suppressed and ignored by asyncio. Apparently, unhandled_error_observer can be called inside the Task.__del__() method that ignores all re-raised exceptions, including the SystemExit error. As a result, Tribler Core finishes with exit code 1 (instead of code 99 dedicated to the database corruption error), and Tribler GUI does not restart the core.

Fixing and refactoring the logic of unhandled_error_observer can be a tricky and complicated task that is outside of the current PR scope. On the other side, raising SystemExit directly from the component's method works well and provides the desired results.

Also, raising SystemExit in the component allows TriblerCore to restart without waiting first until all components start.

Could you please put this explanation into the code?

src/tribler/core/utilities/sqlite_utils.py

src/tribler/core/utilities/tests/test_sqlite.py

src/tribler/gui/event_request_manager.py

drew2a · 2023-11-22T16:26:37Z

We discussed the requested changes with @kozlovsky in person.

drew2a

Looks good. Please, add a comment about necessity of process_manager.sys_exit() in component.py.

kozlovsky changed the title ~~DatabaseError: database disk image is malformed~~ Handle the "DatabaseError: database disk image is malformed" error Oct 9, 2023

kozlovsky force-pushed the fix/database_is_malformed branch 5 times, most recently from dec4465 to 0526d7e Compare October 10, 2023 06:28

kozlovsky marked this pull request as ready for review October 10, 2023 06:28

kozlovsky requested review from a team and drew2a and removed request for a team October 10, 2023 06:28

kozlovsky mentioned this pull request Oct 10, 2023

[7.13.0] TypeError: 'NoneType' object is not iterable #7623

Closed

drew2a suggested changes Oct 10, 2023

View reviewed changes

kozlovsky force-pushed the fix/database_is_malformed branch 7 times, most recently from fada7ac to 394d7f6 Compare October 16, 2023 12:55

kozlovsky requested a review from drew2a October 16, 2023 12:58

kozlovsky force-pushed the fix/database_is_malformed branch from 394d7f6 to 5bdf408 Compare October 16, 2023 14:41

drew2a suggested changes Oct 18, 2023

View reviewed changes

src/tribler/gui/tribler_window.py Outdated Show resolved Hide resolved

src/tribler/gui/defs.py Outdated Show resolved Hide resolved

src/tribler/core/utilities/tests/test_pony_utils.py Outdated Show resolved Hide resolved

drew2a reviewed Oct 18, 2023

View reviewed changes

src/tribler/core/utilities/pony_utils.py Show resolved Hide resolved

drew2a reviewed Oct 18, 2023

View reviewed changes

src/tribler/core/utilities/pony_utils.py Outdated Show resolved Hide resolved

drew2a reviewed Oct 18, 2023

View reviewed changes

src/tribler/core/utilities/pony_utils.py Outdated Show resolved Hide resolved

drew2a reviewed Oct 18, 2023

View reviewed changes

src/tribler/core/utilities/pony_utils.py Show resolved Hide resolved

drew2a reviewed Oct 18, 2023

View reviewed changes

src/tribler/core/utilities/pony_utils.py Outdated Show resolved Hide resolved

kozlovsky added 5 commits November 20, 2023 16:03

CoreManager restarts Core in case of database corruption; use unified…

8a134c8

… dialog with the database corruption message

Satisfy linter

51b3d7b

Add exit_codes.py

8d5b64d

Fix process_compressed_mdblob_threaded

a859041

Add a comment and tests to @catch_db_is_corrupted_exception decorator

8924ca4

kozlovsky force-pushed the fix/database_is_malformed branch from d14792e to b34559b Compare November 22, 2023 12:44

kozlovsky commented Nov 22, 2023

View reviewed changes

src/tribler/gui/tribler_window.py Show resolved Hide resolved

drew2a self-requested a review November 22, 2023 14:51