-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use whitelist table in gravity.db directly #600
Use whitelist table in gravity.db directly #600
Conversation
… domains inside FTL itself. Signed-off-by: DL6ER <[email protected]>
… should prevent the database from staying in a locked state after the first query. Signed-off-by: DL6ER <[email protected]>
Signed-off-by: DL6ER <[email protected]>
…ase is busy. Signed-off-by: DL6ER <[email protected]>
Is it possible to avoid locking the entire database by using WAL? https://www.sqlite.org/draft/wal.html |
I looked at WAL before, however, I'm not sure whether it is a good idea mainly due to the following limitation (quoting the documentation):
|
…DB_count(). Return early in gravityDB_close() if the database is not available. Signed-off-by: DL6ER <[email protected]>
Looking at the SQLite locking states (https://www.sqlite.org/draft/lockingv3.html), it seems that as long as FTL holds a read-only connection to the database, gravity will not be able to acquire a write lock. How does gravity acquire the write lock when FTL is holding a read lock? |
The two relevant locks are
When FTL reads from the database (performs the prepared statement), it requests a There are two possible situations:
While case 2 seems to be the best we can do, it looks like we should add something like
to the |
Why not use 2 sqlite databases? Or will maintaining that be a pain in the code? |
We are already using multiple SQLite databases, but in this case the data is closely related. The core issue is SQLite's concurrency, and it will probably show up again even if we try to avoid it now. |
My only concern before approving this PR is the behavior when the database is locked. Could we put a short timeout on it, so there is a possibility that we can avoid that edge case? Perhaps something around 100-500ms. Even if gravity takes longer, this would avoid cases where there may be small updates to the database (taking only milliseconds) but FTL wants to read the whitelist. |
I do not feel comfortable with adding a delay so far as this might severely slow down FTL when users use massive lists (6M+) on a low end device where gravity might need up to twenty (!) minutes to store the domains. If we'd add only 100msec of possible delay this can reduce the response rate to at most 10/s. Im really going back and forth on this, if you still find this a good compromise, you might be able to convince me to add some small delay. However, I consider 100ms to be really already at the upper limit of what should be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response rate issue is big enough that I'm willing to live with this compromise of assuming the domain is not on the whitelist. I can't think of any way to get around this in our current setup, as we need fast (no waiting) access to the whitelist, but SQLite will lock the whole database during the write. The only way I can think of getting around this would be to implement our own hashed whitelist so we would always have it accessible, but that is out of the question and would make this PR redundant.
I see the commit is in the development branch. I am really grateful you fine people are working on adding support for huge whitelists. Thank you! |
You should wait for pi-hole/pi-hole#2803 getting merged. This PR will change the schema of the gravity database and the test would be much smoother if you start only after we finally agreed on how this should look like. The current Also note our usual disclaimer that the |
By submitting this pull request, I confirm the following (please check boxes, eg [X]) Failure to fill the template will close your PR:
Please submit all pull requests against the
development
branch. Failure to do so will delay or deny your requestHow familiar are you with the codebase?:
10
Use
vw_whitelist
view of tablewhitelist
when checking if a domain is contained in the whitelist instead of copying whitelisted domains on receipt ofSIGHUP
into FTL's memory.This change will not only decrease FTL's memory footprint but also allow a much better scaling to very large / huge whitelists (looking into the 100k+ domains regime).
To achieve good performance, we constantly keep the database connection to
gravity.db
open to have a prepared (= precompiled to SQLite byte code) statement ready for querying the whitelist.In vivo tests on a Raspberry Pi with a whitelist containing 100.000 whitelisted domains combined with the regex
.
(matches any character) confirms that checking the whitelist should not take longer than typically 2 milliseconds, even for very large whitelists. The fast reaction is achieved by usingSELECT EXISTS(...)
which is highly optimized:Note that querying strategies like
are less performant as they will almost always result in an iterating co-routine (even when the column in the
WHERE
clause isUNIQUE
!)In contrast,
SELECT EXISTS(...)
immediately returns when it finds the first occurrence of the containedSELECT
statement. The domain itself will be bound at lookup time using SQLite bindings (placeholder?
).A drawback of relying on the always-open connection to the whitelist view is that when the database is locked (e.g.,
pihole -g
is currently storing the blocked domains), we cannot check whether a domain in contained in the whitelist or not. To avoid blocking the DNS service until gravity is eventually done, we immediately return and assume that the domain was not on the whitelist. In this case, we log likeAs gravity will send
SIGHUP
at the end, this possibly wrong cache entry will be removed soon after the query was made. On the next query of this domain, the database should not be locked any more and the query to the database can succeed.Overall, the benefits seem to outweigh the drawbacks but this PR may serve as a discussion platform.
This PR fixes #596 (at least partially).