-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database size optimizations #191
Comments
Hi, changing the primary key for torrents to anything other than the info hash would be a complex rework with several downsides I can think of - the main being that the info hash is a globally recognised ID (unlike an auto-incrementing So there would have to be a lot of value-add to justify this, and I'm not convinced 10% saved disk space is worth it. You've suggested a few other optimisations and I think each would need discussing on its own merits and understanding the dev work each would entail. If there are any low hanging fruit where we can make a big improvement with minimal impact then those are the kind of optimisations we'd want to find. |
Shouldn't the info hash be more or less an additional It was more than 12% on my test database and depending on the number of files per torrent it could be more, since each torrent file stores the info hash 3 times (1x in the column and 2x in indexes). So for torrents with many files this would save a lot more. And I guess when storing the torrent pieces it would be an even bigger improvement because there would be even more records per torrent. IMHO not storing the same "large" value multiple times in many tables is a low hanging fruit and I would be surprised if there could be more space saved with less work. |
Yes this is possible and it would require updating nearly every single query in the application, and always having to join to that table would incur a performance hit. So it would be a ton of work for a marginal disk space saving and a reduction in performance. I'd be surprised if 12% disk space couldn't be saved with much less disruptive changes, but at the moment performance improvements are more valuable to find than disk space savings. |
One thing I've done is partition |
Is your feature request related to a problem? Please describe
While having a look at the database, I have found that by using a different foreign key or using a foreign key at all could save more than 10% of disk space.
Describe the solution you'd like
Every table that has a relation to a torrent is linked via the info-hash. By adding an ID field with the data type
bigserial
(orbigint
) to thetorrents
table and using the new ID as a foreign key in the tablestorrent_files
andtorrents_torrent_sources
and removing theinfo_hash
field from the tables (except in thetorrents
table of course), I saved more than 12% space in a test database I converted.Of course the
info_hash
field should also be replaced in the other tables I didn't mention.Maybe the info-hash in the field
torrent_contents.id
could also be replaced by the ID.Additional context
Using a small foreign key field and storing the text/data in different table could also be used for other tables that store text that may seem small because it's only a few characters, but by having the same text a few million times adds up quickly.
Some other fields that I haven't tried but which could be replaced with a
smallserial
orsmallint
foreign key (which only takes up 2 bytes and the maximum value of 32,767 should suffice because the fields have very few unique values):torrent_hints.content_type
torrent_hints.content_source
torrent_hints.video_resolution
torrent_hints.video_source
torrent_hints.video_codec
torrent_hints.video_3d
torrent_hints.video_modifier
torrent_hints.release_group
torrent_sources
andtorrent_torrent_sources
Maybe even
torrent_hints.content_id
with a normal integer, because there are more unique values.I don't know how the bloom filters work, but maybe the unique values of the fields
torrents_torrent_sources.bfsd
andtorrents_torrent_sources.bfpe
could also be stored in a separate table and only be referenced by a foreign key. In my database the tabletorrents_torrent_sources
has 6,531,913 records and the two bloom filter fields have only 20,513 and 19,745 unique values, which looks like a lot of duplicates to me.The text was updated successfully, but these errors were encountered: