Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [#261] store original info-hashes #269

Merged
merged 2 commits into from
Sep 12, 2023
Merged

feat: [#261] store original info-hashes #269

merged 2 commits into from
Sep 12, 2023

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Sep 5, 2023

When you upload a torrent, the infohash might change if the info dictionary contains non-standard fields because we remove them. That leads to a different infohash. We store the
original infohash in a new table so that we can know if the torrent was previously uploaded.

If we do not store the original infohash we could reject uploads producing the same canonical infohash. Still, there is no way for the user to ask if a torrent exists with a given original infohash. They only would be able to interact with the API with the canonical infohash.

Sometimes it's useful to use the original infohash, for instance, if you are importing torrents from an external source and you want to check if the original torrent (with the original infohash) was already uploaded.

@codecov-commenter
Copy link

codecov-commenter commented Sep 5, 2023

Codecov Report

Merging #269 (42840d4) into develop (08f0b86) will decrease coverage by 1.09%.
The diff coverage is 9.09%.

@@             Coverage Diff             @@
##           develop     #269      +/-   ##
===========================================
- Coverage    44.73%   43.65%   -1.09%     
===========================================
  Files           77       77              
  Lines         4124     4240     +116     
===========================================
+ Hits          1845     1851       +6     
- Misses        2279     2389     +110     
Files Changed Coverage Δ
src/app.rs 100.00% <ø> (ø)
src/common.rs 100.00% <ø> (ø)
src/databases/database.rs 30.55% <ø> (ø)
src/databases/mysql.rs 0.00% <0.00%> (ø)
src/errors.rs 15.55% <0.00%> (-0.48%) ⬇️
src/tracker/service.rs 2.59% <0.00%> (-0.07%) ⬇️
...s/from_v1_0_0_to_v2_0_0/databases/sqlite_v2_0_0.rs 81.15% <ø> (ø)
src/web/api/v1/contexts/torrent/handlers.rs 50.62% <0.00%> (ø)
src/databases/sqlite.rs 22.32% <4.25%> (-1.30%) ⬇️
src/services/torrent.rs 19.26% <7.69%> (-1.29%) ⬇️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@da2ce7
Copy link
Contributor

da2ce7 commented Sep 5, 2023

Good Work @josecelano

One thing:

I suggest that we rename the original_info_hash to info_hash, and add a new boolean column original_is_unknown. So we do not loose information in the migration.

@josecelano josecelano linked an issue Sep 5, 2023 that may be closed by this pull request
@josecelano
Copy link
Member Author

josecelano commented Sep 5, 2023

Good Work @josecelano

One thing:

I suggest that we rename the original_info_hash to info_hash, and add a new boolean column original_is_unknown. So we do not loose information in the migration.

Thank you!. And that's a good suggestion! I still have to finish some pending tasks.

  • Test MySQL migration
  • Write documentation for the "multiple original infohashes" feature (explaining what original and canonical infohashes are)
  • Review how we catch errors depending on UNIQUE constraints. I had to change the SQLite version because it relies on the error message and it was too generic. I have to change it also for MySQL.
  • Add some tests
  • We should also refactor the huge Torrent::add_torrent function, but I think that could be another PR. See Refactor: services::torrent::Index::add_torrent #276
  • Change torrent details endpoints returning 302 when the user uses any of the original infohash if it's not the canonical one. See Original and canonical info-hashes #275.
  • Change torrent upload endpoints returning 409 when you try to upload a torrent with a new original infohash whose canonical infohash already exists. See Original and canonical info-hashes #275.

Regarding the naming, I'm not sure at all. Do you like these terms: original and canonical?

I do not like the table name:

Current

CREATE TABLE IF NOT EXISTS torrust_torrent_info_hashes (
	original_info_hash	TEXT NOT NULL,
	canonical_info_hash	TEXT NOT NULL,
	PRIMARY KEY(original_info_hash),
	FOREIGN KEY(canonical_info_hash) REFERENCES torrust_torrents (info_hash) ON DELETE CASCADE
);

New Cameron's proposal

CREATE TABLE IF NOT EXISTS torrust_torrent_info_hashes (
	info_hash TEXT NOT NULL,
	canonical_info_hash TEXT NOT NULL,
        original_is_unknown BOOLEAN NULL DEFAULT NULL,
	PRIMARY KEY(original_info_hash),
	FOREIGN KEY(canonical_info_hash) REFERENCES torrust_torrents (info_hash) ON DELETE CASCADE
);

New Jose's proposal

CREATE TABLE IF NOT EXISTS torrust_torrent_original_info_hashes (
	info_hash TEXT NOT NULL,
	canonical_info_hash TEXT NOT NULL,
        original_is_unknown BOOLEAN NULL DEFAULT NULL,
	PRIMARY KEY(original_info_hash),
	FOREIGN KEY(canonical_info_hash) REFERENCES torrust_torrents (info_hash) ON DELETE CASCADE
);

I can't come up with a better name for the table. The one that describes it well is torrust_torrent_original_info_hashes, but it's too long.

These are three proposals from ChatGPT:

  • Canonical Torrent Group
  • Infohash Group
  • Canonicalized Torrent Set

For all the torrents that share the same canonical infohash.

Finally, notice that I have not added a record, as you originally proposed, with the original and canonical infohases being the same. If you upload a torrent the first time which has a different original and canonical infohashes, only one record will be created. I thought it would be simpler, you only have one record per original infohash. Notice that the primary key is only the original infohash and not the pair original/canonical.

@da2ce7
Copy link
Contributor

da2ce7 commented Sep 5, 2023

original_is_unknown BOOLEAN NULL DEFAULT NULL, could be NOT NULL.

@josecelano
Copy link
Member Author

Integration tests fail because there is a new container image for the Tracker. I'm working on a new PR to fix it.

When you upload a torrent, the infohash might change if the `info` dictionary contains non-standard fields because we remove them. That leads to a different infohash. We store the
 original infohash in a new table so that we can know if the torrent was previously uploaded.

If we do not store the original infohash we could reject uploads producing the same canonical infohash. Still, there is no way for the user to ask if a torrent exists with a given original infohash. They only would be able to interact with the API with the canonical infohash.

Sometimes it's useful to use the original infohash, for instance, if you are importing torrents from an external source and you want to check if the original torrent (with the original infohash) was already uploaded.
@josecelano
Copy link
Member Author

I've created a new epic issue to track all changes related to canonical info-hashes.

…ical infohash

If you upload a torrent, the infohash migth change if the `info`
dictionary contains custom fields. The Index removes non-standard custom
fields, and that generates a new infohash for the torrent.

If you upload a second torrent which is different from a previous one
only in the custom fields, the same canonical infohash will be
generated, so the torrent will be rejected as duplicated. The new
original infohash will be stored in the database.
@josecelano
Copy link
Member Author

ACK 110e159

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Changing Infohash from Unsupported Info-Fields
3 participants