Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command to migrate DB data from v1.0.0 to v2.0.0 #77

Merged
merged 53 commits into from
Nov 30, 2022

Conversation

josecelano
Copy link
Member

@josecelano josecelano commented Oct 26, 2022

Depends on: #87

I've created the basic scaffolding of the command. My plan is:

  • Migrate all the table records.
  • Insert torrents in DB from uploads dir.

Potential problems are:

  • AUTOINCREMENTS
  • Foreign keys

UPDATE 2022-11-30

Tasks

  • Transfer torrent_categories
  • Transfer torrent_user
  • Allow login for the transferred users using the Pbkdf2 for hashing the password
  • Use date and time in today_iso8601 function instead of only the date
  • Transfer torrent_tracker_keys
  • Remove unused content from dir upgrades. We do not need the MySQL stuff for this migration. There was no MySQL support for version v1.0.0
  • Transfer torrent_torrents. See below the subtasks for this task
  • Transfer torrent_torrent_files. It was not used in v1.0.0
  • Make registration_date optional and leave it empty for imported users
  • Add a new DB field imported_date to the torrust_users table with the importation date for user records imported with this command
  • Refactor
  • Add automated tests
  • Intensive testing
  • Write documentation explaining how to upgrade
  • Rename destiny DB to target DB.

Subtasks from issue #84

This rework is needed because @ldpr found a bug in the current version that affects the upgrader. We have to fix that bug and update the "upgrader".

Transfer torrents subtasks

  • Fill table torrust_torrents
  • Fill table torrust_torrent_files
  • Fill table torrust_torrent_announce_urls
  • Fill table torrust_torrent_info

Testing subtasks

Assertions for integration test:

  • torrust_users table
  • torrust_user_authentication table
  • torrust_user_profiles table
  • torrust_tracker_keys table
  • torrust_torrents table
  • torrust_torrent_files table
  • torrust_torrent_info table
  • torrust_torrent_announce_urls table

Some cases to test:

  • Torrent with pieces.

Using a single tracker URL or multiple URLs:

  • Torrent with announce (single tracker url).
  • Torrent with announce_list (multiple tracker URLs).

Containing one file or multiple files:

  • Torrent with one file.
  • Torrent with multiple files.

Add tests for the application behaviour changed:

  • User can use have the password hashed with "argon2id" or a "pbkdf2-sha256". Add a test for the "verify_password" function.

Things I'm not going to test:

  • Torrent with MD5 checksum?. I do not know how to include this MD5 checksum in the torrent file.
  • Torrent with root_hash (BEP-0030).
  • The user registration date can be null for users that have been imported. We do not test it for MySQL since the upgrader is only for SQLite.

@josecelano josecelano linked an issue Oct 26, 2022 that may be closed by this pull request
@josecelano josecelano force-pushed the issue-56-db-migration-command-from-v1-to-v2 branch from 85fbda3 to ed27159 Compare October 31, 2022 14:11
@josecelano
Copy link
Member Author

hi @WarmBeer, I'm working on transferring user data. I've done all except the password. Since the hashing algorithm has changed from Pbkdf2 to Argon2, I have to think about how to move the data.

I think the best way to do it would be to add a new column to the database with the type of hashing used and use the old method until the user changes the password.

What do you think?

cc @da2ce7

@mickvandijke
Copy link
Member

mickvandijke commented Oct 31, 2022

Hey @josecelano ,

I think we should revert back to Pbkdf2, since we are dropping username/password authentication in the future anyway in favour of torrust/teps#8.

@da2ce7
Copy link
Contributor

da2ce7 commented Oct 31, 2022

hi @WarmBeer, I'm working on transferring user data. I've done all except the password. Since the hashing algorithm has changed from Pbkdf2 to Argon2, I have to think about how to move the data.

I think the best way to do it would be to add a new column to the database with the type of hashing used and use the old method until the user changes the password.

What do you think?

cc @da2ce7

Users do not need to change their password to change the password hashing algorithm: The only need to present with a valid password.

  1. Save imported user hashed-passwords and nonces into a new "Pbkdf2" table.
  2. User tries to login: Find the correct table: "Pbkdf2" or "Argon2". Confirm password normally.

Optional: Generate new password nonce and calculated password hash in "Argon2". Make an atomic transaction that inserts the new hashed-password-and-nounce, and drops the old one.

I think that it is too-late to move away from "Argon2", as there might be people who use our unreleased v.2 that have active users in this format.

@josecelano josecelano force-pushed the issue-56-db-migration-command-from-v1-to-v2 branch from 158e957 to 4afd4a5 Compare November 2, 2022 18:44
@josecelano
Copy link
Member Author

hi @WarmBeer, I'm working on transferring user data. I've done all except the password. Since the hashing algorithm has changed from Pbkdf2 to Argon2, I have to think about how to move the data.
I think the best way to do it would be to add a new column to the database with the type of hashing used and use the old method until the user changes the password.
What do you think?
cc @da2ce7

Users do not need to change their password to change the password hashing algorithm: The only need to present with a valid password.

  1. Save imported user hashed-passwords and nonces into a new "Pbkdf2" table.
  2. User tries to login: Find the correct table: "Pbkdf2" or "Argon2". Confirm password normally.

Optional: Generate new password nonce and calculated password hash in "Argon2". Make an atomic transaction that inserts the new hashed-password-and-nounce, and drops the old one.

I think that it is too-late to move away from "Argon2", as there might be people who use our unreleased v.2 that have active users in this format.

hi @WarmBeer @da2ce7 ,

I did not know we were using the PHC string format since version v1.0.0. That means we have values like this in the password hash value in the DB.

$pbkdf2-sha256$i=10000,l=32$DGEHp7+y+kvTBUt3XdqFgA$JsK3jdz4Yz4hUPmWmPCyWk+NlgkdN9TCkbcjeQV68fA

I only had to import the hash from the old DB to the new DB field and change the login controller to use a different verification depending on the prefix of the hash value.

NOTE: that means we do not need to have two columns for both hash types and a third one with the name of the algorithm used.

In the previous example: pbkdf2-sha256.
That's the hash algorithm ID.
And this is the password verification in the login:

pub fn verify_password(
    password: &[u8],
    user_authentication: &UserAuthentication,
) -> Result<(), ServiceError> {
    // wrap string of the hashed password into a PasswordHash struct for verification
    let parsed_hash = PasswordHash::new(&user_authentication.password_hash)?;

    match parsed_hash.algorithm.as_str() {
        "argon2id" => {
            if Argon2::default()
                .verify_password(password, &parsed_hash)
                .is_err()
            {
                return Err(ServiceError::WrongPasswordOrUsername);
            }

            Ok(())
        }
        "pbkdf2-sha256" => {
            if Pbkdf2.verify_password(password, &parsed_hash).is_err() {
                return Err(ServiceError::WrongPasswordOrUsername);
            }

            Ok(())
        }
        _ => Err(ServiceError::WrongPasswordOrUsername),
    }
}

We can upgrade the hash when the user logs in if we want. Should I do it now in this PR? Are we going to do it? I would do it to keep things simple. But I would do it in a new minor release (v2.1.0). It's an optional feature, and I would prefer to make the minimum amount of changes in this upgrade. We are already doing a lot of changes.

@da2ce7
Copy link
Contributor

da2ce7 commented Nov 3, 2022

@josecelano Assisting the current date for the registration data is a loss of information. I suggest that we have that field to be optional, and have a new entry that contains the date of import.

@josecelano
Copy link
Member Author

@josecelano Assisting the current date for the registration data is a loss of information. I suggest that we have that field to be optional, and have a new entry that contains the date of import.

hi @da2ce7, Yes, It does not make sense at all. Probably I was thinking about creating a new site as if the users were registered at that moment (but the migration has to be transparent to the user). I realised later, but I have not changed it yet. I think your suggestion is the right way to do it.

Besides, I wanted to propose another change for the future. It would be nice to have "created_at" and "updated_at" fields in all the tables. I think that could help to debug bugs. Ideally, we could get that info from the log, but it's very useful information when you are trying to know what happened, and it's very handy to have it on the table.

@josecelano
Copy link
Member Author

hi @WarmBeer It seem this table was not used at all:

CREATE TABLE IF NOT EXISTS torrust_torrent_files (
    file_id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    torrent_id INTEGER NOT NULL,
    number INTEGER NOT NULL,
    path VARCHAR(255) NOT NULL,
    length INTEGER NOT NULL,
    FOREIGN KEY(torrent_id) REFERENCES torrust_torrents(torrent_id)
)

I see the migration in this commit 6d66a0a, but I do not see any DB query to use it. Could you confirm it?

@josecelano
Copy link
Member Author

Hi @josecelano ,

The table is being used here:

https://github.com/torrust/torrust-index-backend/blob/e8a6fe75654e769d4d12b49d77bae1ff159bc13c/src/databases/mysql.rs#L416

Which is part of this function:

https://github.com/torrust/torrust-index-backend/blob/e8a6fe75654e769d4d12b49d77bae1ff159bc13c/src/databases/mysql.rs#L365

Sorry @WarmBeer, I did not explain it well. I mean, it was created before release v1.0.0, but it was not used after that release. Since I'm only working on migrating from v1.0.0 to v2.0.0, that means that table should be empty. I'm using the index with the backend v1.0.0 and I can upload a torrent but that table remains empty.

@mickvandijke
Copy link
Member

Oh sorry @josecelano, you are right. This table is not used at all prior to V2.

@josecelano
Copy link
Member Author

hi @WarmBeer, two questions:

  1. Were all torrents in v1.0.0 always public? I do not see any column to mark them as private.
  2. Inhohases were lowercase in v1.0.0 and uppercase in v2.0.0. I suppose it does not matter. I've concerted them to uppercase.

@josecelano
Copy link
Member Author

hi @WarmBeer, two questions:

  1. Were all torrents in v1.0.0 always public? I do not see any column to mark them as private.
  2. Inhohases were lowercase in v1.0.0 and uppercase in v2.0.0. I suppose it does not matter. I've converted them to uppercase.

hi @WarmBeer @da2ce7, I'm done with transferring the data. Now I want to refactor a little bit and make some tests with more data and cases. And improve the documentation.

@WarmBeer a part for the previous questions (1 and 2) I have one more:

  1. It seems that if the torrent has "announce" and "announce_list" (BEP-0012 for multiple trackers), we only add the "announce" (tracker URL) to the torrust_torrent_announce_urls table. But the BEP-012 says "announce_list" should have higher priority.

image

Our code for SQlite.

@mickvandijke
Copy link
Member

Hi @josecelano , Sorry for the late response.

  1. Were all torrents in v1.0.0 always public? I do not see any column to mark them as private.

In both V1 and V2, torrents are always public on the torrent index. There is currently nothing in place to restrict viewing rights.

But If you are talking about the .torrent metadata setting, then no. The private flag would just be inside the .torrent file for V1, where in V2 it is in the database.

  1. Inhohases were lowercase in v1.0.0 and uppercase in v2.0.0. I suppose it does not matter. I've concerted them to uppercase.

I do not actively remember making this change and what drove that decision. But I suppose it doesn't matter 😅.

  1. It seems that if the torrent has "announce" and "announce_list" (BEP-0012 for multiple trackers), we only add the "announce" (tracker URL) to the torrust_torrent_announce_urls table. But the BEP-012 says "announce_list" should have higher priority.

Good find. This is indeed a mistake. I will open a PR to prioritize the announce_list field over the announce field.

@josecelano
Copy link
Member Author

Hi @josecelano , Sorry for the late response.

  1. Were all torrents in v1.0.0 always public? I do not see any column to mark them as private.

In both V1 and V2, torrents are always public on the torrent index. There is currently nothing in place to restrict viewing rights.

But If you are talking about the .torrent metadata setting, then no. The private flag would just be inside the .torrent file for V1, where in V2 it is in the database.

  1. Inhohases were lowercase in v1.0.0 and uppercase in v2.0.0. I suppose it does not matter. I've concerted them to uppercase.

I do not actively remember making this change and what drove that decision. But I suppose it doesn't matter sweat_smile.

  1. It seems that if the torrent has "announce" and "announce_list" (BEP-0012 for multiple trackers), we only add the "announce" (tracker URL) to the torrust_torrent_announce_urls table. But the BEP-012 says "announce_list" should have higher priority.

Good find. This is indeed a mistake. I will open a PR to prioritize the announce_list field over the announce field.

  1. OK, thank you. I've fixed it here.
  2. It does matter, but we should always use the same in our DB and UI. I prefer the lowercase version, but let me know which one you prefer to change the upgrader.
  3. OK, I will fix it in the upgrader.

@mickvandijke
Copy link
Member

Hi @josecelano , Sorry for the late response.

  1. Were all torrents in v1.0.0 always public? I do not see any column to mark them as private.

In both V1 and V2, torrents are always public on the torrent index. There is currently nothing in place to restrict viewing rights.
But If you are talking about the .torrent metadata setting, then no. The private flag would just be inside the .torrent file for V1, where in V2 it is in the database.

  1. Inhohases were lowercase in v1.0.0 and uppercase in v2.0.0. I suppose it does not matter. I've concerted them to uppercase.

I do not actively remember making this change and what drove that decision. But I suppose it doesn't matter sweat_smile.

  1. It seems that if the torrent has "announce" and "announce_list" (BEP-0012 for multiple trackers), we only add the "announce" (tracker URL) to the torrust_torrent_announce_urls table. But the BEP-012 says "announce_list" should have higher priority.

Good find. This is indeed a mistake. I will open a PR to prioritize the announce_list field over the announce field.

  1. OK, thank you. I've fixed it here.
  2. It does matter, but we should always use the same in our DB and UI. I prefer the lowercase version, but let me know which one you prefer to change the upgrader.
  3. OK, I will fix it in the upgrader.

I usually prefer lowercase in the database and uppercase in the frontend.

@mickvandijke
Copy link
Member

#80

@josecelano josecelano force-pushed the issue-56-db-migration-command-from-v1-to-v2 branch from 231ba87 to 096e296 Compare November 9, 2022 18:16
@josecelano josecelano force-pushed the issue-56-db-migration-command-from-v1-to-v2 branch 3 times, most recently from 0eef98f to 7905fd9 Compare November 10, 2022 12:58
@josecelano
Copy link
Member Author

hi @WarmBeer, I've added a test to check that one torrent is transferred well from the old DB and file to the new. The torrent I use in the test is a torrent:

  • With one file.
  • Using "pieces" instead of "root_hash".
  • Using "announce_list" instead of "announce".
  • No MD5 checksum.

I want to add a second test with a new torrent:

  • With multiple files (folder).
  • Using "root_hash" instead of "pieces".
  • Using "announce" instead of "announce_list".
  • With MD5 checksum.

But I do not know how to force those options (root_hash, announce, and MD5 checksum) using the BitTorrent client:

image

I want to cover all cases with two torrents if possible. Do you know how I can create a torrent with those fields?

That's the last automatic test I want to add. After that, I will test the migrated DB with a running app.

@mickvandijke
Copy link
Member

hi @WarmBeer, I've added a test to check that one torrent is transferred well from the old DB and file to the new. The torrent I use in the test is a torrent:

  • With one file.
  • Using "pieces" instead of "root_hash".
  • Using "announce_list" instead of "announce".
  • No MD5 checksum.

I want to add a second test with a new torrent:

  • With multiple files (folder).
  • Using "root_hash" instead of "pieces".
  • Using "announce" instead of "announce_list".
  • With MD5 checksum.

But I do not know how to force those options (root_hash, announce, and MD5 checksum) using the BitTorrent client:

image

I want to cover all cases with two torrents if possible. Do you know how I can create a torrent with those fields?

That's the last automatic test I want to add. After that, I will test the migrated DB with a running app.

Hey @josecelano ,

I'm sorry but I don't know of any alternative Torrent file creation tools that do support these extensions or Bittorrent V2. Even qBittorrent does not show any options for creating V2 torrent files.

@josecelano
Copy link
Member Author

hey @WarmBeer, I did not know we don't support The BitTorrent Protocol Specification v2 - BEP52.

I'm going to assume we only have torrents in version V1. If that's the case I'm almost doe with tests.

On the other hand, do you know why the md5sum field is always empty in TorrentInfo?:

pub struct TorrentInfo {
    pub name: String,
    #[serde(default)]
    pub pieces: Option<ByteBuf>,
    #[serde(rename = "piece length")]
    pub piece_length: i64,
    #[serde(default)]
    pub md5sum: Option<String>,
    #[serde(default)]
    pub length: Option<i64>,
    #[serde(default)]
    pub files: Option<Vec<TorrentFile>>,
    #[serde(default)]
    pub private: Option<u8>,
    #[serde(default)]
    pub path: Option<Vec<String>>,
    #[serde(default)]
    #[serde(rename = "root hash")]
    pub root_hash: Option<String>,
}

I do not see any reference to that field in https://www.bittorrent.org/.

@josecelano josecelano force-pushed the issue-56-db-migration-command-from-v1-to-v2 branch from 0810f4a to f9baa2c Compare November 11, 2022 17:32
@josecelano josecelano marked this pull request as ready for review November 11, 2022 18:32
@josecelano josecelano requested a review from da2ce7 November 11, 2022 18:33
josecelano and others added 22 commits November 30, 2022 13:57
Extract different testers for every type of data transferred.
We do not need to read migrations from dir becuase they are not going to
change for verion v1.0.0.
The application now supports two hashing methods:

- "pbkdf2-sha256": the old one. Only for imported users from DB version
  v1.0.0.
- "argon2": the new one for registered users.
Imported users from DB version v1.0.0 (only SQLite) do not have a
"date_registered" field. WE have to copy that behavior in MySQL even if
we do not have users imported from from previous versions in MySQL.

Support for MySQL was added after the version v1.0.0.
Instead of regenerating ID sequence we keep the category id.
Becuase we were keeping IDs for all tables except for this one.

@ldpr helped testing the migration script and found the issue with the
categories IDs.

Co-authored-by: ldpr <[email protected]>
@josecelano josecelano force-pushed the issue-56-db-migration-command-from-v1-to-v2 branch from 3c40e67 to 5a7d875 Compare November 30, 2022 13:59
@josecelano josecelano removed the Needs Rebase Base Branch has Incompatibilities label Nov 30, 2022
@josecelano
Copy link
Member Author

ACK 5a7d875

@josecelano
Copy link
Member Author

josecelano commented Nov 30, 2022

Hi @WarmBeer @da2ce7 @ldpr, I've changed my mind. Instead of importing stats automatically from the tracker, I've created a new console command to do it manually and optionally.

cargo run --bin import_tracker_statistics

You can run that command after running the upgrader if you want to pre-fill the stats cache table (torrust_torrent_tracker_stats).

The reason is that after merging #87 the app works fine again, showing all torrents on the list even if they do not have stats yet.

The app imports stats:

  • Every hour for all torrents (settings option)
  • When you upload a new torrent or
  • When you load the torrent detail page.

You can decide if you want to pre-populate the table or let the app fill it up.

Copy link
Contributor

@da2ce7 da2ce7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5a7d875

@josecelano josecelano merged commit 34cc8f5 into develop Nov 30, 2022
@josecelano josecelano deleted the issue-56-db-migration-command-from-v1-to-v2 branch November 30, 2022 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Provide a migration script for older Torrust versions
4 participants