Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research and define files migration without losing git history #5574

Closed
2 tasks done
pcapellan opened this issue Jul 15, 2024 · 5 comments
Closed
2 tasks done

Research and define files migration without losing git history #5574

pcapellan opened this issue Jul 15, 2024 · 5 comments
Assignees
Labels

Comments

@pcapellan
Copy link

pcapellan commented Jul 15, 2024

Main issue
wazuh/wazuh-packages#2904

Description

Currently, we are facing a migration process from different repositories to a new one, in a normal migration, all the changes history of the files are lost since they are just 'moved' from one repository to another.

This issue aims to find a way to avoid this history loss and document the process that must be followed to achieve it.

Tasks

  • Research for a method/plugin/process to avoid files git history loss
  • Define a process to implement this solution

Conclusion

After some research and testing the different solutions, I would like to highlight two different methods that we could use to accomplish the migration. None of these methods requires any plugin or installation apart from git itself:

Method 1 - Git filter-branch

Note

The CppServer team has defined a slightly different approach to this one. Link

  1. Clone the source repository and navigate to it:
    git clone <source_repository>
    cd <path/to/source/repository>
  2. Remove origin remote:
    git remote rm origin
  3. Select the directories to 'migrate' and clean the repository:
    git filter-branch -f --prune-empty --index-filter   'git rm --cached -r -q -- . ; git reset -q $GIT_COMMIT -- <dir_to_migrate_1> <dir_to_migrate_2> <dir_to_migrate_x>' -- <source_branch>

    We can list all the directories we want to migrate.

  4. Switch to the target repository's directory and add the source repository as a temporary local remote
    cd <path/to/target/repository>
    git remote add <source_name> <path/to/source/repository>
  5. Pull the files and their history from the source, and remove the temporary local remote
    git pull <source_name> <source_branch> --allow-unrelated-histories --no-rebase
    git remote rm <source_name>
Real life example
  1. Clone the source repository and navigate to it:
    git clone https://github.com/wazuh/wazuh-jenkins.git
    cd wazuh-jenkins
  2. Remove origin remote:
    git remote rm origin
  3. Select the directories to 'migrate' and clean the repository
    git filter-branch -f --prune-empty --index-filter   'git rm --cached -r -q -- . ; git reset -q $GIT_COMMIT -- bootstrap/ new-jenkins/' -- master
    
    git gc --aggressive
  4. Switch to the target repository's directory and add the source repository as a temporary local remote
    cd ~/new_repo
    git remote add jenkins /home/wazuh-jenkins
  5. Pull the files and their history from the source, and remove the temporary local remote
    git pull jenkins master  --allow-unrelated-histories --no-rebase
    git remote rm jenkins

Check result:

~/new_repo$ ls
bootstrap  new-jenkins

Pros:

  • Just a few easy steps
  • We can list all the directories to migrate in only one command

Cons:

  • Slow when executing on a big source repository
  • Affects files on the source repository folder

Method 2 - Git split

  1. Clone the source repository and navigate to it:
    git clone <source_repository>
    cd <path/to/source/repository>
  2. Remove origin remote:
    git remote rm origin
  3. Select the directories to 'migrate' and assign a new split branch to each one:
    git subtree split -P <dir_to_migrate_1> -b <split_branch_1>
    git subtree split -P <dir_to_migrate_2> -b <split_branch_2>
    git subtree split -P <dir_to_migrate_x> -b <split_branch_x>
  4. Switch to the target repository's directory and add the source repository as a local remote.
    cd <path/to/target/repository>
    git remote add <source_name> <path/to/source/repository>
  5. Fetch and merge each split branch into a specific directory
    git fetch <source_name> <split_branch_1>
    git fetch <source_name> <split_branch_2>
    git fetch <source_name> <split_branch_3>
    
    git pull  <source_name> <split_branch_1> --no-rebase --allow-unrelated
    git pull  <source_name> <split_branch_2> --no-rebase --allow-unrelated
    git pull  <source_name> <split_branch_3> --no-rebase --allow-unrelated
  6. Cleanup
    git remote rm <source_name>
Real life example
  1. Clone the source repository and navigate to it:
    git clone https://github.com/wazuh/qa-integration-framework.git
    cd qa-integration-framework/
  2. Remove origin remote:
    git remote rm origin
  3. Select the directories to 'migrate' and assign a new split branch to each one:
    git subtree split -P src/wazuh_testing/tools/monitors/ -b integration_monitors
    git subtree split -P src/wazuh_testing/constants/ -b integration_constants
  4. Switch to the target repository's directory and add the source repository as a local remote.
    cd ../new_repo
    git remote add integration /home/qa-integration-framework
  5. Fetch and merge each split branch into a specific directory
    git fetch integration integration_monitors
    git fetch integration integration_constants
    
    git read-tree --prefix=monitors/ -u integration/integration_monitors
    git commit -am "Merge integration monitors to new repo"
    
    git read-tree --prefix=constants/ -u integration/integration_constants
    git commit -am "Merge integration constants to new repo"
  6. Cleanup
    git remote rm <source_name>

Check results
bash ~/new_repo$ ls constants monitors

Pros:

  • Steps are easy and easy to understand
  • We can easily decide the destination folder for each directory/file migrated
  • The source repository folder does not get any directories removed

Cons:

  • Has more steps than the other option
  • We have to create a split branch for each directory to migrate
@pcapellan pcapellan added the level/subtask Subtask issue label Jul 15, 2024
@pcapellan pcapellan self-assigned this Jul 15, 2024
@wazuhci wazuhci moved this to Backlog in Release 4.10.0 Jul 15, 2024
@QU3B1M QU3B1M assigned QU3B1M and unassigned pcapellan Jul 16, 2024
@QU3B1M QU3B1M changed the title Research: migrated files history Research and define files migration without losing git history Jul 16, 2024
@wazuhci wazuhci moved this from Backlog to In progress in Release 4.10.0 Jul 16, 2024
@QU3B1M
Copy link
Member

QU3B1M commented Jul 16, 2024

Update report

Research on git native capabilities to import code with its history from another repo, it seems like the tool is capable to do so, the option --allow-unrelated-histories exists for this purpose.

Testing the solution with a dummy repository:

  1. Clone the source repository:
    git clone https://github.com/QU3B1M/hardhat-token-farm.git
  2. Remove origin remote:
    git remote rm origin
  3. Select the directory to 'migrate' and to keep the history from:
    git filter-branch --subdirectory-filter contracts/ -- --all
  4. Clean up:
    git reset --hard
    git gc --aggressive
    git prune
    git clean -fd
  5. Switch to the target repository's directory and add local remote to the source repository
    cd ../target
    git remote add contracts ../hardhat-token-farm/
  6. Pull the files and its history from the source
    git pull contracts main --allow-unrelated-histories
    remote: Enumerating objects: 40, done.
    remote: Counting objects: 100% (40/40), done.
    remote: Compressing objects: 100% (28/28), done.
    remote: Total 40 (delta 10), reused 40 (delta 10), pack-reused 0
    Unpacking objects: 100% (40/40), 6.99 KiB | 1.40 MiB/s, done.
    From ../hardhat-token-farm
     * branch            main       -> FETCH_HEAD
     * [new branch]      main       -> contracts/main
  7. Remove the local remote
    git remote rm contracts
  8. Check the files and its history
    /target$ ls
    QBMToken.sol  TokenFarm.sol  test
    /target$ git log
    commit 3f68292c88864623b2148155823f485d964eaaa6 (HEAD -> master)
    Author: QU3B1M <[email protected]>
    Date:   Sat Apr 2 12:09:42 2022 -0300
    
        Issue tokens tests
    
    commit 1ea79f5b87e99a38bbda174c202175978d079b2e
    Author: QU3B1M <[email protected]>
    Date:   Sat Apr 2 11:46:24 2022 -0300
    
        Staking Tests
    
    commit d92e68634a02fc826fb1249731af379104fc1d2b

@wazuhci wazuhci moved this from In progress to Pending review in Release 4.10.0 Jul 16, 2024
@jnasselle
Copy link
Member

Review

Nice job @QU3B1M ! just a question: could you please check doing this with two repos? I mean, given repos A and B, create a new one C with some folders of A and some folders of B. This is just in case, because this is a more accurate scenario since we'll merge several repos into one

@wazuhci wazuhci moved this from Pending review to On hold in Release 4.10.0 Jul 16, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 4.10.0 Jul 16, 2024
@QU3B1M
Copy link
Member

QU3B1M commented Jul 17, 2024

On hold 🔐

This issue will remain on hold until 4.9.0 release testing is complete.


Issue resumed

@wazuhci wazuhci moved this from In progress to On hold in Release 4.10.0 Jul 17, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 4.10.0 Jul 17, 2024
@QU3B1M
Copy link
Member

QU3B1M commented Jul 18, 2024

Update report

Working on a more realistic solution, having some troubles when migrating with "complex" paths. WIP

@wazuhci wazuhci moved this from In progress to Pending review in Release 4.10.0 Jul 19, 2024
@damarisg damarisg changed the title Research and define files migration without losing git history Spike - Research and define files migration without losing git history Jul 22, 2024
@wazuhci wazuhci moved this from Pending review to In review in Release 4.10.0 Jul 22, 2024
@damarisg
Copy link
Member

We will work with Method 2 - Git split.

LGTM!

@damarisg damarisg moved this from In review to Done in Release 4.10.0 Jul 22, 2024
@damarisg damarisg changed the title Spike - Research and define files migration without losing git history Research and define files migration without losing git history Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

4 participants