Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate home directories of Pangeo users to the 2i2c Hub #653

Closed
4 of 5 tasks
choldgraf opened this issue Sep 2, 2021 · 27 comments
Closed
4 of 5 tasks

Migrate home directories of Pangeo users to the 2i2c Hub #653

choldgraf opened this issue Sep 2, 2021 · 27 comments
Assignees
Labels
Task Actions that don't involve changing our code or docs.

Comments

@choldgraf
Copy link
Member

choldgraf commented Sep 2, 2021

Description

There are many users that are currently on the old Pangeo JupyterHub (at https://us-central1-b.gcp.pangeo.io/). We should migrate their user home directories to the hub that we are deploying.

Value / benefit

This will minimize the disruption that these users feel when they migrate from one hub to the next.

Implementation details

We should understand whether we need to simply point the old hub's user filesystem to our new hub, or if we will have to move those filesystems instead. Update 2021/10/06: We will be copying contents from one filesystem to another.

We'll need to make sure that the new hub is "ready to go" when this happens, because it will force many users to use the new hub since that's where their work will be. Update 2021-10-06: Sarah plans to make posts in Pangeo discourse which will detail when the domain name switch will happen and when the last data migration happened before that date (a gap she will try to minimise to the best of her ability).

Since the new hub (https://pangeo.2i2c.cloud) has current users, at the hub database step of the move we will need to take care to merge the two databases rather than overwrite them, so no one's access is lost.

The old hub is at https://us-central1-b.gcp.pangeo.io/.

Tasks to complete

Updates

@choldgraf choldgraf added Task Actions that don't involve changing our code or docs. 🏷️ pangeo labels Sep 2, 2021
@rabernat
Copy link
Contributor

This issue is a MUST HAVE for the migration of the Pangeo GCP production cluster.

@sgibson91
Copy link
Member

Docs we already have on moving user directories: https://pilot-hubs.2i2c.org/en/latest/howto/operate/move-hub.html

Since the new hub is using Google Filestore, I will need to mount that to a VM to be able to carry out the transfer. Instructions here: https://cloud.google.com/filestore/docs/creating-instances

I'm keen to use rsync rather than scp as I think it allows a more elegant handling of when to overwrite files https://linuxize.com/post/how-to-transfer-files-with-rsync-over-ssh/

This StackOverflow answer provides a Python class to merge SQLite db's https://stackoverflow.com/a/61954182

@sgibson91
Copy link
Member

@paigem there is a wrinkle with migrating the COESSING users home directories. Since the COESSING hub uses Google for auth and the Pangeo hub uses GitHub, the paths won't match up even if I do transfer the data because they'll contain their emails, not GitHub handles. Therefore, I strongly recommend they download their work locally to upload to the new hub when they switch is made (both hubs will be available simultaneously for a short while) since there'll be no way for me to map emails to github handles.

@sgibson91
Copy link
Member

@jhamman @scottyhq if either of you have the time to go through the GCP hub's nfs storage with me, I'd really appreciate it! It's not setup how I was expecting.

  • I ssh'd into the homedir-manager-2 VM
  • Under /home there are some user dirs but mostly gke-<SOME_HASH> dirs, e.g.:
drwxr-xr-x 3 gke-efc6f9dd0553c8d21056 gke-efc6f9dd0553c8d21056 4096 Aug 16  2020 gke-efc6f9dd0553c8d21056
drwxr-xr-x 3 jhamman                  jhamman                  4096 Jun 26  2020 jhamman
  • I was expecting something like looked more like:
drwx------ 10 ubuntu ubuntu 4096 Jul 26 07:59 <GITHUB_ID>

I could be looking in the wrong place though?

@scottyhq
Copy link
Contributor

scottyhq commented Oct 7, 2021

@scottyhq if either of you have the time to go through the GCP hub's nfs storage with me, I'd really appreciate it! It's not setup how I was expecting.

Sorry, but i have only worked on the AWS infrastructure, so won't be able to help here

@rabernat
Copy link
Contributor

rabernat commented Oct 8, 2021

I am the one who executed the previous migration of home directories from the older cluster (ocean.pangeo.io) to the current one (https://us-central1-b.gcp.pangeo.io/). This was very hard because the old cluster used ORCID (via Globus) for auth, so we had to create a mapping between ORCID and GitHub user name. Then we gzipped each user's home directory from one cluster and extracted it to the new cluster under a new username one at a time.

Some of the scripts I used to do this were archived here:
https://gist.github.com/rabernat/c9b352de926756342e86da662a0eadf9

I believe that the script is telling us that the user homedirs should live in /mnt/nfs/uscentral1b/. However, the absolute path I guess depends on how the NFS volume is mounted.

Is this at all helpful?

@sgibson91
Copy link
Member

sgibson91 commented Oct 8, 2021

Thanks Ryan, I'm sure the scripts will come in handy, but I'm still struggling to find anything!

sgibson@homedir-manager-2:~$ sudo ls -al /mnt/nfs
total 8
drwxr-xr-x 2 root root 4096 Jun 26  2020 .
drwxr-xr-x 3 root root 4096 Jun 26  2020 ..
sgibson@homedir-manager-2:~$ find uscentral1b -type d
find: ‘uscentral1b’: No such file or directory
sgibson@homedir-manager-2:~$ sudo find uscentral1b -type d
find: ‘uscentral1b’: No such file or directory
sgibson@homedir-manager-2:~$

Update: A different find command

sgibson@homedir-manager-2:/$ sudo find . -type d -name uscentral1b
sgibson@homedir-manager-2:/$

@rabernat
Copy link
Contributor

rabernat commented Oct 9, 2021

Here is how I look at the home directories

I don't know what the VM instance homedir-manager-2 is.

@sgibson91
Copy link
Member

Thanks @rabernat, the above worked. I wasn't aware the old cluster was using a filestore (which is good as that's what the new cluster is also using!). I've got the directories now :)

I don't know what the VM instance homedir-manager-2 is.

I actually think this is the VM for the last migration as it has ocean.pangeo.io/ under rpa 😂

@sgibson91
Copy link
Member

sgibson91 commented Oct 11, 2021

I have successfully mounted each NFS filestore to a VM in each Google Cloud project and found the locations of existing user directories and where they should be copied to. However, I am now stuck establishing an ssh connection between the two VMs.

Definitions:

  • target VM: in the pangeo-integration-te GCP project
  • source VM: in the pangeo-181919 GCP project

What I did:

  • On the target VM, ran sudo -s and set a root password. This changed my prompt from sgibson@target-vm to ubuntu@target-vm
  • Created an ssh key pair on the target VM: ssh-keygen -f nfs-transfer-key. I made sure the output files were in ~/.ssh/
  • Copied nfs-transfer-key.pub from the target VM to the source VM
  • Attempted to copy the user directories from the source VM to the target VM, by running the following command on the target VM
    rsync -azvhP ubuntu@SOURCE-VM-IP:/mnt/filestore/uscentral1b/ /mnt/filestore/staging/
    
    I also tried this command too:
    scp -p -r -i ~/.ssh/nfs-transfer-key ubuntu@SOURCE-VM-IP:/mnt/filestore/uscentral1b/ /mnt/filestore/staging/
    
    I tried both with and without sudo
  • Every time I try, I see the following error: Permission denied (publickey).

I also have logs from:

  • ssh -vvv ... from the target VM to the source VM; and
  • tail -f /var/log/auth.log

but wasn't sure what from those logs would be safe to copy-paste into a public issue

@sgibson91
Copy link
Member

Things changed:

  • /home/ubuntu/.ssh/authorized_keys on the source VM should be a file, not a directory (however, this did not resolve the above issue)

@sgibson91
Copy link
Member

I've narrowed this down to something being up with the ubuntu user that we're trying to ssh as (see step 3). I can ssh into the source VM from my local machine using the keys I generated as sgibson (having listed the public part in /home/sgibson/.ssh/authorized_keys) but not as ubuntu (also having listed the public part of the key under /home/ubuntu/.ssh/authorized_keys.

I had trouble with the su ubuntu command list in our docs since GCP VMs don't come configured with a root password, so I had to do something like sudo -s && su ubuntu and I'm just not sure that has been set up properly.

However, the --archive [-a] option to rsync claims to preserve attributes of files, I'm hoping that means the UID? If so, maybe we can just forget the ubuntu user part?

@sgibson91
Copy link
Member

sgibson91 commented Oct 11, 2021

I also double checked that I could ssh from target VM to source VM with the original ssh key as sgibson AND I CAN. But rsync then doesn't work, I get the same "Permission denied (publickey)" error. However, scp was successful in copying over a single file 🙌🏻 BUT the user and group were root rather than ubuntu which is not what I think we're after.

Basically, I think I botched the whole su ubuntu part of the instructions here

@sgibson91
Copy link
Member

I think I needed to follow something like these instructions: https://www.digitalocean.com/community/tutorials/how-to-create-a-sudo-user-on-ubuntu-quickstart so I'll try that tomorrow

We should definitely update our docs on this!

@damianavila
Copy link
Contributor

damianavila commented Oct 12, 2021

@sgibson91, quick question, what are the permissions and ownership in the .ssh directories?
In the past, I have experienced "Permission denied" issues when the ownership and the permissions were not the expected ones... For instance, for the files under /home/ubuntu/.ssh, I would expect ownership by the ubuntu:ubuntu user/group, the .ssh directory with chmod 700, public keys with chmod 644, and private ones with chmod 600, IIRC.
From your description of the problem, it seems some ownership/permission issue is being the underlying cause, IMHO.

@choldgraf choldgraf moved this to Needs input 🙌 in Sprint Board Oct 12, 2021
@sgibson91
Copy link
Member

I would expect ownership by the ubuntu:ubuntu user/group

I agree, and my current suspicion is that it's because the ubuntu user/group doesn't exist and I'll need to set that up using the link in my previous comment.

@sgibson91
Copy link
Member

@sgibson91, quick question, what are the permissions and ownership in the .ssh directories? In the past, I have experienced "Permission denied" issues when the ownership and the permissions were not the expected ones... For instance, for the files under /home/ubuntu/.ssh, I would expect ownership by the ubuntu:ubuntu user/group, the .ssh directory with chmod 700, public keys with chmod 644, and private ones with chmod 600, IIRC. From your description of the problem, it seems some ownership/permission issue is being the underlying cause, IMHO.

I tried you suggestion @damianavila still with no luck 😭

ubuntu@pangeo-migration-vm:~$ ls -al
total 28
drwxr-xr-x 3 ubuntu ubuntu 4096 Oct 11 12:30 .
drwxr-xr-x 4 root   root   4096 Oct  6 10:06 ..
-rw------- 1 ubuntu ubuntu 2003 Oct 11 14:44 .bash_history
-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
drwx------ 2 ubuntu ubuntu 4096 Oct 11 14:01 .ssh
-rw-r--r-- 1 ubuntu ubuntu    0 Oct 11 11:47 .sudo_as_admin_successful
ubuntu@pangeo-migration-vm:~$ ls -al .ssh/
total 20
drwx------ 2 ubuntu ubuntu 4096 Oct 11 14:01 .
drwxr-xr-x 3 ubuntu ubuntu 4096 Oct 11 12:30 ..
-rw------- 1 ubuntu ubuntu    0 Oct  6 10:05 authorized_keys
-rw-r--r-- 1 ubuntu ubuntu  222 Oct 11 11:58 known_hosts
-rw------- 1 ubuntu ubuntu 2610 Oct 14 13:47 nfs-transfer-key
-rw-r--r-- 1 ubuntu ubuntu  580 Oct 14 13:47 nfs-transfer-key.pub
ubuntu@pangeo-migration-vm:~$ chmod 700 ~/.ssh/nfs-transfer-key.pub
ubuntu@pangeo-migration-vm:~$ chmod 644 ~/.ssh/nfs-transfer-key
ubuntu@pangeo-migration-vm:~$ scp -r -p -i ~/.ssh/nfs-transfer-key [email protected]:/mnt/filestore/uscentral1b/aaronspring/Climpred_demo.ipynb /mnt/filestore/staging/aaronspring/
[email protected]: Permission denied (publickey).
ubuntu@pangeo-migration-vm:~$

@damianavila
Copy link
Contributor

damianavila commented Oct 14, 2021

Well... chmod 700 should be in the .ssh directory, chmod 644 for public keys, and chmod 600 for private ones.
I think you have things different from the output you pasted above (ie. the nfs-transfer-key should be 600 instead of 644).

Btw, maybe we can jump in a video together? I have pinged you in Slack to find some time.

@rabernat
Copy link
Contributor

I think a reasonable course of action would be to just exclude the $HOME/.ssh directory from the migration completely.

Rotating SSH keys periodically would be a wise choice anyway.

Also, please do not clobber my home directory. I have the same username on both systems.

@sgibson91
Copy link
Member

sgibson91 commented Oct 14, 2021

I think a reasonable course of action would be to just exclude the $HOME/.ssh directory from the migration completely.

I am not trying to migrate the .ssh folder, I am trying to give the old VM a public ssh key from the new VM so that I can scp/rsync the home dirs across! At the minute I cannot transfer anything!

Also, please do not clobber my home directory. I have the same username on both systems.

Ideally, I would like to use rsync so the 2 are merged rather than overwritten but I guess the only way for me to guarantee that your home directory is not clobbered would be for me to exclude it and for you to migrate it yourself?

@rabernat
Copy link
Contributor

Oops, sorry for parachuting in with an irrelevant suggestion. I clearly misinterpreted the context.

I would like to use rsync so the 2 are merged rather than overwritten

This sounds perfect. So no special treatment needed. 👍

@sgibson91
Copy link
Member

Ok, I'm now at the stage where I've managed to migrate 1 user home dir, but it has not migrated with the correct ownership. It has migrated with ownership ubuntu:root rather than ubuntu:ubuntu. I'm not sure if this is because I had to use sudo so rsync had permissions to create directories. I guess the worst case scenario here is that we run a recursive chown command over the filestore.

@sgibson91 sgibson91 moved this from Needs input 🙌 to In Progress ⚡ in Sprint Board Oct 15, 2021
@sgibson91
Copy link
Member

I opened #753 to better document this process

@sgibson91
Copy link
Member

sgibson91 commented Oct 19, 2021

Update

So after a chat with @yuvipanda today, there are a couple of things I realised won't be strictly necessary for the migration, particularly for staging.

User home dirs on staging

There is a difference between how Pangeo currently uses NFS directories and how we at 2i2c have set them up. Currently, Pangeo has one single folder called uscentral1b containing all directories regardless of a user logging in via the staging or prod hub. At 2i2c, we have configured separate subdirs for staging and prod and these do not mirror each other. This is so that if staging is breached, users' files are not accessible. It also gives the engineers the freedom to break staging without risking users' files.

Given that the Pangeo home dirs is ~1TB of data, by not copying them into the staging subfolder we save ourselves from needlessly doubling the NFS size or making users' files vulnerable to breaches on staging. This means that user's home dirs will not be available on staging after migration but I think that's a fair expectation.

Hence the remaining to-do items for staging migration:

JupyterHub databases

I had wondered if I'd need to merge the two databases from the old and new JupyterHubs, but this is only critical if users have been added manually. Once #733 has been processed, that won't be the case as auth is being handled "outside" the hub (i.e. by GitHub). The JupyterHub db has been designed to be transient and able to reconstruct itself from previous states, hence I don't think we need to do anything with the hub db.

Remaining to-do items for prod migration:

@sgibson91 sgibson91 moved this from In Progress ⚡ to Todo 👍 in Sprint Board Oct 20, 2021
@sgibson91
Copy link
Member

I am scheduling the next data migration for 5th November 2021 ready for the prod hub to go live on 8th November 2021

@sgibson91 sgibson91 moved this from Todo 👍 to In Progress ⚡ in Sprint Board Nov 5, 2021
@sgibson91
Copy link
Member

Kicked off the migration process into the prod folder on the filestore

@sgibson91
Copy link
Member

Migration completed!

Repository owner moved this from In Progress ⚡ to Done 🎉 in Sprint Board Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Task Actions that don't involve changing our code or docs.
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants