-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate home directories of Pangeo users to the 2i2c Hub #653
Comments
This issue is a MUST HAVE for the migration of the Pangeo GCP production cluster. |
Docs we already have on moving user directories: https://pilot-hubs.2i2c.org/en/latest/howto/operate/move-hub.html Since the new hub is using Google Filestore, I will need to mount that to a VM to be able to carry out the transfer. Instructions here: https://cloud.google.com/filestore/docs/creating-instances I'm keen to use This StackOverflow answer provides a Python class to merge SQLite db's https://stackoverflow.com/a/61954182 |
@paigem there is a wrinkle with migrating the COESSING users home directories. Since the COESSING hub uses Google for auth and the Pangeo hub uses GitHub, the paths won't match up even if I do transfer the data because they'll contain their emails, not GitHub handles. Therefore, I strongly recommend they download their work locally to upload to the new hub when they switch is made (both hubs will be available simultaneously for a short while) since there'll be no way for me to map emails to github handles. |
@jhamman @scottyhq if either of you have the time to go through the GCP hub's nfs storage with me, I'd really appreciate it! It's not setup how I was expecting.
I could be looking in the wrong place though? |
Sorry, but i have only worked on the AWS infrastructure, so won't be able to help here |
I am the one who executed the previous migration of home directories from the older cluster (ocean.pangeo.io) to the current one (https://us-central1-b.gcp.pangeo.io/). This was very hard because the old cluster used ORCID (via Globus) for auth, so we had to create a mapping between ORCID and GitHub user name. Then we gzipped each user's home directory from one cluster and extracted it to the new cluster under a new username one at a time. Some of the scripts I used to do this were archived here: I believe that the script is telling us that the user homedirs should live in Is this at all helpful? |
Thanks Ryan, I'm sure the scripts will come in handy, but I'm still struggling to find anything!
Update: A different
|
Here is how I look at the home directories
I don't know what the VM instance |
Thanks @rabernat, the above worked. I wasn't aware the old cluster was using a filestore (which is good as that's what the new cluster is also using!). I've got the directories now :)
I actually think this is the VM for the last migration as it has |
I have successfully mounted each NFS filestore to a VM in each Google Cloud project and found the locations of existing user directories and where they should be copied to. However, I am now stuck establishing an ssh connection between the two VMs. Definitions:
What I did:
I also have logs from:
but wasn't sure what from those logs would be safe to copy-paste into a public issue |
Things changed:
|
I've narrowed this down to something being up with the I had trouble with the However, the |
I also double checked that I could ssh from target VM to source VM with the original ssh key as Basically, I think I botched the whole |
I think I needed to follow something like these instructions: https://www.digitalocean.com/community/tutorials/how-to-create-a-sudo-user-on-ubuntu-quickstart so I'll try that tomorrow We should definitely update our docs on this! |
@sgibson91, quick question, what are the permissions and ownership in the |
I agree, and my current suspicion is that it's because the |
I tried you suggestion @damianavila still with no luck 😭
|
Well... chmod 700 should be in the .ssh directory, chmod 644 for public keys, and chmod 600 for private ones. Btw, maybe we can jump in a video together? I have pinged you in Slack to find some time. |
I think a reasonable course of action would be to just exclude the Rotating SSH keys periodically would be a wise choice anyway. Also, please do not clobber my home directory. I have the same username on both systems. |
I am not trying to migrate the .ssh folder, I am trying to give the old VM a public ssh key from the new VM so that I can scp/rsync the home dirs across! At the minute I cannot transfer anything!
Ideally, I would like to use rsync so the 2 are merged rather than overwritten but I guess the only way for me to guarantee that your home directory is not clobbered would be for me to exclude it and for you to migrate it yourself? |
Oops, sorry for parachuting in with an irrelevant suggestion. I clearly misinterpreted the context.
This sounds perfect. So no special treatment needed. 👍 |
Ok, I'm now at the stage where I've managed to migrate 1 user home dir, but it has not migrated with the correct ownership. It has migrated with ownership |
I opened #753 to better document this process |
UpdateSo after a chat with @yuvipanda today, there are a couple of things I realised won't be strictly necessary for the migration, particularly for staging. User home dirs on stagingThere is a difference between how Pangeo currently uses NFS directories and how we at 2i2c have set them up. Currently, Pangeo has one single folder called Given that the Pangeo home dirs is ~1TB of data, by not copying them into the Hence the remaining to-do items for staging migration:
JupyterHub databasesI had wondered if I'd need to merge the two databases from the old and new JupyterHubs, but this is only critical if users have been added manually. Once #733 has been processed, that won't be the case as auth is being handled "outside" the hub (i.e. by GitHub). The JupyterHub db has been designed to be transient and able to reconstruct itself from previous states, hence I don't think we need to do anything with the hub db. Remaining to-do items for prod migration:
|
I am scheduling the next data migration for 5th November 2021 ready for the prod hub to go live on 8th November 2021 |
Kicked off the migration process into the |
Migration completed! |
Description
There are many users that are currently on the old Pangeo JupyterHub (at https://us-central1-b.gcp.pangeo.io/). We should migrate their user home directories to the hub that we are deploying.
Value / benefit
This will minimize the disruption that these users feel when they migrate from one hub to the next.
Implementation details
We should understand whether we need to simply point the old hub's user filesystem to our new hub, or if we will have to move those filesystems instead.Update 2021/10/06: We will be copying contents from one filesystem to another.We'll need to make sure that the new hub is "ready to go" when this happens, because it will force many users to use the new hub since that's where their work will be. Update 2021-10-06: Sarah plans to make posts in Pangeo discourse which will detail when the domain name switch will happen and when the last data migration happened before that date (a gap she will try to minimise to the best of her ability).
Since the new hub (https://pangeo.2i2c.cloud) has current users, at the hub database step of the move we will need to take care to merge the two databases rather than overwrite them, so no one's access is lost.
The old hub is at https://us-central1-b.gcp.pangeo.io/.
Tasks to complete
Investigate how to merge SQLite databases so we don't overwrite any of the new hub's user dataNot required. See Migrate home directories of Pangeo users to the 2i2c Hub #653 (comment)Updates
The text was updated successfully, but these errors were encountered: