Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Hub] Callysto #1439

Closed
4 of 9 tasks
colliand opened this issue Jun 20, 2022 · 39 comments · Fixed by #1710
Closed
4 of 9 tasks

[New Hub] Callysto #1439

colliand opened this issue Jun 20, 2022 · 39 comments · Fixed by #1710
Assignees

Comments

@colliand
Copy link
Contributor

colliand commented Jun 20, 2022

Hub Description

2i2c will work with @ianabc from PIMS and @byrcyb from Cybera (perhaps others?) to transition the current Callysto hub to a new service managed by 2i2c.

The Callysto hub should be a "standard education hub" since it will be used by students and teachers in grades 5-12 who are making their initial forays into data-intensive studies.

Community Representative(s)

@ianabc @byrcyb

Important dates

Notes: target and required dates are outdated, we need to update them accordingly to new information and prioritization.

  • Target start date: 2022-07-08 end of August?
  • Required start date: 2022-07-15 mid September?
  • Any important dates for usage: Fall term (end of September)

Hub Authentication Type

Other (may not be possible, please specify in comments)

Hub logo information

Hub user image

Extra features you'd like to enable

  • Specific cloud provider or datacenter: must be in Canada; need input here from @byrcyb
  • Dedicated Kubernetes cluster
  • Scalable Dask Cluster

Other relevant information

No response

Hub URL

..2i2c.cloud

Hub Type

No response

Tasks to deploy the hub

  • Engineer who will deploy the hub is assigned
  • Deploy information filled in above
  • Initial Hub deployment PR: Add callysto cluster and hub #1649
  • Administrators able to log on
  • Community Representative satisfied with hub environment
  • Hub now in steady-state
@damianavila
Copy link
Contributor

@colliand request ack. I have added the issue to our backlog board and we will prioritize the hub deployment accordingly to available eng resources so we can deploy the hub in a timely manner.

@ianabc @byrcyb, we will ping you soon with some questions about the specific of the hub deployment.

@damianavila damianavila removed their assignment Jun 21, 2022
@GeorgianaElena
Copy link
Member

GeorgianaElena commented Jun 27, 2022

After following the discussion in the leads repo, I tried to come up with a summary of the things we know so far and the info we need in order to deploy this hub 🚀

Please feel free to update it with any info that I've missed and ask any questions that I didn't.

What we know

  • the new hub to be deployed in its own cluster, on GCP
  • the data to be stored in Canada
  • we should use some form of ids as hub usernames and not emails

Info missing about the hub

  • 1. Google has datacenters in Montréal and Toronto. Is there any preference between the two?
  • 2. Which identity providers should be supported?
    I see Google, Microsoft and institutional providers in the example at https://www.callysto.ca/2018/12/04/how-does-callysto-keep-student-and-teacher-data-secure.
  • 3. Hub logo information section
    The hub landing page can have a specific logo and a link attached to it. This generally points to a community or institutional website. Which image and link should be used?
    • URL to Hub Image: {{ URL HERE }}
    • URL for Image Link: {{ URL HERE }}
  • 4. Hub user image section
    It is possible to customize the user image for the hub, as long as the image is in a public registry
    • Repository for user image: { REPO LINK IF IT EXISTS }
    • User image registry: { REGISTRY IF ONE ALREADY EXISTS }
    • User image tag and name: { NAME AND TAG IF IT EXISTS }
  • 5. Are there any extra features needed?
    • Scalable Dask Cluster?
    • GPUs?

Questions

where @yuvipanda's thoughts can help ✨

  • for the hub usernames I was thinking to use the id in the sub claim with either Auth0 or CILogon. I was thinking mostly about the later since probably logging in with an institutional provider might be wanted. What do you think?
  • the cluster should be regional, right?
  • is there any other info that we're still missing to create the cluster that I still haven't asked?

@GeorgianaElena GeorgianaElena self-assigned this Jun 27, 2022
@yuvipanda
Copy link
Member

Thanks for working on this, @GeorgianaElena!

Cluster should definitely be regional!

I think CILogon is the way to go here, as it lets us use Google / MS / Institutional providers. Question is, would we need to limit which institutions? We can probably limit them by listing out their cilogon IdP ids.

Re: Toronto vs Montreal, my current suggestion is to pick Montreal because they do have GPU availability (https://cloud.google.com/compute/docs/gpus/gpu-regions-zones) so if we ever need GPUs in the future we can do that. GPUs nor dask are required now.

For usernames, we should find something that is:

  1. stable (doesn't change across logins)
  2. Not user controlled
  3. Non-reversible

Does id match these criteria? If we can't find any in the info CILogon sends us, we can override use a hmac to generate a non-reversible id from something like eppn (does Microsoft or Google auth set these?). We will have to keep a secret key used for this, however. Shouldn't be too difficult, given we already have secret storage infrastructure.

@yuvipanda
Copy link
Member

Who would manage the GCP billing account?

@byrcyb
Copy link

byrcyb commented Jun 28, 2022

Thanks @GeorgianaElena and @yuvipanda. We (Callysto/Cybera/PIMS) are going to manage the billing account and are currently investigating which public cloud provider (AWS, GCP, Azure) can provide us with cost-effective options that are located at a Canadian data centre. If GCP is what we go with then Montreal seems like quite a reasonable choice. Are there any specific requirements that 2i2c has that we should pass onto these cloud providers when we ask for options/quotes?

@byrcyb
Copy link

byrcyb commented Jun 28, 2022

Note, I don't think our investigation will take too long as we already have a rough estimate from one provider and have conversations going on with the other two.

@damianavila
Copy link
Contributor

damianavila commented Jun 28, 2022

For contextual/additional information about the above assignations, @GeorgianaElena will be the lead developer for this hub with @yuvipanda's assistance as a secondary companion/helper/supporter. This is maybe already obvious from the above comments, just making it explicit 😉.

@GeorgianaElena
Copy link
Member

Does id match these criteria?

@yuvipanda, according to the cilogon docs here and here, I believe so. They mainly talk about the cilogon sub claim which is of the form http://cilogon.org/serverA/users/<id> as matching these criteria and since all the other parts are const (I think), then id must also match it.

@yuvipanda
Copy link
Member

@GeorgianaElena sounds great! So this is now waiting on figuring out cloud provider and getting back to us.

@GeorgianaElena
Copy link
Member

@byrcyb do you have any updates about the cloud provider preference? Thanks!

@byrcyb
Copy link

byrcyb commented Jul 4, 2022

The person who's in charge of this is on vacation until Wednesday of this week. So, unfortunately, no new updates. I'll be in touch shortly once I hear anything.

@damianavila damianavila moved this from Needs Shaping / Refinement to Waiting in DEPRECATED Engineering and Product Backlog Jul 5, 2022
@damianavila
Copy link
Contributor

@byrcyb, do you have any updates?

Btw, can you confirm the "real" usage will start by the Fall term (by the end of August, am I correct)?
I am trying to get a better sense of the timeline so we can better accommodate (and secure) the technical resources to fulfill this new hub request in a timely manner.
Thanks!

@beakkay
Copy link

beakkay commented Jul 7, 2022

Afternoon team, @damianavila, @GeorgianaElena, @yuvipanda
I have set up a meeting with Google cloud to discuss the technical aspects of hosting the Callysto Hub with them, with 2i2c managing the hub.
Who will be available to meet with Cybera and Google to review the technical aspects and set out the plan going forward?
Just to be clear, we have not set up an agreement with Google and is this to flush out all that is needed and to project some of the costs applicable.
The tentative meeting is for 3 pm MT, Tuesday 12 July.
As soon as we know who we can include in the meeting, please provide their email addresses in order for me to forward the meeting invite.
Thank you for your assistance and support with the project
Elmar - Callysto Project Coordinator
(my email for any questions is [email protected])

@colliand
Copy link
Contributor Author

colliand commented Jul 8, 2022

Thanks @beakkay! 2i2c regularly launches hub service on Google cloud so a review engagement with our engineering team will not be required on our side. We will just need access to the account to set up the hub infrastructure. This page in our documentation provides insights into cloud cost forecasting. Callysto collaborator @ianabc can likely also provide guidance on cloud costs.

@beakkay
Copy link

beakkay commented Jul 8, 2022 via email

@byrcyb
Copy link

byrcyb commented Jul 18, 2022

@damianavila in terms of "real" usage, yes, Fall term is what we are aiming for but end of September is more the target date than end of August.

@yuvipanda
Copy link
Member

@damianavila asked me to comment about what we would need from a cloud provider as 'requirements'. I think it's:

  1. Making sure we (2i2c) have rights to grant unrestricted access to anyone we need in the cloud project
  2. We have access to the billing tools so we can look at cost reports
  3. There aren't additional technical restrictions in place enforced on us

Additionally, we support all the three major cloud providers, but at least personally I'd love for it to not be Azure haha :)

@yuvipanda
Copy link
Member

Note that these requirements are basically 'standard' - anyone creating any account on these cloud providers with a credit card basically meets all these requirements already. I'm listing them so additional restrictions don't get put on during negotiations with cloud provider.

@beakkay
Copy link

beakkay commented Aug 9, 2022

Afternoon,

We have successfully created and updated the GCP environment on which the Callysto hub will be hosted.
The last step is to add the email address that will be allowed access to the environment.
Could you please provide the email addresses we can add to the environment as administrator/s?

Thank you

@sgibson91
Copy link
Member

Hello @beakkay Could you please add all members of the Open Engineering team listed here Their email addresses are linked beneath their profiles

@Chealion
Copy link

Owner access has been granted to the Open Engineering Team - let us know if there's anything else we should do to assist with locking down or removing folks in case of any staffing changes.

@damianavila damianavila moved this from Waiting to Needs Shaping / Refinement in DEPRECATED Engineering and Product Backlog Aug 10, 2022
@damianavila
Copy link
Contributor

Thanks @Chealion!

Btw, can we go back and revisit the questions raised by @GeorgianaElena on this post: #1439 (comment). That information would be needed to start with the deployment process. Thanks!

@byrcyb
Copy link

byrcyb commented Aug 11, 2022

Yes, thanks @damianavila, we're working on this now.

@byrcyb
Copy link

byrcyb commented Aug 17, 2022

Here's what we have so far and a question about logos:

Info missing about the hub

  • 1. Google has datacenters in Montréal and Toronto. Is there any preference between the two?

Montreal

Yes, we're ok with sticking with Google and Microsoft. We don't require the institutional identity providers anymore.

  • 3. Hub logo information section
    The hub landing page can have a specific logo and a link attached to it. This generally points to a community or institutional website. Which image and link should be used?

    • URL to Hub Image: {{ URL HERE }}
    • URL for Image Link: {{ URL HERE }}

We have quite a few Hub and Image logos to choose from. Are there any specifications you require in terms of layout (e.g. vertical or horizontal), resolution, format or other?

  • 4. Hub user image section
    It is possible to customize the user image for the hub, as long as the image is in a public registry

  • 5. Are there any extra features needed?

    • Scalable Dask Cluster?
    • GPUs?

We don't have any extra features needed at the moment but we are interested in having the classic / retro notebook as the default to start with for the moment. I don't know if that's something we would discuss now or later though.

Thank you

@GeorgianaElena
Copy link
Member

GeorgianaElena commented Aug 18, 2022

Thanks a lot for filling in this info @byrcyb.

We have quite a few Hub and Image logos to choose from. Are there any specifications you require in terms of layout (e.g. vertical or horizontal), resolution, format or other?

Not really. This image will go right above the login button. Checkout the 2i2c staging hub https://staging.2i2c.cloud as an example.

Also, do you have any preference regarding the hub URL? Is http://callysto.2i2c.cloud/ ok?

@byrcyb
Copy link

byrcyb commented Aug 19, 2022

No problem, thanks for your patience on this @GeorgianaElena

For the hub image logo you can use https://www.callysto.ca/wp-content/uploads/2022/08/Callysto-HUB_vertical.png
For the image url you can use https://www.callysto.ca/

We're still figuring out the hub URL question. My preference would be to keep it as hub.callysto.ca but I'm not sure if that will cause any problems since that's our current hub url. @ianabc thoughts?

@ianabc
Copy link
Contributor

ianabc commented Aug 22, 2022

Yeah, our current hub is on hub.callysto.ca and it's been around for long enough that there's a lot of links pointing there. It might be worth doing something like 2i2c.callysto.ca for consistency with the domain, then we can update the DNS records for hub.callysto.ca after the move to make sure nobody get's lost.

@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Aug 25, 2022
@GeorgianaElena
Copy link
Member

Updates

  • @byrcyb and @ianabc, there is now a staging (https://staging.callysto.2i2c.cloud) and prod (https://callysto.2i2c.cloud) hubs running in the new callysto cluster. PR here Add callysto cluster and hub #1649.

  • Right now the hubs are locked up because the access its only allowed to a list of users (currently this is just me). It uses the oidc claim for to identify a hub username, which is a unique and opaque id.

  • In order to be able to add you as hub admins to these hub, we would need to know your IDs. You can find them by going to https://cilogon.org/testidp, and login using the Google or Microsoft provider and the account you would like to access the hub with. Under User Attributes you should find the OpenID like in the screenshot below

Screenshot 2022-08-31 at 12 53 53

Question

  • In order to give access to other hub users, you would need to add their OpenIDs through the Hub Admin Panel.
    Is this something that you'd like to do, or would you like the hub yo be open to anyone with a Google and Microsoft account instead.

Yeah, our current hub is on hub.callysto.ca and it's been around for long enough that there's a lot of links pointing there. It might be worth doing something like 2i2c.callysto.ca for consistency with the domain, then we can update the DNS records for hub.callysto.ca after the move to make sure nobody get's lost.

@ianabc, do you want to set a CNAME to callysto.2i2c.cloud so I can set the hub URL to be 2i2c.callysto.ca?

@ianabc
Copy link
Contributor

ianabc commented Aug 31, 2022

Thanks @GeorgianaElena. I've gone through the cilogon process and I'll send my openID in slack. I suspect that we will want to open the hub more generally to any microsoft or google account, which is what we are currently doing on hub.callysto.ca.

Related to that though, we need to manage the migration of user data from the old hub to the new one. On the old hub, accounts are identified by a hash (computed as a function of their email or account identifier) and I think we are doing something similar on the new hub. I started writing an extension for the old hub to help us capture the relevant information, but we'll need to figure out what their new hash/identifier would be so we can map the accounts. Also, there might be a better way of doing this than the extension I was writing. Do you know how users are identified on the new hub? e.g. is their storage related to their OpenID value or something like that?

@ianabc
Copy link
Contributor

ianabc commented Aug 31, 2022

I've added the CNAME record, it should be propagating now.

@GeorgianaElena
Copy link
Member

Thanks a lot @ianabc! Hub is now running at https://2i2c.callysto.ca and I've added you as an admin and also removed the allowed users list, which means that it matches the access level of the old hub.

@GeorgianaElena
Copy link
Member

Do you know how users are identified on the new hub? e.g. is their storage related to their OpenID value or something like that?

Yes, so the hub should only be aware of the user's OpenIDs since these are the hub usernames.

@GeorgianaElena
Copy link
Member

@ianabc and @byrcyb, can you please confirm you were able to login into the hub and everything works as expected? Thanks

@byrcyb
Copy link

byrcyb commented Sep 7, 2022

hi @GeorgianaElena I haven't had a chance to test all of the functionality yet, but, I was able to login successfully. A couple of initial questions:

  • I noticed was that my first login took over 4 minutes from when the server was requested to when the I landed in the Hub Dashboard. Is that expected behaviour?
  • In the Hub Dashboard, there are two folders shared and shared-readwrite. What are those for? Can that be configured?
  • In the Hub Dashboard, the logo is the JupyterHub logo, can we replace that with the horizontal CallystoHub logo?

image

  • In the Hub Dashboard, is it possible to not show the "Clusters" tab?
  • I was able to create a notebook and use a nbgitpuller link to pull in a notebook repository so that seems to be working.

@GeorgianaElena
Copy link
Member

I was able to login successfully

Yay 🎉

  • I noticed was that my first login took over 4 minutes from when the server was requested to when the I landed in the Hub Dashboard. Is that expected behaviour?

I believe it took this long because the server spawning required a new node to be created and this takes a couple of minutes usually. The node spin up event, was probably followed by the user image needing to be downloaded since it was your first time logging in and starting a server which also took a couple of minutes I suspect.
Subsequent server starts shouldn't take this long. For cases when a lot of users are expected at once, we could also have a number of nodes be available all the time to eliminate the time it takes to spin one up. But my suggestion is to wait until the hub get more usage.

  • In the Hub Dashboard, there are two folders shared and shared-readwrite. What are those for? Can that be configured?

We have docs about how to use these folders here: https://docs.2i2c.org/en/latest/admin/howto/data.html

  • In the Hub Dashboard, the logo is the JupyterHub logo, can we replace that with the horizontal CallystoHub logo?
  • In the Hub Dashboard, is it possible to not show the "Clusters" tab

I believe we have the infrastructure to change easily everything that's part of the hub UI, but I will need to double check these two as I'm not confident they are part of the hub's UI, but rather the notebook's. I will come back with a more clear answer asap.

  • I was able to create a notebook and use a nbgitpuller link to pull in a notebook repository so that seems to be working.

Yuhuuuuu 🚀

@byrcyb
Copy link

byrcyb commented Sep 12, 2022

Hi @GeorgianaElena I showed the hub to some high school educators on our team and there were some comments and questions about the CILogon page that is encountered when logging into the hub. Are there ways to customize:

  1. The CILogon Logo - it's confusing for users since they don't know who CILogon is and may not trust that this is part of the normal login process.
  2. The language used to make it more user friendly as it relates to the "Consent to Attribute Release" - e.g. not sharing this data (attributes) with any 3rd parties, and why we are asking for it.
  3. The mechanism to select the Identity Provider? Currently it's a drop-down but can we use the Logos that people select instead which is similar to what we currently use in hub.callysto.ca

I'm also not sure if this is part of the custom jupyter-server templates issue that you linked 😄 #1697

@GeorgianaElena
Copy link
Member

In the Hub Dashboard, the logo is the JupyterHub logo, can we replace that with the horizontal CallystoHub logo?
In the Hub Dashboard, is it possible to not show the "Clusters" tab

@byrcyb, this is now almost done. I have not deployed it to the main hub yet because I am waiting for feedback from the team, but you can check it out on the staging hub if you want, which is running at https://staging.callysto.2i2c.cloud/hub/home 🚀

there were some comments and questions about the CILogon page that is encountered when logging into the hub. Are there ways to customize:

Unfortunately these CILogon related customizations are not possible. But we will prioritize deliberating whether this is something we should support as part of our infrastructure.

@byrcyb
Copy link

byrcyb commented Sep 16, 2022

Thanks for the update @GeorgianaElena.

I was doing more testing and noticed that downloading our Callysto notebooks as PDF via LaTeX or HTML throws a 500: Internal Server Error page. I'm not sure if this is a hub issue, perhaps something with nbconvert, or something else. Any help on this is greatly appreciated. Here's the error thrown when I try to download PDF via HTML:

nbconvert failed: Pyppeteer is not installed to support Web PDF conversion. Please install nbconvert[webpdf] to enable.

The error log when I try to download PDF via LaTeX is much longer so I've attached it (as a PDF).
500 - Internal Server Error.pdf

@GeorgianaElena
Copy link
Member

  • @byrcyb, it looks like you're missing the pyppeteer package in the user image.
    The hub is currently using this custom user image and version that needs to be updated, I believe:

singleuser:
image:
name: callysto/2i2c
tag: 0.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

10 participants