feat: Specify image in server request #452

olevski · 2020-11-03T22:37:36Z

Closes #304

It took a while because I realized that some of the code I wrote was already present when we are checking for the gitlab image. So I spent a bit more time to clean this up and avoid repeating/similar code.

The logic implemented here is as follows:

look for image in the payload from the request to create a new user interactive server
- if image is found then:
  - check if the image name passed regex for dockerhub public image, google container registry (gcr) public image and gitlab
  - if any of those match and the image is confirmed to exist then use that image
  - in the case of a gitlab image if the image is part of renku's gitlab check if the image is public or not and accordingly add a image pull secret
  - in the case that the image requested cannot be found respond with a 404
- if image is not found (what we had before this PR):
  - look at the commit for the current project, try to find an image that matches the current repo and commit
  - if such an image is not found use the default renku image specified in the environment variable

Also in every of the above cases the following annotations are added to the user pod that is launched:

renku.io/default_image_used, True or False depending on wheter the image tied to the current commit could not be found and the default image was used instead
renku.io/image, the name of the image used i.e. renku/renkulab-py:3.7-renku0.10.4-0.6.3

This is currently deployed at https://tasko.dev.renku.ch/. I found the easiest way to test is to just use telepresence and replace None with an image value that you would like to test in here requested_image = payload.get("image", None) in line 85 in api/notebooks.py.

lorenzo-cavazzi · 2020-11-04T12:44:10Z

At first sight, the code looks great. I'm going to test this soon.

Should we add a few more tests to cover the newly created functions, including gcr_public_image_exists, dockerhub_public_image_exists, etc. ?
Not sure about possible rate limits problems when invoking 3rd party services like dockerhub and grc though 🤔

ableuler

This looks really good and seems to give us what we need! 🎉
The only bigger refactoring that I would propose is to try and make the logic for checking the existence of an image a bit more simple and generic. I believe that this should be possible doing something along those lines:

if image does not include host, prepend registry-1.docker.io/
parse image into host, repository, tag
get request to https://{host}/v2/{repository}/manifests/{tag}, if 200, image exists, all good, return
if 401, get token URL and service from the Www-Authenticate header of the 401 response
do access token request with specifying the pull scope for the given repository (for RenkuLab's gitlab include the users oauth token at Authorization: basic {gitlab_oauth_token})
use aquired access token for another get request to https://{host}/v2/{repository}/manifests/{tag}, if 200, image exists, all good, return

I haven't tested this through myself, but I'm pretty sure that this (or some small variations of it should work. This would allow for all publicly available image registries and reduce the need for all the logical branching in the code.

olevski · 2020-11-04T14:05:57Z

@lorenzo-cavazzi I can add the tests you suggest that is a good idea.

@ableuler if a response from the API (either for docker or gitlab) without a token tells you where to go to authenticate then that is great. This is what I had issues with - I did not try to see what is in the response when you do not have a token. The most trouble I had is figuring out the URL to authenticate for renku's gitlab (which may also change with different deployments) and also even for a managed gitlab. So I will try what you propose. I originally did want to have a more unified approach but gave up mostly because I was not sure what is the endpoint where you should authenticate.

ableuler · 2020-11-04T14:29:07Z

because I was not sure what is the endpoint where you should authenticate.

Yes, that's not specified since the token endpoint is not part of the registry itself. The v2 registry documentation only specifies how the client must be informed about where to authenticate: https://github.com/docker/distribution/blob/c192a281f8ac6f2a351fe729c8a56108f8edb377/docs/spec/auth/token.md#how-to-authenticate

rokroskar · 2020-11-04T17:02:41Z

Thanks @olevski!

I have yet to test it out but I'm a bit confused by this:

The most trouble I had is figuring out the URL to authenticate for renku's gitlab (which may also change with different

You have the GITLAB_URL environment variable and you have the user's oauth token so you can use the gitlab registry api.

Two more points:

"if such an image is not found use the default renku image specified in the environment variable" --> I would just fail if no valid image can be found. It's super confusing for users to get an environment with an image they don't expect. The UI should give them the option to launch with the default image but then this will be specified in the UI request with image in the payload.
why do we need the extra annotations? Why would you need to know that a default image was used? The renku.io/image annotation is redundant with the image set for the container that you can get out of the pod manifest anyway. I can definitely see the usefulness of giving this information to the UI in the response, but it doesn't need to be a pod annotation.

lorenzo-cavazzi · 2020-11-04T17:17:16Z

@rokroskar

"if such an image is not found use the default renku image specified in the environment variable" --> I would just fail if no valid image can be found. It's super confusing for users to get an environment with an image they don't expect. The UI should give them the option to launch with the default image but then this will be specified in the UI request with image in the payload.

This applies only when no custom image is specified, so the behavior is the same as before but the information about the default image being used is saved and the UI can give feedback to the user when that actually happens.
It always fails when the image is provided and it's not accessible.

The second one is a good point, we didn't consider the information is already there and I thought an annotation was the easiest solution.

olevski · 2020-11-04T22:55:46Z

So I think I either have some really weird behaviour with telepresence or someone else is trying to run telepresence at the same time as me right now. If it is the latter then sorry for reinstalling the whole deployment a few times. I did not think of this. Also I just deployed some new code that addresses Andreas' comments. I have not had a chance to test it out fully yet. But checking for a public image is now much simpler as Andreas suggested. I did not know about the Www-Authenticate in the header.

ableuler · 2020-11-05T00:35:36Z

@olevski sometimes when telepresence doesn't exit properly, you have to clean up manually:

Check that the local flask server isn't running anymore, I usually do this by checking if something is using the specific port, for example lsof -i tcp:8000.
Delete the deployment created by telepresence using kubectl
Scale the original deployment back up from 0 to the desired number of replicas.

After that the application should be in a clean state again and you should be able to start another telepresence session normally.

olevski · 2020-11-05T11:54:35Z

Ok so I made additional changes to the code to address comments you all posted:

@ableuler
- As you suggested checking whether a public image exists is now done with less logic and one single function. Suggesting to use www-authenticate in the header was a lifesaver.
- I also tested with a sha256 instead of a tag and things work
@rokroskar
- I am not sure if you want to change how we operate with regard to the default image. Even without this PR (and as @lorenzo-cavazzi mentioned) if the image that is tied to the current commit does not exist renku will not fail but will use a default image. I have retained this in the new code here. When the user requests a specific image (i.e. the API gets something from the UI for the image parameter) then we fail if that image does not exist. Let me know if you agree with this approach.
- I removed the annotations from the pod and edited the function that lists the pods parameters for the servers endpoint to return a field called image that is the image name and another field called default_image_used. So there is no more unneeded annotations this data I pull from the pod manifest as you suggested.
@lorenzo-cavazzi
- I added tests for the added code that parses a user specified image name and also for the functions that checks if the image exists.

Let me know what you think.

rokroskar · 2020-11-06T08:17:15Z

Even without this PR (and as @lorenzo-cavazzi mentioned) if the image that is tied to the current commit does not exist renku will not fail but will use a default image.

Right, this is the behavior I'm talking about changing. Maybe it's better to do it in a separate PR, but atm it can happen that a user's environment will be created with some default image, even though they think they asked for something else. This situation is super confusing. It's true that this is normally handled by the UI, but we've had cases where there was some issue with the image late in the process which resulted in the default image being used and things got super confusing. So my point is just that I'd rather be conservative and fail early in this case.

lorenzo-cavazzi · 2020-11-06T08:30:29Z

If that is a desirable change, I'd rather do it now than later since we are adding variables both here and in the UI to handle that specific case better.
I personally think the default image is confusing, and it may not be necessary anymore now that we check for the image existance in the UI -- the user already knows beforehand if something is wrong.
We could give it a try and, if it turns out that the default image was used frequently and it was actually useful, we could re-introduce it later.

olevski · 2020-11-14T13:53:15Z

Ok this is super weird, the tests I added pass on python 3.8.5 (which is the version in my local environment) but fail on python 3.7 which is what is used on the tests in the git actions. Will figure this out. I switched to 3.7 and I can replicate the failed tests. So I can figure out what the problem is. We should also maybe specify the python version in the pipfile so that we avoid similar issues in the future.

olevski · 2020-11-14T20:29:36Z

Hi guys, sorry for the delays. But I fixed the tests I mentioned were failing earlier - they were failing because the unittest.mock API is just a bit different between python 3.7 and 3.8.

In addition to this I addressed all the outstanding comments and did some tests to confirm everything works. When someone requests a nested gitlab image now things work (as long as the image exists).

This is currently deployed at https://tasko.dev.renku.ch/.

Lastly one thing I did not touch and I think we agreed on tackling this in a separate PR is what is returned when repeated POST or GET requests are made to the endpoint that creates the servers (and the sever tied to that commit exists). Currently the server that was created is returned but if one changes the image in the POST request the response is the original server that was created even though the original server has a different image than in the POST request. I propose that we tackle this in a separate PR.

One last thing that I wanted to mention is that I added tests for the logic of requesting a non-existent image, the case where you should fall back to the default or the case where a specific image is requested. These are only unit tests but they still test that the expected image name is found in the request to create the server on the backend.

olevski · 2020-11-16T20:14:11Z

If someone is testing right now, I have to deploy an older version to test something else at https://tasko.dev.renku.ch/. Will post back when the version tied to this PR is back on.

olevski · 2020-11-16T23:16:01Z

ok the version that matches this PR is back on https://tasko.dev.renku.ch/

lorenzo-cavazzi · 2020-11-17T16:21:52Z

I am trying to work with this PR while developing the UI counterpart. All seems to work fine for a while, then I start getting 404 errors when trying to create new environments.

The response to POST /servers is a 404 with a text like this:

Cannot find project New-project---all-good for user: lorenzo.cavazzi.tech.

The 2 relevant entries logged in the notebook pod are

[2020-11-17 15:52:30,485] DEBUG in notebooks: Request to create server: New-project---all-good-5b3c3d1d with options: {'namespace': 'lorenzo.cavazzi.tech', 'project': 'New-project---all-good', 'commit_sha': 'eb86b2c9296a062354606c4c0a124db62086e666', 'branch': 'master', 'serverOptions': {'cpu_request': 0.1, 'defaultUrl': '/lab', 'gpu_request': 0, 'lfs_auto_fetch': False, 'mem_request': '1G'}} for user: {'kind': 'user', 'name': 'lorenzo.cavazzi.tech', 'admin': False, 'groups': [], 'server': None, 'pending': None, 'created': '2020-10-22T17:07:55.600737Z', 'last_activity': '2020-11-17T15:51:40.525623Z', 'servers': None}
[2020-11-17 15:52:30,767] ERROR in gitlab_: Cannot get project: lorenzo.cavazzi.tech/New-project---all-good for user: {'kind': 'user', 'name': 'lorenzo.cavazzi.tech', 'admin': False, 'groups': [], 'server': None, 'pending': None, 'created': '2020-10-22T17:07:55.600737Z', 'last_activity': '2020-11-17T15:51:40.525623Z', 'servers': None}, error: 401: invalid_token

If I try to log out and log in again, everything works fine.

[2020-11-17 15:59:01,800] DEBUG in notebooks: Request to create server: New-project---all-good-5b3c3d1d with options: {'namespace': 'lorenzo.cavazzi.tech', 'project': 'New-project---all-good', 'commit_sha': 'eb86b2c9296a062354606c4c0a124db62086e666', 'branch': 'master', 'serverOptions': {'cpu_request': 0.1, 'defaultUrl': '/lab', 'gpu_request': 0, 'lfs_auto_fetch': False, 'mem_request': '1G'}} for user: {'kind': 'user', 'name': 'lorenzo.cavazzi.tech', 'admin': False, 'groups': [], 'server': None, 'pending': None, 'created': '2020-10-22T17:07:55.600737Z', 'last_activity': '2020-11-17T15:58:17.705863Z', 'servers': None}
[2020-11-17 15:59:03,052] DEBUG in notebooks: Creating server New-project---all-good-5b3c3d1d with {'namespace': 'lorenzo.cavazzi.tech', 'project': 'New-project---all-good', 'branch': 'master', 'commit_sha': 'eb86b2c9296a062354606c4c0a124db62086e666', 'project_id': 5177, 'notebook': None, 'image': 'registry.dev.renku.ch/lorenzo.cavazzi.tech/new-project---all-good:eb86b2c', 'git_clone_image': 'lorenzocavazzitech/git-clone:0.8.3-334f16b', 'server_options': {'cpu_request': 0.1, 'defaultUrl': '/lab', 'gpu_request': 0, 'lfs_auto_fetch': False, 'mem_request': '1G'}}
[2020-11-17 15:59:04,304] DEBUG in notebooks: spawn initialized for New-project---all-good-5b3c3d1d

It seems that renku-notebooks considers the credentials expired but they aren't -- I can browse private repositories in the UI without problems. Maybe the error is misleading and the issue is with the JypiterHub credentials that we use to create a server but we don't use to get the servers list through GET /servers.
It's a bit hard to replicate. I think it will happen if you interact with the environments from the UI as a logged user, then you close the interface and you open it again later (after a few hours?).
Not sure if this helps, but I think that in the past I was getting a 404 on GET /servers when JypiterHub credentials were expired, triggering a re-login in the UI. Is it possible that this behavior has changed? It would explain the problem, but solving it would be hard.

P.S. I've used a private project, but it's the same with public projects.

olevski · 2020-11-17T21:31:59Z

@lorenzo-cavazzi I cannot replicate the behaviour you describe exactly. I think I did get something similar though because for a while even though I was logged in as tasko.olevski renku thought I was logged in as my other account olevski90 and I would not be able to create environments or even see the active ones because somehow the requests sent were for olevski90 and not tasko.olevski. This happened even though my profile said I was logged in as tasko.olevski. After I logged out and logged back in things worked normally though. I have not been able to get anything weird ever since.

This is also really weird because I did not touch the authentication code at all in this PR.

ableuler · 2020-11-18T08:05:04Z

Jupyterhub gets its gitlab oauth token "directly" from gitlab (without the gateway being involved). This can lead to weird login situations where the gateway has a gitlab oauth token for user A and Jupyterhub has one for user B. Especially in our dev setup where different RenkuLab instances rely on the same gitlab AND we often juggle with multiple users, this is likely to happen. I am pretty confident that this is unrelated to the code changes proposed in this PR.

ableuler

I think we're there 🎉 .
Only 3 one-liner suggestions.

renku_notebooks/api/notebooks.py

Co-authored-by: Andreas Bleuler <[email protected]>

ableuler

LGTM!

olevski added 12 commits November 2, 2020 15:13

add checks if a specified image is present

6434b1e

modify notebooks api

42eae4a

edits

3309c75

add annotation

8871f7e

add ability to use commit sha as image name

c773c2f

Edits

df460ef

update annotations

95e3d53

Edit spawner

133affe

Edit spawner

71729fb

Edit spawner

f7c56bb

ensure tests pass

3b194d6

Fix black formatting

eade11c

olevski requested a review from a team as a code owner November 3, 2020 22:37

Merge branch 'master' into specify-image-in-server-request

205e7c9

ableuler reviewed Nov 4, 2020

View reviewed changes

refactor

ceff3a0

olevski added 3 commits November 5, 2020 12:03

refactor code

ccc0422

fix tests

54221dc

add public image tests

6c866d3

rokroskar changed the title ~~enhancement: Specify image in server request~~ feat: Specify image in server request Nov 6, 2020

olevski added 4 commits November 14, 2020 13:16

Add comments and docstrings

0e34b5e

Fix failing tests

35230bc

Check failing test fix

a903b2c

Check failing test fix

a58c1b4

olevski added 4 commits November 14, 2020 20:10

Fix tests

4fbd420

Merge branch 'master' into specify-image-in-server-request

b6013b5

move default image used to annotations

28123f9

edits

debb6d9

Merge branch 'master' into specify-image-in-server-request

5e1faef

lorenzo-cavazzi mentioned this pull request Nov 17, 2020

custom images for interactive environments SwissDataScienceCenter/renku-ui#1109

Merged

ableuler suggested changes Nov 18, 2020

View reviewed changes

renku_notebooks/api/notebooks.py Outdated Show resolved Hide resolved

renku_notebooks/api/notebooks.py Outdated Show resolved Hide resolved

renku_notebooks/api/notebooks.py Show resolved Hide resolved

olevski and others added 7 commits November 19, 2020 01:43

fix comment

d9b6ca4

Co-authored-by: Andreas Bleuler <[email protected]>

Fix error message

3cb723f

Co-authored-by: Andreas Bleuler <[email protected]>

add debug log message when using default image

c7fe1b2

Merge branch 'master' into specify-image-in-server-request

b4b7452

fix tests

c987cdd

remove hardcoded annotation prefix

55268d7

Merge branch 'master' into specify-image-in-server-request

4cb658b

ableuler approved these changes Nov 23, 2020

View reviewed changes

ableuler requested a review from rokroskar November 23, 2020 20:09

rokroskar approved these changes Nov 23, 2020

View reviewed changes

olevski merged commit 95d4f92 into master Nov 24, 2020

olevski deleted the specify-image-in-server-request branch November 24, 2020 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Specify image in server request #452

feat: Specify image in server request #452

olevski commented Nov 3, 2020

lorenzo-cavazzi commented Nov 4, 2020

ableuler left a comment

olevski commented Nov 4, 2020

ableuler commented Nov 4, 2020

rokroskar commented Nov 4, 2020

lorenzo-cavazzi commented Nov 4, 2020

olevski commented Nov 4, 2020

ableuler commented Nov 5, 2020 •

edited

Loading

olevski commented Nov 5, 2020

rokroskar commented Nov 6, 2020

lorenzo-cavazzi commented Nov 6, 2020

olevski commented Nov 14, 2020

olevski commented Nov 14, 2020 •

edited

Loading

olevski commented Nov 16, 2020

olevski commented Nov 16, 2020

lorenzo-cavazzi commented Nov 17, 2020

olevski commented Nov 17, 2020

ableuler commented Nov 18, 2020

ableuler left a comment

ableuler left a comment

feat: Specify image in server request #452

feat: Specify image in server request #452

Conversation

olevski commented Nov 3, 2020

lorenzo-cavazzi commented Nov 4, 2020

ableuler left a comment

Choose a reason for hiding this comment

olevski commented Nov 4, 2020

ableuler commented Nov 4, 2020

rokroskar commented Nov 4, 2020

lorenzo-cavazzi commented Nov 4, 2020

olevski commented Nov 4, 2020

ableuler commented Nov 5, 2020 • edited Loading

olevski commented Nov 5, 2020

rokroskar commented Nov 6, 2020

lorenzo-cavazzi commented Nov 6, 2020

olevski commented Nov 14, 2020

olevski commented Nov 14, 2020 • edited Loading

olevski commented Nov 16, 2020

olevski commented Nov 16, 2020

lorenzo-cavazzi commented Nov 17, 2020

olevski commented Nov 17, 2020

ableuler commented Nov 18, 2020

ableuler left a comment

Choose a reason for hiding this comment

ableuler left a comment

Choose a reason for hiding this comment

ableuler commented Nov 5, 2020 •

edited

Loading

olevski commented Nov 14, 2020 •

edited

Loading