Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs currently suggest to use VSCode in the bastion node #62

Open
pierreglaser opened this issue Jul 9, 2024 · 18 comments
Open

Docs currently suggest to use VSCode in the bastion node #62

pierreglaser opened this issue Jul 9, 2024 · 18 comments
Labels
enhancement New feature or request

Comments

@pierreglaser
Copy link

pierreglaser commented Jul 9, 2024

Hey, and thanks for putting this documentation together! It maybe worth bookmarking the link on SWC's #computing channel to make it easily discoverable.

In the "Remote development" section of the docs, it is more or less suggested to use VSCode in the bastion node:

Then, when you click on the “Open a Remote Window” button in the bottom left corner of the VS Code window, you will see a list of the SSH hosts you have configured in your ~/.ssh/config file. You can then select the host you want to connect to - e.g. swc-gateway.

But if many users start doing this, the bastion node could run into memory errors due to many memory-hungry VSCode apps being opened at the same time. Could the docs be updated to instead recommend and explain how to set-up VSCode into a compute node? A guide to do exactly this was actually put together by Cristofer Holobetz (to find it, search "cristofer holobetz pdf" on slack). Thanks!

@pierreglaser pierreglaser added the enhancement New feature or request label Jul 9, 2024
@niksirbi
Copy link
Member

niksirbi commented Jul 9, 2024

Thanks @pierreglaser, we've also noticed the issue with our remote development guide, and we're discussing changes to that here.

We are considering to update or even remove that section, because it's indeed misleading.

Regarding Cristofer's guide, I have some reservations. It's indeed possible to ssh into a compute node and do remote develpment via VSCode in this way. However, this way of running jobs is not really controlled/limited via SLURM and could lead to consuming all resources in a node. At least that's what I've understood from my discussions with @adamltyson on this topic.

It would be great to come up a remote development solution that doesn't burden the bastion node and also respects SLURM.

@niksirbi
Copy link
Member

niksirbi commented Jul 9, 2024

Other potential solutions for remote development:

  • Some version of what you describe here with JupyterLab. I've tried these instructions, and they work. They should also respect SLURM resource allocation because you're are starting the JupyterLab session with srun.
  • Using a Jupyer notebook/lab app running via OpenOnDemand. @lauraporta in our team has had success with this and it could be the more user-friendly way to go, provided that we work out the kinks and document the workflow.

@pierreglaser
Copy link
Author

Thanks for the quick answer!

However, this way of running jobs is not really controlled/limited via SLURM and could lead to consuming all resources in a node. At least that's what I've understood from my discussions with @adamltyson on this topic.

I'm not sure about this: Cristofer's tutorial clearly states that the node in which VSCode will be started should be obtained through SLURM, using srun for instance. Did you have anything else in mind? I think that this part of the docs is very useful and I recommend keeping it, unless there are clear drawbacks (which right now I don't see once this bastion node issue is addressed).

@adamltyson
Copy link
Member

Very possible that I'm wrong, but I don't understand how just by SSH-ing to a node, somehow that workload is therefore monitored by the SLURM job scheduler. It may be possible if you request the entire node, but then I'm not sure if SLURM will be able to kill the job etc.

@niksirbi
Copy link
Member

niksirbi commented Jul 9, 2024

I think if you start an interactive job via srun, and then ssh into it using the node name, you are indeed using the node you requested, but I don't know if the constraints on memory, cores etc are respected in that case. For example if you do srun --mem 8G and then ssh into that node, what guarantees that you won't exceed the 8G?

Anyhow, I'll message Cristofer so he can also participate in the discussion. I think he may have asked the scientific computing team about this.

@lauraporta
Copy link
Member

In the case that @niksirbi is right, I've found another possible solution: start a job that runs sshd and connect vscode to it. In this way the resources used by vscode will be effectively controlled by slurm. I didn't test this solution yet.
Also, open ondemand offers code server: vscode accessed via the browser running within a slurm job. I was interested in installing it some time ago.

@pierreglaser
Copy link
Author

pierreglaser commented Jul 9, 2024

but I don't understand how just by SSH-ing to a node, somehow that workload is therefore monitored by the SLURM job scheduler

The VSCode app has to be ran in a compute node obtained through slurm, as stated in cristofer's tutorial. However, to port forward VSCode back to your local machine, you have to start an ssh process from your machine to the compute node. This ssh process won't run anything, it just allows ports to be forwarded, which requires much more complex solutions to be done via slurm.

For example if you do srun --mem 8G and then ssh into that node, what guarantees that you won't exceed the 8G?

As the slurm app on the SWC cluster is currently configured, there is a memory limit enforcment mechanism through the cgroup plugin, so yes, you are guaranteed to not exceed 8G.

@niksirbi
Copy link
Member

You may very well be right @pierreglaser, but I don't sufficiently understand the internals of SLURM, cgroup, VScode's remote ssh plugin and their interactions to be confident about it. We may have to do some tests to confirm and consult Alex and John about it. Assuming we can confirm this, I'm happy to update the VSCode instructions according to @cristofer-holobetz 's guide.

@pierreglaser
Copy link
Author

Mmmh. Note that regardless of whether I'm correct, Cristofer's solution is an improvement over what is currently officially suggested (use VSCode on the login node). So not sure why we should delay moving forward with this solution.

@adamltyson
Copy link
Member

I think it's important we only document things we know to be correct. It's unlikely that users will regularly consult this documentation to change their workflows.

@niksirbi for now, shall we just remove this section?

@niksirbi
Copy link
Member

For now I suggest the following:

  • Remove the "remote development" section for now, to minimise the damage from additional people reading it and applying it as is
  • Apply the other small fixes suggested in Updates to ssh howto guide #61 that could help with reducing the workload on the login nodes
  • Open a new issue for coming up with a proper long-form "Remote development with VSCode" guide, with input from Pierre and Cristofer

I can get this done this week if you agree.

@adamltyson
Copy link
Member

Sounds good to me. Thanks!

@pierreglaser
Copy link
Author

pierreglaser commented Jul 12, 2024

I looked more into this issue: it turns out VSCode remote SSH mode does not use SLURM. Cristofer tutorial required you to ask for a compute node via slurm prior to connecting to the said node using VSCode, which led me to think otherwise. But this step is not required as VSCode just starts its own ssh connection.

As the link @lauraporta referenced link shows, this seems to be a well-documented issue on the VSCode side with only partial fixes existing. One option uses sshd within a slurm-allocated compute node (the one Laura mentioned), but the SLURM environment variables are not inherited by the new connections, and require additional hacks to be fully functional, so not ideal.

Another option is to use code-server (a program which serves VSCode in a webapp) in compute nodes, and use VSCode in your local machine's browser. Unlike other alternatives, the steps to get setup are very simple:

  1. install code-server by downloading the binaries (alternatively we could ask IT to set it up globally on all nodes)
export VERSION=4.91.0
mkdir -p ~/.local/lib ~/.local/bin
curl -fL https://github.com/coder/code-server/releases/download/v$VERSION/code-server-$VERSION-linux-amd64.tar.gz \
  | tar -C ~/.local/lib -xz
mv ~/.local/lib/code-server-$VERSION-linux-amd64 ~/.local/lib/code-server-$VERSION
ln -s ~/.local/lib/code-server-$VERSION/bin/code-server ~/.local/bin/code-server
PATH="~/.local/bin:$PATH"
  • ask slurm for an interactive node via srun (srun --pty /bin/bash -l), find it's host name
  • start code-server on some port (code-server --bind-addr=localhost:8081)
  • port forward between the compute node and your machine's: ssh pierreg@<compute-node-hostname> -J hpc-gw1 -N -L 8081:localhost:8081
  • open localhost:8081 on google chrome (flawless code-server UI for as much as I tried. I tried other browsers prior to this one and the code-server UI was buggy).

The in-browser experience is pretty-much feature-complete since VSCode is ran under the hood (You can install extensions, start a terminal etc).

code-server seems widespread, and the solution is both robust and respects SLURM. WDYT?

@niksirbi
Copy link
Member

niksirbi commented Jul 15, 2024

Thanks a lot for investigating this @pierreglaser!

I gave it a shot and it indeed seems to work just fine (incl. from Firefox). This is definitely an improvement on the previous guide, so I'll write up something and have it tested by a few more people.

If all seems well we can ask IT to centrally install code-server, which will make the instructions even simpler.

@adamltyson
Copy link
Member

If we're asking IT to install stuff centrally, is it worth just going straight for VSCode via OOD?

@niksirbi
Copy link
Member

Well the two things are complementary. If people would like to use a VSCode app via OOD, the IT would have to anyway centrally install code-server and then link it to OOD.
Pierre's instructions provide a way to use code-server directly, without making it an OOD app. In a way OOD is an abstraction layer that will make this procedure more user-friendly, and can additionally serve as an entry-point to other apps like Jupyter Lab.

So installing code-server (+ the how to guide that comes with it) is a stepping stone towards full OOD functionality, not opposed to it.

@adamltyson
Copy link
Member

Cool, I assumed that the existing VSCode OOD app worked some other way, so it would be duplicating effort for IT.

@niksirbi
Copy link
Member

From what I can find browing online, using code-server seems to be the most popular choice for creating a VSCode app for OOD, see https://discourse.openondemand.org/t/vscode-showcase/2256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants