Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misadventures in non-vnc Matlab app construction #1781

Closed
gbyrket opened this issue Jan 24, 2022 · 10 comments
Closed

Misadventures in non-vnc Matlab app construction #1781

gbyrket opened this issue Jan 24, 2022 · 10 comments
Assignees
Labels
discussion Issues that are up for discussion with no particular action items.
Milestone

Comments

@gbyrket
Copy link
Contributor

gbyrket commented Jan 24, 2022

Firstly, my apologies if I should be directly contacting VT-ARC or the Mathworks people about this – I thought surely someone else would eventually ask about this topic (given the intense interest during the tips-n-tricks call) but I’ve not seen anything here, nor in ARC’s github-issues. If there’s a better venue (e.g. discord/slack/etc), please let me know, thanks!

Trying to follow the Dev Guide from VT 4, I’ve made the following observations/additions/etc/etc…

The guide mentions modifying manifest.yml, but I wonder if that is supposed to suggest modifying form.yml? (To specify your cluster and customize fields.)

Similarly, although the guide does mention updating template/script.sh.erb 1 for the path to the .sif image, I also needed to update several of the bind parameters. Beyond the obvious filesystem differences, though, it was definitely not clear to me at first that $MATLAB_DIR and $TMPFS seem to be VT-site-specific environment variables. I also had to bind our newer version of readline’s ‘libhistory’.

[jason@wind] $ cd ~/ondemand/dev/bc_vt_matlab_html/template
[jason@wind] $ diff -u script.sh.erb-origBindings script.sh.erb
--- script.sh.erb-origBindings 2021-11-24 12:07:04.000000000 -0700
+++ script.sh.erb 2021-12-10 11:13:59.033674779 -0700
@@ -20,12 +20,13 @@
+echo "SUSPECTED VT-SPECIFIC VARS: MATLAB_DIR=$MATLAB_DIR ... TMPFS=$TMPFS"
export SINGULARITYENV_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export SINGULARITYENV_PATH=$PATH
singularity run --nv --writable-tmpfs \

  • --bind=$MATLAB_DIR:/opt/matlab,$TMPFS:/tmp,/work/${USER},/projects \
  • --bind=pwd/matlab.rc:/mathworks.rc,/cm,/etc/slurm/slurm.conf \
  • --bind=/lib64/libhistory.so.6:/lib/x86_64-linux-gnu/libhistory.so.6 \
  • --bind=/packages/matlab/R2021b:/opt/matlab,/tmp,/projects \
  • --bind=pwd/matlab.rc:/mathworks.rc,/etc/slurm/slurm.conf \
  • --bind=/lib64/libhistory.so.7:/lib/x86_64-linux-gnu/libhistory.so.6
    --bind=/usr/lib64/libmunge.so.2:/lib/x86_64-linux-gnu/libmunge.so.2,/var/run/munge
    --bind=pwd/entrypoint.sh:/entrypoint.sh
    /home/jason/ondemand/dev/bc_vt_matlab_html/matlab.sif bash /entrypoint.sh

┆Issue is synchronized with this Asana task by Unito

@osc-bot osc-bot added this to the Backlog milestone Jan 24, 2022
@gbyrket
Copy link
Contributor Author

gbyrket commented Jan 24, 2022

When I launch my dev Matlab app, it gets stuck “Starting”, and the app/script output indicates that matlab-jupyter-app cannot be found…

[jason@wind] $ cd ~/ondemand/data/sys/dashboard/batch_connect/dev/bc_vt_matlab_html/output/f36b9577-ac0f-4894-9adb-b3d318b6405a
[jason@wind] $ cat output.log
starting before
No modules loaded
Script starting...
Waiting for Matlab to open port 41492...
/home/jason/ondemand/data/sys/dashboard/batch_connect/dev/bc_vt_matlab_html/output/f36b9577-ac0f-4894-9adb-b3d318b6405a
module works
starting singularity
starting Matlab on cn31 using 41492
SUSPECTED VT-SPECIFIC VARS: MATLAB_DIR= ... TMPFS=
retrieved ENV variables from matlab.rc
MWI_APP_PORT=41492
MWI_BASE_URL=/matlab
TMPDIR=/tmp
MWI_EXT_URL=ood.arc.vt.edu
MLM_LICENSE_FILE=/opt/matlab/licenses/network.lic
To use the web-desktop: http://ood.arc.vt.edu/matlab/index.html
starting web matlab
/entrypoint.sh: line 30: matlab-jupyter-app: command not found

@gbyrket
Copy link
Contributor Author

gbyrket commented Jan 24, 2022

Though my app session is stuck starting up, I can stay in the ondemand-generated working-directory to leverage the resources it already prepared. When I interactively shell in, it’s now clearly on my $PATH:

[jason@wind] $ singularity shell --nv --writable-tmpfs \

--bind=/packages/matlab/R2021b:/opt/matlab,/tmp,/projects                          \
--bind=`pwd`/matlab.rc:/mathworks.rc,/etc/slurm/slurm.conf                         \
--bind=/lib64/libhistory.so.7:/lib/x86_64-linux-gnu/libhistory.so.6                \
--bind=/usr/lib64/libmunge.so.2:/lib/x86_64-linux-gnu/libmunge.so.2,/var/run/munge \
--bind=`pwd`/entrypoint.sh:/entrypoint.sh                                          \
/home/jason/ondemand/dev/bc_vt_matlab_html/matlab.sif

INFO: Could not find any nv files on this host!

Singularity> which matlab-jupyter-app
/usr/local/bin/matlab-jupyter-app

@gbyrket gbyrket self-assigned this Jan 24, 2022
@gbyrket
Copy link
Contributor Author

gbyrket commented Jan 24, 2022

Now, obviously there are some variables in template/before.sh.erb (that define the ephemeral matlab.rc) that everyone should be localizing; but more to the point, I can modify template/entrypoint.sh to provide the full path to matlab-jupyter-app to move things along.

$ diff -u entrypoint.sh-orig entrypoint.sh
--- entrypoint.sh-orig 2021-12-10 12:50:53.352527908 -0700
+++ entrypoint.sh 2021-12-10 12:51:36.872916964 -0700
@@ -27,4 +27,4 @@
echo ""
echo starting web matlab
-matlab-jupyter-app
+/usr/local/bin/matlab-jupyter-app
Unforunately, although that definitely got me closer and I can launch the app, I’m still hitting some error and not sure how to go about diagnosing/debugging it:

@gbyrket
Copy link
Contributor Author

gbyrket commented Jan 24, 2022

The output.log file doesn’t show anything enlightening, and there doesn’t appear to be anything remotely special in my /var/log/ondemand-nginx logs.

                        < M A T L A B (R) >
              Copyright 1984-2021 The MathWorks, Inc.
              R2021b (9.11.0.1769968) 64-bit (glnxa64)
                         September 17, 2021

INFO:MATLABProxyApp:Waiting for MATLAB to exit...
INFO:MATLABProxyApp:MATLAB has exited with errorcode: -9
ERROR:MATLABProxyApp:MATLAB returned an unexpected error. For more details, see the log below.

INFO:MATLABProxyApp:Cleaning up matlab_ready_file.../tmp/MWI/31511/connector.securePort
INFO:MATLABProxyApp:MATLAB_LOG_DIR:/tmp/MWI/31511
INFO:MATLABProxyApp:MATLAB_READY_FILE:/tmp/MWI/31511/connector.securePort
INFO:MATLABProxyApp:Starting MATLAB on port 31511
MATLAB is selecting SOFTWARE OPENGL rendering.

                        < M A T L A B (R) >
              Copyright 1984-2021 The MathWorks, Inc.
              R2021b (9.11.0.1769968) 64-bit (glnxa64)
                         September 17, 2021

INFO:MATLABProxyApp:Waiting for MATLAB to exit...
INFO:MATLABProxyApp:MATLAB has exited with errorcode: -9
ERROR:MATLABProxyApp:MATLAB returned an unexpected error. For more details, see the log below.

@gbyrket
Copy link
Contributor Author

gbyrket commented Jan 24, 2022

A bunch of work later, I fixed some ruby typos, tweaked resources allotment, and probably some other things. But I still would appreciate some help understanding a few things:

at the end of the entrypoint script, I don’t understand why I have to provide the full path to matlab-jupyter-app (as mentioned above, I can shell-into the container and it IS in root’s path)
Unsure the source or significance of an “/opt/matlab/parallel_remote” warning that appears after the MATLAB masthead, as displayed in a session’s output.log (see below)
is it fine to leave MWI_BASE_URL=/matlab in before.sh? (I ask because these comments make it seem like we should be matching our OOD installation’s reverse-proxy → OnDemandApps/Dockerfile at 609eb334cec0d95a2163500556644ac01454a256 · AdvancedResearchComputing/OnDemandApps · GitHub 2)
Starting the app up takes a LONG. LONG. TIME. Much more than a few minutes, oftentimes, though not all times. strace’ing the activity on the compute-node shows it looking at an endless list of files, as though it were validating the installation base every. single. run.
Here is that “/opt/matlab/parallel_remote” warning that appears after the MATLAB masthead, as displayed in a session’s output.log:

INFO:MATLABProxyApp:Installing handler for signal: 15
MATLAB is selecting SOFTWARE OPENGL rendering.
Discovered Matlab listening on port 58714!
Generating connection YAML file...
ESC[?1hESC=
< M A T L A B (R) >
Copyright 1984-2021 The MathWorks, Inc.
R2021b (9.11.0.1769968) 64-bit (glnxa64)
September 17, 2021

Warning: Name is nonexistent or not a directory: /opt/matlab/parallel_remote

To get started, type doc.
For product information, visit www.mathworks.com.

Starting CPP Connector on Worker
Warming up worker
Thanks for any input!
–jason

@gbyrket
Copy link
Contributor Author

gbyrket commented Jan 24, 2022

Hi Jason,

we ended up not using the container, but locally installed Anaconda with the Matlab plugin, and locally installed Matlab. Our config is at OOD-apps-v3/matlab_html_app at master · CHPC-UofU/OOD-apps-v3 · GitHub 1

Though, almost all our users keep using the VNC Matlab app, probably because they are used to it.

HTH,
Martin

@Micket
Copy link
Contributor

Micket commented Apr 22, 2022

Heads up on this: matlab-jupyter-app doens't offer any form of authentication. Guess the port number and you are in. I considered this absolutely disasterous and expressed this to mathworks already, but no words on adding authentication yet.

@johrstrom
Copy link
Contributor

johrstrom commented Apr 22, 2022

Network namespaces seem to be catching on. We added network namespaces for our general public.

In any case, here's how I setup tensorboard which also doesn't have auth, but I when setup in it's own network namespace behind a proxy - it does or is secure because thing else can reach it unless through that proxy.

OSC/bc_osc_tensorboard@4a0cd2e

@Micket
Copy link
Contributor

Micket commented Apr 23, 2022

@johrstrom very interesting, i was keeping half an eye open on #712 as a possible solution.
The subuid seems to be the big hurdle though; at least i found gave up on that for user namespaces for singularity; since i didn't find setting up subuid files on each node to be rather unmanageable, and lots of documentation out there seems to indicate that LDAP etc. just isn't supported.

Buuuut i do see shadow-maint/shadow#321 was merged, perhaps there is hope?

@treydock
Copy link
Contributor

I wrote a tool to get LDAP data into /etc/subuid and /etc/subgid and try and do it so that if new users are added/removed the UID/GID range assigned to someone doesn't abruptly change (though never found a way to validate this was needed): https://github.com/treydock/subid-ldap. OSC uses this with OpenLDAP to dump our users into those files for both Singularity and Podman. The tool is in Go so should be pretty portable to whatever OS you run.

@lukew3 lukew3 added the discussion Issues that are up for discussion with no particular action items. label Jul 18, 2022
@gbyrket gbyrket closed this as completed Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Issues that are up for discussion with no particular action items.
Projects
None yet
Development

No branches or pull requests

6 participants