ch3: TCP ports are always bound to INADDR_ANY #6010

zsalvet · 2022-05-17T10:50:25Z

TCP ports are always bound to INADDR_ANY (open to Internet)
even when user asks for specific interface or address (like localhost)
via MPIR_CVAR_NEMESIS_TCP_NETWORK_IFACE or MPIR_CVAR_CH3_INTERFACE_HOSTNAME.
Connection attempt from any external entity can trigger an assert
easily (e.g. in recv_id_or_tmpvc_info()) , there is absolutely no authentication
involved.

hzhou · 2022-05-17T12:43:25Z

What is your use case that this is an issue?

zsalvet · 2022-05-17T13:24:00Z

Security port scans appear to crash an app using MPICH on our cluster ocassionally
(there is also potential of abuse between different users, it is difficult to secure such
ports externally without large performance and functionality impact, IMO).

hzhou · 2022-05-17T13:34:23Z

Try this patch -- #5900 -- and see if it fixes the assertion error. That patch only prevents such assertion error in hydra. Last time I checked, I didn't encounter the issue with ch3:nemesis, but I can see how similar issue exists in the netmod. Could you attach a crash log?

The solution will just add some basic measures to prevent network port scans interrupting the jobs. Will that be sufficient?

zsalvet · 2022-05-17T14:01:18Z

Unfortunately, we are getting assertions in ch3:nemesis much more often than in hydra:
Assertion failed in file src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 572: hdr.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_ID_INFO || hd r.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_TMPVC_INFO

I made simple hotfix by replacing failing assert path with "*got_sc_eof = 1; goto fn_exit;"
(was able to apply binary patch to one statically linked binary-only application even :-) ),
it survived all simple scans. I would prefer stronger check than "HYD" or pkt type constant though,
something like secretword in mpd...

hzhou · 2022-05-17T14:09:40Z

OK, I'll investigate and see to add some basic checks for ch3. Note that device ch3 are legacy device and only will receive minimum maintenance. If deploying new MPICH is an option, we strongly recommend using the ch4 device.

I would prefer stronger check than "HYD" or pkt type constant though, something like secretword in mpd...

Hehe, HYD is a secret word. It should serve the same purpose as any other secret word for the purpose of preventing port scans. If you are aiming to defend against deliberate attacks, MPI is the wrong layer for it.

zsalvet · 2022-05-17T14:44:57Z

Can you give me some pointer to "best" or standard practice, if MPI is the wrong layer ?

hzhou · 2022-05-17T14:53:03Z

Can you give me some pointer to "best" or standard practice, if MPI is the wrong layer ?

I would suggest preventing accessibility to your cluster from external internet altogether. You can launch your jobs using a login node or launch node.

zsalvet · 2022-05-17T15:14:07Z

Acessibility from external internet is the easy part. We would like to allow our (many) users to connect
to running jobs where desirable (imagine e.g. interactive Jupyter notebook or RStudio lanching MPI backend
computations, visualizations running in Cactus framework etc.) and allow running multiple smaller jobs (owned by different
users, potentially) on single (manycore) machine. If connections are not authenticated at MPI layer,
complicated packed filtering with dynamic rules is required, I am affraid...

hzhou · 2022-05-17T15:24:31Z

I see. This is a good conversation. The next layer of security is to control the specific port range to be used. You can use MPIR_CVAR_PORTRANGE for this purpose. This should work with ch3, but we need to patch libfabric or ucx in order to do the same for ch4. With a specific port range, you can shield the outside internet access to the specific ports.

hzhou changed the title ~~TCP ports are always bound to INADDR_ANY~~ misc: TCP ports are always bound to INADDR_ANY May 19, 2022

hzhou changed the title ~~misc: TCP ports are always bound to INADDR_ANY~~ ch3: TCP ports are always bound to INADDR_ANY May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch3: TCP ports are always bound to INADDR_ANY #6010

ch3: TCP ports are always bound to INADDR_ANY #6010

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

ch3: TCP ports are always bound to INADDR_ANY #6010

ch3: TCP ports are always bound to INADDR_ANY #6010

Comments

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

zsalvet commented May 17, 2022

hzhou commented May 17, 2022

zsalvet commented May 17, 2022

hzhou commented May 17, 2022