Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch3: TCP ports are always bound to INADDR_ANY #6010

Open
zsalvet opened this issue May 17, 2022 · 9 comments
Open

ch3: TCP ports are always bound to INADDR_ANY #6010

zsalvet opened this issue May 17, 2022 · 9 comments

Comments

@zsalvet
Copy link

zsalvet commented May 17, 2022

TCP ports are always bound to INADDR_ANY (open to Internet)
even when user asks for specific interface or address (like localhost)
via MPIR_CVAR_NEMESIS_TCP_NETWORK_IFACE or MPIR_CVAR_CH3_INTERFACE_HOSTNAME.
Connection attempt from any external entity can trigger an assert
easily (e.g. in recv_id_or_tmpvc_info()) , there is absolutely no authentication
involved.

@hzhou
Copy link
Contributor

hzhou commented May 17, 2022

What is your use case that this is an issue?

@zsalvet
Copy link
Author

zsalvet commented May 17, 2022

Security port scans appear to crash an app using MPICH on our cluster ocassionally
(there is also potential of abuse between different users, it is difficult to secure such
ports externally without large performance and functionality impact, IMO).

@hzhou
Copy link
Contributor

hzhou commented May 17, 2022

Try this patch -- #5900 -- and see if it fixes the assertion error. That patch only prevents such assertion error in hydra. Last time I checked, I didn't encounter the issue with ch3:nemesis, but I can see how similar issue exists in the netmod. Could you attach a crash log?

The solution will just add some basic measures to prevent network port scans interrupting the jobs. Will that be sufficient?

@zsalvet
Copy link
Author

zsalvet commented May 17, 2022

Unfortunately, we are getting assertions in ch3:nemesis much more often than in hydra:
Assertion failed in file src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 572: hdr.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_ID_INFO || hd r.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_TMPVC_INFO

I made simple hotfix by replacing failing assert path with "*got_sc_eof = 1; goto fn_exit;"
(was able to apply binary patch to one statically linked binary-only application even :-) ),
it survived all simple scans. I would prefer stronger check than "HYD" or pkt type constant though,
something like secretword in mpd...

@hzhou
Copy link
Contributor

hzhou commented May 17, 2022

OK, I'll investigate and see to add some basic checks for ch3. Note that device ch3 are legacy device and only will receive minimum maintenance. If deploying new MPICH is an option, we strongly recommend using the ch4 device.

I would prefer stronger check than "HYD" or pkt type constant though, something like secretword in mpd...

Hehe, HYD is a secret word. It should serve the same purpose as any other secret word for the purpose of preventing port scans. If you are aiming to defend against deliberate attacks, MPI is the wrong layer for it.

@zsalvet
Copy link
Author

zsalvet commented May 17, 2022

Can you give me some pointer to "best" or standard practice, if MPI is the wrong layer ?

@hzhou
Copy link
Contributor

hzhou commented May 17, 2022

Can you give me some pointer to "best" or standard practice, if MPI is the wrong layer ?

I would suggest preventing accessibility to your cluster from external internet altogether. You can launch your jobs using a login node or launch node.

@zsalvet
Copy link
Author

zsalvet commented May 17, 2022

Acessibility from external internet is the easy part. We would like to allow our (many) users to connect
to running jobs where desirable (imagine e.g. interactive Jupyter notebook or RStudio lanching MPI backend
computations, visualizations running in Cactus framework etc.) and allow running multiple smaller jobs (owned by different
users, potentially) on single (manycore) machine. If connections are not authenticated at MPI layer,
complicated packed filtering with dynamic rules is required, I am affraid...

@hzhou
Copy link
Contributor

hzhou commented May 17, 2022

I see. This is a good conversation. The next layer of security is to control the specific port range to be used. You can use MPIR_CVAR_PORTRANGE for this purpose. This should work with ch3, but we need to patch libfabric or ucx in order to do the same for ch4. With a specific port range, you can shield the outside internet access to the specific ports.

@hzhou hzhou changed the title TCP ports are always bound to INADDR_ANY misc: TCP ports are always bound to INADDR_ANY May 19, 2022
@hzhou hzhou changed the title misc: TCP ports are always bound to INADDR_ANY ch3: TCP ports are always bound to INADDR_ANY May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants