Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't remove server a user by port in the manager if a plugin failed to initialize #1612

Closed
talentlessguy opened this issue Aug 12, 2024 · 9 comments

Comments

@talentlessguy
Copy link

I'm running a shadowsocks server in manager mode, where each config is using v2ray plugin and stress testing it with 300 and more configurations. Sometimes it errors (because of race conditions).

Here is the code I am adding users with (it's in Go, can be written in any other language):

package main

import (
	"bufio"
	"fmt"
	"net"
	"os"
)

func SendCommand(command string, bufsize int) (string, error) {
	p := make([]byte, bufsize)
	conn, err := net.Dial("udp", "127.0.0.1:6100")
	if err != nil {
		return "", err
	}
	conn.Write([]byte(command))
	_, err = bufio.NewReader(conn).Read(p)
	if err != nil {
		return "", err
	}
	conn.Close()

	return string(p), err
}

func AddUserCommand(port int, password string) (string, error) {
	return SendCommand(fmt.Sprintf("add: {\"server_port\": %d, \"password\": \"%s\", \"plugin\":\"v2ray-plugin\", \"plugin_opts\": \"server\" }", port, password), 4096)
}

func main() {
  for _, u := range users { // a list of users stored somewhere
    AddUserCommand(u.Port, u.Pwd)
  }
}

Here's an example of such kind of an error:

2024/08/12 22:09:05 failed to start server: app/proxyman/inbound: failed to listen TCP on 45703 > transport/internet: failed to listen on address: 0.0.0.0:45703 > transport/internet/websocket: failed to listen TCP(for WS) on 0.0.0.0:45703 > listen tcp 0.0.0.0:45703: bind: address already in use
2024-08-12T22:09:05.548672213+03:00 ERROR plugin exited with status: exit status: 1 

Running this:

$ echo 'remove: {"server_port": 45703}' | nc -u localhost 6100
ok

Doesn't have any effect. The port is still occupied:

$ fuser 45703/tcp
45703/tcp:           72350
$ ps aux | grep 72350
v1rtl      72350  0.0  1.2 2876384 417264 pts/0  Sl+  22:09   0:01 ssmanager -c config.json

I'm using this config for shadowsocks-manager:

{
  "timeout":300,
  "method":"chacha20-ietf-poly1305",
  "nameserver":"8.8.8.8",
  "mode":"tcp_only",
  "manager_address": "127.0.0.1",
  "manager_port": 6100
}

Specs

  • Arch Linux 6.10.2
  • shadowsocks-rust: 1.19.4
  • shadowsocks-v2ray-plugin: 5.13.0-1
@talentlessguy talentlessguy changed the title Can't remove server a user by port, still getting "ok" in response Can't remove server a user by port in the manager if a plugin failed to initialize Aug 12, 2024
@zonyitoo
Copy link
Collaborator

Just tried with your provided commands:

Adding a new server with a plugin that doesn't exist in the local environment:

$ echo 'add: {"server_port":45703,"password":"a","plugin":"v2ray-plugin"}' | nc -u 127.0.0.1 6100
ok

ssmanger's log:

2024-08-13T14:36:32.739271+08:00 TRACE main ThreadId(01) shadowsocks_service::manager::server: crates/shadowsocks-service/src/manager/server.rs:216: received Add(ServerConfig { server_port: 45703, password: "a", method: None, no_delay: None, plugin: Some("v2ray-plugin"), plugin_opts: None, plugin_mode: None, mode: None, users: None }) from SocketAddr(127.0.0.1:57073)
2024-08-13T14:36:32.739488+08:00 TRACE main ThreadId(01) shadowsocks::plugin::ss_plugin: crates/shadowsocks/src/plugin/ss_plugin.rs:8: Starting plugin "v2ray-plugin", opt: None, arg: [], remote: 0.0.0.0:45703, local: 127.0.0.1:59764
2024-08-13T14:36:32.746896+08:00 ERROR main ThreadId(01) shadowsocks::plugin: crates/shadowsocks/src/plugin/mod.rs:85: failed to start plugin "v2ray-plugin" for server 0.0.0.0:45703, err: No such file or directory (os error 2)
2024-08-13T14:36:32.746945+08:00 ERROR main ThreadId(01) shadowsocks_service::manager::server: crates/shadowsocks-service/src/manager/server.rs:300: failed to start server (0.0.0.0:45703), error: No such file or directory (os error 2)

Port 45703 is not occupied (netstat was running on macOS):

$ netstat -anv|grep 45703

So it couldn't be reproduced.

@talentlessguy
Copy link
Author

talentlessguy commented Aug 13, 2024

The issue is not with a specific port, its with adding a lot of configurations with V2Ray and running into a race condition where a server fails to bind to a port.

After it fails to bind, it's neither possible to remove the server from the manager, nor re-add it. The port is occupied but is unusable. The only solution is to restart the manager at the moment.

@talentlessguy
Copy link
Author

talentlessguy commented Aug 13, 2024

I can attach a repository with a Go program that allocate 500 ports and start 500 SS+V2Ray servers for easier reproduction, if needed

@zonyitoo
Copy link
Collaborator

If I understand it correctly, the v2ray-plugin failed to start but didn't exit as expected, so the port was still being occupied by the server instance.

@talentlessguy
Copy link
Author

talentlessguy commented Aug 13, 2024

If I understand it correctly, the v2ray-plugin failed to start but didn't exit as expected, so the port was still being occupied by the server instance.

yes, that's the problem. And ssmanager doesn't let me neither remove no do anything else about the plugin. Do you think it's a bug in v2ray-plugin, or in a way how ss-rust handles plugins and their signals in general?

for context, I am able to send SIGKILL to a plugin to kill it, but when there's 500 instances it's hard to distinct between them, so preferably remove: would also properly shut down the plugin like it does already, including when a port is occupied already and the error is not handled

@zonyitoo
Copy link
Collaborator

If any of the plugin processes exited unexpectly, the watcher will exit with an error message:

vfut.push(
async move {
match plugin.join().await {
Ok(status) => {
error!("plugin exited with status: {}", status);
Ok(())
}
Err(err) => {
error!("plugin exited with error: {}", err);
Err(err)
}
}
}
.boxed(),
);

which will eventually cause the server's run task exit and then release all resources.

In your case, I suspected that v2ray-plugin didn't exit, so ssmanager still thought the plugin was working well. When sending remove: command to ssmanager, it tries to remove the Server instance in ssmanager, which will call destructors including the Plugin::drop, which will eventually send SIGTERM and then send SIGKILL to plugin's process.

You can still enable -vvv to print all logs and see what exactly happening in ssmanager.

@talentlessguy
Copy link
Author

Below are logs using -vvv verbosity level, which include errors. It resulted in 9 race conditions.

Here is one of them (see ERROR):

2024-08-13T19:47:10.103343319+03:00 TRACE   tokio-runtime-worker ThreadId(17) mio::poll: /build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-0.8.11/src/poll.rs:682: deregistering event source from poller    
2024-08-13T19:47:10.103491473+03:00 DEBUG   tokio-runtime-worker ThreadId(17) shadowsocks::plugin: crates/shadowsocks/src/plugin/mod.rs:182: terminating plugin process None, local_addr: 127.0.0.1:39119    
2024-08-13T19:47:10.103521997+03:00 ERROR   tokio-runtime-worker ThreadId(17) shadowsocks_service::server::server: crates/shadowsocks-service/src/server/server.rs:208: plugin exited with status: exit status: 1    
2024-08-13T19:47:10.103543975+03:00 TRACE   tokio-runtime-worker ThreadId(17) mio::poll: /build/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-0.8.11/src/poll.rs:682: deregistering event source from poller    

And this is what V2Ray plugin stdout emitted:

2024/08/13 19:47:10 failed to start server: app/proxyman/inbound: failed to listen TCP on 34411 > transport/internet: failed to listen on address: 0.0.0.0:34411 > transport/internet/websocket: failed to listen TCP(for WS) on 0.0.0.0:34411 > listen tcp 0.0.0.0:34411: bind: address already in use

Here's ssmanager full log
log.txt

zonyitoo added a commit that referenced this issue Aug 14, 2024
Replaced FutureUnordered with futures::select_all, and put all sub-tasks
into individual tokio tasks.
@zonyitoo
Copy link
Collaborator

Please try with the latest commit.

@talentlessguy
Copy link
Author

works like a charm now, thank you for such a fast fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants