Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considering alternative coroutine sslocal impls #2452

Closed
Mygod opened this issue Feb 20, 2020 · 42 comments · Fixed by #2454
Closed

Considering alternative coroutine sslocal impls #2452

Mygod opened this issue Feb 20, 2020 · 42 comments · Fixed by #2454

Comments

@Mygod
Copy link
Contributor

Mygod commented Feb 20, 2020

Moving discussion from shadowsocks/shadowsocks-org#154.

We are trying to make ACL handling of shadowsocks-android less hacky (for example shadowsocks/shadowsocks-libev#2627), however, things seem challenging.

  1. While shadowsocks-libev more or less takes the approach of doing async I/O via libev, libev requires you to manually keep track of your execution point to resume your connection processing. This makes extending functionalities error-prone and overall the code is hard to read. In fact, there are still traces of synchronous I/O in shadowsocks-libev (e.g. get_sockaddr), which could block the entire thread if network conditions are bad.
  2. Modern programming language more or less solves this problem with coroutines, examples include golang/rust/c++20/Kotlin. However, go-shadowsocks2 (missed opportunity for go2shadowsocks btw) and shadowsocks-rust do not have support for ACL files yet. As we use really large ACL files, we need to integrate re2 or look for a regex library with similar performance. (golang stdlib does not suffice: regexp: port RE2's DFA matcher to the regexp package golang/go#11646)
  3. Implementing everything in JVM is a bad idea since every byte array is going to be an object in GC heap, and there are no good Kotlin async I/O libraries either. It seems like there are similar issues in golang (objects escaping to heap) but I am not an expert in golang.
  4. Doing PAC is going to be even more expensive than ACL since PAC is actually Javascript.

Thoughts are appreciated.

CC @madeye @riobard @zonyitoo

@riobard
Copy link

riobard commented Feb 20, 2020

For point 2, Go's RE engine is pretty robust.

For point 3, Go's GC is really good now and it's relatively easy to write memory-efficient code. But compared to languages without GC and runtime (Rust/C/C++) there's still a memory penalty. However recent discussion on HN makes me more confident in Go compared to Rust.

@Mygod I like the go2shadowsocks name! 😂

@Mygod
Copy link
Contributor Author

Mygod commented Feb 20, 2020

I don't know that article but it seems ancient (it even predates re2). We are talking about 100KB of regex pattern matching domain names so I am not sure if it is as performant as you claimed.

Some benchmarks would be nice...

@riobard
Copy link

riobard commented Feb 20, 2020

The point of Cox's article is that Go uses finite automata-based regex engine so the performance should be good enough. Just checked re2 is similar so I guess you should expect similar performance.

@Mygod
Copy link
Contributor Author

Mygod commented Feb 20, 2020

As mentioned in golang/go#11646, Go uses NFA so its matching time would be at least linear in pattern length for our use case. We want sublinear matching time which could be achieved via DFA, but DFA is harder to implement.

@riobard
Copy link

riobard commented Feb 20, 2020

Ah good to know there's still room for improvement! 😄

@riobard
Copy link

riobard commented Feb 20, 2020

Out of curiosity, what do you do with ACL that makes it so large?

@Mygod
Copy link
Contributor Author

Mygod commented Feb 20, 2020

GFW lists and China lists.

While in theory DFA can take exponential size, almost all (except <5 maybe) of the rules are simply matching domain suffixes so a good DFA implementation should take almost linear size and only exponential in the number of exceptions. Of course, if no such good implementation exists, maybe another way is to handle those domain suffix matching differently using a DFA-esque algorithm (say, using Aho-Corasick with an alphabet over all the possible parts of the domain) and handle the rest using a regexp.

@riobard
Copy link

riobard commented Feb 20, 2020

If it’s almost suffix marching, wouldn’t trie-based data structure more efficient (both space and time)?

@Mygod
Copy link
Contributor Author

Mygod commented Feb 20, 2020

Yeah. Trie is another DFA-esque algorithm that I was suggesting previously. :)

@riobard
Copy link

riobard commented Feb 20, 2020

Ah sorry I missed that part. Anyway, in this case I don’t think regex is the best solution.

@madeye
Copy link
Contributor

madeye commented Feb 21, 2020

I think we all agree that shadowsocks-rust is the best choice here. it looks not much work to implement ACL in shadowsocks-rust, at least for ss-local only.

@zonyitoo What do you think?

@Mygod
Copy link
Contributor Author

Mygod commented Feb 21, 2020

Actually I think after discussion with @riobard I think the only drawback of using golang is GC. I am really unfamiliar with rust so I really don't know which one to use now.

@madeye
Copy link
Contributor

madeye commented Feb 21, 2020

Last time we faced similar issues in overture due to the memory leak in golang's regex routine: #1639

Not sure if this is still a problem now.

@madeye
Copy link
Contributor

madeye commented Feb 21, 2020

@riobard What about implementing the ACL in ss-go first, and let's compare the performance between implementations.

@zonyitoo
Copy link

I think we all agree that shadowsocks-rust is the best choice here. it looks not much work to implement ACL in shadowsocks-rust, at least for ss-local only.

@zonyitoo What do you think?

Sure. But I haven't looked deep into the ACL implementation in libev yet.

@madeye
Copy link
Contributor

madeye commented Feb 21, 2020

@zonyitoo The ACL function in ss-libev is very naive. I think you can build new ACL rules to support more features, like forwarding different domains to different upstreams.

Some example ACL for shadowsocks-libev can be found here: https://github.com/shadowsocks/shadowsocks-libev/blob/master/acl/

@madeye
Copy link
Contributor

madeye commented Feb 21, 2020

Here're the missing features in ss-go and ss-rust for android integration:

  1. protect() call back. This is used to set flags on the raw socket to bypass Android system's NAT rules. It's straightforward in go with official APIs. For rust, we can achieve this with AsRawFd()
  2. Traffic update call back. This is straightforward. Rust implementation should already have this for ss-manager.
  3. Local reverse DNS lookup. This is used to lookup the domain name associate with an IP address for ACL rules. Given we already have a Kotlin based local resolver, this lookup can also be achieved with a callback to Android application.
  4. ACL logic. We can take this opportunity to refine this part. For example, given ss-rust already supports multiple upstreams, we can add new rules to support fine-grained policy routing.

BTW, the above logic can also be shared with an iOS implementation.

@riobard
Copy link

riobard commented Feb 21, 2020

I have no idea how the current ACL works and what features it supports, but judging from https://github.com/shadowsocks/shadowsocks-libev/blob/master/acl/ there are two major categories: 1) CIDR-matching, which can be efficiently calculated using a radix tree, and 2) domain-matching, as discussed before, is also mostly suffix matching.

Why did we end up with regex in the first place? 😂

@Mygod
Copy link
Contributor Author

Mygod commented Feb 21, 2020

There is also an ACL implementation in this repo written in Kotlin and C++ (re2), which might be more readable... I would love to write them myself (and learn golang/rust) but unfortunately time is a luxury I don't have, so let me post a write-up explaining things sometime soon.

@Mygod
Copy link
Contributor Author

Mygod commented Feb 21, 2020

ACL

Let's first start with ACL. ACL is used to decide whether a given IPv4/IPv6/hostname should be bypassed. Basically there are two parts of ACL, one is hostname matching and the other is IP matching. The matching behavior (for a standard socks5 request) is described by the following algorithm:

  1. On input IP, set ips = [input] and skip to 4;
  2. If hostname matches some rule in proxy list, the request is proxied; if hostname matches some rule in bypass list, the request is bypassed; if it matches both list, the behavior is undefined.
  3. Hostname matching is inconclusive and we resolve the hostname locally and let ips be the resolving results;
  4. Check the default behavior for unmatched IPs and match IPs against the other list. For example, if the default behavior is to proxy the request, bypass the request if any IP in ips matches the bypass IP list, otherwise proxy the request.

For hostname matching, we can concatenate all regexps using |. Of course, the speedup mentioned above (#2452 (comment)) could be employed. For IP matching, we can either use a exhaustive search (shadowsocks-libev maybe, last time I checked), a radix tree, or a binary search (shadowsocks-android).

Why ACL

Here is my assumed reasoning for using ACL. With Chrome, you can configure proxy with extensive rulesets using extensions. Android does not have a good interface for socks5 proxy except for a global VpnService (or a transproxy with root), so a socksifier is employed. As mentioned above, PAC is Javascript and we want to avoid having to compile/interpret hundreds of KBs of Javascript on a phone, therefore a reduced version (ACL) is employed. The socksifying process is illustrated like this:

client applications <-> linux tun device <-> linux kernel <-> linux tun fd <-> tun2socks <-> sslocal

Android will set up iptables routes so that all traffic will go to the tun device by default. To bind shadowsocks traffic to the underlying network (there is a DefaultNetworkListener that decides the underlying network to pass shadowsocks traffic through), pass the socket fd through an ancillary message to the socket at ./protect_path and wait for response. Similarly, traffic update is done through ./stat_path. (this is besides the point of doing ACL but I am mentioning these wrt #2452 (comment) by @madeye)

Complications of doing ACL with socksified VPN

  1. Linux does not support per-interface DNS very well and Android needs to accommodate dynamic network environment. In our context, we need to do local DNS resolving in step 3 through Android's DNS resolving API. Part of that logic is currently in LocalDnsServer but I am thinking that we can provide a local socket (maybe ./local_dns_path) to provide sslocal this interface.
  2. Different from browser environment, all hostnames are resolved by client before connecting to IPs directly, instead of connecting to hostnames. This is because the clients are oblivious to how the tun device actually operate. To make ACL matching work correctly under this case, we need to resolve DNS based on ACL rules, and also remember our DNS requests and responses for future reverse lookups.

More rigorously, we want to have a local DNS relay doing the following: (currently implemented by LocalDnsServer)

  1. On input DNS packet, attempt to parse it as an A/AAAA request. Forward the request to remote on failure.
  2. We assume both ACL matching and the remote DNS request will be slow and we will send the request asynchronously in this step, and cancel it later if it is not needed.
  3. Match the hostname as in ACL step 2. If hostname matches something, resolve the DNS request locally/remotely as the outcome and skip to step 5.
  4. If nothing is matched, resolve hostname locally and attempt to match IPs as in ACL step 4 and decide the outcome.
  5. Look up each IP as in ACL step 4. If the current outcome does not match each IP's matched block, add (IP -> outcome) to the exceptionIpPool.

With the DNS server set up, we can do the ACL matching under VPN/transproxy mode:

  1. On input IP, check that if IP is in exceptionIpPool. If so, use the outcome there.
  2. Otherwise, use the old ACL matching behavior.

Outlook

I think with this change we will be able to do a lot more in the future, e.g. policy routing (mentioned above by @madeye), #2087, #2301, etc.

@riobard
Copy link

riobard commented Feb 21, 2020

@Mygod That's… quite some work 😂

@Mygod
Copy link
Contributor Author

Mygod commented Feb 21, 2020

Yeah the complicated part is local DNS resolving. If we are only doing ACL in the new impl then we have not ventured far from keep using libev impl. :)

@zonyitoo

This comment has been minimized.

@zonyitoo
Copy link

zonyitoo commented Feb 21, 2020

BTW, supporting ACL like libev is quite a simple job. I will try to implement that in the following weekends.

@zonyitoo
Copy link

zonyitoo commented Feb 21, 2020

I have just finished reading acl.c of libev. It is quite a simple and clean implementation. But where is the regex matching thing? I haven't seen anything in this project.

Ah .. I see. https://github.com/shadowsocks/shadowsocks-libev/blob/master/acl/gfwlist.acl

They are in the rule.c.

@zonyitoo
Copy link

zonyitoo commented Feb 21, 2020

Privoxy uses very similar rules: https://www.privoxy.org/user-manual/actions-file.html#HOST-PATTERN . I am curious about what they did for improving performance.

@madeye
Copy link
Contributor

madeye commented Feb 22, 2020

@zonyitoo A best practice is concatenating rules together like rule1|rule2|rule3, then the DFA engine in RE2/regex will do the optimization for us.

Here are some sample codes in shadowsocks-android:

  1. https://github.com/shadowsocks/shadowsocks-android/blob/master/core/src/main/jni/jni-helper.cpp#L41
  2. https://github.com/shadowsocks/shadowsocks-android/blob/master/core/src/main/jni/jni-helper.cpp#L93

Some search shows the regex engine in rust should be even faster than RE2: https://github.com/rust-lang/regex/blob/master/bench/log/05/re2-vs-rust

@zonyitoo
Copy link

zonyitoo commented Feb 22, 2020

protect() call back. This is used to set flags on the raw socket to bypass Android system's NAT rules.

Does Android have a C API for that? I can only find a Java API.

It's straightforward in go with official APIs.

Hmm? How to do that?

@madeye
Copy link
Contributor

madeye commented Feb 22, 2020

Yes, only Java API. It means we need a callback through JNI. An example can be found here: https://github.com/madeye/BaoLianDeng/blob/master/app/src/main/java/io/github/baoliandeng/core/LocalVpnService.java#L215

In golang, we have a callback in Dial interface: https://golang.org/src/net/dial.go#L97

zonyitoo added a commit to shadowsocks/shadowsocks-rust that referenced this issue Feb 23, 2020
@zonyitoo
Copy link

zonyitoo commented Feb 23, 2020

shadowsocks-rust have just finished supporting ACL in TCP relays.

  • Not tested yet.
  • Not sure the implementation is correct.
  • UDP is not supported yet.
  • Performance issue:
    1. Each connection may have to perform 2 DNS resolution, one for ACL, one for connect.
    2. ACL IP set matching should merge duplicated networks (or overlapped networks)

@Mygod
Copy link
Contributor Author

Mygod commented Feb 23, 2020

Thanks guys!

Re protect: As mentioned above, shadowsocks-android uses ./protect_path. JVM implementation:

private inner class ProtectWorker : ConcurrentLocalSocketListener("ShadowsocksVpnThread",
File(Core.deviceStorage.noBackupFilesDir, "protect_path")) {
override fun acceptInternal(socket: LocalSocket) {
socket.inputStream.read()
val fd = socket.ancillaryFileDescriptors!!.single()!!
try {
socket.outputStream.write(if (underlyingNetwork.let { network ->
if (network != null) try {
DnsResolverCompat.bindSocket(network, fd)
return@let true
} catch (e: IOException) {
when ((e.cause as? ErrnoException)?.errno) {
// also suppress ENONET (Machine is not on the network)
OsConstants.EPERM, 64 -> e.printStackTrace()
else -> printLog(e)
}
return@let false
} catch (e: ReflectiveOperationException) {
check(Build.VERSION.SDK_INT < 23)
printLog(e)
}
protect(fd.int)
}) 0 else 1)
} finally {
fd.closeQuietly()
}
}
}

Re 2-DNS issue: I think all the DNS probes are necessary? (unless I am missing something) In addition to my explanations above, let me recall the part that involves DNS in VPN mode:

  1. In SOCKS5, if we receive a HOSTNAME request, we resolve it once locally and apply ACL accordingly.
  2. In a normal VPN connection, we first receive a DNS request from client, which results in up to two DNS resolutions (once locally, once remotely). After that, the client might issue a request directly to the resolved IP, so sslocal does an IP reverse lookup in exceptionIpPool and route the traffic accordingly. No DNS requests are done in this step.

Re-IP merging: I think it is safe to assume the input does not have duplicated blocks unless you are using a binary search, since there the algorithm will not work correctly if there are duplicates.

@Mygod
Copy link
Contributor Author

Mygod commented Feb 23, 2020

Also none of these are set in stone so feel free to propose changes when appropriate.

@madeye madeye mentioned this issue Feb 25, 2020
15 tasks
@madeye
Copy link
Contributor

madeye commented Feb 25, 2020

I just got shadowsocks-rust successfully integrated into shadowsocks-android.

For the next step, @zonyitoo can help implement those two RPCs in shadowsocks-rust.

  1. https://github.com/shadowsocks/shadowsocks-libev/blob/master/src/android.c#L49
  2. https://github.com/shadowsocks/shadowsocks-libev/blob/master/src/android.c#L98

It's a long term project, so take your time, no rush.

@zonyitoo
Copy link

Ok. I can only work on it in weekends.

@ohsorry
Copy link

ohsorry commented Feb 26, 2020

I prefer separate filtering modules. An in-memory index is worth considering, basically a balanced tree.

@Mygod
Copy link
Contributor Author

Mygod commented Feb 26, 2020

Yeah we can definitely put ACL part directly in this repo if it is doable.

@ohsorry
Copy link

ohsorry commented Feb 26, 2020

https://github.com/shadowsocks/shadowsocks-libev/tree/master/acl
Can we merge these into two files? One is called "IP.list" and the other is "domain.list". Each item in the list can be prioritized. If a conflict occurs, let the user choose whether to bypass or connect directly.
IP matching may use a segment tree/B tree, domain name matching uses a regular binary search tree.

zonyitoo added a commit to shadowsocks/shadowsocks-rust that referenced this issue Feb 26, 2020
@Mygod
Copy link
Contributor Author

Mygod commented Feb 27, 2020

@madeye #2452 (comment) is what I was talking about in PR. I think this implementation is much easier and does not involve RPC for reverse lookup. Let me know if you want to change it.

zonyitoo added a commit to shadowsocks/shadowsocks-rust that referenced this issue Mar 1, 2020
zonyitoo added a commit to shadowsocks/shadowsocks-rust that referenced this issue Mar 1, 2020
zonyitoo added a commit to shadowsocks/shadowsocks-rust that referenced this issue Mar 1, 2020
zonyitoo added a commit to shadowsocks/shadowsocks-rust that referenced this issue Mar 2, 2020
@madeye madeye linked a pull request Mar 4, 2020 that will close this issue
15 tasks
@Mygod
Copy link
Contributor Author

Mygod commented Mar 15, 2020

@madeye
Copy link
Contributor

madeye commented Mar 16, 2020

@Mygod After several weeks of developing with Rust, I'm quite sure that it's the best language for system programming.

We're also discussing the adoption of Rust internally. However, one concern is that there's no ISO 26262 standard compiler for Rust so far. It means no safe certificate for the production software written by Rust.

@Mygod
Copy link
Contributor Author

Mygod commented Mar 16, 2020

I see. I guess that's why you are making time to contribute to the rust repo. 🤣

@ohsorry
Copy link

ohsorry commented Mar 16, 2020

Consider my proposal, advanced routing options can be placed in the "Advanced" menu, or even make the ACL editable. Most SS users only need an easy-to-use configuration: should I proxy this IP / URL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants