-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP] Authentication based multi-user-single-port #130
Comments
Have you considered the possibility that NAT might mess with your cache? Namely, if two clients behind the same NAT router try to connect to the same server with different credentials, god bless you because they have the same source IP address to the server. |
May be that's what we called, THE COST :) Things cannot be perfect. It's depend on a BALANCE.
That's just some personal comment. This SIP need more balance in any case. |
More words: If a shadowsocks server supports only countable user/users, it's abnormal behavior too. -- Following on this idea, maybe a later SIP should about exchange shadowsocks in a kind of circle (known friends? trusted servers?) |
Hmm if you're okay with the COST that users one day complaining to you it's not working because of NAT v.s. cache issue. I suggest either not taking the cache approach, or use other protocols that already supports multi-user like v2mess (I haven't looked at the protocol yet but it seems that that protocol supports this use-case). Different people prefer different balance between things. I don't think Shadowsocks is intended to cover all kind of balances you wish. |
Hmmm… I think if we're gonna officially support multiuser per port, we might as well address the problem cleanly? #54 is still open ^_^ |
But I agree this hack is neat in that it does not require any changes in the clients. 👍 |
Also I should point out that the problem I pointed out might occur more frequently than you imagine thanks to exhausted IPv4 pool and widely-deployed CGN. It's likely that one will run into such frustration despite having taken precautions. |
CGN is a major concern. We might need to run some tests to determine the rough size of NAT pools used by ISPs doing massive CGN. |
NAT should not be a problem, as long as not all of the users are behind the same NAT address. Say five users behind a same NAT ip address, at most five keys cached for that IP. |
This SIP just suggests a kind of multi-user-single-port solution for shadowsocks without modifying the protocol. But as mentioned by @Mygod, shadowsocks is not designed for this purpose. I listed this SIP here since it's already implemented in a third-party software. If anyone is interested in it as well, please follow this SIP and apply the suggested optimizations. |
My worry is that people will eventually abuse this hack to run commercial services. It's not gonna scale well when users are mostly behind CGN with small pool of public IPs, e.g. mobile networks in China. |
CGN also applies to ADSLs. Also one shouldn't forget NAT routers in enterprises, schools, etc. A good way to combat this is to enlarge cache size and always do a fallback lookup. |
Fallback lookup is always needed. Even a key is cached, the authentication is still required. If failed for authentication, a fallback lookup is performed. I don't expect millions of users on one single port. A reasonable assumption is thousands of users per server, hundreds per port. And of course, it cannot scale for commercial usage. |
In some places, the ISP may do the NAT for entire neighborhood which may include 10,000 end users by assigning the ip address with prefix https://tools.ietf.org/html/rfc6598
|
@celeron533 This is CGN mentioned above. |
Hmm, why not use a ElGamal-like method to identify users? |
Compatibility.
|
FYI, Outline Servers have all been migrated to outline-ss-server this week. They don't yet use the single port feature, but we intend to enable it in a few weeks, after I implement the IP->cipher cache. We can roll that out gradually and see how it performs in the wild. In my own tests, the added latency for 100 users without any optimization in a crappy $5 VPS can be significant, 10s of milliseconds, but it can vary wildly, and I believe the optimizations will help significantly. Also, outline-ss-server has Prometheus metrics, so we will be able to expose latency metrics and admins will be able to monitor that. BTW, outline-ss-server still allows for multiple ports, and you can have multiple keys per port, and multiple ports per key. You can always start a new port if one becomes overloaded. One nice feature is that you can do that without creating a new process for each port, or stop the running one. |
It's worth mentioning that the single-port feature has some very good motivation:
|
I now have a benchmark for my single-port implementation: These are the results on a $5 Frankfurt DigitalOcean machine that is idle:
That's 1.3ms to go over 100 ciphers for a TCP connection. 0.6 ms for a UDP datagram. That will probably be worse under load, but it gives an idea of the kind of added latency we'd be adding. There's 2MB of allocations for one TCP connection. I believe that can be significantly reduced by sharing buffers, but it gets a little tricky with the code structure and different ciphers needing different sizes of buffers (I guess I need to find the max buffer size). |
@fortuna That's a lot of allocs/op. Is that normal? |
PR Jigsaw-Code/outline-ss-server#8 makes the TCP performance on par with UDP. We no longer allocate so much memory:
The ~2MB allocations were because I was allocating a buffer for an entire encrypted chunk (~16KB) for each of the 100 ciphers I tried. Now I allocate only one buffer for all ciphers As for the number of allocations, it's just that' I'm doing the operation 100 times. For 1 cipher only I get these numbers:
|
FYI, I've added an optimization to outline-ss-server that will keep the latest used cipher in the front of the list. This way the time to find the cipher is proportional to the number of ciphers being used, rather than the total ciphers. Furthermore, I've added the This should be enough to inform us whether the performance is good enough. It would be great if people here gave it a try and reported back. The lastest binary with the changes is v1.0.3 and can be found in the releases: |
Update: Outline has been running servers with multi-user support on a single port for a few months now. Some organizations have 300 keys on a server, with over 100 active on any given day. Median latency due to cipher finding is around 10ms and CPU usage is minimal (bandwidth is the bottleneck). At 90th percentile you can see cases here and there close to 1 second, but that's not common and may be due to other factors such as a burst in CPU usage (maybe expensive prometheus queries). Has anyone here tried the single port feature? How was your experience? |
Average 10ms latency looks too slow to me. Assuming 300 users and the worst case that 300 authentications performed for each connection, one single authentication takes 33us. It means more than 33k cycles on a 1 GHz CPU, which is too long for a small packet authentication. Can you elaborate more about the measurement of latency? |
2998 light-kilometer might or might not be acceptable depending on use case, e.g. it's probably not acceptable for game streaming but probably ok for downloading/video streaming. 😄 |
This site says that 20ms is excellent RTT. So 10ms shouldn't be perceptible. Also, this is latency added per connection, not per packet. |
How about UDP connections/packets (which are mostly used in latency-sensitive applications)? |
I have a benchmark above: #130 (comment) UDP takes about 9 microseconds per cipher. |
@fortuna Sorry, I mean to ask whether the added latency for UDP connections is per-connection or per-packet. |
Yes, the added latency is per packet.
…On Fri, Apr 12, 2019 at 11:51 PM Mygod ***@***.***> wrote:
@fortuna <https://github.com/fortuna> Sorry, I mean to ask whether the
added latency for UDP connections is per-connection or per-packet.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAG7nQCeTU6QZdIMnf9u1Sf527B4pe8xks5vgVQsgaJpZM4W6UI5>
.
|
I think it would be more appropriate to optimize for UDP connections (I think there are UDP lookup caches in libev implementation) |
Oh, the cipher finding overhead is per UDP packet from the client. We don't
need to find the cipher for the UDP packets from the remote target, because
the chosen cipher is saved in the UDP association.
That means the overhead will be minimal if you're watching a video.
I guess it could be a concern if you're live streaming, but then your cipher will be kept near the front of the cipher list, which minimizes the overhead.
|
@fortuna Is it technically possible to do a cache for UDP packets as well? |
Update: @bemasc has merged Jigsaw-Code/outline-ss-server#25 that adds a new optimization to the cipher finding. We now associate a "last client ip" to each cipher. When a new request arrives, we lookup the ciphers that had the client ip as the last ip, and try them first, before trying the the prioritized list. If a cipher is accessed by a single IP, it will always be tried first. With the optimization, any extra latency will be almost gone for almost everyone, even if there are hundreds of active access keys. @Mygod, the heuristic of pushing used ciphers to the front of the list, as well as the new one, are applied to both TCP and UDP. |
@fortuna Neat! Almost two orders of magnitude latency reduction in the common case! I'm really surprised by how far you guys have pushed forward without changing the protocol 👍 |
i also impl the many user in one port use python asyncio core idea is to use db code is here |
We can use same technology to eliminate the need of the encryption method selection. Server try both AES-256-GCM and Chacha20-Poly1305 with same password (they have same tag size and salt size, thus have exact same packet layout). Client choose the fastest one depends on it's platform. Remove encryption selection might be too radical (and lack of foresight. With this selector, we've introduced new protocol) for us, but still an option for other shadowsocks-like protocol. |
This may be a stupid question, but what prevents us from using a HashSet for cipher lookup? |
@lzm0 there's no id in the Shadowsocks protocol that can be mapped to the credentials to use, so there's no key to lookup. That's why we need to use trial decryption. |
Shadowsocks 2022 (#196) has a protocol extension that brings native multi-user-single-port support without trial decryption: https://github.com/Shadowsocks-NET/shadowsocks-specs/blob/main/2022-2-shadowsocks-2022-extensible-identity-headers.md |
Background
Previous discussions suggest that we do authentication (SIP004) on the first chunk with different keys, to identify the user based on the success key.
Implementation Consideration
Performing GCM/Poly1305 on the first chunk should be very fast. It's expected that even a naive implementation would support thousands of users without any notable overhead.
Still, we can cache the success keys for its source IP, which would save most of computation. To prevent potential DDOS attack, the IP that tries too many times with authentication failure should be blocked.
Given this SIP doesn't involve any protocol change, only server code needs to be modified. The only limitation here is that AEAD ciphers are required.
Example
Jigsaw implemented a go-ss2 based server here: https://github.com/Jigsaw-Code/outline-ss-server. Early report shows that it works quite well with 100 users: #128 (comment)
The text was updated successfully, but these errors were encountered: