Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: implement p2c balancer with random, ewma, leastloaded #3211

Closed
sysulq opened this issue Jan 7, 2021 · 15 comments
Closed

Feature: implement p2c balancer with random, ewma, leastloaded #3211

sysulq opened this issue Jan 7, 2021 · 15 comments
Labels
enhancement New feature or request

Comments

@sysulq
Copy link
Contributor

sysulq commented Jan 7, 2021

P2c implemention should have better performance than the implemention with for loop when the upstream contains lots of backend servers. apisix/balancer/ewma.lua

I would like to send a PR for this, if you guys interested.

@Firstsawyou
Copy link
Contributor

If the performance is better, then welcome to submit PR.

@tokers
Copy link
Contributor

tokers commented Jan 7, 2021

@hnlq715 A benchmark result is also desired to let us know the earnings :).

@sysulq
Copy link
Contributor Author

sysulq commented Jan 7, 2021

@hnlq715 A benchmark result is also desired to let us know the earnings :).

Absolutely 😄

@sysulq
Copy link
Contributor Author

sysulq commented Jan 14, 2021

Simple benchmark result with 20 servers in upstream:

  • roundrobin: 10068.77 qps
  • ewma: 9207.02 qps
  • p2cewma: 10101.92 qps

If the number of servers in upstream grows, the difference between p2cewma and ewma should be significant.

apisix: 1 worker + 1 upstream + no plugin
+ curl http://127.0.0.1:9080/apisix/admin/routes/1 -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "uri": "/hello",
    "plugins": {
    },
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "127.0.0.1:1980": 1,
            "127.0.0.1:1981": 1,
            "127.0.0.1:1982": 1,
            "127.0.0.1:1983": 1,
            "127.0.0.1:1984": 1,
            "127.0.0.1:1985": 1,
            "127.0.0.1:1986": 1,
            "127.0.0.1:1987": 1,
            "127.0.0.1:1988": 1,
            "127.0.0.1:1989": 1,
            "127.0.0.1:1990": 1,
            "127.0.0.1:1991": 1,
            "127.0.0.1:1992": 1,
            "127.0.0.1:1993": 1,
            "127.0.0.1:1994": 1,
            "127.0.0.1:1995": 1,
            "127.0.0.1:1996": 1,
            "127.0.0.1:1997": 1,
            "127.0.0.1:1998": 1,
            "127.0.0.1:1999": 1
        }
    }
}'
{"node":{"key":"\/apisix\/routes\/1","value":{"update_time":1610590265,"create_time":1610538147,"plugins":{},"priority":0,"status":1,"id":"1","uri":"\/hello","upstream":{"nodes":{"127.0.0.1:1981":1,"127.0.0.1:1982":1,"127.0.0.1:1983":1,"127.0.0.1:1984":1,"127.0.0.1:1985":1,"127.0.0.1:1986":1,"127.0.0.1:1987":1,"127.0.0.1:1980":1,"127.0.0.1:1989":1,"127.0.0.1:1990":1,"127.0.0.1:1991":1,"127.0.0.1:1992":1,"127.0.0.1:1993":1,"127.0.0.1:1994":1,"127.0.0.1:1995":1,"127.0.0.1:1996":1,"127.0.0.1:1997":1,"127.0.0.1:1998":1,"127.0.0.1:1999":1,"127.0.0.1:1988":1},"pass_host":"pass","hash_on":"vars","type":"roundrobin"}}},"action":"set"}
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.88ms    2.22ms  39.39ms   95.48%
    Req/Sec     5.11k     1.18k   10.61k    86.14%
  51352 requests in 5.10s, 204.76MB read
Requests/sec:  10068.77
Transfer/sec:     40.15MB
+ sleep 1
+ echo -e '\n\napisix: 1 worker + 1 upstream + no plugin + ewma'


apisix: 1 worker + 1 upstream + no plugin + ewma
+ curl http://127.0.0.1:9080/apisix/admin/routes/1 -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "uri": "/hello",
    "plugins": {
    },
    "upstream": {
        "type": "ewma",
        "nodes": {
            "127.0.0.1:1980": 1,
            "127.0.0.1:1981": 1,
            "127.0.0.1:1982": 1,
            "127.0.0.1:1983": 1,
            "127.0.0.1:1984": 1,
            "127.0.0.1:1985": 1,
            "127.0.0.1:1986": 1,
            "127.0.0.1:1987": 1,
            "127.0.0.1:1988": 1,
            "127.0.0.1:1989": 1,
            "127.0.0.1:1990": 1,
            "127.0.0.1:1991": 1,
            "127.0.0.1:1992": 1,
            "127.0.0.1:1993": 1,
            "127.0.0.1:1994": 1,
            "127.0.0.1:1995": 1,
            "127.0.0.1:1996": 1,
            "127.0.0.1:1997": 1,
            "127.0.0.1:1998": 1,
            "127.0.0.1:1999": 1
        }
    }
}'
{"node":{"key":"\/apisix\/routes\/1","value":{"update_time":1610590272,"create_time":1610538147,"plugins":{},"priority":0,"status":1,"id":"1","uri":"\/hello","upstream":{"nodes":{"127.0.0.1:1981":1,"127.0.0.1:1982":1,"127.0.0.1:1983":1,"127.0.0.1:1984":1,"127.0.0.1:1985":1,"127.0.0.1:1986":1,"127.0.0.1:1987":1,"127.0.0.1:1980":1,"127.0.0.1:1989":1,"127.0.0.1:1990":1,"127.0.0.1:1991":1,"127.0.0.1:1992":1,"127.0.0.1:1993":1,"127.0.0.1:1994":1,"127.0.0.1:1995":1,"127.0.0.1:1996":1,"127.0.0.1:1997":1,"127.0.0.1:1998":1,"127.0.0.1:1999":1,"127.0.0.1:1988":1},"pass_host":"pass","hash_on":"vars","type":"ewma"}}},"action":"set"}
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.98ms    1.98ms  24.63ms   96.23%
    Req/Sec     4.67k   771.19     8.17k    83.17%
  46953 requests in 5.10s, 187.21MB read
Requests/sec:   9207.02
Transfer/sec:     36.71MB
+ sleep 1
+ echo -e '\n\napisix: 1 worker + 1 upstream + no plugin + p2cewma'


apisix: 1 worker + 1 upstream + no plugin + p2cewma
+ curl http://127.0.0.1:9080/apisix/admin/routes/1 -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "uri": "/hello",
    "plugins": {
    },
    "upstream": {
        "type": "p2cewma",
        "nodes": {
            "127.0.0.1:1980": 1,
            "127.0.0.1:1981": 1,
            "127.0.0.1:1982": 1,
            "127.0.0.1:1983": 1,
            "127.0.0.1:1984": 1,
            "127.0.0.1:1985": 1,
            "127.0.0.1:1986": 1,
            "127.0.0.1:1987": 1,
            "127.0.0.1:1988": 1,
            "127.0.0.1:1989": 1,
            "127.0.0.1:1990": 1,
            "127.0.0.1:1991": 1,
            "127.0.0.1:1992": 1,
            "127.0.0.1:1993": 1,
            "127.0.0.1:1994": 1,
            "127.0.0.1:1995": 1,
            "127.0.0.1:1996": 1,
            "127.0.0.1:1997": 1,
            "127.0.0.1:1998": 1,
            "127.0.0.1:1999": 1
        }
    }
}'
{"node":{"key":"\/apisix\/routes\/1","value":{"update_time":1610590279,"create_time":1610538147,"plugins":{},"priority":0,"status":1,"id":"1","uri":"\/hello","upstream":{"nodes":{"127.0.0.1:1981":1,"127.0.0.1:1982":1,"127.0.0.1:1983":1,"127.0.0.1:1984":1,"127.0.0.1:1985":1,"127.0.0.1:1986":1,"127.0.0.1:1987":1,"127.0.0.1:1980":1,"127.0.0.1:1989":1,"127.0.0.1:1990":1,"127.0.0.1:1991":1,"127.0.0.1:1992":1,"127.0.0.1:1993":1,"127.0.0.1:1994":1,"127.0.0.1:1995":1,"127.0.0.1:1996":1,"127.0.0.1:1997":1,"127.0.0.1:1998":1,"127.0.0.1:1999":1,"127.0.0.1:1988":1},"pass_host":"pass","hash_on":"vars","type":"p2cewma"}}},"action":"set"}
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.71ms    1.34ms  28.22ms   96.96%
    Req/Sec     5.08k   668.32     5.80k    83.00%
  50537 requests in 5.00s, 201.50MB read
Requests/sec:  10101.92
Transfer/sec:     40.28MB

@spacewander
Copy link
Member

@hnlq715
Look like ingress-controller has switched its ewma implementation to p2cewma: https://github.com/kubernetes/ingress-nginx/blob/master/rootfs/etc/nginx/lua/balancer/ewma.lua.

Our ewma balancer is forked from an old version of its ewma implementation. Maybe we can replace our ewma with the p2cewma so that we are able to catch up with ingress-controller?

@sysulq
Copy link
Contributor Author

sysulq commented Jan 14, 2021

@spacewander
Yep, seems like it has switched to p2cewma, I'd like to benchmark this implemention too.

https://github.com/kubernetes/ingress-nginx/blob/master/rootfs/etc/nginx/lua/balancer/ewma.lua#L126

@sysulq
Copy link
Contributor Author

sysulq commented Jan 14, 2021

I think we should not use shared.DICT to store ewma data, to avoid using lock, which could improve performance significantly.
roundrobin: 10807.92 qps
ewma from ingress: 9246.09 qps
p2cewma: 10234.47 qps

apisix: 1 worker + 1 upstream + no plugin
+ curl http://127.0.0.1:9080/apisix/admin/routes/1 -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "uri": "/hello",
    "plugins": {
    },
    "upstream": {
        "type": "roundrobin",
        "nodes": {
            "127.0.0.1:1980": 1,
            "127.0.0.1:1981": 1,
            "127.0.0.1:1982": 1,
            "127.0.0.1:1983": 1,
            "127.0.0.1:1984": 1,
            "127.0.0.1:1985": 1,
            "127.0.0.1:1986": 1,
            "127.0.0.1:1987": 1,
            "127.0.0.1:1988": 1,
            "127.0.0.1:1989": 1,
            "127.0.0.1:1990": 1,
            "127.0.0.1:1991": 1,
            "127.0.0.1:1992": 1,
            "127.0.0.1:1993": 1,
            "127.0.0.1:1994": 1,
            "127.0.0.1:1995": 1,
            "127.0.0.1:1996": 1,
            "127.0.0.1:1997": 1,
            "127.0.0.1:1998": 1,
            "127.0.0.1:1999": 1
        }
    }
}'
{"action":"set","node":{"key":"\/apisix\/routes\/1","value":{"uri":"\/hello","status":1,"upstream":{"type":"roundrobin","nodes":{"127.0.0.1:1984":1,"127.0.0.1:1985":1,"127.0.0.1:1986":1,"127.0.0.1:1987":1,"127.0.0.1:1988":1,"127.0.0.1:1989":1,"127.0.0.1:1980":1,"127.0.0.1:1981":1,"127.0.0.1:1992":1,"127.0.0.1:1993":1,"127.0.0.1:1994":1,"127.0.0.1:1995":1,"127.0.0.1:1996":1,"127.0.0.1:1997":1,"127.0.0.1:1998":1,"127.0.0.1:1999":1,"127.0.0.1:1991":1,"127.0.0.1:1990":1,"127.0.0.1:1982":1,"127.0.0.1:1983":1},"hash_on":"vars","pass_host":"pass"},"create_time":1610538147,"update_time":1610595383,"id":"1","plugins":{},"priority":0}}}
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.72ms    1.95ms  36.29ms   96.83%
    Req/Sec     5.43k   726.03     6.36k    75.49%
  55119 requests in 5.10s, 219.78MB read
Requests/sec:  10807.92
Transfer/sec:     43.09MB
+ sleep 1
+ echo -e '\n\napisix: 1 worker + 1 upstream + no plugin + ewma'


apisix: 1 worker + 1 upstream + no plugin + ewma
+ curl http://127.0.0.1:9080/apisix/admin/routes/1 -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "uri": "/hello",
    "plugins": {
    },
    "upstream": {
        "type": "ewma",
        "nodes": {
            "127.0.0.1:1980": 1,
            "127.0.0.1:1981": 1,
            "127.0.0.1:1982": 1,
            "127.0.0.1:1983": 1,
            "127.0.0.1:1984": 1,
            "127.0.0.1:1985": 1,
            "127.0.0.1:1986": 1,
            "127.0.0.1:1987": 1,
            "127.0.0.1:1988": 1,
            "127.0.0.1:1989": 1,
            "127.0.0.1:1990": 1,
            "127.0.0.1:1991": 1,
            "127.0.0.1:1992": 1,
            "127.0.0.1:1993": 1,
            "127.0.0.1:1994": 1,
            "127.0.0.1:1995": 1,
            "127.0.0.1:1996": 1,
            "127.0.0.1:1997": 1,
            "127.0.0.1:1998": 1,
            "127.0.0.1:1999": 1
        }
    }
}'
{"action":"set","node":{"key":"\/apisix\/routes\/1","value":{"uri":"\/hello","status":1,"upstream":{"type":"ewma","nodes":{"127.0.0.1:1984":1,"127.0.0.1:1985":1,"127.0.0.1:1986":1,"127.0.0.1:1987":1,"127.0.0.1:1988":1,"127.0.0.1:1989":1,"127.0.0.1:1980":1,"127.0.0.1:1981":1,"127.0.0.1:1992":1,"127.0.0.1:1993":1,"127.0.0.1:1994":1,"127.0.0.1:1995":1,"127.0.0.1:1996":1,"127.0.0.1:1997":1,"127.0.0.1:1998":1,"127.0.0.1:1999":1,"127.0.0.1:1991":1,"127.0.0.1:1990":1,"127.0.0.1:1982":1,"127.0.0.1:1983":1},"hash_on":"vars","pass_host":"pass"},"create_time":1610538147,"update_time":1610595390,"id":"1","plugins":{},"priority":0}}}
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.04ms    2.23ms  29.88ms   95.80%
    Req/Sec     4.65k   751.32     5.77k    80.00%
  46328 requests in 5.01s, 184.73MB read
Requests/sec:   9246.09
Transfer/sec:     36.87MB
+ sleep 1
+ echo -e '\n\napisix: 1 worker + 1 upstream + no plugin + p2cewma'


apisix: 1 worker + 1 upstream + no plugin + p2cewma
+ curl http://127.0.0.1:9080/apisix/admin/routes/1 -H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '
{
    "uri": "/hello",
    "plugins": {
    },
    "upstream": {
        "type": "p2cewma",
        "nodes": {
            "127.0.0.1:1980": 1,
            "127.0.0.1:1981": 1,
            "127.0.0.1:1982": 1,
            "127.0.0.1:1983": 1,
            "127.0.0.1:1984": 1,
            "127.0.0.1:1985": 1,
            "127.0.0.1:1986": 1,
            "127.0.0.1:1987": 1,
            "127.0.0.1:1988": 1,
            "127.0.0.1:1989": 1,
            "127.0.0.1:1990": 1,
            "127.0.0.1:1991": 1,
            "127.0.0.1:1992": 1,
            "127.0.0.1:1993": 1,
            "127.0.0.1:1994": 1,
            "127.0.0.1:1995": 1,
            "127.0.0.1:1996": 1,
            "127.0.0.1:1997": 1,
            "127.0.0.1:1998": 1,
            "127.0.0.1:1999": 1
        }
    }
}'
{"action":"set","node":{"key":"\/apisix\/routes\/1","value":{"uri":"\/hello","status":1,"upstream":{"type":"p2cewma","nodes":{"127.0.0.1:1984":1,"127.0.0.1:1985":1,"127.0.0.1:1986":1,"127.0.0.1:1987":1,"127.0.0.1:1988":1,"127.0.0.1:1989":1,"127.0.0.1:1980":1,"127.0.0.1:1981":1,"127.0.0.1:1992":1,"127.0.0.1:1993":1,"127.0.0.1:1994":1,"127.0.0.1:1995":1,"127.0.0.1:1996":1,"127.0.0.1:1997":1,"127.0.0.1:1998":1,"127.0.0.1:1999":1,"127.0.0.1:1991":1,"127.0.0.1:1990":1,"127.0.0.1:1982":1,"127.0.0.1:1983":1},"hash_on":"vars","pass_host":"pass"},"create_time":1610538147,"update_time":1610595397,"id":"1","plugins":{},"priority":0}}}
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.76ms    2.01ms  41.86ms   97.58%
    Req/Sec     5.15k   597.40     5.84k    85.29%
  52193 requests in 5.10s, 208.11MB read
Requests/sec:  10234.47
Transfer/sec:     40.81MB

@membphis membphis added the enhancement New feature or request label Jan 14, 2021
@membphis
Copy link
Member

the performance is enough, welcome PR @hnlq715

@ElvinEfendi
Copy link

My 2 cents: https://github.com/kubernetes/ingress-nginx/blob/master/rootfs/etc/nginx/lua/balancer/ewma.lua has always been configured to be p2c if you check the local PICK_SET_SIZE = 2. It is just implemented generically - but I think it makes sense to delete the loop and make the implementation specific to p2c.

As to using shared dictionary, I've actually tried that in ingress-nginx: kubernetes/ingress-nginx#3295, and we ran with it for at least one version. But then I brought back the shared dictionary and locking in kubernetes/ingress-nginx#4448. In my company, we saw issues with 20 NGINX replicas and 1000 backends. Each NGINX replica with 40 workers. Without the shared dictionary we were seeing load balancing problems, the NGINX workers were taking too long to "know" that a backend is overloaded.

@moonming
Copy link
Member

@ElvinEfendi Thank you for sharing, it is very helpful.
@hnlq715 @membphis what do you think?

@sysulq
Copy link
Contributor Author

sysulq commented Jan 15, 2021

@ElvinEfendi Very insightful, thanks

@moonming @membphis
Reviewing nginx's random implemention, which supports sharing data between worker processes by shared memory.
And this article from nginx org demonstrates p2c configured with upstream zone.

Maybe it's more wiser to use the shared dict to share data between worker processes, although with some performance lose.

@tokers
Copy link
Contributor

tokers commented Jan 15, 2021

@ElvinEfendi Very insightful, thanks

@moonming @membphis
Reviewing nginx's random implemention, which supports sharing data between worker processes by shared memory.
And this article from nginx org demonstrates p2c configured with upstream zone.

Maybe it's more wiser to use the shared dict to share data between worker processes, although with some performance lose.

If i remember correctly, only Nginx Plus supports shared load balancing data among workers. I am also support to store data in shared memory, the performance loss should be acceptable.

@sysulq
Copy link
Contributor Author

sysulq commented Jan 15, 2021

@tokers

Nginx features are merged into open source version smoothly :-)
http://nginx.org/en/docs/http/ngx_http_upstream_module.html#zone

Syntax: | zone name [size];
-- | --
—
upstream

This directive appeared in version 1.9.0.

@tokers
Copy link
Contributor

tokers commented Jan 15, 2021

-- | --

OK, My Nginx version is old :)

@spacewander
Copy link
Member

Solved. Open another issue if you want to implement other p2c balancer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants