这一节主要讲解Nginx的负载均衡算法,并介绍Tengine新引入的vnswrr算法(目前我司的接入层使用的负载均衡算法)。在讨论不同的算法前,重申下前提,下面的算法基本都会在每次选择一个节点之后,修改节点的current_weight,这种情况其实是线程不安全的,但是,由于Nginx是多进程(Worker),Master进程fork出Worker进程之后,两个进程的地址空间不共享(在读的时候共享,发生写的时候会复制一份),Worker和Master以及Worker之间都是线程安全的,且Worker内部是单线程的eventloop,因此是没有问题的。
一、(Weighted)Round Robin (带权重)轮询算法是非常常用的负载均衡算法,例如按照下面的配置三个权重分别是1、2、3的三个server,分别用A、B、C表示
1 2 3 4 5 upstream hello { server 127.0.0.1:9090 weight=1; server 127.0.0.1:9090 weight=2; server 127.0.0.1:9090 weight=3; }
为了方便选择,引入current_weight的概念,current_weight初始值为0,同时总的权重值称为total_weight,当前为6。算法执行的过程是,每次server节点增加weight到currnet_weight,然后选择current_weight最大的,并将这个节点的current_weight减去total_weigth,具体如下图,绿色节点代表本轮选中的节点 上述一轮选择过程C被选中3次,B被选中2次,A被选中1次,符合权重的设置,且选择较为平滑,每个节点没有被连续选中。注意在实际使用过程,上述weight其实用的是effective_weight,表示当前节点的动态权重,这个权重不会超过weight,因为节点运行过程可能会上下线或者故障,因此effective_weight其实是变化的,整个算法的时间复杂度为O(N)。Nginx里对应的选择逻辑的核心算法如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 static ngx_http_upstream_rr_peer_t *ngx_http_upstream_get_peer(ngx_http_upstream_rr_peer_data_t *rrp) { time_t now; uintptr_t m; #if (T_NGX_HTTP_UPSTREAM_RANDOM) ngx_int_t total, flag; #else ngx_int_t total; #endif ngx_uint_t i, n, p; ngx_http_upstream_rr_peer_t *peer, *best; now = ngx_time(); best = NULL ; total = 0 ; #if (NGX_SUPPRESS_WARN) p = 0 ; #endif #if (T_NGX_HTTP_UPSTREAM_RANDOM) if (rrp->peers->init_number == NGX_CONF_UNSET_UINT) { rrp->peers->init_number = ngx_random() % rrp->peers->number; } for (peer = rrp->peers->peer, i = 0 ; i < rrp->peers->init_number; i++) { peer = peer->next; } flag = 1 ; for (i = rrp->peers->init_number; i != rrp->peers->init_number || flag; i = (i + 1 ) % rrp->peers->number, peer = peer->next ? peer->next : rrp->peers->peer) { flag = 0 ; #else for (peer = rrp->peers->peer, i = 0 ; peer; peer = peer->next, i++) { #endif n = i / (8 * sizeof (uintptr_t )); m = (uintptr_t ) 1 << i % (8 * sizeof (uintptr_t )); if (rrp->tried[n] & m) { continue ; } if (peer->down) { continue ; } #if (NGX_HTTP_UPSTREAM_CHECK) if (ngx_http_upstream_check_peer_down(peer->check_index)) { continue ; } #endif if (peer->max_fails && peer->fails >= peer->max_fails && now - peer->checked <= peer->fail_timeout) { continue ; } if (peer->max_conns && peer->conns >= peer->max_conns) { continue ; } peer->current_weight += peer->effective_weight; total += peer->effective_weight; if (peer->effective_weight < peer->weight) { peer->effective_weight++; } if (best == NULL || peer->current_weight > best->current_weight) { best = peer; p = i; } } if (best == NULL ) { return NULL ; } rrp->current = best; n = p / (8 * sizeof (uintptr_t )); m = (uintptr_t ) 1 << p % (8 * sizeof (uintptr_t )); rrp->tried[n] |= m; best->current_weight -= total; if (now - best->checked > best->fail_timeout) { best->checked = now; } return best; }
二、(Weighted)Random 在权重的基础上,随机选择,配置文件如下
1 2 3 4 5 6 upstream hello { random; server 127.0.0.1:9090 weight=1; server 127.0.0.1:9090 weight=2; server 127.0.0.1:9090 weight=3; }
选择的流程例子如下 Nginx代码实现如下,时间复杂度为O(log(TW))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 static ngx_int_t ngx_http_upstream_update_random(ngx_pool_t *pool, ngx_http_upstream_srv_conf_t *us) { total_weight = 0 ; for (peer = peers->peer, i = 0 ; peer; peer = peer->next, i++) { ranges[i].peer = peer; ranges[i].range = total_weight; total_weight += peer->weight; } rcf->ranges = ranges; return NGX_OK; } static ngx_uint_t ngx_http_upstream_peek_random_peer(ngx_http_upstream_rr_peers_t *peers, ngx_http_upstream_random_peer_data_t *rp) { ngx_uint_t i, j, k, x; x = ngx_random() % peers->total_weight; i = 0 ; j = peers->number; while (j - i > 1 ) { k = (i + j) / 2 ; if (x < rp->conf->ranges[k].range) { j = k; } else { i = k; } } return i; }
三、(Weigted)Hash 配置文件如下,下面是根据请求的uri哈希,也就是每次只要请求的uri不变,打到的upstream节点也是不会变的
1 2 3 4 5 6 upstream hello { hash $request_uri ; server 127.0.0.1:9090 weight=1; server 127.0.0.1:9090 weight=2; server 127.0.0.1:9090 weight=3; }
核心算法如下,先通过crc32算法计算出hash,然后将hash对总权重取模,再利用Random算法类似的原理根据权重选节点(只存在输入参数的差异),只是这里用的方法的时间复杂度是O(N)
1 2 3 4 5 6 7 8 9 10 w = hp->hash % hp->rrp.peers->total_weight; peer = hp->rrp.peers->peer; p = 0 ; while (w >= peer->weight) { w -= peer->weight; peer = peer->next; p++; }
除了上述这种通用的hash,还存在ip_hash,算法基本是一样,只是hash的来源换成了client的ip,这里就不赘述。值得注意的是hash方式容易产生流量分配不均的问题,因此hash参数的选择需要慎重,一般是在statefull的后端(例如带session的情况)才需要使用基于Hash的负载均衡。crc32的hash离散程度应该是不及mur_mur_hash的。
四、(Weighted)Least Connection 按照每个server上的连接数来决定,同时考虑了权重,例如节点A权重为2,连接数为2,节点B权重为4,连接数为3,在选择的时候会选择B。 配置文件如下
1 2 3 4 5 6 upstream hello { least_conn; server 127.0.0.1:9090 weight=1; server 127.0.0.1:9090 weight=2; server 127.0.0.1:9090 weight=3; }
核心算法如下,当遇到多个节点连接数一样的时候(many=1),会用之前讲到的weight round robin来选择出一个,整体时间复杂度为O(N)
1 2 3 4 5 6 7 8 9 10 11 12 if (best == NULL || peer->conns * best->weight < best->conns * peer->weight) { best = peer; many = 0 ; p = i; } else if (peer->conns * best->weight == best->conns * peer->weight) { many = 1 ; }
五、vnswrr(Virtual Node Smooth Weighted Round Robin) 这个负载均衡算法是淘宝开源的Tengine新加入的算法,配置文件如下,只是多了个vnswrr
1 2 3 4 5 6 upstream hello { vnswrr; server 127.0.0.1:9090 weight=1; server 127.0.0.1:9090 weight=2; server 127.0.0.1:9090 weight=3; }
其执行过程如下图,这里把起始点进行了随机选择(图中选择第三个),其实质就是按照权重把节点都创建到一个数组(按照 weight round robin的顺序),然后轮流选择 就是这个傻瓜算法,Tengine官方没提供详细的介绍,而且取了个玄乎的名字。算法属于空间换时间,时间复杂度从O(N)降低O(1),而空间复杂度从O(1)提升到了O(TW),Tengine官方声称性能60%(节点数2000个时),但是没提供具体的权重数据,如果权重达到50000时,同时每个虚拟节点占用16B,会占用1G+的内存,当然实际使用过程一般不会配置这么大的权重。 Tengine的具体代码实现如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 static ngx_http_upstream_rr_peer_t *ngx_http_upstream_get_vnswrr(ngx_http_upstream_vnswrr_peer_data_t *vnsp) { time_t now; uintptr_t m; ngx_uint_t i, n, p, flag, begin_number; ngx_http_upstream_rr_peer_t *peer, *best; ngx_http_upstream_rr_peers_t *peers; ngx_http_upstream_rr_vpeers_t *vpeers; ngx_http_upstream_rr_peer_data_t *rrp; ngx_http_upstream_vnswrr_srv_conf_t *uvnscf; now = ngx_time(); best = NULL ; #if (NGX_SUPPRESS_WARN) p = 0 ; #endif rrp = &vnsp->rrp; peers = rrp->peers; uvnscf = vnsp->uvnscf; vpeers = uvnscf->vpeers; if (uvnscf->last_number == NGX_CONF_UNSET_UINT) { uvnscf->init_number = ngx_random() % peers->number; if (peers->weighted) { peer = vpeers[uvnscf->init_number].vpeer; } else { for (peer = peers->peer, i = 0 ; i < uvnscf->init_number; i++) { peer = peer->next; } } uvnscf->last_number = uvnscf->init_number; uvnscf->last_peer = peer; } if (peers->weighted) { if (uvnscf->vnumber != peers->total_weight && (uvnscf->last_number + 1 == uvnscf->vnumber)) { n = peers->total_weight - uvnscf->vnumber; if (n > peers->number) { n = peers->number; } ngx_http_upstream_init_virtual_peers(peers, uvnscf, uvnscf->vnumber, n + uvnscf->vnumber); } begin_number = (uvnscf->last_number + 1 ) % uvnscf->vnumber; peer = vpeers[begin_number].vpeer; } else { if (uvnscf->last_peer && uvnscf->last_peer->next) { begin_number = (uvnscf->last_number + 1 ) % peers->number; peer = uvnscf->last_peer->next; } else { begin_number = 0 ; peer = peers->peer; } } for (i = begin_number, flag = 1 ; i != begin_number || flag; i = peers->weighted ? ((i + 1 ) % uvnscf->vnumber) : ((i + 1 ) % peers->number), peer = peers->weighted ? vpeers[i].vpeer : (peer->next ? peer->next : peers->peer)) { flag = 0 ; if (peers->weighted) { n = peers->total_weight - uvnscf->vnumber; if (n > peers->number) { n = peers->number; } ngx_http_upstream_init_virtual_peers(peers, uvnscf, uvnscf->vnumber, n + uvnscf->vnumber); n = vpeers[i].rindex / (8 * sizeof (uintptr_t )); m = (uintptr_t ) 1 << vpeers[i].rindex % (8 * sizeof (uintptr_t )); } else { n = i / (8 * sizeof (uintptr_t )); m = (uintptr_t ) 1 << i % (8 * sizeof (uintptr_t )); } if (rrp->tried[n] & m) { continue ; } if (peer->down) { continue ; } if (peer->max_fails && peer->fails >= peer->max_fails && now - peer->checked <= peer->fail_timeout) { continue ; } #if defined(nginx_version) && nginx_version >= 1011005 if (peer->max_conns && peer->conns >= peer->max_conns) { continue ; } #endif #if (NGX_HTTP_UPSTREAM_CHECK) if (ngx_http_upstream_check_peer_down(peer->check_index)) { continue ; } #endif best = peer; uvnscf->last_peer = peer; uvnscf->last_number = i; p = i; break ; } if (best == NULL ) { return NULL ; } rrp->current = best; if (peers->weighted) { n = vpeers[p].rindex / (8 * sizeof (uintptr_t )); m = (uintptr_t ) 1 << vpeers[p].rindex % (8 * sizeof (uintptr_t )); } else { n = p / (8 * sizeof (uintptr_t )); m = (uintptr_t ) 1 << p % (8 * sizeof (uintptr_t )); } rrp->tried[n] |= m; if (now - best->checked > best->fail_timeout) { best->checked = now; } return best; }
六、总结 负载均衡算法是Nginx比较基础的功能,但是大部分算法的时间复杂度都达到了O(N)级别,只有random达到了O(log(TW))级别,不知道为什么没有优化(如果upstream里节点较少其实影响不大)。vnswrr算法性能更好,但是官方没有patch到Nginx主分支中,这又是什么原因呢? 个人理解可能是国外开发者比较注重代码的可读性、简单性,简单的代码是最不容易出错的,因为高效率的算法也是有代价的,例如vnswrr的代价就是内存占用多了如果权重配的很大,同时后端的server节点数量很多时,每个Worker的负载均衡的内存使用可能要上G,也是对内存的一种浪费。
七、Reference
Tengine
深入浅出Nginx负载均衡算法
ngx_http_upstream_vnswrr_module