- 博客/
Keepalived+nginx LB高可用集群实现
keepalived简介#
keepalived是一款解决负载均衡(调度器)的高可用问题的软件,可以实现负载均衡器的故障转移(failover),原生设计的目的为了高可用ipvs服务,后来增加了脚本调用接口,使其一定程度上具有了管控服务进程的功能,基于此接口也可以支持nginx、haproxy负载均衡的高可用。keepalived是VRRP协议的软件实现,也基于VRRP协议完成地址流动,那什么是VRRP协议呢?
VRRP协议#
VRRP(Virtual Routing Redundant Protocol)是解决路由器冗余的虚拟路由器冗余协议。
在VRRP的拓扑模型中,主节点会周期性向多点传送地址(224.0.0.18)发送组播信息即VRRP报文,备用节点若一段时间内没有收到VRRP报文信息,备用节点就会认为主节点已经宕机,然后映射VIP(虚拟地址),发生故障转移。
VRRP报文包括以下三项数据:
- VIP 虚拟IP地址
- Virtual Rtr ID 虚拟路由器ID
一台物理路由器可能有多个接口,不同物理路由器的接口可以组合出多个虚拟路由器,以此ID区分
Priority 优先顺序
VIP优先漂移到高优先级的节点
keepalived特点#
- 基于VRRP协议实现负载均衡器的故障转移
- 为集群内所有节点生成ipvs规则(在配置文件中预先定义)
- 为ipvs集群的各RS做健康状态检测
- 基于脚本调用接口通过执行脚本完成脚本中定义的功能,进而影响集群事务;正是基于此接口,keepalived才能实现nginx负载均衡器的故障转移
本文将介绍单主、双主模型下nginx负载均衡高可用集群的实现
单主模型KA高可用集群实现#
拓扑模型#
在上图网络拓扑中,nginx server1和server2为安装keepalived的两台负载均衡器:
- 主节点nginx server1
正常工作时(active状态)
,VIP映射在其网卡上,它负责将客户端请求分发至后端real server,此时nginx server2备用状态(passive)
- nginx server1发生故障后,备用节点nginx server2就会变成主节点,VIP漂移映射到sever2上,之后server2 代替server1响应客户端并分发请求,实现故障转移
一个节点处于active状态,另一个节点处于passive状态的KA高可用模型就称为单主模型
IP地址分配#
nginx LB sever1:
ens33: 192.168.196.131/24
ens37: 172.16.253.93/16
nginx LB sever2:
ens37: 192.168.196.133/24
ens33: 172.16.251.171/16
Web server1:
ens37: 192.168.196.129/24
Web server2:
ens37: 192.168.196.132/24
VIP: 172.16.255.168/16 –>对外公开提供服务地址
服务配置#
完成服务器准备及IP配置后,首先要同步时间,保证各服务器系统时间一致性,可以借助ntp或者chrony服务。
Web server 的配置#
- server1:
$yum install -y nginx
$echo "backend server 1" /usr/share/nginx/html/index.html
$systemctl start nginx
- server2:
$yum install -y nginx
$echo "backend server 2" /usr/share/nginx/html/index.html
$systemctl start nginx
nginx server 的配置#
server1 –> 主节点
- 安装配置keepalived
$ yum install -y keepalived
$ vim /etc/keepalived/keepalived.conf <--修改主配置文件
global_defs {
notification_email { <--通知邮件相关设置
root@localhost <--邮件发送目标地址
}
notification_email_from keepalived@localhost <--邮件发件人
smtp_server 127.0.0.1 <--使用本机邮件服务
smtp_connect_timeout 30
router_id node1 <--物理路由节点名称
vrrp_mcast_group4 224.3.10.67 <--发送VRRP报文的组播地址
script_user root
enable_script_security
}
vrrp_script chk_down { <--参考注释[1]定义实现手动维护server的脚本
script "/usr/bin/test -f /etc/keepalived/down && exit 1 || exit 0"
interval 1
weight -10
}
vrrp_script chk_nginx { <--参考注释[2]定义实现检查nginx服务健康检查周期的脚本
script ""/usr/bin/killall -0 nginx && exit 0 || exit 1"
interval 1 <--检测间隔
weight -10 <--脚本执行状态结果为非0则该虚拟路由优先级-10
fall 2 <--失败检测两次
rise 1 <--成功检测一次
}
vrrp_instance VI_1 { <--设置vrrp实例即虚拟路由
state MASTER <--设置虚拟路由为主节点
interface ens37 <--设置进行收发vrrp报文的网卡
virtual_router_id 51 <--虚拟路由器ID,唯一性
priority 100
advert_int <--vrrp通告的时间间隔
authentication {
auth_type PASS
auth_pass AhpaeQ9J <--VRRP报文合法性认证
}
virtual_ipaddress {
172.16.255.168/16 dev ens37 <--设置VIP
}
track_script { <--调用已定义脚本
chk_down
chk_nginx
}
notify_master "/etc/keepalived/notify.sh master" <--参考注释[3]
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
$ scp /etc/keepalived/notify.sh 192.168.196.133:/etc/keepalived/notify.sh
[1]检测down文件存在,则脚本执行状态结果为1,调用该脚本的vrrp实例(即虚拟路由器)的优先级减10;本文中master优先级设为100,减10变为90,则小于backup节点的优先级95,backup节点变为主节点,实现故障转移
[2]检测到nginx服务进程没有运行,则脚本执行状态结果为1,调用该脚本的vrrp实例的优先级减10
[3]通用格式的通知触发机制,一个脚本可完成master、backup、fault三种状态的转换时的通知
$ vim /etc/keepalived/notify.sh
#!/bin/bash
#
contact='root@localhost'
notify() {
local mailsubject="$(hostname) to be $1, vip floating"
local mailbody="$(date +'%F %T'): vrrp transition, $(hostname) changed to be $1"
echo "$mailbody" | mail -s "$mailsubject" $contact
}
case $1 in
master)
notify master
;;
backup)
notify backup
;;
fault)
notify fault
;;
*)
echo "Usage: $(basename $0) {master|backup|fault}"
exit 1
;;
esac
- 配置nginx代理
$ vim /etc/nginx/nginx.conf
upstream srvs {
server 192.168.196.129;
server 192.168.196.132;
server 127.0.0.1:8080 backup; <--nginx热备实现sorry server,注意80端口已被占用
}
sever{
...
location / {
proxy_pass http://srvs;
}
...
}
$ vim /etc/nginx/conf.d/web2.conf <--设置监听8080的虚拟主机做sorry server
server{
server_name www.web2.com;
listen 8080 ;
root /var/www/html/web2;
}
}
$ echo 'nginx LB sorry server 1' > /var/www/html/web2/index.html
$ systemctl start nginx
server2 –> 备用节点
- 安装配置keepalived
$ yum install -y keepalived
$ vim /etc/keepalived/keepalived.conf
global_defs {
notification_email {
root@localhost
}
notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id node2
vrrp_mcast_group4 224.3.10.67 <--相同的组播地址
script_user root
enable_script_security
}
vrrp_script chk_down {
script "/usr/bin/test -f /etc/keepalived/down && exit 1 || exit 0"
interval 1
weight -10
}
vrrp_script chk_nginx {
script ""/usr/bin/killall -0 nginx && exit 0 || exit 1"
interval 1
weight -10
fall 2
rise 1
}
vrrp_instance VI_1 {
state BACKUP <--状态为backup
interface ens33
virtual_router_id 51
priority 95 <--优先级要低于master
advert_int 1
authentication {
auth_type PASS
auth_pass AhpaeQ9J
}
virtual_ipaddress {
172.16.255.168/16 dev ens33 <--相同的VIP
}
track_script {
chk_down
chk_nginx
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
- 配置nginx代理
$ vim /etc/nginx/nginx.conf
upstream srvs {
server 192.168.196.129;
server 192.168.196.132;
server 127.0.0.1:8080 backup;
}
server{
...
location / {
proxy_passhttp://srvs;
}
...
}
$ vim /etc/nginx/conf.d/vhost.conf
server{
server_name www.vhost.com;
listen 8080;
root "/usr/share/nginx/html";
}
$ systemctl start nginx
启用keepalived并测试#
1.node1
$ systemctl start keepalived
$ systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2017-09-05 16:24:29 CST; 4s ago
Process: 3721 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 3722 (keepalived)
CGroup: /system.slice/keepalived.service
├─3722 /usr/sbin/keepalived -D
├─3723 /usr/sbin/keepalived -D
└─3724 /usr/sbin/keepalived -D
Sep 05 16:24:30 centos7.ffu.com Keepalived_healthcheckers[3723]: Registering Kernel netlink reflector
Sep 05 16:24:30 centos7.ffu.com Keepalived_healthcheckers[3723]: Registering Kernel netlink command channel
Sep 05 16:24:30 centos7.ffu.com Keepalived_healthcheckers[3723]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 05 16:24:30 centos7.ffu.com Keepalived_healthcheckers[3723]: Configuration is using : 7553 Bytes
Sep 05 16:24:30 centos7.ffu.com Keepalived_healthcheckers[3723]: Using LinkWatch kernel netlink reflector...
Sep 05 16:24:30 centos7.ffu.com Keepalived_vrrp[3724]: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 05 16:24:31 centos7.ffu.com Keepalived_vrrp[3724]: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 05 16:24:31 centos7.ffu.com Keepalived_vrrp[3724]: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 05 16:24:31 centos7.ffu.com Keepalived_vrrp[3724]: VRRP_Instance(VI_1) Sending gratuitous ARPs on ens37 f....168
Sep 05 16:24:31 centos7.ffu.com Keepalived_healthcheckers[3723]: Netlink reflector reports IP 172.16.255.168 added
Hint: Some lines were ellipsized, use -l to show in full.
从状态信息可以看出,node1成为master节点,并映射了VIP到指定的物理网卡;ip a list ens37
命令查看
$ ip a list ens37
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:f1:c7:d5 brd ff:ff:ff:ff:ff:ff
inet 172.16.253.93/16 brd 172.16.255.255 scope global dynamic ens37
valid_lft 62845sec preferred_lft 62845sec
inet 172.16.255.168/16 scope global secondary ens37
valid_lft forever preferred_lft forever
inet6 fe80::fce:4707:e290:f1e0/64 scope link
valid_lft forever preferred_lft forever
mail查看邮件,确认收到了node状态转换的邮件通知
$ mail
Heirloom Mail version 12.5 7/5/10. Type ? for help.
"/var/spool/mail/root": 1 message 1 new
>N 1 root Tue Sep 5 17:26 18/681 "centos7.ffu.com to be master, vip floating"
2.node2
$ start keepalived
$ systemctl status keepalived -l
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2017-09-05 16:38:24 CST; 19s ago
Process: 8822 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 8825 (keepalived)
CGroup: /system.slice/keepalived.service
├─8825 /usr/sbin/keepalived -D
├─8826 /usr/sbin/keepalived -D
└─8827 /usr/sbin/keepalived -D
Sep 05 16:38:24 Centos7 Keepalived_healthcheckers[8826]: Netlink reflector reports IP fe80::d254:507:7c6e:2e23 added
Sep 05 16:38:24 Centos7 Keepalived_healthcheckers[8826]: Netlink reflector reports IP fe80::250:56ff:fe20:3183 added
Sep 05 16:38:24 Centos7 Keepalived_healthcheckers[8826]: Registering Kernel netlink reflector
Sep 05 16:38:24 Centos7 Keepalived_healthcheckers[8826]: Registering Kernel netlink command channel
Sep 05 16:38:24 Centos7 Keepalived_healthcheckers[8826]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 05 16:38:24 Centos7 Keepalived_healthcheckers[8826]: Configuration is using : 7807 Bytes
Sep 05 16:38:24 Centos7 Keepalived_healthcheckers[8826]: Using LinkWatch kernel netlink reflector...
Sep 05 16:38:24 Centos7 Keepalived_vrrp[8827]: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 05 16:38:24 Centos7 Keepalived_vrrp[8827]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
Sep 05 16:38:24 Centos7 Keepalived_vrrp[8827]: VRRP_Script(chk_down) succeeded
Sep 05 16:38:24 Centos7 Keepalived_vrrp[8827]: VRRP_Script(chk_nginx) succeeded
node2优先级95低于node1的100,从状态信息可以看出,chk_down/chk_nginx 脚本执行状态返回结果均为0显示success,node2成为backup节点,不会映射VIP
$ mail
Heirloom Mail version 12.5 7/5/10. Type ? for help.
"/var/spool/mail/root": 1 message 1 new
>N 1 root Tue Sep 5 17:26 18/693 "Centos7 to be backup, vip floating"
3.client测试
- 通过VIP访问Web服务,成功实现调度
$ while : ; do curl http://172.16.255.168 ;sleep 1 ;done
backend server 1
backend server 2
backend server 1
backend server 2
- 手动停止node1 nginx进程,脚本chk_nginx状态返回值为1,priority减为90小于node2的95,node2节点则成为master节点,VIP漂移并完成故障转移。同样可以查看keepalived、IP、mail状态
–> node1
$ systemctl stop nginx
$ systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2017-09-05 16:37:24 CST; 11min ago
... 中间省略...
Sep 05 16:47:53 centos7.ffu.com Keepalived_vrrp[5171]: VRRP_Script(chk_nginx) failed
Sep 05 16:47:55 centos7.ffu.com Keepalived_vrrp[5171]: VRRP_Instance(VI_1) Received higher prio advert
Sep 05 16:47:55 centos7.ffu.com Keepalived_vrrp[5171]: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 05 16:47:55 centos7.ffu.com Keepalived_vrrp[5171]: VRRP_Instance(VI_1) removing protocol VIPs.
Sep 05 16:47:55 centos7.ffu.com Keepalived_healthcheckers[5170]: Netlink reflector reports IP 172.16.255.168 removed
Hint: Some lines were ellipsized, use -l to show in full.
$ ip a list ens37
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:f1:c7:d5 brd ff:ff:ff:ff:ff:ff
inet 172.16.253.93/16 brd 172.16.255.255 scope global dynamic ens37
valid_lft 61554sec preferred_lft 61554sec
inet6 fe80::fce:4707:e290:f1e0/64 scope link
valid_lft forever preferred_lft forever
$ mail
Heirloom Mail version 12.5 7/5/10. Type ? for help.
"/var/spool/mail/root": 2 messages 1 new 2 unread
U 1 root Tue Sep 5 17:26 19/691 "centos7.ffu.com to be master, vip floating"
>N 2 root Tue Sep 5 17:30 18/681 "centos7.ffu.com to be backup, vip floating"
–> node2
[root@Centos7 keepalived]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2017-09-05 16:38:24 CST; 10min ago
... 中间省略...
Sep 05 16:38:24 Centos7 Keepalived_vrrp[8827]: VRRP_Script(chk_nginx) succeeded
Sep 05 16:47:55 Centos7 Keepalived_vrrp[8827]: VRRP_Instance(VI_1) forcing a new MASTER election
Sep 05 16:47:56 Centos7 Keepalived_vrrp[8827]: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 05 16:47:57 Centos7 Keepalived_vrrp[8827]: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 05 16:47:57 Centos7 Keepalived_vrrp[8827]: VRRP_Instance(VI_1) setting protocol VIPs.
Sep 05 16:47:57 Centos7 Keepalived_vrrp[8827]: VRRP_Instance(VI_1) Sending gratuitous ARPs on ens33 for 172.1....168
Sep 05 16:47:57 Centos7 Keepalived_healthcheckers[8826]: Netlink reflector reports IP 172.16.255.168 added
Sep 05 16:48:02 Centos7 Keepalived_vrrp[8827]: VRRP_Instance(VI_1) Sending gratuitous ARPs on ens33 for 172.1....168
Hint: Some lines were ellipsized, use -l to show in full.
$ ip a list ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:38:c5:21 brd ff:ff:ff:ff:ff:ff
inet 172.16.251.171/16 brd 172.16.255.255 scope global dynamic ens33
valid_lft 55724sec preferred_lft 55724sec
inet 172.16.255.168/16 scope global secondary ens33
valid_lft forever preferred_lft forever
inet6 fe80::d254:507:7c6e:2e23/64 scope link
valid_lft forever preferred_lft forever
$ mail
Heirloom Mail version 12.5 7/5/10. Type ? for help.
"/var/spool/mail/root": 2 messages 1 new 2 unread
U 1 root Tue Sep 5 17:26 19/703 "Centos7 to be backup, vip floating"
>N 2 root Tue Sep 5 17:30 18/693 "Centos7 to be master, vip floating"
–>访问测试也可以看出故障转移过程
backend server 1
backend server 2
curl: (7) Failed connect to 172.16.255.168:80; Connection refused
curl: (7) Failed connect to 172.16.255.168:80; Connection refused
curl: (7) Failed connect to 172.16.255.168:80; Connection refused
backend server 1
backend server 2
重新启动node1 nginx服务,即可实现故障转回(failback)
- 依次关闭Web server cluster 的所有nginx服务,启用sorry server,访问测试可以看出
backend server 1
backend server 2
backend server 1
backend server 1
nginx LB sorry server 2
nginx LB sorry server 2
重新启动Web server nginx 服务,即可向外部提供Web服务
双主模型KA高可用集群实现#
对比上文拓扑模型,双主模型不同的是,每台负载均衡器上配置两个VRRP实例,互为主备,VIP1和VIP2同时面向网络用户,此时两台负载均衡器均处于active状态
- 沿用上文拓扑模型,在两个节点的keepalived配置文件中添加vrrp实例
–> node1
vrrp_instance VI_2 {
state BACKUP
interface ens37
virtual_router_id 61 <--注意同一台物理路由的虚拟路由ID唯一性
priority 95
advert_int 1
authentication {
auth_type PASS
auth_pass AhpaeQ0J
}
virtual_ipaddress {
172.16.255.68/16 dev ens37 <--指定VIP2
}
track_script {
chk_down
chk_nginx
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
–> node2
vrrp_instance VI_2 {
state MASTER
interface ens33
virtual_router_id 61
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass AhpaeQ0J
}
virtual_ipaddress {
172.16.255.68/16 dev ens33
}
track_script {
chk_down
chk_nginx
}
notify_master "/etc/keepalived/notify.sh master"
notify_backup "/etc/keepalived/notify.sh backup"
notify_fault "/etc/keepalived/notify.sh fault"
}
- 启动node1、node2 keepalived服务
VIP1和VIP2分别映射在node1和node2上
$ ip a list ens37
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:f1:c7:d5 brd ff:ff:ff:ff:ff:ff
inet 172.16.253.93/16 brd 172.16.255.255 scope global dynamic ens37
valid_lft 79856sec preferred_lft 79856sec
inet 172.16.255.168/16 scope global secondary ens37
valid_lft forever preferred_lft forever
$ ip a list ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:38:c5:21 brd ff:ff:ff:ff:ff:ff
inet 172.16.251.171/16 brd 172.16.255.255 scope global dynamic ens33
valid_lft 39490sec preferred_lft 39490sec
inet 172.16.255.68/16 scope global secondary ens33
valid_lft forever preferred_lft forever
- 访问测试,两台负载均衡器均处于active状态
$ while : ; do curl http://172.16.255.68 ;sleep 1 ;done
backend server 1
backend server 2
backend server 1
backend server 2
$ while : ; do curl http://172.16.255.168 ;sleep 1 ;done
backend server 1
backend server 2
backend server 1
backend server 2
- 手动停止node1 nginx服务,VIP1漂移到node2上,实现故障转移;
–> node2 同时映射VIP1、VIP2
$ ip a list ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:38:c5:21 brd ff:ff:ff:ff:ff:ff
inet 172.16.251.171/16 brd 172.16.255.255 scope global dynamic ens33
valid_lft 38315sec preferred_lft 38315sec
inet 172.16.255.68/16 scope global secondary ens33
valid_lft forever preferred_lft forever
inet 172.16.255.168/16 scope global secondary ens33
valid_lft forever preferred_lft forever
–> 访问测试也可以看出故障转移过程
$ while : ; do curl http://172.16.255.68 ;sleep 1 ;done
backend server 2
backend server 1
curl: (7) Failed connect to 172.16.255.168:80; Connection refused
curl: (7) Failed connect to 172.16.255.168:80; Connection refused
backend server 2
backend server 1
同样的还可以测试node2的故障转移,至此单主、双主模型keepalived+nginx负载均衡高可用集群(两节点)已经实现。