做了集群,Gateway和Register是在同一臺服務(wù)器上,BuinessWorker單獨(dú)一臺服務(wù)器,都是局域網(wǎng),
Gateway起了2個進(jìn)程,
BuinessWorker起了8個進(jìn)程,
我們的客戶端數(shù)量是可數(shù)的,也就2千不到,但啟動服務(wù)后,status 中的 connections一直在慢慢增長,
于是我在Gateway服務(wù)中onWebSocketConnect寫了日志看是否是客戶端的問題導(dǎo)致的重復(fù)連接,發(fā)現(xiàn)并沒有新的連接數(shù),但connections一直在漲,基本上每2秒漲1個,甚至一下子漲了10幾個連接數(shù),慢慢的,服務(wù)器連接數(shù)上去后,導(dǎo)致客戶端掉線,目前幾臺服務(wù)器iptables都是關(guān)了的;
1,請問連接數(shù)狂漲是什么情況?
2,另外我發(fā)現(xiàn)客戶端端口后,不會走onClose方法?
/etc/sysctl.conf:
fs.file-max=65535
net.ipv4.tcp_max_tw_buckets = 20000
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_max_syn_backlog = 262144
net.core.netdev_max_backlog = 32768
net.core.somaxconn = 65535
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_fin_timeout = 20
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_syncookies = 1
#net.ipv4.tcp_tw_len = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.ip_local_port_range = 1024 65000
net.nf_conntrack_max = 6553500
net.netfilter.nf_conntrack_max = 6553500
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_established = 3600
/proc/進(jìn)程/limit:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 65535 65535 processes
Max open files 65535 65535 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 15233 15233 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
ulimit -a:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15233
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 65535
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
長鏈接應(yīng)用必須加心跳
長鏈接應(yīng)用一定要加心跳,一定要加心跳,一定要加心跳。重要的話說三遍,不加心跳就是找死。
可以使用這個心跳配置
$gateway = new Gateway("Websocket://0.0.0.0:8585");
$gateway->pingInterval = 30;
$gateway->pingNotResponseLimit = 1;
$gateway->pingData = '';
心跳原理相關(guān)參考 http://doc2.workerman.net/326139
鏈接增多的原因
你的情況應(yīng)該是客戶端與服務(wù)端的長鏈接長時間不通訊被路由節(jié)點(diǎn)防火墻給清理了(或者是斷網(wǎng)斷電等極端情況關(guān)閉鏈接),導(dǎo)致鏈接斷開,這種情況服務(wù)端無法檢測到鏈接已經(jīng)斷開(因?yàn)闆]有給服務(wù)端發(fā)送fin包),需要用心跳來檢測鏈接是否斷開,比如上面的配置30秒收不到客戶端心跳就關(guān)閉鏈接。
由于鏈接被防火墻清理或者斷網(wǎng)斷電等異常斷開,服務(wù)端又不知道鏈接已經(jīng)斷開,導(dǎo)致linux內(nèi)核認(rèn)為這些鏈接都是有效的,會一直保持這些鏈接,但是實(shí)際上這些鏈接都是死鏈接,周而復(fù)始,死鏈接越來越多。
另外超過1000的并發(fā)連接數(shù)都需要http://doc.workerman.net/315302,而且必須裝http://doc.workerman.net/315116,不然也會出現(xiàn)類似問題
服務(wù)端運(yùn)行命令
netstat -nt | grep ESTABLISHED | grep 端口號 | awk '{print $5}' | awk -F : '{print $1}' | sort | uniq -c | sort -rn
其中端口號換成實(shí)際的gateway端口。
結(jié)果類似
79 221.226.97.95
28 223.74.34.15
21 116.192.14.116
12 218.97.247.118
10 47.90.103.189
10 119.129.70.195
8 39.88.21.160
前面數(shù)字代表鏈接數(shù),后面數(shù)字代表是那個ip連來的
這樣能看到哪個ip的鏈接最多,然后去找對應(yīng)ip的客戶端看看是否連了這么多鏈接。