1. 问题描述

突然发现宿主机节点出现严重内存不足的情况。起初判断不太合理——当前仅运行少量容器,按容器资源统计,内存使用量最多约十余 GB,理论上仍有数十 GB 空闲。基于这一异常判断,随即对主机内存使用情况进行了排查,并发现了如下信息。

从进程视角看,业务与系统进程内存占用并不高,最大进程仅 5GB 左右,明显无法解释整机内存消耗。进一步查看 /proc/meminfo 发现 Slab 内存占用高达 38G,且几乎全部为不可回收的 SUnreclaim,属于明显异常。

ps 结果看,内存占用最高的进程仅为 mysqld(约 5GB),其余如 Elasticsearch、kube-apiserver、Ceph OSD 等占用也有限,前几十个进程的 RSS 总和远不足以解释整机 50+GB 的内存消耗,问题显然不在应用或容器进程层面。

[root@tanqidi ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            62G         56G        1.5G         18M        4.4G        5.5G
Swap:            0B          0B          0B

[root@tanqidi ~]# cat /proc/meminfo 
MemTotal:       65674112 kB
MemFree:         1586104 kB
MemAvailable:    5799208 kB
Buffers:         2570496 kB
Cached:          1102840 kB
SwapCached:            0 kB
Active:         21484124 kB
Inactive:        3171600 kB
Active(anon):   20635988 kB
Inactive(anon):     1964 kB
Active(file):     848136 kB
Inactive(file):  3169636 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               872 kB
Writeback:             0 kB
AnonPages:      20796084 kB
Mapped:           638320 kB
Shmem:             19080 kB
KReclaimable:     921308 kB
Slab:           38854192 kB
SReclaimable:     921308 kB
SUnreclaim:     37932884 kB
KernelStack:       96368 kB
PageTables:       106724 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    32837056 kB
Committed_AS:   57620592 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      117424 kB
VmallocChunk:          0 kB
Percpu:            46912 kB
HardwareCorrupted:     0 kB
AnonHugePages:  10602496 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:    51476352 kB
DirectMap2M:    14583808 kB
DirectMap1G:     3145728 kB

[root@tanqidi ~]# ps -eo pid,ppid,user,cmd,%mem,rss --sort=-%mem | head -40
   PID   PPID USER     CMD                         %MEM   RSS
172023 171990 1001     mysqld --wsrep_start_positi  7.8 5166000
275014 274962 centos   /app/elasticsearch/jdk/bin/  3.0 1987664
134972 134880 root     kube-apiserver --advertise-  2.6 1710564
  3863   3841 167      ceph-osd --foreground --id   1.8 1187456
 16160  16139 root     java -server -Xms512m -Xmx1  1.7 1134292
109023 109003 167      ceph-mon --fsid=45492291-00  1.5 1024888
112063 112025 centos   /opt/jdk-12/bin/java -Xms1g  1.3 868776
141425 141394 root     ks-apiserver --logtostderr=  0.9 649948
267940 267893 root     java -server -Xms512m -Xmx1  0.9 644640
501764 501731 root     java -server -Xms512m -Xmx2  0.8 581900
134924      1 root     ./titanagent -d -b /etc/tit  0.6 410000
102189 102168 root     /home/weave/scope --mode=pr  0.5 361308
508992 508950 root     kube-controller-manager --a  0.4 287928
134987 134855 root     etcd --advertise-client-url  0.3 236280
179495      1 root     /usr/bin/dockerd -H fd:// -  0.3 204248
182113 181958 root     /usr/local/bin/cephcsi --no  0.2 178208
133496      1 root     /usr/bin/kubelet --bootstra  0.2 176096
355222 355159 167      ceph-mds --fsid=45492291-00  0.2 149400
 69171  69149 root     /home/weave/scope --mode=ap  0.2 149220
183555 183480 root     /usr/local/bin/cephcsi --no  0.1 124068
143161 143138 root     /app/redis/src/redis-server  0.1 118976
158491 158460 root     /home/weave/scope --mode=pr  0.1 105316
509044 509000 root     kube-scheduler --authentica  0.1 89160
443163      1 root     /usr/bin/monitor-agent -con  0.1 85264
418389 418361 root     /usr/local/bin/cephcsi --no  0.1 78404
  2654      1 root     /bin/sh /opt/monitor/osw/el  0.1 69080
409574 409359 centos   haproxy -W -db -f /usr/loca  0.1 67100
409359 409234 centos   haproxy -W -db -f /usr/loca  0.1 67088
  1562      1 root     /usr/bin/containerd          0.1 66308
236769 236741 root     ganesha.nfsd -F -L STDERR -  0.0 65384
 69705  69682 65532    /velero server --features=   0.0 59688
204312 204300 root     calico-node -felix           0.0 59424
161760 161722 root     /usr/local/bin/cephcsi --no  0.0 56380
379559 379539 root     /bin/thanos rule --data-dir  0.0 53900
433477      1 root     /usr/bin/containerd-shim-ru  0.0 50836
433499      1 root     /usr/bin/containerd-shim-ru  0.0 49568
 36217      1 root     /usr/bin/containerd-shim-ru  0.0 49184
433858      1 root     /usr/bin/containerd-shim-ru  0.0 48044
159452 159329 root     /vminsert-prod --storageNod  0.0 47580

2. 解决方式

由于该机器是 Kubernetes 工作节点,已长期运行且从未重启,内存异常增长的具体时间点已无法确认。考虑通过重启主机释放内存,但节点上运行着 Ceph OSD 与 Mysql 等存储组件,若直接重启存在较高风险,严重时甚至可能导致节点启动失败或存储异常。因此最终选择在 应用层面逐步回收资源,对主机进行相对温和的重启处理,以降低对存储和集群稳定性的影响。

先将 Ceph OSD 与 Mysql 等块存储相关容器的副本数调整为 0,确保其安全退出;随后执行 systemctl stop kubelet,再通过 systemctl restart docker 重启容器运行时,以实现对应用层的温和重启;待 Docker 重启完成后,再启动 kubelet,逐步恢复节点上的工作负载。

在完成 Docker 与 kubelet 的重启后,SUnreclaim 不可回收内存成功被释放,节点可用内存恢复至约 48GB。后续将持续观察该节点内存变化情况,若再次出现 SUnreclaim 异常增长,再进一步深入定位具体存在内存泄露的组件或内核模块。

[root@tanqidi appuser]# free -h
              total        used        free      shared  buff/cache   available
Mem:            62G         13G         41G         17M        7.0G         48G
Swap:            0B          0B          0B
[root@tanqidi appuser]# cat /proc/meminfo
MemTotal:       65674112 kB
MemFree:        43915596 kB
MemAvailable:   50777080 kB
Buffers:         1056344 kB
Cached:          5524412 kB
SwapCached:            0 kB
Active:         14991032 kB
Inactive:        4338480 kB
Active(anon):   12506288 kB
Inactive(anon):     3892 kB
Active(file):    2484744 kB
Inactive(file):  4334588 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               520 kB
Writeback:             0 kB
AnonPages:      12068124 kB
Mapped:          1513104 kB
Shmem:             18100 kB
KReclaimable:     768128 kB
Slab:            1893868 kB
SReclaimable:     768128 kB
SUnreclaim:      1125740 kB
KernelStack:       58224 kB
PageTables:        60516 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    32837056 kB
Committed_AS:   36981664 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       75824 kB
VmallocChunk:          0 kB
Percpu:            46912 kB
HardwareCorrupted:     0 kB
AnonHugePages:   5595136 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:    51476352 kB
DirectMap2M:    14583808 kB
DirectMap1G:     3145728 kB
[root@tanqidi appuser]#