我有一台 Cisco ISR4431 互联网边缘路由器,每隔 5 天左右就会随机重启一次。每次重启后,需要 10 到 60 分钟才能恢复正常,网络流量才能正常流动。它正在运行 BGP 并为 /19 和 /20 网络路由,因此对于此类设备来说,负载应该相对较小。
我看到的唯一可疑之处是 94% 的内存已被消耗,因此我怀疑它保存的 BGP 路由比应保存的多,尽管同样的配置在旧路由器中已经运行多年,从未变得不稳定。我不太确定如何进一步诊断问题,也不知道这是硬件问题还是配置问题。
不幸的是,路由器在国家的另一边,在隔离结束之前我无法亲自到达它。
sh ver:
Cisco IOS XE Software, Version 03.16.04b.S - Extended Support Release
Cisco IOS Software, ISR Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 15.5(3)S4b, RELEASE SOFTWARE (fc1)
sh logging
*Apr 28 14:47:09.074: %LINK-3-UPDOWN: Interface GigabitEthernet0/0/2, changed state to up
*Apr 28 14:47:10.074: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/0/2, changed state to up
*Apr 28 14:50:12.834: %PLATFORM-4-ELEMENT_WARNING:smand: RP/0: Committed Memory value 94% exceeds warning level 90%
*Apr 28 14:52:00.253: %IOSXE_INFRA-6-PROCPATH_CLIENT_HOG: IOS shim client 'fman stats bipc' took 685 msec (runtime: 256 msec) to process a 'tdl_qfpmib_throughput_data' message
*Apr 28 15:00:14.511: %PLATFORM-4-ELEMENT_WARNING:smand: RP/0: Committed Memory value 94% exceeds warning level 90%
sh processes cpu sorted
CPU utilization for five seconds: 13%/0%; one minute: 3%; five minutes: 3%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
193 230311 5004 46025 12.39% 1.63% 1.22% 0 BGP Scanner
117 22772 228335 99 0.15% 0.10% 0.10% 0 IOSXE-RP Punt Se
240 31843 1902016 16 0.07% 0.14% 0.15% 0 Inline Power
414 2694 20294 132 0.07% 0.00% 0.00% 0 NTP
284 18520 605984 30 0.07% 0.09% 0.08% 0 HTTP CORE
配置的 BGP 部分如下所示:
router bgp 7835
no bgp log-neighbor-changes
neighbor ZZ.ZZ.6.113 remote-as XXX
neighbor ZZ.ZZ.6.113 password XXXXXX
!
address-family ipv4
network XX.XX.160.0 mask 255.255.240.0
network YY.YY.64.0 mask 255.255.224.0
network YY.YY.79.0
neighbor ZZ.ZZ.6.113 activate
neighbor ZZ.ZZ.6.113 soft-reconfiguration inbound
neighbor ZZ.ZZ.6.113 filter-list 1 out
exit-address-family
!
一些进一步的诊断:
sh platform resources
**State Acronym: H - Healthy, W - Warning, C - Critical
Resource Usage Max Warning Critical State
----------------------------------------------------------------------------------------------------
RP0 (ok, active) C
Control Processor 32.12% 100% 90% 95% H
DRAM 3849MB(99%) 3872MB 90% 95% C
ESP0(ok, active) H
QFP H
DRAM 1663176KB(79%) 2097152KB 80% 90% H
IRAM 0KB(0%) 0KB 80% 90% H
记忆
show processes memory sorted
Processor Pool Total: 1688347248 Used: 1417980160 Free: 270367088
lsmpi_io Pool Total: 6295128 Used: 6294296 Free: 832
PID TTY Allocated Freed Holding Getbufs Retbufs Process
510 0 904032136 54730248 901424352 0 0 BGP Router
271 0 257116280 1297600 256693920 0 0 IP RIB Update
0 0 352326368 108678280 227122576 0 0 *Init*
79 0 8209072 12176 7592984 0 0 IOSD ipc task
389 0 3889024 5160 3925856 799092 0 EEM ED Syslog
409 0 1439256 26792 1442328 0 0 EEM Server
155 0 3223184 91024 1057808 0 0 CWAN OIR Handler