Oracle VM Server instability on version 3.4.5 due to a Oracle Unbreakable Enterprise Kernel (UEK) bug.
The kernel version 4.1.12-124.14.5.el6uek.x86_64 has introduced a memory leak of the network module i40e.
Here is reported the backtraces collected after the problem from the /var/log/messages:
Dec 12 07:06:30 efuovs02 kernel: [1508192.885203] ntpd invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 Dec 12 07:06:30 efuovs02 kernel: [1508192.885208] ntpd cpuset=/ mems_allowed=0 Dec 12 07:06:30 efuovs02 kernel: [1508192.885217] CPU: 3 PID: 4751 Comm: ntpd Not tainted 4.1.12-124.14.5.el6uek.x86_64 #2 Dec 12 07:06:30 efuovs02 kernel: [1508192.885221] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 02/14/2018 Dec 12 07:06:30 efuovs02 kernel: [1508192.885224] 0000000000000000 ffff8804484cf678 ffffffff816e4bdb ffff88044d44aa00 Dec 12 07:06:30 efuovs02 kernel: [1508192.885230] 0000000000000000 ffff8804484cf708 ffffffff816e32d1 01ff8804484cf688 Dec 12 07:06:30 efuovs02 kernel: [1508192.885235] ffff8804484cf718 ffff8804484cf6c8 ffffffff811fc561 ffff8804484cf800 Dec 12 07:06:30 efuovs02 kernel: [1508192.885241] Call Trace: Dec 12 07:06:30 efuovs02 kernel: [1508192.885251] [<ffffffff816e4bdb>] dump_stack+0x63/0x81 Dec 12 07:06:30 efuovs02 kernel: [1508192.885256] [<ffffffff816e32d1>] dump_header+0x7f/0x1f3 Dec 12 07:06:30 efuovs02 kernel: [1508192.885264] [<ffffffff811fc561>] ? vmpressure+0x21/0x90 Dec 12 07:06:30 efuovs02 kernel: [1508192.885272] [<ffffffff8118e53c>] oom_kill_process+0x1cc/0x3c0 Dec 12 07:06:30 efuovs02 kernel: [1508192.885283] [<ffffffff8108de0e>] ? has_capability_noaudit+0x1e/0x30 Dec 12 07:06:31 efuovs02 kernel: [1508192.885288] [<ffffffff8118eaab>] __out_of_memory+0x31b/0x530 Dec 12 07:06:31 efuovs02 kernel: [1508192.885294] [<ffffffff8118ee5b>] out_of_memory+0x5b/0x80 Dec 12 07:06:31 efuovs02 kernel: [1508192.885300] [<ffffffff81194d42>] __alloc_pages_nodemask+0x952/0xab0 Dec 12 07:06:31 efuovs02 kernel: [1508192.885307] [<ffffffff811de28d>] alloc_pages_vma+0xbd/0x260 Dec 12 07:06:31 efuovs02 kernel: [1508192.885311] [<ffffffff8118a59e>] ? find_get_entry+0x1e/0xc0 Dec 12 07:06:31 efuovs02 kernel: [1508192.885317] [<ffffffff811ce6bd>] read_swap_cache_async+0xed/0x170 Dec 12 07:06:31 efuovs02 kernel: [1508192.885322] [<ffffffff811ce82d>] swapin_readahead+0xed/0x190 Dec 12 07:06:31 efuovs02 kernel: [1508192.885328] [<ffffffff811bbfe0>] handle_mm_fault+0x12d0/0x1770 Dec 12 07:06:31 efuovs02 kernel: [1508192.885335] [<ffffffff8121d910>] ? poll_select_copy_remaining+0x130/0x130 Dec 12 07:06:31 efuovs02 kernel: [1508192.885340] [<ffffffff8106d57f>] __do_page_fault+0x1af/0x480 Dec 12 07:06:31 efuovs02 kernel: [1508192.885346] [<ffffffff816f2c1c>] ? page_fault+0xcc/0x120 Dec 12 07:06:31 efuovs02 kernel: [1508192.885350] [<ffffffff8106d87f>] do_page_fault+0x2f/0x80 Dec 12 07:06:31 efuovs02 kernel: [1508192.885354] [<ffffffff816f2be4>] ? page_fault+0x94/0x120 Dec 12 07:06:31 efuovs02 kernel: [1508192.885359] [<ffffffff816f2bdd>] ? page_fault+0x8d/0x120 Dec 12 07:06:31 efuovs02 kernel: [1508192.885363] [<ffffffff816f2bd6>] ? page_fault+0x86/0x120 Dec 12 07:06:31 efuovs02 kernel: [1508192.885367] [<ffffffff816f2c5f>] page_fault+0x10f/0x120 Dec 12 07:06:31 efuovs02 kernel: [1508192.885375] [<ffffffff813316c5>] ? copy_user_enhanced_fast_string+0x5/0x10 Dec 12 07:06:31 efuovs02 kernel: [1508192.885379] [<ffffffff8121d7d1>] ? set_fd_set+0x21/0x30 Dec 12 07:06:31 efuovs02 kernel: [1508192.885384] [<ffffffff8121e5aa>] core_sys_select+0x1fa/0x2f0 Dec 12 07:06:31 efuovs02 kernel: [1508192.885392] [<ffffffff810f8fc3>] ? ntp_notify_cmos_timer+0x23/0x30 Dec 12 07:06:31 efuovs02 kernel: [1508192.885396] [<ffffffff810f8a1d>] ? do_adjtimex+0xed/0x100 Dec 12 07:06:31 efuovs02 kernel: [1508192.885402] [<ffffffff810ed3ac>] ? SYSC_adjtimex+0x4c/0x80 Dec 12 07:06:31 efuovs02 kernel: [1508192.885410] [<ffffffff810209e9>] ? read_tsc+0x9/0x10 Dec 12 07:06:31 efuovs02 kernel: [1508192.885414] [<ffffffff810f68cb>] ? ktime_get_ts64+0x4b/0x110 Dec 12 07:06:31 efuovs02 kernel: [1508192.885419] [<ffffffff8121e74b>] SyS_select+0xab/0x100 Dec 12 07:06:31 efuovs02 kernel: [1508192.885424] [<ffffffff816ed451>] ? system_call_after_swapgs+0xdb/0x18c Dec 12 07:06:31 efuovs02 kernel: [1508192.885428] [<ffffffff816ed51a>] system_call_fastpath+0x18/0xd4 Dec 12 07:06:31 efuovs02 kernel: [1508192.885457] Mem-Info: Dec 12 07:06:31 efuovs02 kernel: [1508192.885469] active_anon:1452 inactive_anon:1426 isolated_anon:65 Dec 12 07:06:31 efuovs02 kernel: [1508192.885469] active_file:4559 inactive_file:873 isolated_file:0 Dec 12 07:06:31 efuovs02 kernel: [1508192.885469] unevictable:1547 dirty:20 writeback:31 unstable:0 Dec 12 07:06:31 efuovs02 kernel: [1508192.885469] slab_reclaimable:6776 slab_unreclaimable:8649 Dec 12 07:06:31 efuovs02 kernel: [1508192.885469] mapped:3007 shmem:0 pagetables:1705 bounce:0 Dec 12 07:06:31 efuovs02 kernel: [1508192.885469] free:33536 free_pcp:918 free_cma:0 Dec 12 07:06:31 efuovs02 kernel: [1508192.885483] Node 0 DMA free:15740kB min:60kB low:72kB high:84kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Dec 12 07:06:31 efuovs02 kernel: [1508192.885499] lowmem_reserve[]: 0 2661 15921 15921 Dec 12 07:06:31 efuovs02 kernel: [1508192.885508] Node 0 DMA32 free:64064kB min:11076kB low:13844kB high:16612kB active_anon:5912kB inactive_anon:5876kB active_file:112kB inactive_file:52kB unevictable:836kB isolated(anon):256kB isolated(file):0kB present:2781336kB managed:2751088kB mlocked:836kB dirty:0kB writeback:0kB mapped:668kB shmem:0kB slab_reclaimable:4792kB slab_unreclaimable:6452kB kernel_stack:912kB pagetables:1692kB unstable:0kB bounce:0kB free_pcp:1156kB local_pcp:248kB free_cma:0kB writeback_tmp:0kB pages_scanned:619296 all_unreclaimable? yes Dec 12 07:06:31 efuovs02 kernel: [1508192.885524] lowmem_reserve[]: 0 0 13260 13260 Dec 12 07:06:31 efuovs02 kernel: [1508192.885532] Node 0 Normal free:54340kB min:54392kB low:67988kB high:81584kB active_anon:0kB inactive_anon:0kB active_file:18124kB inactive_file:3440kB unevictable:5352kB isolated(anon):4kB isolated(file):0kB present:13979888kB managed:13534768kB mlocked:5352kB dirty:80kB writeback:124kB mapped:11360kB shmem:0kB slab_reclaimable:22312kB slab_unreclaimable:28144kB kernel_stack:2880kB pagetables:5128kB unstable:0kB bounce:0kB free_pcp:2516kB local_pcp:572kB free_cma:0kB writeback_tmp:0kB pages_scanned:129384 all_unreclaimable? yes Dec 12 07:06:31 efuovs02 kernel: [1508192.885546] lowmem_reserve[]: 0 0 0 0 Dec 12 07:06:31 efuovs02 kernel: [1508192.885551] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15740kB Dec 12 07:06:31 efuovs02 kernel: [1508192.885573] Node 0 DMA32: 143*4kB (UE) 111*8kB (UEM) 209*16kB (UE) 140*32kB (UE) 94*64kB (UEM) 57*128kB (UEM) 28*256kB (UEM) 7*512kB (UEM) 4*1024kB (EM) 9*2048kB (MR) 2*4096kB (MR) = 64068kB Dec 12 07:06:31 efuovs02 kernel: [1508192.885596] Node 0 Normal: 8736*4kB (UEM) 1360*8kB (UEM) 208*16kB (UEM) 32*32kB (UE) 2*64kB (UE) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 54400kB Dec 12 07:06:31 efuovs02 kernel: [1508192.885615] 8002 total pagecache pages Dec 12 07:06:31 efuovs02 kernel: [1508192.885618] 1494 pages in swap cache Dec 12 07:06:31 efuovs02 kernel: [1508192.885621] Swap cache stats: add 3717250313, delete 3717248819, find 2895172777/5168362256 Dec 12 07:06:31 efuovs02 kernel: [1508192.885624] Free swap = 4129656kB Dec 12 07:06:31 efuovs02 kernel: [1508192.885626] Total swap = 4194300kB Dec 12 07:06:31 efuovs02 kernel: [1508192.885628] 4194303 pages RAM Dec 12 07:06:31 efuovs02 kernel: [1508192.885630] 0 pages HighMem/MovableOnly Dec 12 07:06:31 efuovs02 kernel: [1508192.885632] 118864 pages reserved Dec 12 07:06:31 efuovs02 kernel: [1508192.885634] 0 pages cma reserved Dec 12 07:06:31 efuovs02 kernel: [1508192.885636] 0 pages hwpoisoned Dec 12 07:06:31 efuovs02 kernel: [1508192.885638] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name Dec 12 07:06:31 efuovs02 kernel: [1508192.885650] [ 983] 0 983 2677 266 11 3 111 -1000 udevd Dec 12 07:06:31 efuovs02 kernel: [1508192.885657] [ 3783] 0 3783 125771 1414 48 5 0 -1000 multipathd Dec 12 07:06:31 efuovs02 kernel: [1508192.885662] [ 4334] 0 4334 6944 399 15 3 108 -1000 auditd Dec 12 07:06:31 efuovs02 kernel: [1508192.885667] [ 4368] 0 4368 61281 438 23 3 377 0 rsyslogd Dec 12 07:06:31 efuovs02 kernel: [1508192.885671] [ 4383] 0 4383 2832 352 11 3 164 0 irqbalance Dec 12 07:06:31 efuovs02 kernel: [1508192.885675] [ 4412] 32 4412 4760 397 16 3 74 0 rpcbind Dec 12 07:06:31 efuovs02 kernel: [1508192.885680] [ 4436] 29 4436 5853 354 17 3 112 0 rpc.statd Dec 12 07:06:31 efuovs02 kernel: [1508192.885684] [ 4481] 0 4481 5790 0 15 3 50 0 rpc.idmapd Dec 12 07:06:31 efuovs02 kernel: [1508192.885689] [ 4522] 0 4522 2106 268 12 5 28 0 fcoemon Dec 12 07:06:31 efuovs02 kernel: [1508192.885694] [ 4537] 81 4537 5373 0 15 3 62 0 dbus-daemon Dec 12 07:06:31 efuovs02 kernel: [1508192.885698] [ 4612] 0 4612 1030 310 9 5 41 0 o2hbmonitor Dec 12 07:06:31 efuovs02 kernel: [1508192.885702] [ 4632] 0 4632 47286 410 51 3 221 0 cupsd Dec 12 07:06:31 efuovs02 kernel: [1508192.885706] [ 4692] 0 4692 1039 323 9 5 30 0 acpid Dec 12 07:06:31 efuovs02 kernel: [1508192.885711] [ 4718] 0 4718 1580 211 8 5 27 0 mcelog Dec 12 07:06:31 efuovs02 kernel: [1508192.885715] [ 4738] 0 4738 16579 304 34 3 185 -1000 sshd Dec 12 07:06:32 efuovs02 kernel: [1508192.885720] [ 4751] 38 4751 6644 567 18 3 162 0 ntpd Dec 12 07:06:32 efuovs02 kernel: [1508192.885724] [ 4796] 0 4796 3235 331 15 6 143 0 xenstored Dec 12 07:06:32 efuovs02 kernel: [1508192.885729] [ 4803] 0 4803 21126 307 21 6 69 0 xenconsoled Dec 12 07:06:32 efuovs02 kernel: [1508192.885733] [ 4807] 0 4807 53362 393 65 3 525 0 qemu-system-i38 Dec 12 07:06:32 efuovs02 kernel: [1508192.885737] [ 4910] 0 4910 20252 478 45 3 239 0 master Dec 12 07:06:32 efuovs02 kernel: [1508192.885742] [ 4922] 89 4922 20315 486 46 3 238 0 qmgr Dec 12 07:06:32 efuovs02 kernel: [1508192.885746] [ 4930] 0 4930 29223 395 16 3 171 0 crond Dec 12 07:06:32 efuovs02 kernel: [1508192.885750] [ 5036] 0 5036 5291 283 15 3 67 0 atd Dec 12 07:06:32 efuovs02 kernel: [1508192.885755] [ 5345] 0 5345 38468 249 14 5 30 0 osmdaemon Dec 12 07:06:32 efuovs02 kernel: [1508192.885759] [ 5366] 0 5366 85597 650 61 7 1514 0 python Dec 12 07:06:32 efuovs02 kernel: [1508192.885764] [ 5378] 0 5378 24079 467 27 6 113 0 ovmport Dec 12 07:06:32 efuovs02 kernel: [1508192.885768] [ 5390] 0 5390 60521 410 65 6 920 0 ovmwatch Dec 12 07:06:32 efuovs02 kernel: [1508192.885772] [ 5405] 0 5405 208969 656 87 7 1558 0 python Dec 12 07:06:32 efuovs02 kernel: [1508192.885777] [ 5772] 0 5772 177327 1015 89 6 1775 0 python Dec 12 07:06:32 efuovs02 kernel: [1508192.885782] [ 5789] 0 5789 49154 741 71 7 1366 0 python Dec 12 07:06:32 efuovs02 kernel: [1508192.885786] [ 5831] 0 5831 82559 555 70 6 1491 0 devmon Dec 12 07:06:32 efuovs02 kernel: [1508192.885790] [ 5901] 0 5901 1031 292 9 5 18 0 mingetty Dec 12 07:06:32 efuovs02 kernel: [1508192.885794] [ 5903] 0 5903 1031 292 8 5 19 0 mingetty Dec 12 07:06:32 efuovs02 kernel: [1508192.885798] [ 5905] 0 5905 1031 292 9 5 19 0 mingetty Dec 12 07:06:32 efuovs02 kernel: [1508192.885802] [ 5907] 0 5907 1031 292 9 5 19 0 mingetty Dec 12 07:06:32 efuovs02 kernel: [1508192.885806] [ 5909] 0 5909 1031 292 9 5 19 0 mingetty Dec 12 07:06:32 efuovs02 kernel: [1508192.885812] [26455] 0 26455 11091 458 44 5 108 0 socat Dec 12 07:06:32 efuovs02 kernel: [1508192.885816] [27087] 0 27087 11091 458 45 5 108 0 socat Dec 12 07:06:32 efuovs02 kernel: [1508192.885820] [27845] 0 27845 11091 458 44 5 109 0 socat Dec 12 07:06:32 efuovs02 kernel: [1508192.885825] [27996] 0 27996 11091 458 44 5 107 0 socat Dec 12 07:06:32 efuovs02 kernel: [1508192.885829] [14189] 0 14189 11091 458 44 5 109 0 socat Dec 12 07:06:32 efuovs02 kernel: [1508192.885833] [16371] 0 16371 11091 458 44 5 109 0 socat Dec 12 07:06:32 efuovs02 kernel: [1508192.885838] [14238] 0 14238 2676 256 11 3 129 -1000 udevd Dec 12 07:06:32 efuovs02 kernel: [1508192.885842] [14374] 0 14374 2676 240 11 3 119 -1000 udevd Dec 12 07:06:32 efuovs02 kernel: [1508192.885846] [15869] 0 15869 22957 931 62 6 1730 0 python Dec 12 07:06:32 efuovs02 kernel: [1508192.885851] [16935] 0 16935 28695 2029 16 5 64 0 OSWatcher Dec 12 07:06:32 efuovs02 kernel: [1508192.885855] [ 5867] 89 5867 20272 1250 45 3 229 0 pickup Dec 12 07:06:32 efuovs02 kernel: [1508192.885860] [ 8948] 0 8948 27070 682 17 5 71 0 vmsub Dec 12 07:06:32 efuovs02 kernel: [1508192.885864] [ 8951] 0 8951 27070 675 17 5 77 0 mpsub Dec 12 07:06:32 efuovs02 kernel: [1508192.885868] [ 8953] 0 8953 1581 328 11 5 42 0 vmstat Dec 12 07:06:32 efuovs02 kernel: [1508192.885872] [ 8958] 0 8958 27070 659 17 6 41 0 iosub Dec 12 07:06:32 efuovs02 kernel: [1508192.885876] [ 8959] 0 8959 25258 441 13 5 46 0 mpstat Dec 12 07:06:32 efuovs02 kernel: [1508192.885880] [ 8966] 0 8966 25261 420 11 5 19 0 iostat Dec 12 07:06:32 efuovs02 kernel: [1508192.885884] [ 8971] 0 8971 27070 679 17 5 0 0 xtop Dec 12 07:06:32 efuovs02 kernel: [1508192.885888] [ 8976] 0 8976 27070 695 17 5 0 0 psmemsub Dec 12 07:06:32 efuovs02 kernel: [1508192.885892] [ 8977] 0 8977 3771 483 20 5 3 0 top Dec 12 07:06:32 efuovs02 kernel: [1508192.885896] [ 8980] 0 8980 27070 680 17 5 0 0 oswsub Dec 12 07:06:32 efuovs02 kernel: [1508192.885901] [ 8985] 0 8985 28695 1794 15 5 131 0 OSWatcher Dec 12 07:06:32 efuovs02 kernel: [1508192.885905] [ 8986] 0 8986 27564 523 19 5 8 0 ps Dec 12 07:06:32 efuovs02 kernel: [1508192.885909] [ 8987] 0 8987 27070 54 12 5 0 0 psmemsub Dec 12 07:06:32 efuovs02 kernel: [1508192.885913] [ 8988] 0 8988 27070 52 11 5 0 0 oswsub Dec 12 07:06:32 efuovs02 kernel: [1508192.885917] Out of memory: Kill process 5772 (python) score 0 or sacrifice child Dec 12 07:06:32 efuovs02 kernel: [1508192.886216] Killed process 5772 (python) total-vm:709308kB, anon-rss:0kB, file-rss:4060kB
How to fix the OVS Kernel Memory Leak
Download the following kernel version which includes the memoy leak fix for the i40e module: link to Oracle RPM repository
kernel-uek-4.1.12-124.21.1.el6uek.x86_64.rpm kernel-uek-firmware-4.1.12-124.21.1.el6uek.noarch.rpm [root@efuovs02 new_Kernel]# rpm -qp --changelog kernel-uek-4.1.12-124.21.1.el6uek.x86_64.rpm | grep -B 3 28228724 warning: kernel-uek-4.1.12-124.21.1.el6uek.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY * Tue Oct 30 2018 Brian Maly <brian.maly@oracle.com> [4.1.12-124.20.8.el6uek] - scsi: lpfc: devloss timeout race condition caused null pointer reference (James Smart) [Orabug: 27994179] - scsi: qla2xxx: Fix race condition between iocb timeout and initialisation (Ben Hutchings) [Orabug: 28013813] - i40e: Add programming descriptors to cleaned_count (Alexander Duyck) [Orabug: 28228724] - i40e: Fix memory leak related filter programming status (Alexander Duyck) [Orabug: 28228724]
Install the new OVS Kernel
Using the steps reported below, the new kernel has been installed on all OVS servers of the farm.
[root@efuovs02 new_Kernel]# rpm -ivh kernel* warning: kernel-uek-4.1.12-124.21.1.el6uek.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY Preparing... ########################################### [100%] 1:kernel-uek-firmware ########################################### [ 50%] 2:kernel-uek ########################################### [100%] Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.1.12-124.21.1.el6uek.x86_64 Found linux image: /boot/vmlinuz-4.1.12-124.14.5.el6uek.x86_64 Found initrd image: /boot/initramfs-4.1.12-124.14.5.el6uek.x86_64.img done Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.1.12-124.21.1.el6uek.x86_64 Found initrd image: /boot/initramfs-4.1.12-124.21.1.el6uek.x86_64.img Found linux image: /boot/vmlinuz-4.1.12-124.14.5.el6uek.x86_64 Found initrd image: /boot/initramfs-4.1.12-124.14.5.el6uek.x86_64.img done [root@efuovs02 new_Kernel]# .... [root@efuovs02 new_Kernel]# reboot [root@efuovs02 ~]# uname -a Linux efuovs02 4.1.12-124.21.1.el6uek.x86_64 #2 SMP Tue Nov 6 13:31:13 PST 2018 x86_64 x86_64 x86_64 GNU/Linux