Discover process that exhausted my memory and CPU limits on Ubuntu

Question

I have a DigitalOcean droplet running Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-89-generic x86_64). My website went down at approximately Fri Dec 01 6:48 AM GMT. From var/log/apache2/access.log, I discovered that the last entries before immediately before the website went down were:

45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/331/et-core-unified-tb-508-tb-516-deferred-331.min.css?ver=1701308913 HTTP/1.1" 200 7247 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML
, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/25/et-core-unified-25.min.css?ver=1701337684 HTTP/1.1" 200 5295 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0
.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/25/et-core-unified-tb-13-tb-74-deferred-25.min.css?ver=1701337684 HTTP/1.1" 200 6148 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, li
ke Gecko) Chrome/87.0.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/24899/et-core-unified-24899.min.css?ver=1701308552 HTTP/1.1" 200 5295 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrom
e/87.0.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/331/et-core-unified-331.min.css?ver=1701308912 HTTP/1.1" 200 5295 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87
.0.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/23/et-core-unified-tb-13-tb-74-deferred-23.min.css?ver=1701336351 HTTP/1.1" 200 6144 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, li
ke Gecko) Chrome/87.0.4280.88 Safari/537.36"

I was expecting to find evidence of what caused the problem at var/log/apache2/error.log. To my surprise, this was all I found:

[Fri Dec 01 00:00:01.001724 2023] [mpm_prefork:notice] [pid 851] AH00163: Apache/2.4.52 (Ubuntu) OpenSSL/3.0.2 configured -- resuming normal operations
[Fri Dec 01 00:00:01.001885 2023] [core:notice] [pid 851] AH00094: Command line: '/usr/sbin/apache2'
[Fri Dec 01 12:02:26.984774 2023] [mpm_prefork:notice] [pid 853] AH00163: Apache/2.4.52 (Ubuntu) OpenSSL/3.0.2 configured -- resuming normal operations
[Fri Dec 01 12:02:27.071107 2023] [core:notice] [pid 853] AH00094: Command line: '/usr/sbin/apache2'

Nothing happening around Fri Dec 01 6:48 AM GMT/UTC. My server is using GMT/UTC (and all the logs I copied above).

I am wondering if the problem was not anything that would be sent to error logs, but maybe my RAM and CPU were exhausted, reaching their limits and taking my server down. Particularly, I know RAM reached 100%:

CPU, Load, Disk I/O, and Bandwidth were also impacted with unusual peaks at the same time (around Fri Dec 1 6:48 to 6:50 AM GMT/UTC), which is when the website went down:

I know something happened around Fri Dec 01 in the two minutes between 6:48 to 6:50 AM GMT/UTC, as evidenced by the server resource consumption/performance. But the Apache logs at /var/log/apache2/ are missing information of what happened at that time. What I would like to do is to figure out which processes were consuming the highest percentage of resources at Fri Dec 01 6:48 AM GMT, since I know RAM memory reached 100% of its capacity. I would like to have something similar to what we can do on Windows with Ctrl + Alt + Sup, and then under the Task Manager, clicking the Processes tab. When clicking the Memory column, Windows sorts processes by RAM Memory consumption, listed from highest to lowest. For example, from my computer I can tell that Google Chrome is consuming more RAM than any other application, when competing individually:

The problem of course is that I cannot go back in time and monitor RAM memory consumption on the Ubuntu server sorted by process to see what was happening at Fri Dec 1 6:48 to 6:50 AM GMT/UTC. So I need to discover that information from the logs and var/log/apache2/ does not seem to be providing anything useful. In fact, it misses information of what happened at that time in terms of errors. Is there a way to monitor RAM and/or CPU consumption by process back in time on Ubuntu? Or, how can I find in the Apache logs what could have happened that took my website down?

UPDATE 1: By using cat syslog |grep "Dec 1 06:48" I discovered what happened at Dec 1 06:48 GMT/UTC:

Dec  1 06:48:11 jaimemontoya kernel: [67708.613307] [UFW BLOCK] IN=eth0 OUT= MAC=92:df:ec:15:65:70:fe:00:00:00:01:01:08:00 SRC=89.248.163.205 DST=143.198.36.10 LEN=40 TOS=0x00 PREC=0x00 TTL=239 ID=35961 PROTO=TCP SPT=59246 DPT=3644 WINDOW=1024 RES=0x00 SYN URGP=0
Dec  1 06:48:17 jaimemontoya kernel: [67713.912771] systemd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Dec  1 06:48:17 jaimemontoya kernel: [67713.912798] CPU: 0 PID: 1 Comm: systemd Not tainted 5.15.0-89-generic #99-Ubuntu
Dec  1 06:48:17 jaimemontoya kernel: [67713.912806] Hardware name: DigitalOcean Droplet/Droplet, BIOS 20171212 12/12/2017
Dec  1 06:48:17 jaimemontoya kernel: [67713.912818] Call Trace:
Dec  1 06:48:17 jaimemontoya kernel: [67713.912822]  <TASK>
Dec  1 06:48:17 jaimemontoya kernel: [67713.912876]  show_stack+0x52/0x5c
...........
Dec  1 06:48:17 jaimemontoya kernel: [67713.913069]  asm_exc_page_fault+0x27/0x30
Dec  1 06:48:17 jaimemontoya kernel: [67713.913075] RIP: 0033:0x7f2280df1040
Dec  1 06:48:17 jaimemontoya kernel: [67713.913091] Code: Unable to access opcode bytes at RIP 0x7f2280df1016.
Dec  1 06:48:17 jaimemontoya kernel: [67713.913092] RSP: 002b:00007ffe4d14fa18 EFLAGS: 00010202
Dec  1 06:48:17 jaimemontoya kernel: [67713.913097] RAX: 0000000000000000 RBX: 000055a43072c670 RCX: 000055a4308ed840
Dec  1 06:48:17 jaimemontoya kernel: [67713.913099] RDX: 0000000000000000 RSI: 000055a42ff1dfe1 RDI: 000055a43072c670
Dec  1 06:48:17 jaimemontoya kernel: [67713.913100] RBP: 0000000000000000 R08: 0000000000000000 R09: dd020cb76a33553c
Dec  1 06:48:17 jaimemontoya kernel: [67713.913101] R10: 83e2e07d8b22b2e2 R11: 00007ffe4d1ed090 R12: 000055a42ff1dfe1
Dec  1 06:48:17 jaimemontoya kernel: [67713.913102] R13: 0000000000000000 R14: 0000000000061824 R15: 0000000000000000
Dec  1 06:48:17 jaimemontoya kernel: [67713.913109]  </TASK>
Dec  1 06:48:17 jaimemontoya kernel: [67713.913113] Mem-Info:
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118] active_anon:19496 inactive_anon:176522 isolated_anon:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  active_file:25 inactive_file:294 isolated_file:33
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  unevictable:7678 dirty:0 writeback:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  slab_reclaimable:5762 slab_unreclaimable:11722
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  mapped:24147 shmem:24761 pagetables:3898 bounce:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  kernel_misc_reclaimable:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  free:12123 free_pcp:224 free_cma:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913123] Node 0 active_anon:77984kB inactive_anon:706088kB active_file:100kB inactive_file:1176kB unevictable:30712kB isolated(anon):0kB isolated(file):132kB mapped:96588kB dirty:0kB writeback:0kB shmem:99044kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:3152kB pagetables:15592kB all_unreclaimable? no
Dec  1 06:48:17 jaimemontoya kernel: [67713.913133] Node 0 DMA free:4344kB min:732kB low:912kB high:1092kB reserved_highatomic:0KB active_anon:4kB inactive_anon:10456kB active_file:12kB inactive_file:32kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913144] lowmem_reserve[]: 0 903 903 903 903
Dec  1 06:48:17 jaimemontoya kernel: [67713.913147] Node 0 DMA32 free:44148kB min:44320kB low:55400kB high:66480kB reserved_highatomic:0KB active_anon:77980kB inactive_anon:695632kB active_file:88kB inactive_file:1148kB unevictable:30712kB writepending:0kB present:1032032kB managed:965124kB mlocked:27640kB bounce:0kB free_pcp:896kB local_pcp:896kB free_cma:0kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913152] lowmem_reserve[]: 0 0 0 0 0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913155] Node 0 DMA: 14*4kB (M) 8*8kB (M) 4*16kB (UM) 8*32kB (U) 5*64kB (U) 4*128kB (UM) 2*256kB (U) 3*512kB (UM) 1*1024kB (M) 0*2048kB 0*4096kB = 4344kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913166] Node 0 DMA32: 209*4kB (UE) 254*8kB (ME) 136*16kB (UME) 152*32kB (UME) 75*64kB (ME) 38*128kB (ME) 16*256kB (UME) 14*512kB (ME) 13*1024kB (UM) 0*2048kB 0*4096kB = 44148kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913186] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913191] 27388 total pagecache pages
Dec  1 06:48:17 jaimemontoya kernel: [67713.913192] 0 pages in swap cache
Dec  1 06:48:17 jaimemontoya kernel: [67713.913195] Swap cache stats: add 0, delete 0, find 0/0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913196] Free swap  = 0kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913197] Total swap = 0kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913197] 262006 pages RAM
Dec  1 06:48:17 jaimemontoya kernel: [67713.913198] 0 pages HighMem/MovableOnly
Dec  1 06:48:17 jaimemontoya kernel: [67713.913198] 16885 pages reserved
Dec  1 06:48:17 jaimemontoya kernel: [67713.913199] 0 pages hwpoisoned
Dec  1 06:48:17 jaimemontoya kernel: [67713.913200] Tasks state (memory values in pages):
Dec  1 06:48:17 jaimemontoya kernel: [67713.913200] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Dec  1 06:48:17 jaimemontoya kernel: [67713.913205] [    339]     0   339    34126      874   290816        0          -250 systemd-journal
...........
Dec  1 06:48:38 jaimemontoya kernel: [67735.153143] [  19981]     0 19981      411       26    32768        0             0 sshd
Dec  1 06:48:38 jaimemontoya kernel: [67735.153146] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/apache2.service,task=apache2,pid=14813,uid=1000
Dec  1 06:48:38 jaimemontoya kernel: [67735.153172] Out of memory: Killed process 14813 (apache2) total-vm:358052kB, anon-rss:38456kB, file-rss:3128kB, shmem-rss:76672kB, UID:1000 pgtables:440kB oom_score_adj:0
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Control process exited, code=killed, status=15/TERM
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 851 (apache2) with signal SIGKILL.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 13197 (apache2) with signal SIGKILL.
...........
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 19950 (apache2) with signal SIGKILL.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 19968 (apache2) with signal SIGKILL.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Failed with result 'oom-kill'.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Unit process 13197 (apache2) remains running after unit stopped.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Unit process ...........
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Unit process 19968 (apache2) remains running after unit stopped.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Consumed 18min 58.770s CPU time.
Dec  1 06:48:39 jaimemontoya systemd[1]: Started MySQL Community Server.
Dec  1 06:48:58 jaimemontoya kernel: [67755.616034] [UFW BLOCK] IN=eth0 OUT= MAC=92:df:ec:15:65:70:fe:00:00:00:01:01:08:00 SRC=167.94.146.79 DST=143.198.36.10 LEN=44 TOS=0x00 PREC=0x00 TTL=40 ID=53055 PROTO=TCP SPT=63693 DPT=2762 WINDOW=1024 RES=0x00 SYN URGP=0

UPDATE 2: After examining logs from UPDATE 1, I see the root cause:

Dec  1 06:48:38 jaimemontoya kernel: [67735.153172] Out of memory: Killed process 14813 (apache2) total-vm:358052kB, anon-rss:38456kB, file-rss:3128kB, shmem-rss:76672kB, UID:1000 pgtables:440kB oom_score_adj:0
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Control process exited, code=killed, status=15/TERM
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 851 (apache2) with signal SIGKILL.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 13197 (apache2) with signal SIGKILL.

Now I need to find a way to know which process to blame as the cause of this Out of memory: Killed process 14813 (apache2).

If the out-of-memory (OooM) killer killed a process you normally see a message in the kernel log (with dmesg for example although that gets flushed at reboot). — HBruijn, Dec 1 at 14:03
In general you need to set up a sufficient level of auditing and logging before an incident occurs to have sufficient granularity and information after the incident. — HBruijn, Dec 1 at 14:11
The OS logs OoM events, IIRC for Ubuntu in /var/log/syslog - if the OoM killer triggers it kills the programme with the best score to free up sufficient memory, which is not always the culprit of the system running out memory in the first place. — HBruijn, Dec 1 at 14:17
Unless you have swap enabled, running a system with 83% memory usage is a recipe for a system crash. However, you do not want anything going to swap because CPU usage can sky rocket causing thrashing. At the same time, disk I/O dramatically increased and then your system collapsed. Your droplet has plenty of CPU, but has insufficient memory. You did not find the root cause. You found the logs for the system trying to save itself after the root cause. Save yourself a lot of wasted time and double the memory to keep it under 50% usage. — John Hanley, Dec 3 at 7:23
@JohnHanley If you have the possibility of having SWAP space, it is to your advantage to survive rather than crash due to OOM. You do not have to use Red Hat to take advantage of their expertise and recommendations to survive in the real world. — Wilson Hauck, Dec 4 at 4:24

Stack Exchange Network

Discover process that exhausted my memory and CPU limits on Ubuntu

0

You must log in to answer this question.

Browse other questions tagged
logging
memory
apache2
cpu-usage
memory-usage
.

Hot Network Questions

Discover process that exhausted my memory and CPU limits on Ubuntu

0

You must log in to answer this question.

Browse other questions tagged loggingmemoryapache2cpu-usagememory-usage.

Related

Hot Network Questions

Browse other questions tagged
logging
memory
apache2
cpu-usage
memory-usage
.