1

We're running a reverse proxying application within docker that is very network intensive. Dockerd is occasionally allocating memory using madv_hugepage, and we've observed this sometimes causes application stalls whilst dockerd pauses emptying the container log buffers and blocked writes occur. These stalls are long enough to cause a significant amount of dropped traffic.

Checking buddyinfo confirmed that these issues occur when we've got zero high-order allocations available. Usually busy hosts, with high uptime.

We've taken the step to switch from [madvise] to [defer] in an attempt to limit the impact of compact stalling, whilst retaining the benefit of transparent hugepages and not disabling them entirely. We haven't yet had a reoccurence of our issue, but it's early days.

These settings are set during boot by a systemd unit we added.

[foobar.local:/root]:mgmt:$ cat /sys/kernel/mm/transparent_hugepage/defrag
always [defer] defer+madvise madvise never

[foobar.local:/root]:mgmt:$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

[foobar.local:/root]:mgmt:$ cat /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
1

In plain english, I understand this to mean... Only allocate a thp on madv_hugepage regions, if not available, fallback and wake kswapd/kcompactd. khugepaged will attempt to collapse the pages into a thp at a later point.

[foobar.local:/root]:mgmt:$ egrep 'thp|compact' /proc/vmstat
compact_migrate_scanned 7153737
compact_free_scanned 994921216
compact_isolated 7446472
compact_stall 510
compact_fail 336
compact_success 174
compact_daemon_wake 508
compact_daemon_migrate_scanned 845040
compact_daemon_free_scanned 146659185
thp_fault_alloc 1260
thp_fault_fallback 313
thp_collapse_alloc 1118
thp_collapse_alloc_failed 1703
thp_file_alloc 0
thp_file_mapped 0
thp_split_page 152
thp_split_page_failed 0
thp_deferred_split_page 2102
thp_split_pmd 719
thp_split_pud 0
thp_zero_page_alloc 4
thp_zero_page_alloc_failed 0
thp_swpout 0
thp_swpout_fallback 0

Given the above settings, I'm surprised to see the compact_stall metric still increasing. Why are we seeing compact_stalls increase? Should we be concerned, or is it only a side-effect of the deferred allocation? thp_fault_fallback is increasing, so some defer appears to be happening.

0

You must log in to answer this question.

Browse other questions tagged .