What can be done to protect from data corruption under low free/avail memory conditions? #1964
-
While investigating #1963 and trying to replicate the triggering conditions, it became evident (somehow expected and understandable) that, under memory pressure, some applications will experience data corruption, due to nanos crashing. Assuming that the assertions on nanos are not recoverable (meaning that when those checks fail things are already screwed and there is no graceful way to handle/report to the application), is there anything we can do on nanos to prevent/alleviate those scenarios, where I would prefer nanos to initiate/signaled a controlled shutdown instead of kernel panic. While I think such applications should be able to detect and react to such scenarios (insufficient resources for the work they are about to do) on their own (shut down cleanly), in reality unfortunately that's not always the case. The followings are some real failures, experienced while testing, that most of the time did cause fatal data corruption. frame trace:
ffffc0001fe77f10: ffffffff800c4ab2 (poll_internal.constprop.0 + 0000000000000282/0000000000000d4a)
ffffc0001fe77fb0: ffffffff800e1a23 (syscall_handler + 00000000000002f3/0000000000000636)
loaded klibs:
assertion w->retval++ < (w->poll_fds->length / sizeof(struct pollfd)) failed at /nanos/src/unix/poll.c:939 (IP 0xffffffff800c0c63) in poll_notify(); halt frame trace:
ffffc0003987fe10: ffffffff800c277b (wait_notify + 00000000000001cb/00000000000003c8)
ffffc0003987fe50: ffffffff800bf83f (notify_dispatch_with_arg + 000000000000009f/00000000000001a5)
ffffc0003987fec0: ffffffff800a922d (efd_write_bh + 00000000000001cd/00000000000001f2)
ffffc0003987ff20: ffffffff800a3113 (blockq_check_timeout + 0000000000000063/0000000000000289)
ffffc0003987ff80: ffffffff800d8d76 (write + 0000000000000166/0000000000000247)
ffffc0003987ffb0: ffffffff800e1a23 (syscall_handler + 00000000000002f3/0000000000000636)
loaded klibs:
assertion w->retval++ < (w->poll_fds->length / sizeof(struct pollfd)) failed at /nanos/src/unix/poll.c:939 (IP 0xffffffff800c0c63) in poll_notify(); halt frame trace:
ffffc000142bfda0: ffffffff8004e8bb (pagecache_node_fetch_internal + 00000000000001fb/0000000000000b18)
ffffc000142bff20: ffffffff800d3556 (file_read + 0000000000000126/000000000000031c)
ffffc000142bff80: ffffffff800d925d (pread + 000000000000016d/0000000000000247)
ffffc000142bffb0: ffffffff800e1a23 (syscall_handler + 00000000000002f3/0000000000000636)
ffffc000142bfff0: 271c466de8251a0f
00000033f5eb6d60: 000000000290a0e8
0000000001231300: 44c7491977000003
loaded klibs:
assertion pp->refcount++ == 0 failed at /nanos/src/kernel/pagecache.c:249 (IP 0xffffffff8004abdf) in realloc_pagelocked(); halt frame trace:
ffffc0000b33f370: ffffffff800853c1 (encode_value_internal + 00000000000002e1/0000000000000b58)
ffffc0000b33f3f0: ffffffff80087768 (encode_tuple_each + 0000000000000078/00000000000000cc)
ffffc0000b33f420: ffffffff800846a5 (iterate + 0000000000000075/0000000000000182)
ffffc0000b33f450: ffffffff80084a93 (encode_tuple_internal + 00000000000002d3/000000000000049f)
ffffc0000b33f560: ffffffff80087768 (encode_tuple_each + 0000000000000078/00000000000000cc)
ffffc0000b33f590: ffffffff800846a5 (iterate + 0000000000000075/0000000000000182)
ffffc0000b33f5c0: ffffffff80084a93 (encode_tuple_internal + 00000000000002d3/000000000000049f)
ffffc0000b33f6d0: ffffffff80087768 (encode_tuple_each + 0000000000000078/00000000000000cc)
ffffc0000b33f700: ffffffff800846a5 (iterate + 0000000000000075/0000000000000182)
ffffc0000b33f730: ffffffff80084a93 (encode_tuple_internal + 00000000000002d3/000000000000049f)
ffffc0000b33f840: ffffffff80087768 (encode_tuple_each + 0000000000000078/00000000000000cc)
ffffc0000b33f870: ffffffff800846a5 (iterate + 0000000000000075/0000000000000182)
ffffc0000b33f8a0: ffffffff80084a93 (encode_tuple_internal + 00000000000002d3/000000000000049f)
ffffc0000b33f9b0: ffffffff80087768 (encode_tuple_each + 0000000000000078/00000000000000cc)
ffffc0000b33f9e0: ffffffff800846a5 (iterate + 0000000000000075/0000000000000182)
ffffc0000b33fa10: ffffffff80084a93 (encode_tuple_internal + 00000000000002d3/000000000000049f)
ffffc0000b33fb20: ffffffff80087768 (encode_tuple_each + 0000000000000078/00000000000000cc)
ffffc0000b33fb50: ffffffff800846a5 (iterate + 0000000000000075/0000000000000182)
ffffc0000b33fb80: ffffffff80084a93 (encode_tuple_internal + 00000000000002d3/000000000000049f)
ffffc0000b33fc90: ffffffff80087768 (encode_tuple_each + 0000000000000078/00000000000000cc)
ffffc0000b33fcc0: ffffffff800846a5 (iterate + 0000000000000075/0000000000000182)
ffffc0000b33fcf0: ffffffff80084a93 (encode_tuple_internal + 00000000000002d3/000000000000049f)
ffffc0000b33fe00: ffffffff80087ca4 (encode_tuple + 0000000000000054/000000000000009e)
ffffc0000b33fe40: ffffffff800a027e (log_write + 000000000000004e/000000000000017e)
ffffc0000b33fe90: ffffffff80099841 (filesystem_log_rebuild + 0000000000000031/00000000000000b4)
ffffc0000b33fed0: ffffffff800a03de (log_flush_timer_expired + 000000000000002e/000000000000007e)
ffffc0000b33fef0: ffffffff80082b0a (timer_service + 00000000000000ea/0000000000000372)
ffffc0000b33ff40: ffffffff80054774 (runloop_internal + 0000000000000164/0000000000000b42)
ffffc0000b33ffc0: ffffffff80041a3f (context_switch_finish + 000000000000006f/00000000000001c9)
ffffc0000b33fff0: 22065b6b80600255
ffffc00000a4ffe8: ffffffff800011a0
0000004202b91d48: 000000420a36cf5b
loaded klibs:
assertion buffer_extend(b, words + 1) failed at /nanos/src/runtime/tuple.c:291 (IP 0xffffffff800839f0) in push_header(); halt
mutex_lock_internal: lock already held - cpu 0, mutex 0xffffc00000808880, ctx 0xffffc0000b338000, ra 0xffffffff8004691e |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I see 2 types of assert failures being mentioned in this discussion:
General purpose OSes like Linux have an out-of-memory killer that, when memory usage is in a critical state, select one or more user processes and kill them (with SIGKILL) to reclaim the memory they occupy; and when this happens, the affected processes are terminated immediately (in other words, it's not a controlled shutdown, from the point of view of the process being shut down). |
Beta Was this translation helpful? Give feedback.
I see 2 types of assert failures being mentioned in this discussion:
w->retval++ < (w->poll_fds->length / sizeof(struct pollfd))
andpp->refcount++ == 0
), which indicate that there is either a flaw in the logic that manages the data structures involved, or some kind of corruption of kernel memory caused by some other (unidentified, and possibly unrelated to where the assert failure occurs) piece of code. To solve these, we either identify the root cause by reading the code (but I haven't been able to so far), or we should find a way to reliable reproduce the issuesbuffer_extend(b, words + 1)
); this is …