Why accessing memory cause numerous pgfault? #300

xfan1024 · 2024-12-21T17:41:16Z

Description

I've noticed that on the SG2042, extensive memory access operations in user mode result in significant time spent in kernel mode. It seems that many page faults are occurring in the kernel.

Typically, page faults happen during the initial memory access or if the system has swap enabled. However, even with swap disabled and after the initial access, there are still numerous page faults. Are these page faults necessary? If not, can they be optimized?

Steps to reproduce

pgfault.py

this script help to monitor the pgfault count.

import time

def read_pgfault():
    with open("/proc/vmstat", "r") as f:
        for line in f:
            if line.startswith("pgfault"):
                return int(line.split()[1])
    return 0

def main():
    previous_pgfault = None
    while True:
        current_pgfault = read_pgfault()
        if previous_pgfault is not None:
            diff = current_pgfault - previous_pgfault
            print(f"{time.strftime('%Y-%m-%d %H:%M:%S')} Current pgfault: {current_pgfault}, Diff: {diff}")
        previous_pgfault = current_pgfault
        time.sleep(1)

if __name__ == "__main__":
    main()

memtest.c

this is the test code to access memory.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <pthread.h>

#define NUM_THREADS 64
#define NUM_ELEMENTS ((size_t)((1ull * 1024 * 1024 * 1024) / sizeof(uint64_t) / NUM_THREADS))
#define NUM_ITERATIONS 128

struct thread_data
{
    uint64_t *data;
    size_t elements;
    size_t iterations;
};

void memtest(uint64_t *data, size_t elements)
{
    for (size_t i = 0; i < elements; i++)
        data[i] = (uint64_t)i;
}

void *thread_memtest(void *arg)
{
    struct thread_data *data = (struct thread_data *)arg;
    for (size_t i = 0; i < data->iterations; i++)
        memtest(data->data, data->elements);
    return NULL;
}

int main(int argc, char **argv)
{
    pthread_t threads[NUM_THREADS];
    struct thread_data thread_data[NUM_THREADS];
    
    for (size_t i = 0; i < NUM_THREADS; i++)
    {
        thread_data[i].data = (uint64_t *)malloc(NUM_ELEMENTS * sizeof(uint64_t));
        thread_data[i].elements = NUM_ELEMENTS;
        thread_data[i].iterations = NUM_ITERATIONS;
    }

    printf("press enter to warm up");
    getchar();
    for (size_t i = 0; i < NUM_THREADS; i++)
        memtest(thread_data[i].data, thread_data[i].elements);

    printf("press enter to start test");
    getchar();
    for (size_t i = 0; i < NUM_THREADS; i++)
        pthread_create(&threads[i], NULL, thread_memtest, &thread_data[i]);

    for (size_t i = 0; i < NUM_THREADS; i++)
        pthread_join(threads[i], NULL);
    return 0;
}

Test Results

Test on SG2042 (linux 6.6)

warm up stage

2024-12-21 12:25:30 Current pgfault: 747897, Diff: 0
2024-12-21 12:25:31 Current pgfault: 762836, Diff: 14939
2024-12-21 12:25:32 Current pgfault: 781113, Diff: 18277
2024-12-21 12:25:33 Current pgfault: 781113, Diff: 0

test stage

A large number of pgfaults occur here

2024-12-21 12:25:34 Current pgfault: 781113, Diff: 0
2024-12-21 12:25:35 Current pgfault: 781247, Diff: 134
2024-12-21 12:25:36 Current pgfault: 781247, Diff: 0
2024-12-21 12:25:37 Current pgfault: 781247, Diff: 0
2024-12-21 12:25:38 Current pgfault: 785357, Diff: 4110
2024-12-21 12:25:39 Current pgfault: 800029, Diff: 14672
2024-12-21 12:25:40 Current pgfault: 817000, Diff: 16971
2024-12-21 12:25:41 Current pgfault: 834280, Diff: 17280
2024-12-21 12:25:43 Current pgfault: 836192, Diff: 1912
2024-12-21 12:25:44 Current pgfault: 836320, Diff: 128
2024-12-21 12:25:45 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:46 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:47 Current pgfault: 836320, Diff: 0
2024-12-21 12:25:48 Current pgfault: 836362, Diff: 42
2024-12-21 12:25:49 Current pgfault: 836362, Diff: 0

Test on x86_64

only warm up stage cause pgfault, test stage no pgfault.

warm up stage

2024-12-22 01:39:34 Current pgfault: 1235160, Diff: 0
2024-12-22 01:39:35 Current pgfault: 1268376, Diff: 33216
2024-12-22 01:39:36 Current pgfault: 1268376, Diff: 0

test stage

These 134 pgfaults should not be caused by accessing the data array but by starting the threads.

2024-12-22 01:39:38 Current pgfault: 1268376, Diff: 0
2024-12-22 01:39:39 Current pgfault: 1268510, Diff: 134
2024-12-22 01:39:40 Current pgfault: 1268510, Diff: 0

xfan1024 · 2024-12-21T17:54:10Z

In the previous linux-6.1.55, the speed of concurrent memory access by 64 threads sometimes was even below 10MB/s.(not per thread speed, it's speed of all threads)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why accessing memory cause numerous pgfault? #300

Why accessing memory cause numerous pgfault? #300

xfan1024 commented Dec 21, 2024 •

edited

Loading

xfan1024 commented Dec 21, 2024 •

edited

Loading

Why accessing memory cause numerous pgfault? #300

Why accessing memory cause numerous pgfault? #300

Comments

xfan1024 commented Dec 21, 2024 • edited Loading

Description

Steps to reproduce

pgfault.py

memtest.c

Test Results

Test on SG2042 (linux 6.6)

Test on x86_64

xfan1024 commented Dec 21, 2024 • edited Loading

xfan1024 commented Dec 21, 2024 •

edited

Loading

xfan1024 commented Dec 21, 2024 •

edited

Loading