-
-
Notifications
You must be signed in to change notification settings - Fork 604
Syscalls
By design, most applications running on OSv do not execute system calls when calling the libc
functions. For example, an invocation of a mmap()
is a direct local function call resolved by OSv dynamic linker that involves very few instructions and is therefore very fast. On Linux, the same call is way more expensive as it goes through a wrapper function in glibc
which then invokes the system call SYS_mmap
that involves a CPU ring and virtual address space switch among other things. This OSv optimization may not be as relevant as one would hope, especially when applications make few mmap()
calls as is often the case, but this is a topic for another story.
Some applications like Golang or statically linked applications (see this for more details) bypass the libc
layer and invoke systems calls directly using the SYSCALL
(x86_64
) or SVC
(aarch64
) instructions. To support those, OSv implemented the system call handler machinery in assembly for both x86_64
and aarch64
.
Unlike Linux, where libc
functions like mmap()
delegate to the corresponding system calls (SYS_mmap
in the example above), in OSv the opposite happens. Just like in Linux, OSv implements the SYSCALL
and SVC
instructions for x86_64
and aarch64
respectively (see syscall_entry
in arch/x64/entry.S
and handle_system_call
in arch/aarch64/entry.S
). This tricky low-level assembly code (~50 instructions) switches to a dedicated system call stack, saves all necessary registers, and delegates to syscall_wrapper()
and eventually syscall()
functions implemented in C++ in linux.cc
. Finally, the syscall()
function has a case
statement that calls the relevant libc
function implemented by OSv.
Sometimes there is no libc
function that syscall()
can directly invoke by a simple preprocessor statement like below:
SYSCALL2(listen, int, int);
In those cases like futex()
or getdents64()
we implement relevant functions which we delegate to like so:
int futex(int *uaddr, int op, int val, const struct timespec *timeout,
int *uaddr2, uint32_t val3)
{
switch (op & FUTEX_CMD_MASK) {
...
}
SYSCALL6(futex, int *, int, int, const struct timespec *, int *, uint32_t);
As one can tell, the system call invocations in OSv are slower and more expensive than the regular direct local function calls but should still be faster than in Linux because OSv does not have to switch the virtual memory mapping nor CPU ring.
Please note that even if OSv does not implement (aka expose) a specific system call by delegating to the relevant libc
function, the syscall()
function would log such a fact and return ENOSYS
instead of crashing:
syscall(): unimplemented system call <nnn>
Here is a list of 73 system calls OSv exposes as of April 2023:
accept4
bind
clock_getres
clock_gettime
close
connect
dup3
epoll_create1
epoll_ctl
epoll_pwait
epoll_wait
eventfd2
exit
exit_group
fcntl
fdatasync
flock
fstat
fstatat
fsync
ftruncate
futex
getcwd
getdents64
getgid
get_mempolicy
getpeername
getpid
getrandom
getsockname
getsockopt
gettid
getuid
ioctl
listen
lseek
madvise
mincore
mkdir
mkdirat
mmap
munmap
nanosleep
open
openat
pipe2
pread64
pselect6
pwrite64
read
readlinkat
recvfrom
recvmsg
renameat
rt_sigaction
rt_sigprocmask
sched_getaffinity
sched_setaffinity
sched_yield
select
sendmsg
sendto
set_mempolicy
setsockopt
sigaltstack
socket
stat
statfs
symlinkat
tgkill
uname
unlinkat
write
Most of the time implementing (or exposing) new system calls is very trivial as long as there is a corresponding libc
function in OSv (see the introduction section). Here is a list of 60 systems calls that should be trivial to expose (minus those that do not exactly map one-to-one to the libc
functions):
accept
access
alarm
chdir
creat
dup
dup2
epoll_create
eventfd
fallocate
faccessat
fchdir
fstatfs
futimesat
getitimer
getpriority
getrlimit
getrusage
gettimeofday
kill
lstat
mprotect
msync
pause
pipe
poll
ppoll
prctl
readlink
readv
rename
rmdir
sched_get_priority_max
sched_get_priority_min
sendfile
sethostname
setitimer
setpriority
setrlimit
shmget
shmat
shmctl
shmdt
shutdown
socketpair
symlink
sync
sysinfo
time
timerfd_create
timerfd_gettime
timerfd_settime
times
truncate
umask
unlink
utime
utimensat
utimes
writev