Very inefficient block cache #1318

wkozaczuk · 2024-06-23T15:47:22Z

As Jan Braunwarth eloquently explains in his bachelor thesis, the OSv block cache is very inefficient:

"OSv also has a cache that should increase I/O performance, but it is very inefficient and, as you can see in Figure 4.8, does not lead to an increase but rather a dramatic drop. If you look at how the block cache works, it quickly becomes clear why this is. Each I/O is initially divided by the cache into 512 byte blocks. Then, when a read request is made, each block is checked to see whether it is already in the cache and, if so, copied directly from there to the target address. Since the RAM can answer the request much faster, this administrative effort is worth it. The problem is what happens when the block is not yet in the cache.

For example, if an application wants to read a 1 MiB file that is not yet in the cache, the request is divided into 2048 I/Os, each 512B in size. These 2048 requests are then all processed sequentially and also copied from the block cache to the target address. The measured IOPS are therefore significantly lower than the number of SQEs that were processed by the NVMe."

It must be noted, however, that most applications will not be affected by it as they will go through the VFS layer. The filesystems drivers in OSv (ZFS, RoFS, and recently EXT4) bypass the block cache and call devops->strategy() directly

To reproduce this problem, one can use the fio app setup to read from disk directly (as bypasses the file system):

/fio --name=fiotest --filename=/dev/nvme1n1 --size 10Mb --rw=read ....

There are at least two options to fix this moderately important issue:

improve the block cache and ideally make devops->strategy() use it as well (more difficult)
change block device drivers to replace bread and bwrite with code similar to what the strategy functions do like in this proposed patch (easy):

diff --git a/drivers/virtio-blk.cc b/drivers/virtio-blk.cc
index 48750a01..4f7676e9 100644
--- a/drivers/virtio-blk.cc
+++ b/drivers/virtio-blk.cc
@@ -49,6 +49,9 @@ TRACEPOINT(trace_virtio_blk_req_err, "bio=%p, sector=%lu, len=%lu, type=%x", str
 using namespace memory;
 
 
+int
+bdev_direct_read_write(struct device *dev, struct uio *uio, int ioflags);
+
 namespace virtio {
 
 int blk::_instance = 0;
@@ -71,7 +74,8 @@ blk_strategy(struct bio *bio)
 static int
 blk_read(struct device *dev, struct uio *uio, int ioflags)
 {
-    return bdev_read(dev, uio, ioflags);
+    return bdev_direct_read_write(dev, uio, ioflags);
 }
 
 static int
@@ -82,6 +86,7 @@ blk_write(struct device *dev, struct uio *uio, int ioflags)
     if (prv->drv->is_readonly()) return EROFS;
 
-     return bdev_write(dev, uio, ioflags);
+    return bdev_direct_read_write(dev, uio, ioflags);
 }
 
 static struct devops blk_devops {
diff --git a/fs/vfs/kern_physio.cc b/fs/vfs/kern_physio.cc
index c7c99c72..80c22ccc 100644
--- a/fs/vfs/kern_physio.cc
+++ b/fs/vfs/kern_physio.cc
@@ -138,3 +138,50 @@ void multiplex_strategy(struct bio *bio)
 		len -= req_size;
 	}
 }
+
+int
+bdev_direct_read_write(struct device *dev, struct uio *uio, int ioflags)
+{
+    bio* complete_io = alloc_bio();
+
+    u8 opcode;
+    switch (uio->uio_rw) {
+    case UIO_READ :
+        opcode = BIO_READ;
+        break;
+    case UIO_WRITE :
+        opcode = BIO_WRITE;
+        break;
+    default :
+        return EINVAL;
+    }
+
+    refcount_init(&complete_io->bio_refcnt, uio->uio_iovcnt);
+
+    while(uio->uio_iovcnt > 0) 
+    {
+        bio* bio = alloc_bio();
+        bio->bio_cmd = opcode;
+        bio->bio_dev = dev;
+
+        bio->bio_bcount = uio->uio_iov->iov_len;
+        bio->bio_data = uio->uio_iov->iov_base;
+        bio->bio_offset = uio->uio_offset;
+
+        bio->bio_caller1 = complete_io;
+        bio->bio_private = complete_io->bio_private;
+        bio->bio_done = multiplex_bio_done;
+
+        dev->driver->devops->strategy(bio);
+
+        uio->uio_offset += uio->uio_iov->iov_len;
+        uio->uio_resid -= uio->uio_iov->iov_len;
+        uio->uio_iov++;
+        uio->uio_iovcnt--;
+    }
+    assert(uio->uio_resid == 0);
+    int ret = bio_wait(complete_io);
+    destroy_bio(complete_io);
+
+    return ret;
+}

The text was updated successfully, but these errors were encountered:

nyh · 2024-06-24T21:05:40Z

As Jan Braunwarth eloquently explains in his bachelor thesis, the OSv block cache is very inefficient:

Interesting, can you please post here a link to this bachelor thesis?

wkozaczuk · 2024-06-28T19:38:06Z

Let me send it to you! It is in german but you can easily google-translate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very inefficient block cache #1318

Very inefficient block cache #1318

wkozaczuk commented Jun 23, 2024

nyh commented Jun 24, 2024

wkozaczuk commented Jun 28, 2024

Very inefficient block cache #1318

Very inefficient block cache #1318

Comments

wkozaczuk commented Jun 23, 2024

nyh commented Jun 24, 2024

wkozaczuk commented Jun 28, 2024