You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running incus-benchmark launch --count 200 --parallel 11 (with --parallel >= 11 on that 12 thread machine), some instances fail with this errors:
(sysadm_r)@pluto ~ # incus-benchmark launch --count 200 --parallel 11
Test environment:
Server backend: incus
Server version: 6.7
Kernel: Linux
Kernel tecture: x86_64
Kernel version: 6.6.62-gentoo-dist
Storage backend: zfs
Storage version: 2.2.5-r0-gentoo
Container backend: lxc | qemu
Container version: 6.0.2 | 8.2.3
Test variables:
Container count: 200
Container mode: unprivileged
Startup mode: normal startup
Image: images:ubuntu/22.04
Batches: 18
Batch size: 11
Remainder: 2
[Dec 13 09:37:00.962] Found image in local store: 3bf7665e786f9baf780a5774fae5b43f5a29c79d76af1474ff6e8f212fcfa25d
[Dec 13 09:37:00.962] Batch processing start
[Dec 13 09:37:58.761] Processed 11 containers in 57.798s (0.190/s)
[Dec 13 09:38:54.180] Processed 22 containers in 113.217s (0.194/s)
[Dec 13 09:40:51.015] Processed 44 containers in 230.053s (0.191/s)
[Dec 13 09:41:25.084] Failed to launch container 'benchmark-052': Failed creating instance from image: Failed to activate volume: Failed to locate zvol for "zdata/incus/containers/benchmark-052": context deadline exceeded
[Dec 13 09:41:25.099] Failed to launch container 'benchmark-055': Failed creating instance from image: Failed to activate volume: Failed to locate zvol for "zdata/incus/containers/benchmark-055": context deadline exceeded
[Dec 13 09:44:18.373] Processed 88 containers in 437.411s (0.201/s)
[Dec 13 09:44:53.504] Failed to launch container 'benchmark-093': Failed creating instance from image: Failed to activate volume: Failed to locate zvol for "zdata/incus/containers/benchmark-093": context deadline exceeded
[Dec 13 09:45:43.469] Failed to launch container 'benchmark-107': Failed creating instance from image: Failed to mount "/dev/zvol/zdata/incus/containers/benchmark-107" on "/var/lib/incus/storage-pools/default/containers/benchmark-107" using "ext4": no such file or directory
[Dec 13 09:51:44.892] Processed 176 containers in 883.929s (0.199/s)
[Dec 13 09:53:45.504] Batch processing completed in 1004.542s
(sysadm_r)@pluto ~ #
The ZFS storage itself is fine and healthy. The mentioned filesets are missing in the pool. If I increase --parallel further, I get more failing instances (both, "context deadline exceeded" and "no such file or directory")
Steps to reproduce
Have zfs storage backend (here 2x nvme as system and zfs special device and 2x spinning rust for data)
run above incus-benchmark command with --parallel matching (alomost) system cpu threads
Information to attach
I also see following errors, but also for instances that did not fail in the end:
To me this looks like load issues and incusd not waiting long enough for devices to appear? Because it eventually gave up on 107 (does not exist), but not for example on 086 (exists and is running)
Here is some of the the output of incus monitor --pretty, but from another run.
benchmark-055 failed with "no such file or directory":
Is your system using udev? Do you also have /lib/udev/zvol_id or zvol_id available in the $PATH?
Normally Incus relies on udev to automatically create the needed /dev/zvol entries. When that doesn't happen, it can also use zvol_id as a fallback mechanism to try to figure out the right device.
There is a 30s timeout for that logic at which point you get the Failed to locate zvol for error.
The other cases would generally imply that we did locate a /dev/zvol device but that it was somehow gone by the time we tried to mount it, or maybe was only partially set up by that point.
I'm running your example on a system also using ZFS block mode here but so far without any problem.
stgraber
changed the title
failing instances during incus-benchmark with ZFS storage backend
ZFS bock mode devices sometimes don't appear in time (or disappear)
Dec 13, 2024
Do you also have /lib/udev/zvol_id or zvol_id available in the $PATH?
yes
Normally Incus relies on udev to automatically create the needed /dev/zvol entries. When that doesn't happen, it can also use zvol_id as a fallback mechanism to try to figure out the right device.
There is a 30s timeout for that logic at which point you get the Failed to locate zvol for error. The other cases would generally imply that we did locate a /dev/zvol device but that it was somehow gone by the time we tried to mount it, or maybe was only partially set up by that point.
I'm running your example on a system also using ZFS block mode here but so far without any problem.
How much system load do you get? I am having a very high IOwait when it happens. Is there any way to rais this 30s timeout just to test if waiting longer would help here?
Required information
Issue description
Here is another one on the same setup as in #1483
While running
incus-benchmark launch --count 200 --parallel 11
(with--parallel
>= 11 on that 12 thread machine), some instances fail with this errors:indeed 4 instances missing:
The ZFS storage itself is fine and healthy. The mentioned filesets are missing in the pool. If I increase
--parallel
further, I get more failing instances (both, "context deadline exceeded" and "no such file or directory")Steps to reproduce
--parallel
matching (alomost) system cpu threadsInformation to attach
I also see following errors, but also for instances that did not fail in the end:
To me this looks like load issues and incusd not waiting long enough for devices to appear? Because it eventually gave up on 107 (does not exist), but not for example on 086 (exists and is running)
Here is some of the the output of
incus monitor --pretty
, but from another run.benchmark-055 failed with "no such file or directory":
benchmark-082 failed with context deadline exceeded
The text was updated successfully, but these errors were encountered: