-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt to new (v2) VFIO migration interface #741
Conversation
Signed-off-by: William Henderson <[email protected]>
Signed-off-by: William Henderson <[email protected]>
Signed-off-by: William Henderson <[email protected]>
Signed-off-by: William Henderson <[email protected]>
Signed-off-by: William Henderson <[email protected]>
Signed-off-by: William Henderson <[email protected]>
Signed-off-by: William Henderson <[email protected]>
Signed-off-by: William Henderson <[email protected]>
Signed-off-by: William Henderson <[email protected]>
Hi William, thanks! Looks like a great start. As for the first change here, this looks good, but could you add some context to the commit message? In particular:
|
lib/libvfio-user.c
Outdated
ssize_t ret; | ||
|
||
if (req->flags & VFIO_DEVICE_FEATURE_PROBE) { | ||
msg->out.iov.iov_base = msg->in.iov.iov_base; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation doesn't make much sense to me: it has no effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set the reply to equal the request, as this is what VFIO and hence vfio-user dictates for the success response - this has since been replaced by the following, as the aliasing caused a double free: 😳
msg->out.iov.iov_base = malloc(msg->in.iov.iov_len);
msg->out.iov.iov_len = msg->in.iov.iov_len;
memcpy(msg->out.iov.iov_base, msg->in.iov.iov_base,
msg->out.iov.iov_len);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such an API would not be useful. I think you're supposed to respond with the set of flags known to the server. (And, if get, or set, is set, whether the server supports that verb for the given flag(s)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to diverge from VFIO here?
From documentation of VFIO_DEVICE_FEATURE:
Get, set, or probe feature data of the device. The feature is selected using the FEATURE_MASK portion of the flags field. Support for a feature can be probed by setting both the FEATURE_MASK and PROBE bits. A probe may optionally include the GET and/or SET bits to determine read vs write access of the feature respectively. Probing a feature will return success if the feature is supported and all of the optionally indicated GET/SET methods are supported. The format of the data portion of the structure is specific to the given feature. The data portion is not required for probing. GET and SET are mutually exclusive, except for use with PROBE.
* ``VFIO_DEVICE_FEATURE_SET`` instructs the server to set the feature data to | ||
that given in the ``data`` field of the payload. | ||
|
||
* ``VFIO_DEVICE_FEATURE_PROBE`` instructs the server to probe for feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probing for one or more of these three things:
- is the flag supported at all
- can I do a "get" for this flag
- can I do a "set" for this flag
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is if you don't set GET/SET, it is successful if the feature is supported at all, but if you set GET/SET it is only successful if whatever commands you probed for are also supported. How would you suggest I word this more clearly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a whole section on probe, saying something like:
Probing
If just VFIO_DEVICE_FEATURE_PROBE
is set, the server should return the corresponding flags that are supported.
If VFIO_DEVICE_FEATURE_PROBE
is set along with VFIO_DEVICE_FEATURE_GET
, VFIO_DEVICE_FEATURE_SET
, or both, the server should return whether the corresponding operation(s) are supported for the given flags.
indicated via the flags field; at least ``VFIO_MIGRATION_STOP_COPY`` must be | ||
set. | ||
|
||
There is no data field of the request message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit of the spec is pretty much just lifted from VFIO, I don't believe you can call SET on this feature...?
https://elixir.bootlin.com/linux/v6.3.8/source/include/uapi/linux/vfio.h#L814
| VFIO_DEVICE_STATE_PRE_COPY_P2P | 7 | (not used in vfio-user) | | ||
+--------------------------------+-------+---------------------------------------------------------------------+ | ||
|
||
* *data_fd* is unused in vfio-user, as the ``VFIO_USER_MIG_DATA_READ`` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this is always going to be completely meaningless, we should discuss whether we should diverge here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, the only reason why we could keep it is to slightly simplify QEMU implementation, though I'm not sure that's a strong reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we haven't discussed whether we're going to support memory mapped data transfers for migration (this FD is something we could use), @w-henderson have you given any thought to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought about this - I quite like the idea of following VFIO as closely as we can so my preference is probably to leave it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
support memory mapped data transfers for migration (this FD is something we could use)
What I said doens't make sense, the FD needs to be passed via ancillary data. I don't have a strong opinion about keeping it, perhaps qemu-devel may help us decide.
``data_fd`` file descriptor in ``<linux/vfio.h>`` | ||
(``struct vfio_device_feature_mig_state``) to enable all data transport to use | ||
the single already-established UNIX socket. Hence, the migration data is | ||
treated like a stream, so the client must continue reading until no more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this look in vfio? Is a stream "safe", or would we ever want to support sparse read/writes in some way?
I'm not clear how this would even work if we wanted to do device state dirty tracking in PRE_COPY state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in VFIO we never "seek", only read/write until no more reading/writing can be done. I'm also unsure about how we'd do device state dirty tracking in PRE_COPY state - I think the idea is we track the dirty pages in PRE_COPY so we can only send the necessary migration data once we get to STOP_COPY but I'm not too sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe in VFIO we never "seek", only read/write until no more reading/writing can be done.
That's correct.
I'm not sure I understand the rest of the discussion though, especially how dirty page tracking is related to device state tracking.
I have the following question (perhaps we're talking about the same thing?), regarding PRE_COPY, from VFIO:
* RUNNING -> PRE_COPY
* RUNNING_P2P -> PRE_COPY_P2P
* STOP -> STOP_COPY
* PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states
* which share a data transfer session. Moving between these states alters
* what is streamed in session, but does not terminate or otherwise affect
* the associated fd.
*
* These arcs begin the process of saving the device state and will return a
* new data_fd. The migration driver may perform actions such as enabling
* dirty logging of device state when entering PRE_COPY or PER_COPY_P2P.
*
* Each arc does not change the device operation, the device remains
* RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below
* in PRE_COPY_P2P -> STOP_COPY.
IIUC PRE_COPY returns an FD as this starts a new copy session, so in vfio-user this means that the client can start sending VFIO_USER_MIG_DATA_READ
messages until the server returns 0 length read (equivalnt to the data_fd
returning no more data in VFIO). While in PRE_COPY state, the device migth generate more device data, how does the VFIO client know this? Does reading from the data_fd
return data (it would previously return EOF).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VFIO defines PRE_COPY
to mean
The device is running normally but tracking internal state changes
which maybe implies that we don't actually transfer any data in that state, rather just do whatever we can in the background to prepare for transfer and track the changes to keep this up to date? I'm not sure if this makes sense though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These arcs begin the process of saving the device state and will return a new data_fd.
This suggests otherwise, but I agree it's not clear at all.
Each arc does not change the device operation, the device remains RUNNING, P2P quiesced or in STOP.
Can you comment on this? Does the device stay in RUNNING
state or does it transition to RUNNING | PRE_COPY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely doesn't stay in the exact same state though.
By state do you mean RUNNING
etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, RUNNING
and PRE_COPY
are distinct states that have the same behaviour from the client's perspective but do things differently behind the scenes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, the documentation says:
Each arc does not change the device operation, the device remains RUNNING, P2P quiesced or in STOP.
while you say:
It definitely doesn't stay in the exact same state though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused too, I had been thinking of them as distinct states but I guess it's supposed to be more like different behaviour in the same state? That sounds a bit dodgy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, and I was hoping you'd look at the kernel code to figure it out 😄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've skimmed through it, need to sit down and compare with VFIO v2.
Could you include in the commit message from which commit/kernel version you're basing this? If master doesn't differ too much from latest stable version, then we could probably use the latter.
| VFIO_DEVICE_STATE_PRE_COPY_P2P | 7 | (not used in vfio-user) | | ||
+--------------------------------+-------+---------------------------------------------------------------------+ | ||
|
||
* *data_fd* is unused in vfio-user, as the ``VFIO_USER_MIG_DATA_READ`` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, the only reason why we could keep it is to slightly simplify QEMU implementation, though I'm not sure that's a strong reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this WIP, can you explain in the commit message how libvfio-user handles migration v2? E.g. it understands all migration-related operations but fails them, etc
type->subtype = VFIO_REGION_SUBTYPE_MIGRATION; | ||
vfio_reg->cap_offset = sizeof(struct vfio_region_info); | ||
} | ||
|
||
if (vfu_reg->mmap_areas != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it, can you add a separate commit where you invert this check so that identation is decreased?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also explain what this commit does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the client/server sample work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed from migration_read_data
onwards, can you explain in the commit message how the code changes and why? Also, some comments in the code would also help, I know it wasn't there in v1 but it should have been, let's improve with v2.
@@ -267,13 +267,13 @@ enum vfio_device_mig_state { | |||
// used for read request and write response | |||
struct vfio_user_mig_data_without_data { | |||
uint32_t argsz; | |||
uint32_t size; | |||
uint64_t size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm honest I can't remember why I changed this, should I change it back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it probably doesn't make sense to be 64-bit given that the entire message's length is 32-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should change it back to 32-bit
@w-henderson Is this PR up for review or WIP I see that the build is failing? Are you going to split this into small logical PRs? |
This PR will (when done):
Closes #654