-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable dbus-broker reexecute #310
base: main
Are you sure you want to change the base?
Conversation
if (mem_fd < 0) | ||
return error_fold(mem_fd); | ||
|
||
(void) serialize_basic(f, "max_ids", "%d", broker->bus.peers.ids); |
Check failure
Code scanning / CodeQL
Wrong type of arguments to formatting function
|
||
/* Recover: sasl */ | ||
peer_recover_sasl(peer, sasl); | ||
if (r < 0) |
Check warning
Code scanning / CodeQL
Comparison result is always the same
Thank you for working on this! It is our most requested feature and will surely be greatly appreciated by many. Some questions before I look at the code in detail (I am sure I'll think of more things in the coming days):
|
Regarding the draining, it might be worth while to introduce a Freeze would finish any partial dispatch and stop further dispatching. We still need to serialize messages that are queued but not dispatched though. An open question is what to do about misbehaving peers who have stopped reading/worrying from/to their sockets. |
I didn't think about the version. Currently, I only serialize the peer's
These should be enough to recover a peer. But yes, we may have some big changes in the future version,
I don't close the peer's fd during reexecuting, and write all messages out before reexecuting. After reexecuting, I reepoll these fd and process it. I tested this by the following testcase:
No message was lost. Currently, this commit doesn't support create a new connection during reexecuting. |
This is really cool - one suggestion, maybe use systemd's FD store? That should allow to avoid needing to manually handle things, and a restart should "just" work? |
Thanks for your suggestion, I'll look into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @teg already mentioned, we appreciate all efforts to implement re-execution of dbus-broker. I don't mind the approach taken by this PR, but I thinks it lacks crucial required elements.
In particular, I would suggest implementing re-execution of the launcher in a separate PR. This can be implemented without requiring dbus-broker to re-execute and can be merged separately.
For dbus-broker, I think this requires serializing the socket/connection object. This is a rather delicate matter but we should be able to verify the behavior in the unit-tests. I don't think your approach works, since it relies on flushing all queues and thus clients can block a re-execution by intentionally filling their receive-queue.
Anyway, thanks a lot for the PR! I think we can definitely take this as a base to move forward!
} while (!r); | ||
|
||
Peer *peeri; | ||
c_rbtree_for_each_entry(peeri, &broker->bus.peers.peer_tree, registry_node) { | ||
socket_dispatch_write(&peeri->connection.socket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not reliably flush the entire queue. There are several scenarios where this will not block and instead retain data in the user-space queue. In those cases, we would lose messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I toke it for granted that all messages we have gotten will be sent out by socket_dispatch_write
successfully. But if there are not many messages flying between dbus and peers, I think it's probobly fine.
By the way, do you think if we should support new client to establish connection during reexecuting? I haved checked systemd, systemctl start/stop
doesn't work when systemctl daemon-reexec
is running. Supporting that seems very complicated IMHO.
This add a new feature to dbus-broker. With this patch, we can reexecute dbus-broker by running
dbus-broker-launch --reexec
.The main idea behind this commit is:
Reexecute
message, it will serialize the main information (socket fd, id, uid, pid, match rule, request name, sasl) to the anonymous file,AddListener
to broker.AddListener
message, and it uses thesocket fd
saved in anonymous file to create new peers, and recover id/uid/pid/...Ready
signal, when thedbus-broker-launch --reexec
gets this signal, the reexecuting process is over.I'll split this commit to several small commits after your first review. Please take a look, thank a lot.