-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Supervisor as an L1 Consistency Source #136
base: main
Are you sure you want to change the base?
Conversation
```mermaid | ||
flowchart TD | ||
L1([L1]) -->|Batches| Supervisor | ||
Supervisor -->|L1 Content| ChainA[Chain A Node] | ||
Supervisor -->|L1 Content| ChainB[Chain B Node] | ||
ChainA -->|Log Events| Supervisor | ||
ChainB -->|Log Events| Supervisor | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To note, L1 data does not create the motivation for new Log Entries to flow into the Supervisor like this diagram says. We take new log entries during Unsafe updates, and then simply advance pointers during Safe updates.
However, maybe it should? We could hold two separate databases, one for Unsafe events and one for Safe events. the Unsafe DB can be trimmed aggressively, and the Safe DB would be the archival version. It would maybe make reorg handling nicer because we exactly what to roll back or purge? It's twice as much RPC communication between nodes and supervisor, so in some ways it seems needless.
How does the supervisor handle an op-node that drops out of sync (e.g it's execution client db is wiped so it starts from genesis again)? If it depended on feeding the L1 data consistently across all op-node instances, that will cause it to now be out of sync. The talk of "super node" and approaches where op-node becomes more of a slave to op-supervisor makes me wonder if we wouldn't just be better adding multi tenancy to op-node so it can be the consensus client for multiple chains in a single instance (and just ditch op-supervisor). The execution engines would still be separate so could still have the issue of suddenly being wiped (or just having a new node added to the dependency set) but maybe you could just leverage execution layer syncing to let it catch back up? Most of the work in op-node is scanning through L1 so it would be a lot more lightweight if we did that once and extracted the data for all chains at once, and you'd know that they all have a consistent view of the L1 chain. With a direct connection to the execution client, you could query it for existence of logs, and you have perfect knowledge of reorgs across all chains so know what can be cached and when things need to be rechecked. And you could actually just shove it into op-program and fault proofs would "work" for maybe 2-3 chains before we need to split the problem up. It is a pretty extreme change though... |
Great point, this is indeed a gap. However, this is also a gap in the current architecture -- a self-managed node may lose all its state, and the Supervisor today has no knowledge of that happening. In fact, if we don't program defensively against it, it could even be worse, as when the Node starts traversing early blocks, it may appear as a reorg to the Supervisor, causing an inappropriate purge of its own data.
💯, I view this solution as analogous to the one being suggested here, but by more tightly coupling things the control/authority of the system becomes even easier to reason about.
Great point
!Great Point!
It might be extreme, but IMO it's unavoidable that the responsibilities of the Supervisor start including consistency. Whether that's happening in a discreet Supervisor component, or its embedded into the Node, the concerns and properties are the same. We already know we want a Safety Index for the OP Node, and the Supervisor already has functionality for such a thing. Bringing them together makes sense. "What is a Supernode but a miserable pile of log events?" |
@ajsutton - @sebastianst points out that partitioning the services like this (having Supervisor be its own binary) allows for composability with other L2 client implementations, rather than being bound to op-node. Arguably, it would be responsibility of protocol-adhering clients to implement a supervisor. |
Yeah I think the downside of a separate binary is that alt clients won't implement it and we won't get client diversity because everything hinges on op-supervisor. And then op-supervisor design and API will impact the design decisions of clients, potentially making them all look similar. I'm still not sure I'd actually advocate for the multi-tenant op-node approach I suggested though. It seems like a pretty extreme pivot at a time where we are focussed on shipping something. Even if we later decide it is the way to go for some reason, I suspect we're better off shipping as-is first anyway and we wouldn't waste that much more effort than if we pivoted now, but we would learn a heck of a lot that could lead to a better design if we pivoted later. But I do think it would be a fun little hackathon project to try and make op-program multi tenant.... |
No description provided.