Draft: Supervisor as an L1 Consistency Source #136

axelKingsley · 2024-10-16T18:12:13Z

No description provided.

axelKingsley · 2024-10-16T18:22:29Z

protocol/supervisor-as-l1-source.md

+```mermaid
+flowchart TD
+    L1([L1]) -->|Batches| Supervisor
+    Supervisor -->|L1 Content| ChainA[Chain A Node]
+    Supervisor -->|L1 Content| ChainB[Chain B Node]
+    ChainA -->|Log Events| Supervisor
+    ChainB -->|Log Events| Supervisor
+```


To note, L1 data does not create the motivation for new Log Entries to flow into the Supervisor like this diagram says. We take new log entries during Unsafe updates, and then simply advance pointers during Safe updates.

However, maybe it should? We could hold two separate databases, one for Unsafe events and one for Safe events. the Unsafe DB can be trimmed aggressively, and the Safe DB would be the archival version. It would maybe make reorg handling nicer because we exactly what to roll back or purge? It's twice as much RPC communication between nodes and supervisor, so in some ways it seems needless.

ajsutton · 2024-10-17T03:34:48Z

How does the supervisor handle an op-node that drops out of sync (e.g it's execution client db is wiped so it starts from genesis again)? If it depended on feeding the L1 data consistently across all op-node instances, that will cause it to now be out of sync.

The talk of "super node" and approaches where op-node becomes more of a slave to op-supervisor makes me wonder if we wouldn't just be better adding multi tenancy to op-node so it can be the consensus client for multiple chains in a single instance (and just ditch op-supervisor). The execution engines would still be separate so could still have the issue of suddenly being wiped (or just having a new node added to the dependency set) but maybe you could just leverage execution layer syncing to let it catch back up? Most of the work in op-node is scanning through L1 so it would be a lot more lightweight if we did that once and extracted the data for all chains at once, and you'd know that they all have a consistent view of the L1 chain.

With a direct connection to the execution client, you could query it for existence of logs, and you have perfect knowledge of reorgs across all chains so know what can be cached and when things need to be rechecked.

And you could actually just shove it into op-program and fault proofs would "work" for maybe 2-3 chains before we need to split the problem up.

It is a pretty extreme change though...

axelKingsley · 2024-10-17T18:24:35Z

How does the supervisor handle an op-node that drops out of sync (e.g it's execution client db is wiped so it starts from genesis again)? If it depended on feeding the L1 data consistently across all op-node instances, that will cause it to now be out of sync.

Great point, this is indeed a gap. However, this is also a gap in the current architecture -- a self-managed node may lose all its state, and the Supervisor today has no knowledge of that happening. In fact, if we don't program defensively against it, it could even be worse, as when the Node starts traversing early blocks, it may appear as a reorg to the Supervisor, causing an inappropriate purge of its own data.

The talk of "super node" and approaches where op-node becomes more of a slave to op-supervisor makes me wonder if we wouldn't just be better adding multi tenancy to op-node so it can be the consensus client for multiple chains in a single instance (and just ditch op-supervisor). The execution engines would still be separate so could still have the issue of suddenly being wiped (or just having a new node added to the dependency set) but maybe you could just leverage execution layer syncing to let it catch back up? Most of the work in op-node is scanning through L1 so it would be a lot more lightweight if we did that once and extracted the data for all chains at once, and you'd know that they all have a consistent view of the L1 chain.

💯, I view this solution as analogous to the one being suggested here, but by more tightly coupling things the control/authority of the system becomes even easier to reason about.

With a direct connection to the execution client, you could query it for existence of logs, and you have perfect knowledge of reorgs across all chains so know what can be cached and when things need to be rechecked.

Great point

And you could actually just shove it into op-program and fault proofs would "work" for maybe 2-3 chains before we need to split the problem up.

!Great Point!

It is a pretty extreme change though...

It might be extreme, but IMO it's unavoidable that the responsibilities of the Supervisor start including consistency. Whether that's happening in a discreet Supervisor component, or its embedded into the Node, the concerns and properties are the same.

We already know we want a Safety Index for the OP Node, and the Supervisor already has functionality for such a thing. Bringing them together makes sense.

"What is a Supernode but a miserable pile of log events?"

axelKingsley · 2024-10-28T21:54:37Z

@ajsutton - @sebastianst points out that partitioning the services like this (having Supervisor be its own binary) allows for composability with other L2 client implementations, rather than being bound to op-node. Arguably, it would be responsibility of protocol-adhering clients to implement a supervisor.

ajsutton · 2024-10-29T00:39:17Z

Yeah I think the downside of a separate binary is that alt clients won't implement it and we won't get client diversity because everything hinges on op-supervisor. And then op-supervisor design and API will impact the design decisions of clients, potentially making them all look similar.

I'm still not sure I'd actually advocate for the multi-tenant op-node approach I suggested though. It seems like a pretty extreme pivot at a time where we are focussed on shipping something. Even if we later decide it is the way to go for some reason, I suspect we're better off shipping as-is first anyway and we wouldn't waste that much more effort than if we pivoted now, but we would learn a heck of a lot that could lead to a better design if we pivoted later.

But I do think it would be a fun little hackathon project to try and make op-program multi tenant....

Supervisor as an L1 Consistency Source

adb87c3

axelKingsley commented Oct 16, 2024

View reviewed changes

This was referenced Nov 13, 2024

Interop: make op-node play back old data to sync op-supervisor ethereum-optimism/optimism#12784

Closed

interop: Reset Derivation and Backfill Supervisor when Too Far Behind ethereum-optimism/optimism#12919

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Supervisor as an L1 Consistency Source #136

Draft: Supervisor as an L1 Consistency Source #136

axelKingsley commented Oct 16, 2024

axelKingsley Oct 16, 2024

ajsutton commented Oct 17, 2024

axelKingsley commented Oct 17, 2024

axelKingsley commented Oct 28, 2024

ajsutton commented Oct 29, 2024

Draft: Supervisor as an L1 Consistency Source #136

Are you sure you want to change the base?

Draft: Supervisor as an L1 Consistency Source #136

Conversation

axelKingsley commented Oct 16, 2024

axelKingsley Oct 16, 2024

Choose a reason for hiding this comment

ajsutton commented Oct 17, 2024

axelKingsley commented Oct 17, 2024

"What is a Supernode but a miserable pile of log events?"

axelKingsley commented Oct 28, 2024

ajsutton commented Oct 29, 2024