-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for actor events. #1049
Conversation
Differentiated between ActorEvent (the event as emitted by the actor) and StampedEvent (the event that goes on chain, after it has been stamped with extra information, such as the emitter actor).
TODO: - Implement gas calculation. - Validate that values in event entries are valid DAG-CBOR.
TODO: make the CID optional.
If any actor events were emitted during execution, this field will contain the CID of the root of the AMT holding the StampedEvents. Otherwise, this will be None (serializing to a CBOR NULL value on the wire).
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #1049 +/- ##
==========================================
- Coverage 51.69% 50.95% -0.75%
==========================================
Files 125 128 +3
Lines 10914 11073 +159
==========================================
Hits 5642 5642
- Misses 5272 5431 +159
|
// TODO this can be zero-copy if the AMT supports a batch set operation that takes an | ||
// iterator of references and flushes the batch at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien any quick ideas here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks! But we have one impediment to using that API as-is. Would love you hear your thoughts here: #1083 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok, but i want steb review here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (just nits)
As for doing zero copy things for AMT, it would likely require introducing a lifetime to Nodes (leading to the classic rust lifetime parameter creep).
We could do type shenanigans but that would be a larger project probably...
fvm/src/executor/default.rs
Outdated
Some(Block::new(DAG_CBOR, msg.params.bytes())) | ||
}; | ||
let (res, gas_used, mut backtrace, exec_trace, events_root, events) = | ||
self.map_machine(|machine| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've reached the point where we may want to just define a struct (we can define one inside this function). Even if we destructure immediately, it'll get us nice named fields and prevent potential mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can do that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let evt: ActorEvent = context.memory.read_cbor(event_off, event_len)?; | ||
context.kernel.emit_event(evt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is where you fell into the validation hole. Yeah, it would be nice to at least charge something before parsing. Especially because the values might be large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, um. I'm going to make you hate me. We should do this with a gas-charging serde visitor. But we can do that in a followup patch (I'm "happy" to write it).
Basically, we can charge gas as we deserialize and stop immediately if/when we run out without ever doing any work we haven't charged for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we can probably extend this concept to other read_cbor
cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have these gas fees in place here:
- Syscall fee, which we had discussed to extend with dynamic prices based on parameter lengths.
- Event emission -- flat rate per event.
- Event emission -- per byte cost of serialized event. This is a storage cost.
- Event emission -- per indexed field.
In a perfect world, we would charge for the cost of serde by metering it -- we have discussed about instrumenting codec logic and charging gas for it, but my impression is that that's a tomorrow thing.
In absence of the above, what would the pricing policy of this gas-charging serde visitor be? Would we charge for anything other than bytes consumed? Can't we include the projected cost for that in (1) or (3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the current version is that we charge after processing the event. If we use a serde visitor, se'd be able to charge for 2-4 while parsing (and stop parsing early if we run out of gas).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the syscall fee with parameter length costs, charged at the time of the syscall, cover the concern here?
What I'm trying to figure out is what exactly are we charging for with this gas-charging serde visitor, if we are not charging for compute? Aren't we just charging for bytes (which can be covered in a simpler manner through (1))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would be charging for compute.
To be clear, I'm not suggesting that we change the charging model. We're already charging for all the operations in 2-4. However, a user can get us to do a lot of free work before we get around to charging because we have to first parse the structure entirely before we can determine how much it should cost.
I'm just trying to find a way to charge as we parse, so we can abort early.
But this isn't even consensus critical, so we can punt till later.
|
||
/// Commits the events to the machine by building the events AMT, and making sure that events | ||
/// are written to the store. | ||
fn commit_events(&self, events: &[StampedEvent]) -> Result<Option<Cid>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this doesn't touch the machine state, so I'd consider just defining it as a helper function that takes the machine as an input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future it will, though. We are planning to fold events into a block-level index structure at some point. My understanding is that the logic to populate that data structure would go here, and be returned when finalizing the Machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I assume that logic would go in the client. But I could see it going here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(basically, the FVM usually deals with the state-tree only)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that we don't return events to the client in ApplyRet (which we're doing now, but because to support transient call cases, and not sure if it needs to be there, might remove), doing it there would involve the extra work of re-fetching the events from the store just to populate the structure.
FWIW, I think forming the receipts HAMT should also be done in the Machine, with us returning the receipt_root to the client. IMO we whould trend towards consolidating all consensus-relevant execution logic inside the FVM. Right now there is still a gray area, but I'm strongly advocating for that future event indexing logic to live here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly against it. In general, the FVM should just be the VM. It manages the state-tree, applies messages, and returns receipts. It doesn't care about the receipt HAMT, tipsets, the message lists, etc.
- Different chains (e.g., in IPC) may want to use the FVM but may also have different chain structures and indexing requirements. E.g., I expect some chains will use cryptographic accumulators, vector commitments, etc.
- Filecoin clients will likely want to index events in some form of database anyways, so it's not like they won't have to process them.
fvm/src/call_manager/default.rs
Outdated
@@ -51,6 +52,8 @@ pub struct InnerDefaultCallManager<M: Machine> { | |||
invocation_count: u64, | |||
/// Limits on memory throughout the execution. | |||
limits: M::Limiter, | |||
/// Accumulator for events emitted in this call stack. | |||
events: Vec<StampedEvent>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle reverts/aborts somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is unfortunately a bit of a can of worms.
- Filecoin does not revert side effects on non-zero exit codes (i.e. you can change your state root and abort with a non-zero exit code, and the protocol doesn’t revert your state change).
- Events are trickier though, because they are fire-and-forget. Once an event is emitted, you can't take a compensating action to un-emit it.
- By contrast, you can always revert a state root update. For example, you track your previous state root, update your state root, detect a failure scenario, and revert back to your original state root, before returning a non-zero exit code.
- Ethereum does revert side effects (including storage updates) on REVERT. We currently enforce the rollback of storage mutations by using a transaction block in the EVM runtime (i.e. entirely actor-side), but this begs to introduce protocol support for rollback on failure, which would create a new precedent.
I can think of two ways to handle this:
- Layer events in the call manager and the last layer on a non-zero exit. This would prevent us from intentionally emitting events on failure (e.g. I could reasonably conceive a "window post failed" event).
- Enable actors to throw away events emitted through their execution. The EVM runtime would use this feature, and we could make the transactional Runtime discard events by default on failure. Future native actors (or built-in actors) could choose retain events on failure.
I think (2) gives us the most flexibility.
Ultimately I propose we solve this in Sapphire (and I can own it myself). We can ship Topaz with this caveat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Filecoin totally reverts everything on non-zero exit codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^What Steb said -- we do so in both the legacy VM and the FVM.
I think that makes most (all?) of the problems you're describing go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in doubt and it sounded strange that it didn't, but I wasn't able to find where it was happening. It happens through the layering in the state tree:
ref-fvm/fvm/src/call_manager/default.rs
Line 160 in 0801fd8
fn with_transaction( |
/// The key of this event. | ||
pub key: String, | ||
/// Any DAG-CBOR encodeable type. | ||
pub value: RawBytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this were an IgnoreAny
(or Ipld
if we wanted to actually keep it). By using RawBytes
, we've "hidden" any internal CIDs from the codec.
But I don't feel too strongly about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess I feel somewhat strongly about this. I'd say that the value should either by "arbitrary bytes", or an actual CBOR value. Having cbor as a byte-string within another cbor-value is icky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the concern. But in practice I'm not convinced it changes much. What's important is that the validation logic is in place somewhere. This being represented as an Ipld::Ipld
here makes things harder for actor code (because Entry
is a shared type). An IgnoredAny
would simply not work for the same reason. We would need to bifurcate this type into the FVM-side (which could use IgnoredAny
or Ipld::Ipld
) and the SDK-side type. And I'm not sure what the right field type would be for the latter if not RawBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, these behave completely differently.
Ipld
andIgnoreAny
means the data would actually be cbor.RawBytes
means that the data would be a byte array (that may be cbor if we validate it).
The important thing is that any links in that raw data wouldn't be resolvable. I.e., any IPLD tooling would treat it as "just bytes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This being represented as an Ipld::Ipld here makes things harder for actor code (because Entry is a shared type). An IgnoredAny would simply not work for the same reason.
The "right" solution is likely to make the Event
generic over this type. Then we can use whatever type we want for the value, depending on the context.
- When validating, we can use
IgnoredAny
(once we fix it to actually validate fully, but we don't really care about validation right now anyways). - The most "general" version is
Ipld
, it just gets expensive because it allocates a lot. - We should implement a
RawValue
equivalent for IPLD: https://docs.rs/serde_json/latest/serde_json/value/struct.RawValue.html. We just don't have that right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ipld and IgnoreAny means the data would actually be cbor.
It just changes the timing at which you verify. Using Ipld
or IgnoreAny
would give us that guarantee during deserialization. Validating explicitly gives us that guarantee later. But in all cases you would walk away guaranteeing that the input is DAG-CBOR.
The important thing is that any links in that raw data wouldn't be resolvable. I.e., any IPLD tooling would treat it as "just bytes".
How much of a problem is that?
Inside the FVM, we never expect to resolve these links (for all we know, they could be pointing at some NFT stored in the public IPFS network). I'd argue that we have no interest in processing Entry values at all, beyond validating that they're correct DAG-CBOR. And the reason we enforce DAG-CBOR is to give observers (which will be vary varied) some guarantees about the structure of the data. Otherwise, they will have zero ability to reason about the events they observe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of a problem is that?
It breaks tooling:
- If the user linked to data, pinning a receipt wouldn't pin any data linked-to by an event.
- Selectors wouldn't be able to see inside events.
I'd argue that we have no interest in processing Entry values at all, beyond validating that they're correct DAG-CBOR.
Either we care or we don't.
- If we care, it should be an embedded CBOR object. E.g., if we ever want to maybe consume events on-chain.
- If we don't, we can say that the user should use CBOR, but I see no reason to spend extra time validating (because it doesn't actually matter to us).
Given this discussion, I'm leaning towards the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this discussion, I'm leaning towards the second.
Which is basically what we have, just with an extra "we're never going to validate this" guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, how about we table this discussion until I get around to updating the FIP? Then we can discuss with a larger audience there.
In short, I still think it's beneficial to enforce DAG-CBOR as these events are meant for external observability, and not having a consistent data format hinders such observability. I do not think we should reason at all about these payloads. No link discovery, no reachability analysis, nothing. Validating structural correctness is where I'd draw the line in terms of FVM responsibility.
If we don't reason about these, I'd argue it's not a requirement to use types that are transparent to IPLD tooling (e.g. Ipld::Ipld
, I don't know about IgnoredAny
) in the implementation. We may opt to do so for technical reasons, though, not arguing that. But bearing in mind that that would require more type sophistication here (I can sense your eyes light up 😆).
// TODO this can be zero-copy if the AMT supports a batch set operation that takes an | ||
// iterator of references and flushes the batch at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two nits, but everything else can be done in followups. We might as well merge.
let _ = | ||
sdk::send::send(&our_addr, EMIT_SUBCALLS_REVERT, params.into(), Zero::zero()) | ||
.ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the ok()
(but I'm not sure why you're not just calling unwrap
to assert that this worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Result here is deliberately unused, and clippy was complaining. One of these sends is going to fail (see below), and ignoring that failure is part of the test scenario. I can document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why clippy, WHY!!!
Reference implementation of filecoin-project/FIPs#483.
Part of #1048.
TODO
Validate that values in event entries are valid DAG-CBOR.Macro for emitting events.