Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for actor events. #1049

Merged
merged 31 commits into from
Nov 15, 2022
Merged

add support for actor events. #1049

merged 31 commits into from
Nov 15, 2022

Conversation

raulk
Copy link
Member

@raulk raulk commented Nov 6, 2022

Reference implementation of filecoin-project/FIPs#483.
Part of #1048.

TODO

raulk added 9 commits November 5, 2022 14:47
Differentiated between ActorEvent (the event as emitted by the actor) and
StampedEvent (the event that goes on chain, after it has been stamped with
extra information, such as the emitter actor).
TODO:
- Implement gas calculation.
- Validate that values in event entries are valid DAG-CBOR.
TODO: make the CID optional.
If any actor events were emitted during execution, this field will contain the CID of the
root of the AMT holding the StampedEvents. Otherwise, this will be None (serializing to a
CBOR NULL value on the wire).
@codecov-commenter
Copy link

codecov-commenter commented Nov 6, 2022

Codecov Report

Merging #1049 (02e7944) into master (e1a44a4) will decrease coverage by 0.74%.
The diff coverage is 0.61%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1049      +/-   ##
==========================================
- Coverage   51.69%   50.95%   -0.75%     
==========================================
  Files         125      128       +3     
  Lines       10914    11073     +159     
==========================================
  Hits         5642     5642              
- Misses       5272     5431     +159     
Impacted Files Coverage Δ
fvm/src/call_manager/default.rs 0.00% <0.00%> (ø)
fvm/src/call_manager/mod.rs 20.00% <ø> (ø)
fvm/src/executor/default.rs 0.90% <0.00%> (-0.06%) ⬇️
fvm/src/executor/mod.rs 0.00% <0.00%> (ø)
fvm/src/kernel/default.rs 14.95% <0.00%> (-0.12%) ⬇️
fvm/src/machine/boxed.rs 0.00% <0.00%> (ø)
fvm/src/machine/default.rs 28.12% <0.00%> (-4.49%) ⬇️
fvm/src/machine/mod.rs 49.29% <ø> (ø)
fvm/src/syscalls/event.rs 0.00% <0.00%> (ø)
fvm/src/syscalls/mod.rs 0.00% <0.00%> (ø)
... and 13 more

Comment on lines +277 to +278
// TODO this can be zero-copy if the AMT supports a batch set operation that takes an
// iterator of references and flushes the batch at the end.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stebalien any quick ideas here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! But we have one impediment to using that API as-is. Would love you hear your thoughts here: #1083 (comment).

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, but i want steb review here.

Copy link
Contributor

@mriise mriise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (just nits)

As for doing zero copy things for AMT, it would likely require introducing a lifetime to Nodes (leading to the classic rust lifetime parameter creep).

We could do type shenanigans but that would be a larger project probably...

fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
fvm/src/gas/price_list.rs Show resolved Hide resolved
fvm/src/gas/price_list.rs Show resolved Hide resolved
fvm/src/machine/boxed.rs Show resolved Hide resolved
shared/src/version/mod.rs Show resolved Hide resolved
testing/integration/tests/events_test.rs Outdated Show resolved Hide resolved
@raulk raulk marked this pull request as ready for review November 14, 2022 23:31
fvm/src/executor/default.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
Some(Block::new(DAG_CBOR, msg.params.bytes()))
};
let (res, gas_used, mut backtrace, exec_trace, events_root, events) =
self.map_machine(|machine| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've reached the point where we may want to just define a struct (we can define one inside this function). Even if we destructure immediately, it'll get us nice named fields and prevent potential mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +12 to +13
let evt: ActorEvent = context.memory.read_cbor(event_off, event_len)?;
context.kernel.emit_event(evt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is where you fell into the validation hole. Yeah, it would be nice to at least charge something before parsing. Especially because the values might be large.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, um. I'm going to make you hate me. We should do this with a gas-charging serde visitor. But we can do that in a followup patch (I'm "happy" to write it).

Basically, we can charge gas as we deserialize and stop immediately if/when we run out without ever doing any work we haven't charged for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we can probably extend this concept to other read_cbor cases)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have these gas fees in place here:

  1. Syscall fee, which we had discussed to extend with dynamic prices based on parameter lengths.
  2. Event emission -- flat rate per event.
  3. Event emission -- per byte cost of serialized event. This is a storage cost.
  4. Event emission -- per indexed field.

In a perfect world, we would charge for the cost of serde by metering it -- we have discussed about instrumenting codec logic and charging gas for it, but my impression is that that's a tomorrow thing.

In absence of the above, what would the pricing policy of this gas-charging serde visitor be? Would we charge for anything other than bytes consumed? Can't we include the projected cost for that in (1) or (3)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the current version is that we charge after processing the event. If we use a serde visitor, se'd be able to charge for 2-4 while parsing (and stop parsing early if we run out of gas).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the syscall fee with parameter length costs, charged at the time of the syscall, cover the concern here?

What I'm trying to figure out is what exactly are we charging for with this gas-charging serde visitor, if we are not charging for compute? Aren't we just charging for bytes (which can be covered in a simpler manner through (1))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would be charging for compute.

To be clear, I'm not suggesting that we change the charging model. We're already charging for all the operations in 2-4. However, a user can get us to do a lot of free work before we get around to charging because we have to first parse the structure entirely before we can determine how much it should cost.

I'm just trying to find a way to charge as we parse, so we can abort early.

But this isn't even consensus critical, so we can punt till later.


/// Commits the events to the machine by building the events AMT, and making sure that events
/// are written to the store.
fn commit_events(&self, events: &[StampedEvent]) -> Result<Option<Cid>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this doesn't touch the machine state, so I'd consider just defining it as a helper function that takes the machine as an input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future it will, though. We are planning to fold events into a block-level index structure at some point. My understanding is that the logic to populate that data structure would go here, and be returned when finalizing the Machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I assume that logic would go in the client. But I could see it going here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(basically, the FVM usually deals with the state-tree only)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that we don't return events to the client in ApplyRet (which we're doing now, but because to support transient call cases, and not sure if it needs to be there, might remove), doing it there would involve the extra work of re-fetching the events from the store just to populate the structure.

FWIW, I think forming the receipts HAMT should also be done in the Machine, with us returning the receipt_root to the client. IMO we whould trend towards consolidating all consensus-relevant execution logic inside the FVM. Right now there is still a gray area, but I'm strongly advocating for that future event indexing logic to live here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly against it. In general, the FVM should just be the VM. It manages the state-tree, applies messages, and returns receipts. It doesn't care about the receipt HAMT, tipsets, the message lists, etc.

  1. Different chains (e.g., in IPC) may want to use the FVM but may also have different chain structures and indexing requirements. E.g., I expect some chains will use cryptographic accumulators, vector commitments, etc.
  2. Filecoin clients will likely want to index events in some form of database anyways, so it's not like they won't have to process them.

@@ -51,6 +52,8 @@ pub struct InnerDefaultCallManager<M: Machine> {
invocation_count: u64,
/// Limits on memory throughout the execution.
limits: M::Limiter,
/// Accumulator for events emitted in this call stack.
events: Vec<StampedEvent>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to handle reverts/aborts somehow.

Copy link
Member Author

@raulk raulk Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is unfortunately a bit of a can of worms.

  1. Filecoin does not revert side effects on non-zero exit codes (i.e. you can change your state root and abort with a non-zero exit code, and the protocol doesn’t revert your state change).
  2. Events are trickier though, because they are fire-and-forget. Once an event is emitted, you can't take a compensating action to un-emit it.
    • By contrast, you can always revert a state root update. For example, you track your previous state root, update your state root, detect a failure scenario, and revert back to your original state root, before returning a non-zero exit code.
  3. Ethereum does revert side effects (including storage updates) on REVERT. We currently enforce the rollback of storage mutations by using a transaction block in the EVM runtime (i.e. entirely actor-side), but this begs to introduce protocol support for rollback on failure, which would create a new precedent.

I can think of two ways to handle this:

  1. Layer events in the call manager and the last layer on a non-zero exit. This would prevent us from intentionally emitting events on failure (e.g. I could reasonably conceive a "window post failed" event).
  2. Enable actors to throw away events emitted through their execution. The EVM runtime would use this feature, and we could make the transactional Runtime discard events by default on failure. Future native actors (or built-in actors) could choose retain events on failure.

I think (2) gives us the most flexibility.

Ultimately I propose we solve this in Sapphire (and I can own it myself). We can ship Topaz with this caveat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Filecoin totally reverts everything on non-zero exit codes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^What Steb said -- we do so in both the legacy VM and the FVM.

I think that makes most (all?) of the problems you're describing go away.

Copy link
Member Author

@raulk raulk Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in doubt and it sounded strange that it didn't, but I wasn't able to find where it was happening. It happens through the layering in the state tree:

fn with_transaction(

fvm/tests/dummy.rs Show resolved Hide resolved
/// The key of this event.
pub key: String,
/// Any DAG-CBOR encodeable type.
pub value: RawBytes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this were an IgnoreAny (or Ipld if we wanted to actually keep it). By using RawBytes, we've "hidden" any internal CIDs from the codec.

But I don't feel too strongly about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess I feel somewhat strongly about this. I'd say that the value should either by "arbitrary bytes", or an actual CBOR value. Having cbor as a byte-string within another cbor-value is icky.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the concern. But in practice I'm not convinced it changes much. What's important is that the validation logic is in place somewhere. This being represented as an Ipld::Ipld here makes things harder for actor code (because Entry is a shared type). An IgnoredAny would simply not work for the same reason. We would need to bifurcate this type into the FVM-side (which could use IgnoredAny or Ipld::Ipld) and the SDK-side type. And I'm not sure what the right field type would be for the latter if not RawBytes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, these behave completely differently.

  • Ipld and IgnoreAny means the data would actually be cbor.
  • RawBytes means that the data would be a byte array (that may be cbor if we validate it).

The important thing is that any links in that raw data wouldn't be resolvable. I.e., any IPLD tooling would treat it as "just bytes".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being represented as an Ipld::Ipld here makes things harder for actor code (because Entry is a shared type). An IgnoredAny would simply not work for the same reason.

The "right" solution is likely to make the Event generic over this type. Then we can use whatever type we want for the value, depending on the context.

  • When validating, we can use IgnoredAny (once we fix it to actually validate fully, but we don't really care about validation right now anyways).
  • The most "general" version is Ipld, it just gets expensive because it allocates a lot.
  • We should implement a RawValue equivalent for IPLD: https://docs.rs/serde_json/latest/serde_json/value/struct.RawValue.html. We just don't have that right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ipld and IgnoreAny means the data would actually be cbor.

It just changes the timing at which you verify. Using Ipld or IgnoreAny would give us that guarantee during deserialization. Validating explicitly gives us that guarantee later. But in all cases you would walk away guaranteeing that the input is DAG-CBOR.

The important thing is that any links in that raw data wouldn't be resolvable. I.e., any IPLD tooling would treat it as "just bytes".

How much of a problem is that?

Inside the FVM, we never expect to resolve these links (for all we know, they could be pointing at some NFT stored in the public IPFS network). I'd argue that we have no interest in processing Entry values at all, beyond validating that they're correct DAG-CBOR. And the reason we enforce DAG-CBOR is to give observers (which will be vary varied) some guarantees about the structure of the data. Otherwise, they will have zero ability to reason about the events they observe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of a problem is that?

It breaks tooling:

  • If the user linked to data, pinning a receipt wouldn't pin any data linked-to by an event.
  • Selectors wouldn't be able to see inside events.

I'd argue that we have no interest in processing Entry values at all, beyond validating that they're correct DAG-CBOR.

Either we care or we don't.

  • If we care, it should be an embedded CBOR object. E.g., if we ever want to maybe consume events on-chain.
  • If we don't, we can say that the user should use CBOR, but I see no reason to spend extra time validating (because it doesn't actually matter to us).

Given this discussion, I'm leaning towards the second.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this discussion, I'm leaning towards the second.

Which is basically what we have, just with an extra "we're never going to validate this" guarantee.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, how about we table this discussion until I get around to updating the FIP? Then we can discuss with a larger audience there.

In short, I still think it's beneficial to enforce DAG-CBOR as these events are meant for external observability, and not having a consistent data format hinders such observability. I do not think we should reason at all about these payloads. No link discovery, no reachability analysis, nothing. Validating structural correctness is where I'd draw the line in terms of FVM responsibility.

If we don't reason about these, I'd argue it's not a requirement to use types that are transparent to IPLD tooling (e.g. Ipld::Ipld, I don't know about IgnoredAny) in the implementation. We may opt to do so for technical reasons, though, not arguing that. But bearing in mind that that would require more type sophistication here (I can sense your eyes light up 😆).

Comment on lines +277 to +278
// TODO this can be zero-copy if the AMT supports a batch set operation that takes an
// iterator of references and flushes the batch at the end.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits, but everything else can be done in followups. We might as well merge.

fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
fvm/src/call_manager/default.rs Outdated Show resolved Hide resolved
Comment on lines +119 to +121
let _ =
sdk::send::send(&our_addr, EMIT_SUBCALLS_REVERT, params.into(), Zero::zero())
.ok();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the ok() (but I'm not sure why you're not just calling unwrap to assert that this worked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Result here is deliberately unused, and clippy was complaining. One of these sends is going to fail (see below), and ignoring that failure is part of the test scenario. I can document it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clippy, WHY!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants