Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the blockstore #1694

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Refactor the blockstore #1694

wants to merge 3 commits into from

Conversation

Stebalien
Copy link
Member

First, remove the multihash "code" from the arguments and use a plain u64. This will let us compile the actors without importing hashing libraries.

fixes #1238
fixes #504

Second, replace the concrete Block type with a trait. We can then implement this trait for, e.g., fvm_ipld_encoding::IpldBlock.

Finally, make the blockstore object safe. This will let us work with the blockstore as &dyn Blockstore in most cases.

First, remove the multihash "code" from the arguments and use a plain
u64. This will let us compile the actors without importing hashing
libraries.

fixes #1238
fixes #504

Second, replace the concrete `Block` type with a trait. We can then
implement this trait for, e.g., `fvm_ipld_encoding::IpldBlock`.

Finally, make the blockstore object safe. This will let us work with the
blockstore as `&dyn Blockstore` in most cases.
@Stebalien Stebalien force-pushed the feat/blockstore-refactor branch from 5bed4e0 to c74826b Compare March 3, 2023 23:43
self.put_keyed(&k, block.as_ref())?;
Ok(k)
}
fn put(&self, mh_code: u64, block: &dyn Block) -> Result<Cid>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm using a trait for Block to allow abstracting over blocks that own their data, and blocks that borrow their data.
  • After implementing this, I think we need to replace mh_code with some form of MultihashType { size: u32, code: u64 }.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could simply remove the multihash-code parameter and say that the blockstore gets to decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think removing it is right at this base level. Currying it in to a wrapper store could make sense though.

@Stebalien Stebalien requested a review from anorth March 3, 2023 23:45
@Stebalien
Copy link
Member Author

@anorth can I get feedback on the blockstore trait (and the block trait)? You can ignore everything else.

Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there's two different interfaces for storing blocks to express here:

  • The core one of (CID, [u8])
  • A convenience method in terms of (MHCode, Codec, [u8]) that computes the CID

In both cases the codec is assumed to match the data format. In the first one also the hash is assumed to have been computed with the MH code in the CID. The point being that the blockstore isn't checking anything.

Given that, it doesn't make much sense to me why in put() the MH code is separate from the Block, but the codec isn't. A more convenient method might compute both the serialisation and the CID, from appropriate parameters. But this probably would be less good than the pattern of wrapping, like CborStore.

So forgive me if I've overstepped the scope here, but I don't really get why the core Blockstore trait and implementations should do hashing and CID construction. Instead they'd be simpler as a pure (CID, data) store abstraction, and then wrappers can curry in either or both the hashing function and serialisation method, no neither need to be passed around. How bad would it be to remove (deprecate) put? I think there are few call sites, since most code uses a wrapper like CborStore.

[Unfortunately I can see one piece of friction in that most actor code declares to use a Blockstore but in fact uses CborStore via impl<T: Blockstore> CborStore for T {}, so we'd probably need to (a) restrict all those declarations and (b) make CborStore wrap rather than extend BlockStore.]

An alternative intent might be that the blockstore should be responsible for the correctness of hashes corresponding to the data (even if it can't verify encoding), but in that case put_keyed() should not exist, and the API look more like the FVM storage syscalls.

In that latter case, or if my earlier thoughts are off the rails, then yes I think these traits look about right.

self.put_keyed(&k, block.as_ref())?;
Ok(k)
}
fn put(&self, mh_code: u64, block: &dyn Block) -> Result<Cid>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think removing it is right at this base level. Currying it in to a wrapper store could make sense though.

@Stebalien
Copy link
Member Author

So forgive me if I've overstepped the scope here, but I don't really get why the core Blockstore trait and implementations should do hashing and CID construction. Instead they'd be simpler as a pure (CID, data) store abstraction, and then wrappers can curry in either or both the hashing function and serialisation method, no neither need to be passed around. How bad would it be to remove (deprecate) put? I think there are few call sites, since most code uses a wrapper like CborStore.

The issue is that I'm to support two distinct use-cases:

  • FVM -> Client: The FVM will already know the CID so I need a way to "put" with a pre-computed CID (put_keyed).
  • Builtin actors -> FVM: For security, the FVM needs to compute the CID internally. I.e., it can't trust a CID passed by the user.

I can probably drop the FVM -> Client use-case here, honestly. I.e., get rid of put_keyed, leaving only put.

@anorth
Copy link
Member

anorth commented Mar 27, 2023

The built-in-actors test VM is another example with the two use cases: untrusted use from within the actors, but trusted use from the surrounding test infrastructure code. However, we can probably support these with two different traits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants