Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Filter to TrieWalker #1598

Merged
merged 5 commits into from
Dec 9, 2024

Conversation

KolbyML
Copy link
Member

@KolbyML KolbyML commented Dec 5, 2024

What was wrong?

Walking a whole state trie is very time consuming

How was it fixed?

By making it so you can filter the trie to only walk a slice of it, this PR will be used in my followup PR, which adds a random walking state snapshot mode.

This mode will be useful for the random slice state snapshot bridge mode I will be adding

@KolbyML KolbyML requested review from morph-dev and removed request for morph-dev December 5, 2024 05:06
@KolbyML KolbyML self-assigned this Dec 5, 2024
@KolbyML KolbyML marked this pull request as draft December 5, 2024 05:06
@KolbyML KolbyML requested a review from morph-dev December 5, 2024 05:13
@KolbyML KolbyML force-pushed the add-random-snapshot-gossip branch from 84dba35 to 2568b79 Compare December 5, 2024 05:15
@KolbyML KolbyML marked this pull request as ready for review December 5, 2024 05:29
Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments. Nothing deal breaking but I would say they are nice improvements.

};
}

let slice_size = U256::MAX / U256::from(slice_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this logic is accurate. Let's see that happens if slice_count is power of 2 (most likely value to be used), e.g: 16.

Because U256::MAX = 0xffff..ff, then slice_size = 0x0fff..ff, leading to following:

  • first slice would be [ 0x0000..00 - 0x0fff..fe]
  • second slice would be [0x0ffff..ff - 0x1fff..fd]
  • third slice would be [0x1fff..fe - 0x2fff..fc]
  • ...
  • last slice would be [0xefff..f1 - 0xffff..ff]

This would make last slice bigger, than the rest, but that's not the main issue. The main issue is that slices break in unexpected places, and hexary trie is not split in 16 equal parts as expected.


To fix it, I think we should do: let slice_size = U256::MAX / U256::from(slice_count) + 1.

This works perfectly when slice_count is power of two.

If slice_count is not power of two, we know that some slices will be bigger than the rest, and trie will be broken in weird way anyway.

In your implementation, the last one would be bigger than the rest and in mine it would be smaller than the rest. And it both cases, the difference shouldn't be more than slice_count. So I see both of them as equally good solutions.

}

impl Filter {
pub fn new_random_filter(slice_count: u16) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would let the caller control which slice is being created. Meaning, I would change this function to:

pub fn new(slice_index: u16, slice_count: u16) -> Self {
    assert!(slice_index < slice_count, "...");
    ...
}

This way, we can also add more tests and make sure that slice splitting works correctly for non-trivial cases (and prevent the issue I explained in the other comment).

At least, we should have a function like this. And if you want to keep new_random_filter, then we can do something like:

pub fn random(slice_count) -> Self {
    Self::new(thread_rng().gen_range(0..slice_count), slice_count)
}

assert!(filter.is_included(&[0x1, 0x5, 0x5]));
assert!(!filter.is_included(&[0x1, 0x5, 0x4]));
assert!(filter.is_included(&[0x2]));
assert!(filter.is_included(&[0x3]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would add one more example:

assert!(!filter.is_included(&[0x3, 0x0, 0x1]));

}

fn partial_prefix(prefix: U256, shift_amount: usize) -> B256 {
prefix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to do one of the following:

  1. prefix | (U256::MAX << shift_amount) - I think this is the cleanest
    • I would go one step further and say that shift_amount is not clear what it means. Instead, I would use opposite value:
    •   let prefix_length = 4 * path.len();
        let partial_prefix_mask = !(U256::MAX >> prefix_length);
        let partial_start_prefix = self.start_prefix & partial_prefix_mask;
        let partial_end_prefix = self.end_prefix & partial_prefix_mask;
  2. (prefix >> shift_amount) << shift_amount - same idea as yours, but simpler

And in either of those cases, this is simple enough and I would inline it below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose 2 as it is a simpler to understand


#[derive(Debug, Clone)]
pub struct Filter {
start_prefix: U256,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why do we call this _prefix? Wouldn't start and end be sufficient and more correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

B256::from_slice(&compress_nibbles(key_path))
}

pub fn partial_nibble_path_to_right_padded_b256(partial_nibble_path: &[u8]) -> B256 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of converting them to Vec and then to B256.

How about:

pub fn nibbles_to_right_padded_b256(nibbles: &[u8]) -> B256 {
    let mut result = B256::ZERO;
    for (i, nibble) in nibbles.iter().enumerate() {
        if i % 2 == 0 {
            result[i / 2] |= nibble << 4;
        } else {
            result[i / 2] |= nibble;
        };
    }
    result
}

We also don't need compress_nibbles and we can modify full_nibble_path_to_address_hash to:

pub fn full_nibble_path_to_address_hash(key_path: &[u8]) -> B256 {
    if key_path.len() != 64 {
        panic!(
            "Key path should always be 64 bytes long: {}",
            key_path.len()
        );
    }

    nibbles_to_right_padded_b256(key_path)
}

@morph-dev
Copy link
Collaborator

Actually, there is one thing, and we can decide if it is a problem or not.

Let's say we have two Filters ( slice_count is 512)

  • FilterA { start: 0xab10..00, end: 0xab7f..ff }
  • FilterB { start: 0xab80..00, end: 0xabff..ff }

And let's say that trie node at path [0xa, 0xb] is a Leaf node, with prefix: [0xc, 0xd, ...] (making entire path for the data stored inside the leaf node: 0xabcd...).

Intuitively, one would expect that this trie node (since it's leaf) would be visited only by TrieWalker that has FilterB, and not be visited by TrieWalker with FilterA. But it would actually be visited by both of them.

I personally don't think this is an issue. But I wanted to raise awareness and get your opinion as well.

@KolbyML
Copy link
Member Author

KolbyML commented Dec 9, 2024

Actually, there is one thing, and we can decide if it is a problem or not.

Let's say we have two Filters ( slice_count is 512)

  • FilterA { start: 0xab10..00, end: 0xab7f..ff }
  • FilterB { start: 0xab80..00, end: 0xabff..ff }

And let's say that trie node at path [0xa, 0xb] is a Leaf node, with prefix: [0xc, 0xd, ...] (making entire path for the data stored inside the leaf node: 0xabcd...).

Intuitively, one would expect that this trie node (since it's leaf) would be visited only by TrieWalker that has FilterB, and not be visited by TrieWalker with FilterA. But it would actually be visited by both of them.

I personally don't think this is an issue. But I wanted to raise awareness and get your opinion as well.

I made Filter more inclusive, because I thought it was simpler to reason about that way. Especially well walking diverging paths in the branch node. I don't think including a few extra nodes is a issue here as you said.

@KolbyML
Copy link
Member Author

KolbyML commented Dec 9, 2024

@morph-dev ready for another review

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

/// Create a new filter that includes the whole trie
/// Slice index must be less than slice count or it will panic
pub fn new(slice_index: u16, slice_count: u16) -> Self {
assert!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition would fail if slice_count is zero (which seems to be supported case otherwise).

assert_eq!(filter.end_prefix, U256::MAX);
let filter = Filter::random(1);
assert_eq!(filter.start, U256::ZERO);
assert_eq!(filter.end, U256::MAX);
}

#[test]
fn test_is_included() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be renamed to contains

}
}

#[cfg(test)]
mod tests {
use alloy::hex::FromHex;

use super::*;

#[test]
fn test_new_random_filter() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: names of tests usually don't start with test_ (it's redundant since they are already inside tests module)

@Skippyy9
Copy link

Skippyy9 commented Dec 9, 2024

Awesome

@KolbyML KolbyML merged commit 580a72b into ethereum:master Dec 9, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants