Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

floresta-chain: new optimized chainstore #251

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Davidson-Souza
Copy link
Collaborator

The current chainstore is based on kv, but it has a few problems:

  • When we flush, we get a huge heap spike
  • We are getting a 2 or 3 times overhead on headers
  • It gets kinda slow to retrieve headers during IBD if we flush early

This commit introduces a bare-bones, ad-hock store that consists in two parts:

  • A open addressing, file backed and memory-mapped hash map to keep the relation block_hash -> block_height
  • A flat file that contains block headers serialized, in ascending order

To recover a header, given the block height, we simply use pointer arithmetic inside the flat file. If we need to get from the block hash, use the map first, then find it inside the flat file. This has the advantage of not needing explicit flushes (the os will flush it in fixed intervals), flushes are async (the os will do it), we get caching for free (mmap-ed pages will stay in memory if we need) and our cache can react to system constraints, because the kernel will always know how much memory we sill have

@JoseSK999
Copy link
Contributor

My understanding is that kv buckets also don't need explicit flushes. The data saved in a bucket is flushed to disk by the OS periodically, and it's kept in the inner sled pagecache (so we should have fast access), which is configured with a capacity of 100MB in KvChainStore::new.

@Davidson-Souza
Copy link
Collaborator Author

That was my understanding too, but for some reason, even before using the cache (second part of #169) it didn't really flush on its own, and if we had an unclean shutdown, we would lose our progress (or a good part of it). It would also become increasingly more CPU and IO heavy as we made progress, I suspect it's due to the block locator getting bigger as our chain grows.

But the biggest problem for me, that I couldn't find an alternative, was the heap spike. It would always crash on my phone, before or after #169. With this PR, it runs fine!

@JoseSK999
Copy link
Contributor

JoseSK999 commented Oct 8, 2024

It would also become increasingly more CPU and IO heavy as we made progress, I suspect it's due to the block locator getting bigger as our chain grows.

Isn't that the expected behavior of a node in IBD, as it moves from old empty blocks to more recent ones?

Also was the OS not flushing on its own on desktop or on mobile?

@Davidson-Souza
Copy link
Collaborator Author

Isn't that the expected behavior of a node in IBD, as it moves from old empty blocks to more recent ones?

Not this much, at least not for headers. They are small and have constant-size

Also was the OS not flushing on its own on desktop or on mobile?

Both. At least on my setup.

@Davidson-Souza Davidson-Souza force-pushed the new-chainstore branch 6 times, most recently from c82d96c to 4f43d68 Compare January 2, 2025 19:53
The current chainstore is based on `kv`, but it has a few problems:
  - When we flush, we get a huge heap spike
  - We are getting a 2 or 3 times overhead on headers
  - It gets kinda slow to retrieve headers during IBD if we flush early

This commit introduces a bare-bones, ad-hock store that consists in two
parts:
  - A open addressing, file backed and memory-mapped hash map to keep
    the relation block_hash -> block_height
  - A flat file that contains block headers serialized, in ascending
    order
  - A LRU cache to avoid going througth the map every time

To recover a header, given the block height, we simply use pointer
arithmetic inside the flat file. If we need to get from the block hash,
use the map first, then find it inside the flat file. This has the
advantage of not needing explicit flushes (the os will flush it in fixed
intervals), flushes are async (the os will do it), we get caching for
free (mmap-ed pages will stay in memory if we need) and our cache can
react to system constraints, because the kernel will always know how
much memory we sill have
@Davidson-Souza Davidson-Souza changed the title [WIP] floresta-chain: new optimized chainstore floresta-chain: new optimized chainstore Jan 3, 2025
@Davidson-Souza Davidson-Souza marked this pull request as ready for review January 3, 2025 00:26
Copy link
Contributor

@jaoleal jaoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes, heres mine superficial review... I still didnt finished.

Youre right, this needs a lot of testing and review.
It looks a nice job!

//! embedded database that doesn't require any runtime dependency. However, floresta-chain uses the
//! database in a very unusual way: it downloads a bunch of small chunks of data that needs to be
//! indexed and retrieved, all at once (~800k for mainnet at the time of writing). If we simply
//! keep evething in memory, and then make one big batch, most embedded databases will see a big
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/evething/everything

//! into the given header, we do this by keeping a persistent, open-addressing hash map that map block
//! hashes -> heights. Then from the height we can work out the block header in the headers file.
//!
//! ## Calculations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some good documentation! But can you include whats a load factor ? Its mentions seems important and would be good to have its meaning right aside for some readers with less expertise in data structure.

IMO you can include this

///! # Good to know
///!
///! A load factor of a hashmap is the relation between empty buckets and buckets that are being used. 
///! The load factor is used to express the chance of hash collisions which decreases performance.
///!
///! Buckets are the slots of a hashmap.
///!
///! For more detailed information please refer to [Hash Table] (https://en.wikipedia.org/wiki/Hash_table) from wikipedia.

///
/// When computing the size in bytes, we will round the number to the nearest power of 2, minus
/// 1. This lets us do some optimizations like use & instead of %, and use << instead of *.
pub index_mmap_size: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/index_mmap_size/index_map_size

Comment on lines +99 to +102
/// This is the size of the flat file that holds all of our block headers. We keep all headers
/// in a simple flat file, one after the other. That file then gets mmaped into RAM, so we can
/// use pointer arithmetic to find specific block, since pos(h) = h * size_of(DiskBlockHeader)
/// The default value is having space for 10 million blocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mmispelled mapped
s/mmaped/maped

/// The permission for all the files we create
///
/// This is the permission we give to all the files we create. The default value is 0o600
pub file_permission: Option<u32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cant this be a Option<u16>?

pub cache_size: Option<usize>,
/// The permission for all the files we create
///
/// This is the permission we give to all the files we create. The default value is 0o600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean "The default value is 600" ?

version: u32,
/// Hash of the last block in the chain we believe has more work on
best_block: BlockHash,
/// How many blocks are pilled on this chain?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the ? ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants