Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate whether we could have a faster and leaner chainbase data storage using B-trees instead of a binary tree #1072

Open
greg7mdp opened this issue Dec 16, 2024 · 0 comments
Assignees

Comments

@greg7mdp
Copy link
Contributor

greg7mdp commented Dec 16, 2024

Introduction

In-memory B-tree containers can have significant benefits when storing small value_types. See the design doc on Google's B-tree Containers.

However, whether B-tree containers are a good alternative to regular binary trees or not depend on the size of the values stored. As the Google design doc states:

  • When value_type is large, fewer values can be stored per node so the benefits of B-tree are lessened.
  • When value_type is expensive to move, B-tree may become more expensive than STL alternatives because B-tree needs to move values within and between nodes to maintain balance, whereas binary trees can just move pointers instead. std::array<int32_t, 16> is an example of a value_type for which STL sorted containers currently outperform B-trees.

So whether B-tree containers should be considered for Chainbase will depend on the size of the payload stored within tree nodes.

Storing row values in B-tree nodes

=> This would not be a solution allowing to index by multiple keys.

However, maybe we can still consider using B-tree containers, where the nodes would store pointers to the value_type instead of the value_type itself. I will consider this possibility next.

Storing row value pointers in B-tree nodes

I have instrumented Chainbase's node_allocator to dump the count of nodes allocated while loading a snapshot per payload size (where the payload size if the actual allocation size, minus the 16 bytes used for storing the three pointers to the siblings and parent).

In the last section we have the results for 3 different snapshots. They show that between August 2024 and now, we have seen a huge increase in the number of rows of size 64 and 112 bytes (shown in bold below), these two size together representing about 70GB out of the total memory used for node storage (excluding strings/vectors).

If we use the same optimization as in offset_node_base (allowing to store 3 offset pointers in 16 bytes), this would allow us to store 48 pointers per 256 byte node, allowing for 48 value pointers on leaf nodes, and 23 value pointers on intermediate nodes.

If we assume an average of 40 value pointers per node, this would save almost 10 bytes per row (6.4 bytes per value instead of 16 currently, but then that's without the parent pointer), so for a total of 766 million rows a saving of ~7.6GB (for the latest snapshot, a memory usage reduction of 7.1%).

I expect that this change may also allow for better performance as well (both lookups and updates), which in theory could be faster by a factor of 3.5x (log2(11) / log2(2)), but the improvement will likely be less in our case (or it could possibly be slower) because the comparison of values will require a pointer dereferencing.

In any case, further experimentation to compare the currently used binary trees with B-trees holding pointers will be necessary to solidify these assumptions.

Collected data

snapshot Aug 27, 2023

Note: In the tables below, I subtracted 16 bytes (the size of the offset_node_base) to the actual allocated size to compute what I called the payload size. This is incorrect for indices using multiple keys, as this 16 byte cost is incurred for every key. So for example when a row needed 80 bytes, I used payload size of 64 byte, but the actual payload could be less (48 bytes for example if the table uses one secondary key).

Chainbase memory used: 21.01GB

payload size num_allocated btree - values per intermediate node (248 / (sz + 8)) btree - values per leaf node (256 / sz) mem/node (bin / int / leaf)
24 12,650,325 7 10 40 / 37 / 26
48 65,536 4 5 64 / 64 / 51
56 6,005,259 3 4 72 / 85 / 64
64 54,337,418 3 4 80 / 85 / 64
72 28,296,137 3 3 88 / 85 / 85
80 28,796,696 2 3 96 / 128 / 85
88 15,138 2 2 104 / 128 / 128
96 7,647,198 2 2 112 / 128 / 128
112 655,236 2 2 128 / 128 / 128
120 6,005,260 1 2 136 / 256 / 128
136 12,650,325 1 1 152 / 256 / 256
192 5,832 1 1 208 / 256 / 256
200 1 1 1 216 / 256 / 256

snapshot Aug 11, 2024

Chainbase memory used: 22.42GB

payload size num_allocated btree - values per intermediate node (248 / (sz + 8)) btree - values per leaf node (256 / sz) mem/node (bin / int / leaf)
24 13,209,626 7 10 40 / 37 / 26
48 65,536 4 5 64 / 64 / 51
56 6,264,868 3 4 72 / 85 / 64
64 69,365,660 3 4 80 / 85 / 64
72 26,829,738 3 3 88 / 85 / 85
80 22,166,552 2 3 96 / 128 / 85
88 13,104 2 2 104 / 128 / 128
96 17,977,498 2 2 112 / 128 / 128
112 8,123,230 2 2 128 / 128 / 128
120 6,264,869 1 2 136 / 256 / 128
136 13,209,626 1 1 152 / 256 / 256
192 5,749 1 1 208 / 256 / 256
200 1 1 1 216 / 256 / 256

snapshot Sept 29, 2024

Chainbase memory used: 63.12GB

payload size num_allocated btree - values per intermediate node (248 / (sz + 8)) btree - values per leaf node (256 / sz) mem/node (bin / int / leaf)
24 13,257,994 7 10 40 / 37 / 26
48 65,536 4 5 64 / 64 / 51
56 6,273,805 3 4 72 / 85 / 64
64 164,238,971 3 4 80 / 85 / 64
72 26,865,822 3 3 88 / 85 / 85
80 23,655,119 2 3 96 / 128 / 85
88 13,960 2 2 104 / 128 / 128
96 15,738,610 2 2 112 / 128 / 128
112 199,938,036 2 2 128 / 128 / 128
120 6,273,806 1 2 136 / 256 / 128
136 13,257,993 1 1 152 / 256 / 256
200 1 1 1 216 / 256 / 256

snapshot Dec 16, 2024

Chainbase memory used: 106.15GB

payload size num_allocated btree - values per intermediate node (248 / (sz + 8)) btree - values per leaf node (256 / sz) mem/node (bin / int / leaf)
24 13,269,907 7 10 40 / 37 / 26
48 65,536 4 5 64 / 64 / 51
56 6,292,467 3 4 72 / 85 / 64
64 261,261,459 3 4 80 / 85 / 64
72 26,932,251 3 3 88 / 85 / 85
80 32,286,909 2 3 96 / 128 / 85
88 12,446 2 2 104 / 128 / 128
96 12,964,842 2 2 112 / 128 / 128
112 397,570,166 2 2 128 / 128 / 128
120 6,292,468 1 2 136 / 256 / 128
136 13,269,907 1 1 152 / 256 / 256
200 1 1 1 216 / 256 / 256
@greg7mdp greg7mdp added this to the Spring v2.0.0-rc1 milestone Dec 16, 2024
@greg7mdp greg7mdp self-assigned this Dec 16, 2024
@greg7mdp greg7mdp changed the title Faster and leaner chainbase data storage using B-trees instead of a binary tree Investigate whether we could have a faster and leaner chainbase data storage using B-trees instead of a binary tree Dec 17, 2024
@greg7mdp greg7mdp moved this from Todo to In Progress in Team Backlog Dec 17, 2024
@greg7mdp greg7mdp added 👍 lgtm and removed triage labels Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

2 participants