You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In-memory B-tree containers can have significant benefits when storing small value_types. See the design doc on Google's B-tree Containers.
However, whether B-tree containers are a good alternative to regular binary trees or not depend on the size of the values stored. As the Google design doc states:
When value_type is large, fewer values can be stored per node so the benefits of B-tree are lessened.
When value_type is expensive to move, B-tree may become more expensive than STL alternatives because B-tree needs to move values within and between nodes to maintain balance, whereas binary trees can just move pointers instead. std::array<int32_t, 16> is an example of a value_type for which STL sorted containers currently outperform B-trees.
So whether B-tree containers should be considered for Chainbase will depend on the size of the payload stored within tree nodes.
Storing row values in B-tree nodes
=> This would not be a solution allowing to index by multiple keys.
However, maybe we can still consider using B-tree containers, where the nodes would store pointers to the value_type instead of the value_type itself. I will consider this possibility next.
Storing row value pointers in B-tree nodes
I have instrumented Chainbase's node_allocator to dump the count of nodes allocated while loading a snapshot per payload size (where the payload size if the actual allocation size, minus the 16 bytes used for storing the three pointers to the siblings and parent).
In the last section we have the results for 3 different snapshots. They show that between August 2024 and now, we have seen a huge increase in the number of rows of size 64 and 112 bytes (shown in bold below), these two size together representing about 70GB out of the total memory used for node storage (excluding strings/vectors).
If we use the same optimization as in offset_node_base (allowing to store 3 offset pointers in 16 bytes), this would allow us to store 48 pointers per 256 byte node, allowing for 48 value pointers on leaf nodes, and 23 value pointers on intermediate nodes.
If we assume an average of 40 value pointers per node, this would save almost 10 bytes per row (6.4 bytes per value instead of 16 currently, but then that's without the parent pointer), so for a total of 766 million rows a saving of ~7.6GB (for the latest snapshot, a memory usage reduction of 7.1%).
I expect that this change may also allow for better performance as well (both lookups and updates), which in theory could be faster by a factor of 3.5x (log2(11) / log2(2)), but the improvement will likely be less in our case (or it could possibly be slower) because the comparison of values will require a pointer dereferencing.
In any case, further experimentation to compare the currently used binary trees with B-trees holding pointers will be necessary to solidify these assumptions.
Collected data
snapshot Aug 27, 2023
Note: In the tables below, I subtracted 16 bytes (the size of the offset_node_base) to the actual allocated size to compute what I called the payload size. This is incorrect for indices using multiple keys, as this 16 byte cost is incurred for every key. So for example when a row needed 80 bytes, I used payload size of 64 byte, but the actual payload could be less (48 bytes for example if the table uses one secondary key).
greg7mdp
changed the title
Faster and leaner chainbase data storage using B-trees instead of a binary tree
Investigate whether we could have a faster and leaner chainbase data storage using B-trees instead of a binary treeDec 17, 2024
Introduction
In-memory B-tree containers can have significant benefits when storing small
value_type
s. See the design doc on Google's B-tree Containers.However, whether B-tree containers are a good alternative to regular binary trees or not depend on the size of the values stored. As the Google design doc states:
value_type
is large, fewer values can be stored per node so the benefits of B-tree are lessened.value_type
is expensive to move, B-tree may become more expensive than STL alternatives because B-tree needs to move values within and between nodes to maintain balance, whereas binary trees can just move pointers instead.std::array<int32_t, 16>
is an example of avalue_type
for which STL sorted containers currently outperform B-trees.So whether B-tree containers should be considered for Chainbase will depend on the size of the payload stored within tree nodes.
Storing row values in B-tree nodes
=> This would not be a solution allowing to index by multiple keys.
However, maybe we can still consider using B-tree containers, where the nodes would store pointers to the
value_type
instead of thevalue_type
itself. I will consider this possibility next.Storing row value pointers in B-tree nodes
I have instrumented Chainbase's
node_allocator
to dump the count of nodes allocated while loading a snapshot per payload size (where the payload size if the actual allocation size, minus the 16 bytes used for storing the three pointers to the siblings and parent).In the last section we have the results for 3 different snapshots. They show that between August 2024 and now, we have seen a huge increase in the number of rows of size
64
and112
bytes (shown in bold below), these two size together representing about 70GB out of the total memory used for node storage (excluding strings/vectors).If we use the same optimization as in
offset_node_base
(allowing to store 3 offset pointers in 16 bytes), this would allow us to store 48 pointers per 256 byte node, allowing for 48 value pointers on leaf nodes, and 23 value pointers on intermediate nodes.If we assume an average of 40 value pointers per node, this would save almost 10 bytes per row (6.4 bytes per value instead of 16 currently, but then that's without the parent pointer), so for a total of 766 million rows a saving of ~7.6GB (for the latest snapshot, a memory usage reduction of 7.1%).
I expect that this change may also allow for better performance as well (both lookups and updates), which in theory could be faster by a factor of 3.5x (
log2(11) / log2(2)
), but the improvement will likely be less in our case (or it could possibly be slower) because the comparison of values will require a pointer dereferencing.In any case, further experimentation to compare the currently used binary trees with B-trees holding pointers will be necessary to solidify these assumptions.
Collected data
snapshot Aug 27, 2023
Note: In the tables below, I subtracted 16 bytes (the size of the
offset_node_base
) to the actual allocated size to compute what I called thepayload size
. This is incorrect for indices using multiple keys, as this 16 byte cost is incurred for every key. So for example when a row needed 80 bytes, I usedpayload size
of 64 byte, but the actual payload could be less (48 bytes for example if the table uses one secondary key).Chainbase memory used: 21.01GB
snapshot Aug 11, 2024
Chainbase memory used: 22.42GB
snapshot Sept 29, 2024
Chainbase memory used: 63.12GB
snapshot Dec 16, 2024
Chainbase memory used: 106.15GB
The text was updated successfully, but these errors were encountered: