Investigate whether we could have a faster and leaner chainbase data storage using `B-trees` instead of a `binary tree` #1072

greg7mdp · 2024-12-16T20:11:58Z

Introduction

In-memory B-tree containers can have significant benefits when storing small value_types. See the design doc on Google's B-tree Containers.

However, whether B-tree containers are a good alternative to regular binary trees or not depend on the size of the values stored. As the Google design doc states:

When value_type is large, fewer values can be stored per node so the benefits of B-tree are lessened.
When value_type is expensive to move, B-tree may become more expensive than STL alternatives because B-tree needs to move values within and between nodes to maintain balance, whereas binary trees can just move pointers instead. std::array<int32_t, 16> is an example of a value_type for which STL sorted containers currently outperform B-trees.

So whether B-tree containers should be considered for Chainbase will depend on the size of the payload stored within tree nodes.

Storing row values in B-tree nodes

=> This would not be a solution allowing to index by multiple keys.

However, maybe we can still consider using B-tree containers, where the nodes would store pointers to the value_type instead of the value_type itself. I will consider this possibility next.

Storing row value pointers in B-tree nodes

I have instrumented Chainbase's node_allocator to dump the count of nodes allocated while loading a snapshot per payload size (where the payload size if the actual allocation size, minus the 16 bytes used for storing the three pointers to the siblings and parent).

In the last section we have the results for 3 different snapshots. They show that between August 2024 and now, we have seen a huge increase in the number of rows of size 64 and 112 bytes (shown in bold below), these two size together representing about 70GB out of the total memory used for node storage (excluding strings/vectors).

If we use the same optimization as in offset_node_base (allowing to store 3 offset pointers in 16 bytes), this would allow us to store 48 pointers per 256 byte node, allowing for 48 value pointers on leaf nodes, and 23 value pointers on intermediate nodes.

If we assume an average of 40 value pointers per node, this would save almost 10 bytes per row (6.4 bytes per value instead of 16 currently, but then that's without the parent pointer), so for a total of 766 million rows a saving of ~7.6GB (for the latest snapshot, a memory usage reduction of 7.1%).

I expect that this change may also allow for better performance as well (both lookups and updates), which in theory could be faster by a factor of 3.5x (log2(11) / log2(2)), but the improvement will likely be less in our case (or it could possibly be slower) because the comparison of values will require a pointer dereferencing.

In any case, further experimentation to compare the currently used binary trees with B-trees holding pointers will be necessary to solidify these assumptions.

Collected data

snapshot Aug 27, 2023

Note: In the tables below, I subtracted 16 bytes (the size of the offset_node_base) to the actual allocated size to compute what I called the payload size. This is incorrect for indices using multiple keys, as this 16 byte cost is incurred for every key. So for example when a row needed 80 bytes, I used payload size of 64 byte, but the actual payload could be less (48 bytes for example if the table uses one secondary key).

Chainbase memory used: 21.01GB

payload size	num_allocated	btree - values per intermediate node (248 / (sz + 8))	btree - values per leaf node (256 / sz)	mem/node (bin / int / leaf)
24	12,650,325	7	10	40 / 37 / 26
48	65,536	4	5	64 / 64 / 51
56	6,005,259	3	4	72 / 85 / 64
64	54,337,418	3	4	80 / 85 / 64
72	28,296,137	3	3	88 / 85 / 85
80	28,796,696	2	3	96 / 128 / 85
88	15,138	2	2	104 / 128 / 128
96	7,647,198	2	2	112 / 128 / 128
112	655,236	2	2	128 / 128 / 128
120	6,005,260	1	2	136 / 256 / 128
136	12,650,325	1	1	152 / 256 / 256
192	5,832	1	1	208 / 256 / 256
200	1	1	1	216 / 256 / 256

snapshot Aug 11, 2024

Chainbase memory used: 22.42GB

payload size	num_allocated	btree - values per intermediate node (248 / (sz + 8))	btree - values per leaf node (256 / sz)	mem/node (bin / int / leaf)
24	13,209,626	7	10	40 / 37 / 26
48	65,536	4	5	64 / 64 / 51
56	6,264,868	3	4	72 / 85 / 64
64	69,365,660	3	4	80 / 85 / 64
72	26,829,738	3	3	88 / 85 / 85
80	22,166,552	2	3	96 / 128 / 85
88	13,104	2	2	104 / 128 / 128
96	17,977,498	2	2	112 / 128 / 128
112	8,123,230	2	2	128 / 128 / 128
120	6,264,869	1	2	136 / 256 / 128
136	13,209,626	1	1	152 / 256 / 256
192	5,749	1	1	208 / 256 / 256
200	1	1	1	216 / 256 / 256

snapshot Sept 29, 2024

Chainbase memory used: 63.12GB

payload size	num_allocated	btree - values per intermediate node (248 / (sz + 8))	btree - values per leaf node (256 / sz)	mem/node (bin / int / leaf)
24	13,257,994	7	10	40 / 37 / 26
48	65,536	4	5	64 / 64 / 51
56	6,273,805	3	4	72 / 85 / 64
64	164,238,971	3	4	80 / 85 / 64
72	26,865,822	3	3	88 / 85 / 85
80	23,655,119	2	3	96 / 128 / 85
88	13,960	2	2	104 / 128 / 128
96	15,738,610	2	2	112 / 128 / 128
112	199,938,036	2	2	128 / 128 / 128
120	6,273,806	1	2	136 / 256 / 128
136	13,257,993	1	1	152 / 256 / 256
200	1	1	1	216 / 256 / 256

snapshot Dec 16, 2024

Chainbase memory used: 106.15GB

payload size	num_allocated	btree - values per intermediate node (248 / (sz + 8))	btree - values per leaf node (256 / sz)	mem/node (bin / int / leaf)
24	13,269,907	7	10	40 / 37 / 26
48	65,536	4	5	64 / 64 / 51
56	6,292,467	3	4	72 / 85 / 64
64	261,261,459	3	4	80 / 85 / 64
72	26,932,251	3	3	88 / 85 / 85
80	32,286,909	2	3	96 / 128 / 85
88	12,446	2	2	104 / 128 / 128
96	12,964,842	2	2	112 / 128 / 128
112	397,570,166	2	2	128 / 128 / 128
120	6,292,468	1	2	136 / 256 / 128
136	13,269,907	1	1	152 / 256 / 256
200	1	1	1	216 / 256 / 256

The text was updated successfully, but these errors were encountered:

greg7mdp added this to the Spring v2.0.0-rc1 milestone Dec 16, 2024

greg7mdp self-assigned this Dec 16, 2024

enf-ci-bot added the triage label Dec 16, 2024

enf-ci-bot added this to Team Backlog Dec 16, 2024

github-project-automation bot moved this to Todo in Team Backlog Dec 16, 2024

greg7mdp changed the title ~~Faster and leaner chainbase data storage using B-trees instead of a binary tree~~ Investigate whether we could have a faster and leaner chainbase data storage using B-trees instead of a binary tree Dec 17, 2024

greg7mdp moved this from Todo to In Progress in Team Backlog Dec 17, 2024

greg7mdp added 👍 lgtm and removed triage labels Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate whether we could have a faster and leaner chainbase data storage using `B-trees` instead of a `binary tree` #1072

Investigate whether we could have a faster and leaner chainbase data storage using `B-trees` instead of a `binary tree` #1072

greg7mdp commented Dec 16, 2024 •

edited

Loading

Investigate whether we could have a faster and leaner chainbase data storage using B-trees instead of a binary tree #1072

Investigate whether we could have a faster and leaner chainbase data storage using B-trees instead of a binary tree #1072

Comments

greg7mdp commented Dec 16, 2024 • edited Loading

Introduction

Storing row values in B-tree nodes

Storing row value pointers in B-tree nodes

Collected data

snapshot Aug 27, 2023

snapshot Aug 11, 2024

snapshot Sept 29, 2024

snapshot Dec 16, 2024

Investigate whether we could have a faster and leaner chainbase data storage using `B-trees` instead of a `binary tree` #1072

Investigate whether we could have a faster and leaner chainbase data storage using `B-trees` instead of a `binary tree` #1072

greg7mdp commented Dec 16, 2024 •

edited

Loading