Implement counted b-trees. #10

bigeasy · 2012-05-17T19:51:54Z

Counted b-trees means that we keep a count for every page, but the user will have to pay for the count, it will be calculated.

When we open a b-tree, we visit all of the leaf pages. All of them. However, we only read from the back of the page to find the the address array so that we can read the address of the next page and the count of records in the file.

Alternatively, we can append this information every time we write a record to the page, or we can append the file position of the last position array written.

You know, that's not a bad idea. As long as we've made the file brittle with file positions, we may as well capitalize on the benefits. Once you've gone and editing a file in Vim, you've pretty much thrown everything off anyway. There will be utilities to tidy a file that you recover in a text editor, essentially a compiler, but once you start poking around with the file, it has become source code. The checksums are off, the file positions are off, etc.

The compiler can play the leaf page as a log and rewrite, reporting failed checksums, but not entirely freaking out and dying, because it's purpose is to rebuild hand edited leaf files. Something that no one should ever do, but possibly a great recovery feature, that you can extract all your data as plain JSON.

bigeasy · 2012-05-18T22:17:37Z

File positions are useless because a position array is the count of records at a point in time. You need to move forward through the file in order to get the actual count.

We need to append the count with every record.

Also, the page will only change when the file is rewritten, so we can probably put that in the first record, so we can load our counts by reading the first and last record of every leaf page in the database.

bigeasy · 2012-05-19T17:43:23Z

We're recording the address of the right sibling leaf page in the position array entry. We need to keep it there. When we load the leaf, we move backward until we find it. We don't want to also have to open the front of the file as well, that becomes to jittery.

We simply write a position array entry at the head of the file, simply to record the right sibling leaf page.

bigeasy · 2012-05-20T19:53:44Z

Our counted b-tree implementation requires that we crack every leaf file when we reopen an index, to get the counts from the file. If we're going to do this, get all the counts, we can take this opportunity to check our balance, see if there are any potential splits or merges that might have been interrupted, or never executed because of a hard shutdown.

bigeasy · 2012-05-20T19:55:54Z

Can we go faster on load if either the sibling is in the file name, or else the file has an extension if it is a leaf? Sibling in filename would make loading difficult, need to pattern match to find the file. Extension means we can load the counts by looking at the end of every file, without having to look at the front of the file to find the next leaf.

We not store a checksum of each line of JSON in branch page file and leaf page files. Until we implement a CRC 32, we're using SHA1, which is available through the `crypto` Node.js module. The checksum is appended as a hexadecimal number after the JSON string on each line of a branch page file or a leaf page file. Closes #9. The test fixtures have been updated to read and write b‑tree file directories containing branch page files and leaf page files with checksummed lines of JSON. Each leaf page entry now includes an entry number. This can be used to verify the integrity of the file, to insure that no entries are missing. The entry number should increase by one with each entry. A break in the series indicates corruption. Closes #25. We added this because we checksum each line of a leaf page file, but we do not checksum the entire leaf page file. We know that each line is correct, but without the entry count, we don't know if any lines or missing, repeated, or out of order. Each insert and delete entry now includes the record count of the leaf page including the effect of the entry. This was added in anticipation of counted our b‑tree implementation described in #10. Closes #26. Each position array entry now includes a leaf page file format version number, because changes to file format are inevitable. Current version number is `1`. Closes #28. A position array entry is always the first entry of a leaf page file. We'll be able to look at the head of the file to find the right sibling leaf page. This is part of a future counted b‑tree strategy of skimming leaf pages at startup to read leaf page record counts. Closes #29. Removed a long chunk of exposition on the editablity of leaf page files. They're not really meant for hacking with `vim`. They can be managed with `git`, viewed with `less`, watched with `tail -f`, and munged using a Node.js program of the adopter's design. We store file positions however, so making an edit in your favorite text editor is likely to throw those file positions off. Part of the value of JSON and plain text is the visibility. Part of the value is that an application developer can restore corrupt files using utilities written in response to a specific instance of corruption, informed by the visibility. The data is still there. You can read through a corrupted file, recovering what can be recovered, inserting it into a new b‑tree. Also, I noticed that I was not adding the dot to the file suffixes when rewriting files in create. I added the dots.

bigeasy · 2013-08-19T18:25:51Z

This is becoming an externalized implementation, through a recipe, just like null keys or duplicate keys. One can use a version number and MVCC to implement counted b-trees in an external tree. I'm not sure it belongs here, or in Memento, but I'll leave it here for now.

bigeasy · 2015-11-26T03:09:18Z

Counting can be implemented by writing the count increase for each leaf page when the leaf page is locked. The count log for a leaf page would increment or decrement the count for each branch page visited during the descent. The count for each page would be kept in memory, in a hash table or in Magazine so it would be updated synchronously.

However, as I consider this I begin to realize that a counted b-tree internally to Strata doesn't matter so much. What good does it do if you're using the MVCC library to create a versioned tree?

As a user, you could use the lock you have on a leaf to update your housekeeping. You could then perform housekeeping by getting a mutator lock on the leaf, then updating the counts on all the pages. But, to really use this, it must be somewhat internal, so you can get a count based on a range of values, for example.

Then how do you version the count? This is a problem for a higher level implementation. The thought I'm trying to capture is that you can use the lock on the leaf to log changes, keep he counts in memory, then have a writer thread log all the branch counts. It manages the compexity in the same way that the balancer is only one to operate on branches.

See #10.

* Use new `Journalist.drain()` in shutdown. See #662. * Implement `Strata.drain()`, Cursor 100% coverage. Closes #662. * Remove ghosts. Closes #580. Closes #660. * Calculate and return heft on insert/remove. Closes #661. * Notes on counts and Merkle. See #10. Closes #663.

ghost assigned bigeasy May 17, 2012

bigeasy mentioned this issue May 24, 2012

Prepend an empty position array entry to the head of a leaf page file. #29

Closed

bigeasy pushed a commit that referenced this issue Sep 21, 2020

Notes on counts and Merkle.

9cccf19

See #10.

bigeasy mentioned this issue Sep 23, 2020

Release B-tree version 2.0.0-alpha.37. #663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement counted b-trees. #10

Implement counted b-trees. #10

bigeasy commented May 17, 2012

bigeasy commented May 18, 2012

bigeasy commented May 19, 2012

bigeasy commented May 20, 2012

bigeasy commented May 20, 2012

bigeasy commented Aug 19, 2013

bigeasy commented Nov 26, 2015

Implement counted b-trees. #10

Implement counted b-trees. #10

Comments

bigeasy commented May 17, 2012

bigeasy commented May 18, 2012

bigeasy commented May 19, 2012

bigeasy commented May 20, 2012

bigeasy commented May 20, 2012

bigeasy commented Aug 19, 2013

bigeasy commented Nov 26, 2015