Skip to content

Commit

Permalink
Notes on counts and Merkle.
Browse files Browse the repository at this point in the history
See #10.
  • Loading branch information
flatheadmill committed Sep 21, 2020
1 parent 39a5e08 commit 9cccf19
Showing 1 changed file with 47 additions and 0 deletions.
47 changes: 47 additions & 0 deletions diary.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,50 @@
## Sun Sep 20 20:45:19 CDT 2020

Occurs to me that an easy way to do counts or hashes is to just have an
additional file in page directory for counts or hashes. It would be a short file
that would be easy to read, be per leaf, and you could probably use timestamps
on the file to determine if you ought to load the page and recalcuate.

Which would be a place to start. Currently, we're content that a failure to
write a leaf file means that the last few appends are lost, and that we'll
notice and start complaining to user, instead of failing silently and misleading
the user into believing that their data has been saved when it has not.

If we have two files, once that is small containing meta-data, how do we know
that the smaller file is valid relative to the larger file? If the file is a
summary of a branch, then it is a summary of all the children, so how do we know
that this summary is correct?

With Amalgamate we are close to having a write-ahead log, oh, and it occurs to
me now that there is no good way to merge a count or a Merkel tree, not unless
we adapt Amalgamate to only consider itself committed once the stage is merged
into the primary tree, and then to only reference the primary tree in its
iterators. What then is the use, really, of a pre-calculated count? It is only
useful if we are counting by an index. Yes, I suppose that is useful.

Two nascent thoughts, then. Some sort of count cache that is hashed on versions,
some sort of version number for a version set to order that cache, so a version
number that is ever increasing, and then a version set version that is ever
increasing, and now it does seem to make more sense to keep this meta-data in an
external index, not inside the tree. This verison set version number, may as
well implement it and see what it enables.

Second nascent thought is just that if the primary tree is large, and the stages
are small, you could query a count by the primary tree first, and calculate only
those primary tree pages that do not fit somehow.

Finally, if we want to have this merge thing, and we want to have a tree
properties like merkel and count, then we need a definitive tree and need to
expose the three structure through our clever iterators. If the first key in a
primary key page and the last key in a primary page resolve to the same index in
a stage, then there is nothing to search for, and assuming that the stages are
small, oh, but then we're storing first and last keys in our little lookup file,
but, this might be a useful optimization. It is going to want to be pluggable,
let me construct a little cupcake for a page when you've added stuff to it.

Of course, each branch page does have a key range, we could use the branch pages
of the primary to determine these ranges.

## Sat Aug 22 21:18:25 CDT 2020

git log -n 1 bc45430aedcb1dc35256d321b83009ce28821f2f
Expand Down

0 comments on commit 9cccf19

Please sign in to comment.