Notes on counts and Merkle.

See #10.
bigeasy · Sep 21, 2020 · 9cccf19 · 9cccf19
1 parent 39a5e08
commit 9cccf19
Showing 1 changed file with 47 additions and 0 deletions.
diff --git a/diary.md b/diary.md
@@ -1,3 +1,50 @@
+## Sun Sep 20 20:45:19 CDT 2020
+
+Occurs to me that an easy way to do counts or hashes is to just have an
+additional file in page directory for counts or hashes. It would be a short file
+that would be easy to read, be per leaf, and you could probably use timestamps
+on the file to determine if you ought to load the page and recalcuate.
+
+Which would be a place to start. Currently, we're content that a failure to
+write a leaf file means that the last few appends are lost, and that we'll
+notice and start complaining to user, instead of failing silently and misleading
+the user into believing that their data has been saved when it has not.
+
+If we have two files, once that is small containing meta-data, how do we know
+that the smaller file is valid relative to the larger file? If the file is a
+summary of a branch, then it is a summary of all the children, so how do we know
+that this summary is correct?
+
+With Amalgamate we are close to having a write-ahead log, oh, and it occurs to
+me now that there is no good way to merge a count or a Merkel tree, not unless
+we adapt Amalgamate to only consider itself committed once the stage is merged
+into the primary tree, and then to only reference the primary tree in its
+iterators. What then is the use, really, of a pre-calculated count? It is only
+useful if we are counting by an index. Yes, I suppose that is useful.
+
+Two nascent thoughts, then. Some sort of count cache that is hashed on versions,
+some sort of version number for a version set to order that cache, so a version
+number that is ever increasing, and then a version set version that is ever
+increasing, and now it does seem to make more sense to keep this meta-data in an
+external index, not inside the tree. This verison set version number, may as
+well implement it and see what it enables.
+
+Second nascent thought is just that if the primary tree is large, and the stages
+are small, you could query a count by the primary tree first, and calculate only
+those primary tree pages that do not fit somehow.
+
+Finally, if we want to have this merge thing, and we want to have a tree
+properties like merkel and count, then we need a definitive tree and need to
+expose the three structure through our clever iterators. If the first key in a
+primary key page and the last key in a primary page resolve to the same index in
+a stage, then there is nothing to search for, and assuming that the stages are
+small, oh, but then we're storing first and last keys in our little lookup file,
+but, this might be a useful optimization. It is going to want to be pluggable,
+let me construct a little cupcake for a page when you've added stuff to it.
+
+Of course, each branch page does have a key range, we could use the branch pages
+of the primary to determine these ranges.
+
 ## Sat Aug 22 21:18:25 CDT 2020
 
 git log -n 1 bc45430aedcb1dc35256d321b83009ce28821f2f