-
Notifications
You must be signed in to change notification settings - Fork 0
Hash based memtable implementations
Other than the default memtable implementation using skip lists, users can use other types of memtable implementation, for example HashLinkList and HashSkipList, to speed-up some queries.
As their names imply, HashSkipList organizes data in a hash table with each hash bucket to be a skip list, while HashLinkList organizes data in a hash table with each hash bucket as a sorted single linked list. Both types are built to reduce number of comparisons when doing queries. One good use case is to combine them with PlainTable SST format and store data in RAMFS.
When doing a look-up or inserting a key, target key's prefix is retrieved using Options.prefix_extractor, which is used to find the hash bucket. Inside a hash bucket, all the comparisons are done using whole (internal) keys, just as SkipList based memtable.
Comparison:
Mem Table Type | SkipList | HashSkipList | HashLinkList |
---|---|---|---|
Optimized Use Case | General | Range query within a specific key prefix | Range query within a specific key prefix and there are only a small number of rows for each prefix |
Index type | binary search | hash + binary search | hash + linear search |
Support totally ordered full db scan? | naturally | very costly (copy and sort to create a temporary totally-ordered view) | very costly (copy and sort to create a temporary totally-ordered view) |
Memory Overhead | Average (multiple pointers per entry) | High (Hash Buckets + Skip List Metadata for non-empty buckets + multiple pointers per entry) | Lower (Hash buckets + pointer per entry) |
MemTable Flush | Fast with constant extra memory | Slow with high temporary memory usage | Slow with high temporary memory usage |
The biggest limitation of the hash based memtables is that doing scan across multiple prefixes requires copy and sort, which is very slow and memory costly.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
-
Developer's Guide
- Basic Operations
- Known Issues
- Block-based Table Format
- MANIFEST
- Block Cache
- PlainTable Format
- Bloom Filter
- Hash-Based Memtable
- Prefix seek
- Read-Modify-Write Operator
- Tailing Iterator
- Single Delete
- Time to Live (TTL) Support
- Huge Page TLB Support
- Column Families
- Universal compaction style
- FIFO compaction style
- Write Ahead Log File Format
- WAL Recovery Modes
- EventListener
- Rate Limiter
- RocksDB Options File
- Transactions
- Creating and Ingesting SST files
- Statistics
- Perf Context and IO Stats Context
- Logger
- Tools / Utilities
- Implementation Details
- RocksJava
- Performance
- Misc