How do you choose a sample size for the cache ? #6

abdulmi · 2023-01-31T19:29:43Z

Is there a strategy for choosing the sample size for the cache ? Based on the example provided its the sum of all windows, but just wanted to check if there's a suggested way to choose the sample size, and what are the implications for choosing a bigger or smaller sample size ?

The parameter I am talking about is the one passed to the WTinyLFUCache::with_sizes function

pub fn with_sizes(
    window_cache_size: usize,
    protected_cache_size: usize,
    probationary_cache_size: usize,
    samples: usize
) -> Result<Self, WTinyLFUError>

The text was updated successfully, but these errors were encountered:

al8n · 2023-10-18T06:36:52Z

Hi, sorry for the late reply, for the size, highly depends on the content that will be stored. So, unfortunately, there are no suggestions here.

vlovich · 2023-10-29T23:45:16Z

I had a similar question and I think the root is that it's unclear how to connect the size + sample parameters into an estimate of total memory usage & vice-versa (& just in general what those parameters control). More broadly, it's not even really clear what the size & sample values control. AFAICT:

Number of elements in the LRU (window_cache_size)
Number of elements in the segmented cache (probationary_cache_size + protected_cache_size)
Number of elements in the CountMinSketch (somewhere between 2 * (window_cache_size + probationary_cache_size + protected_cache_size) and 4 * (window_cache_size + probationary_cache_size + protected_cache_size) if I'm not mistaken because there's 4 filters of length size rounded down to the nearest power of 2 approximately).
Number of bits within the bloom filter (derived from samples & false-positive rate although not 100% clear immediately what that relationship is because the logic isn't trivial and there's no helper docs to guide & I'm not doing more than skimming the implementation)

Is that correct? Is there a more elegant way to convert human-friendly requirements (e.g. "I want to use at most 100 MiB of space with a false-positive rate of 1%" and maybe something like "here's the percentage breakdown among the different components within the LFU") into the actual parameters needed?

vlovich · 2023-10-31T17:50:45Z

@al8n can you help provide some color commentary on what the different parameters control & how they relate to memory usage / cache hit ratio? I'm not sure I fully understand how to convert target memory usage requirements into the parameters for the cache.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do you choose a sample size for the cache ? #6

How do you choose a sample size for the cache ? #6

abdulmi commented Jan 31, 2023 •

edited

Loading

al8n commented Oct 18, 2023

vlovich commented Oct 29, 2023 •

edited

Loading

vlovich commented Oct 31, 2023

How do you choose a sample size for the cache ? #6

How do you choose a sample size for the cache ? #6

Comments

abdulmi commented Jan 31, 2023 • edited Loading

al8n commented Oct 18, 2023

vlovich commented Oct 29, 2023 • edited Loading

vlovich commented Oct 31, 2023

abdulmi commented Jan 31, 2023 •

edited

Loading

vlovich commented Oct 29, 2023 •

edited

Loading