Limit the cache based on arbitrary measures #38

rphmeier · 2016-10-12T11:38:08Z

Motivation:
My use-case for lru-cache involves inspecting items' heap size. Rather than hack it in, I think it would be beneficial to make the cache adaptable to whatever limit needs to be imposed.

You might have a set of heterogeneous items, each using a different amount of allocation on the heap. Maybe a more general Meter trait might help:

trait Meter<T> {
    fn measure(&self, item: &T) -> usize;
}

The default Meter could be a Count:

struct Count;

impl<T> Meter<T> for Count { 
    fn measure(&self, _: &T) -> usize { 1 }
}

Here's an implementation that would use the HeapSizeOf trait (I'd recommend this one for inclusion under a feature gate as it's generally useful.)

struct HeapSize;

impl<T: HeapSizeOf> Meter<T> for HeapSize {
    fn measure(&self, item: &T) -> usize { item.heap_size_of_children() + ::std::mem::size_of::<T>() }
}

When you add or remove an item, you add or subtract meter.measure(&item) from the maximum respectively. If the metered total exceeds the maximum while adding an item, other items must be removed in a loop until the total falls below the limit.

Caveats:

get_mut or interior mutability could allow items to be altered such that their measure at insertion time differs from their measure at removal. We could add a UniformMeter<T> which takes no reference to individual items when measuring -- indicating that the measurement is unrelated to item size. get_mut might only be implemented when the meter is uniform, while fn get(&mut self, key: &Q) -> Option<&T> would serve as an alternative implemented for all caches. This doesn't solve the interior mutability issue, but that can be addressed the same way as in HashMap: documentation specifying that it's a logic error to modify the value such that its measure would be altered.
insert can remove multiple items -- how do we return each of them to the caller efficiently? (maybe an iterator?)

Alternatively:

check the measure before and after get_mut and adjust the total correspondingly. The issue here is that it would need to be done through a CacheHandle<'a, T: 'a> which contains that logic in its destructor. This would break backwards compatibility, and would need to handle returning displaced items if the size grows.
re-measure every item in the cache whenever necessary. this seems prohibitively expensive

Looking forward to your comments and naming suggestions below.

The text was updated successfully, but these errors were encountered:

apasel422 · 2016-10-12T16:19:44Z

This is a neat idea. I need to ponder this further, but I think I'm in favor of going the HashMap route and just documenting that changing an item's measure after it's been inserted is a logic error, potentially also providing a method that can be used to update the measure in a controlled way once it's in the map.

rphmeier · 2016-10-12T20:20:49Z

@apasel422 The situation is a bit different, though, since the measure can be changed through plain old mutability as opposed to just interior mutability like HashMap keys.

I wonder if Meter should be Meter<K, V> and measurement should be done based on both key and value sizes (String as key type comes to mind)

apasel422 · 2016-10-12T20:31:43Z

I do like the idea of a CacheHandle (modulo naming?), even though it's possible to mem::forget it. A recent change to BinaryHeap is similar and declared leaking OK, since it doesn't affect memory safety. As for backwards compatibility, we could just have a semver bump, or declare a new method on LruCache<K, V, M> that returns CacheHandle instead of &mut V and leave get_mut only on LruCache<K, V> (assuming it defaults to Count).

I was also thinking it could be Meter<K, V>.

Is there a way to avoid storing a duplicate usize with Count, since it's derivable from the underlying LinkedHashMap's len()?

luser · 2016-12-01T17:38:10Z

I have a patch for this mostly done, with the exception of the get_mut bits.

This commit allows creating LruCaches that are limited in size by something other than a count of entries. It does this by providing a new trait, `Meter`, where implementations of the trait provide a `measure` method to do the measurement. The default behavior of measuring by the count of entries is provided by a `Count` type. Due to difficulties with mutating entries making it possible to change an item's measure, the `get_mut` and `iter_mut` methods are provided only on LruCaches using `Count` for sizing. A bit of optimization is provided for the default case--to avoid having to redundantly store the item count the `Meter` trait allows implementations to provide a type for storing the measurement, and the `Count` implementation uses `()`. The `CountableMeter` trait allows `LruCache` to work with this or any `Meter` implementation that uses `usize` for measurement. Finally an optional feature is added, `heapsize`, which uses the heapsize crate to provide HeapSizeOf, a measurement based on the heap memory usage of values stored in the cache.

rphmeier mentioned this issue Oct 12, 2016

Mostly configurable canonical cache size openethereum/parity-ethereum#2516

Merged

apasel422 mentioned this issue Oct 12, 2016

Capacity based on entries size #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the cache based on arbitrary measures #38

Limit the cache based on arbitrary measures #38

rphmeier commented Oct 12, 2016 •

edited

Loading

apasel422 commented Oct 12, 2016

rphmeier commented Oct 12, 2016

apasel422 commented Oct 12, 2016

luser commented Dec 1, 2016

Limit the cache based on arbitrary measures #38

Limit the cache based on arbitrary measures #38

Comments

rphmeier commented Oct 12, 2016 • edited Loading

apasel422 commented Oct 12, 2016

rphmeier commented Oct 12, 2016

apasel422 commented Oct 12, 2016

luser commented Dec 1, 2016

rphmeier commented Oct 12, 2016 •

edited

Loading