Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the cache based on arbitrary measures #38

Open
rphmeier opened this issue Oct 12, 2016 · 4 comments
Open

Limit the cache based on arbitrary measures #38

rphmeier opened this issue Oct 12, 2016 · 4 comments

Comments

@rphmeier
Copy link

rphmeier commented Oct 12, 2016

Motivation:
My use-case for lru-cache involves inspecting items' heap size. Rather than hack it in, I think it would be beneficial to make the cache adaptable to whatever limit needs to be imposed.

You might have a set of heterogeneous items, each using a different amount of allocation on the heap. Maybe a more general Meter trait might help:

trait Meter<T> {
    fn measure(&self, item: &T) -> usize;
}

The default Meter could be a Count:

struct Count;

impl<T> Meter<T> for Count { 
    fn measure(&self, _: &T) -> usize { 1 }
}

Here's an implementation that would use the HeapSizeOf trait (I'd recommend this one for inclusion under a feature gate as it's generally useful.)

struct HeapSize;

impl<T: HeapSizeOf> Meter<T> for HeapSize {
    fn measure(&self, item: &T) -> usize { item.heap_size_of_children() + ::std::mem::size_of::<T>() }
}

When you add or remove an item, you add or subtract meter.measure(&item) from the maximum respectively. If the metered total exceeds the maximum while adding an item, other items must be removed in a loop until the total falls below the limit.

Caveats:

  • get_mut or interior mutability could allow items to be altered such that their measure at insertion time differs from their measure at removal. We could add a UniformMeter<T> which takes no reference to individual items when measuring -- indicating that the measurement is unrelated to item size. get_mut might only be implemented when the meter is uniform, while fn get(&mut self, key: &Q) -> Option<&T> would serve as an alternative implemented for all caches. This doesn't solve the interior mutability issue, but that can be addressed the same way as in HashMap: documentation specifying that it's a logic error to modify the value such that its measure would be altered.
  • insert can remove multiple items -- how do we return each of them to the caller efficiently? (maybe an iterator?)

Alternatively:

  • check the measure before and after get_mut and adjust the total correspondingly. The issue here is that it would need to be done through a CacheHandle<'a, T: 'a> which contains that logic in its destructor. This would break backwards compatibility, and would need to handle returning displaced items if the size grows.
  • re-measure every item in the cache whenever necessary. this seems prohibitively expensive

Looking forward to your comments and naming suggestions below.

@apasel422
Copy link
Contributor

This is a neat idea. I need to ponder this further, but I think I'm in favor of going the HashMap route and just documenting that changing an item's measure after it's been inserted is a logic error, potentially also providing a method that can be used to update the measure in a controlled way once it's in the map.

@rphmeier
Copy link
Author

@apasel422 The situation is a bit different, though, since the measure can be changed through plain old mutability as opposed to just interior mutability like HashMap keys.

I wonder if Meter should be Meter<K, V> and measurement should be done based on both key and value sizes (String as key type comes to mind)

@apasel422
Copy link
Contributor

I do like the idea of a CacheHandle (modulo naming?), even though it's possible to mem::forget it. A recent change to BinaryHeap is similar and declared leaking OK, since it doesn't affect memory safety. As for backwards compatibility, we could just have a semver bump, or declare a new method on LruCache<K, V, M> that returns CacheHandle instead of &mut V and leave get_mut only on LruCache<K, V> (assuming it defaults to Count).

I was also thinking it could be Meter<K, V>.

Is there a way to avoid storing a duplicate usize with Count, since it's derivable from the underlying LinkedHashMap's len()?

@luser
Copy link

luser commented Dec 1, 2016

I have a patch for this mostly done, with the exception of the get_mut bits.

luser added a commit to luser/lru-cache that referenced this issue Dec 2, 2016
This commit allows creating LruCaches that are limited in size by
something other than a count of entries. It does this by providing a
new trait, `Meter`, where implementations of the trait provide a
`measure` method to do the measurement. The default behavior of
measuring by the count of entries is provided by a `Count` type.

Due to difficulties with mutating entries making it possible to change
an item's measure, the `get_mut` and `iter_mut` methods are provided
only on LruCaches using `Count` for sizing.

A bit of optimization is provided for the default case--to avoid having
to redundantly store the item count the `Meter` trait allows implementations
to provide a type for storing the measurement, and the `Count` implementation
uses `()`. The `CountableMeter` trait allows `LruCache` to work with this
or any `Meter` implementation that uses `usize` for measurement.

Finally an optional feature is added, `heapsize`, which uses the heapsize
crate to provide HeapSizeOf, a measurement based on the heap memory usage
of values stored in the cache.
rillian pushed a commit to rillian/lru-cache that referenced this issue Jul 13, 2017
This commit allows creating LruCaches that are limited in size by
something other than a count of entries. It does this by providing a
new trait, `Meter`, where implementations of the trait provide a
`measure` method to do the measurement. The default behavior of
measuring by the count of entries is provided by a `Count` type.

Due to difficulties with mutating entries making it possible to change
an item's measure, the `get_mut` and `iter_mut` methods are provided
only on LruCaches using `Count` for sizing.

A bit of optimization is provided for the default case--to avoid having
to redundantly store the item count the `Meter` trait allows implementations
to provide a type for storing the measurement, and the `Count` implementation
uses `()`. The `CountableMeter` trait allows `LruCache` to work with this
or any `Meter` implementation that uses `usize` for measurement.

Finally an optional feature is added, `heapsize`, which uses the heapsize
crate to provide HeapSizeOf, a measurement based on the heap memory usage
of values stored in the cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants