Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove default independent sampler jitter but ensure positive variance #888

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions tests/unit/models/gpflow/test_sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,22 @@ def test_independent_reparametrization_sampler_reset_sampler(qmc: bool, qmc_skip
npt.assert_array_less(1e-9, tf.abs(samples2 - samples1))


@pytest.mark.parametrize("qmc", [True, False])
@pytest.mark.parametrize("dtype", [tf.float32, tf.float64])
def test_independent_reparametrization_sampler_sample_ensures_positive_variance(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this test is doing... does setting the kernel amplitude to 0 makes the model variance equal to zero? should we check then that the model prediction variance is zero, but the sampler applies the right fix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. I've now added an assert that the model variance is zero.

qmc: bool, dtype: tf.DType
) -> None:
model = QuadraticMeanAndRBFKernel(kernel_amplitude=tf.constant(0, dtype=dtype))
sampler = IndependentReparametrizationSampler(100, model, qmc=qmc)
x = tf.constant([[1.0]], dtype=dtype)
_, model_var = model.predict(x)
npt.assert_array_equal(model_var, tf.constant([[0]]))
variance = tf.math.reduce_variance(sampler.sample(x)) # default jitter
assert variance > (1e-7 if dtype is tf.float32 else 1e-17)
variance = tf.math.reduce_variance(sampler.sample(x, jitter=0.0)) # explicit jitter
assert variance > (1e-7 if dtype is tf.float32 else 1e-17)


@pytest.mark.parametrize("qmc", [True, False])
@pytest.mark.parametrize("sample_size", [0, -2])
def test_batch_reparametrization_sampler_raises_for_invalid_sample_size(
Expand Down
26 changes: 26 additions & 0 deletions tests/unit/utils/test_misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
LocalizedTag,
Ok,
Timer,
ensure_positive,
flatten_leading_dims,
get_value_for_tag,
jit,
Expand Down Expand Up @@ -222,3 +223,28 @@ def test_flatten_leading_dims_invalid_output_dims(output_dims: int) -> None:
x_old = tf.random.uniform([2, 3, 4, 5]) # [2, 3, 4, 5]
with pytest.raises(TF_DEBUGGING_ERROR_TYPES):
flatten_leading_dims(x_old, output_dims=output_dims)


@pytest.mark.parametrize(
"t, expected",
[
(
tf.constant(0, dtype=tf.float32),
tf.constant(1e-6, dtype=tf.float32),
),
(
tf.constant(0, dtype=tf.float64),
tf.constant(1e-16, dtype=tf.float64),
),
(
tf.constant([[-1.0, 0.0], [1e-7, 1.0]], dtype=tf.float32),
tf.constant([[1e-6, 1e-6], [1e-6, 1.0]], dtype=tf.float32),
),
(
tf.constant([[-1.0, 0.0], [1e-7, 1.0]], dtype=tf.float64),
tf.constant([[1e-16, 1e-16], [1e-7, 1.0]], dtype=tf.float64),
),
],
)
def test_ensure_positive(t: TensorType, expected: TensorType) -> None:
npt.assert_array_equal(ensure_positive(t), expected)
2 changes: 1 addition & 1 deletion tests/util/acquisition/sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def sample(self, at: TensorType, *, jitter: float = DEFAULTS.JITTER) -> TensorTy
"""
:param at: Batches of query points at which to sample the predictive distribution, with
shape `[..., B, D]`, for batches of size `B` of points of dimension `D`.
:param jitter: placeholder
:param jitter: unused
:return: The samples, of shape `[..., S, B, L]`, where `S` is the `sample_size`, `B` the
number of points per batch, and `L` the dimension of the model's predictive
distribution.
Expand Down
5 changes: 3 additions & 2 deletions trieste/models/gpflow/sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from ...space import EncoderFunction
from ...types import TensorType
from ...utils import DEFAULTS, flatten_leading_dims
from ...utils.misc import ensure_positive
from ..interfaces import (
ProbabilisticModel,
ReparametrizationSampler,
Expand Down Expand Up @@ -114,7 +115,7 @@ def __init__(
"at: [N..., 1, D] # IndependentReparametrizationSampler only supports batch sizes of one",
"return: [N..., S, 1, L]",
)
def sample(self, at: TensorType, *, jitter: float = DEFAULTS.JITTER) -> TensorType:
def sample(self, at: TensorType, *, jitter: float = 0.0) -> TensorType:
"""
Return approximate samples from the `model` specified at :meth:`__init__`. Multiple calls to
:meth:`sample`, for any given :class:`IndependentReparametrizationSampler` and ``at``, will
Expand All @@ -133,7 +134,7 @@ def sample(self, at: TensorType, *, jitter: float = DEFAULTS.JITTER) -> TensorTy
tf.debugging.assert_greater_equal(jitter, 0.0)

mean, var = self._model.predict(at[..., None, :, :]) # [..., 1, 1, L], [..., 1, 1, L]
var = var + jitter
var = ensure_positive(var + jitter)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note that we could alternatively ignore the jitter argument here, even if it's explicitly provided, if we think that would be better)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version might be a bit difficult to read and debug, as we are potentially applying a correction twice (we apply the jitter with the sum, then with ensure_positive we potentially add an offset).

But I'm not sure if there exists a better alternative

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution to both this comment and the one at the end would be to change the default value to -1, and comment that this magic value doesn't add jitter but ensures that the variance is positive. And then if the user specifies an explicit non-negative jitter we can use that unmodified?

(Engineering-wise it would be nicer to make jitter an Optional[float] but that would necessitate changing the interface and modifying the other samplers too.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explicitly ignore the jitter here and add to docstrings that it is ignored - perhaps lets also do it properly and change it to be optional

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I think there should be no reason for the user to want a different jitter, right @vpicheny ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also please check GPflux and keras samplers?
in Keras (https://github.com/secondmind-labs/trieste/blob/25d2a038fc1a74485337afac4fa45f29a4c4a311/trieste/models/keras/sampler.py#L171C59-L171C67), we have the same use-case and we should use the new ensure_positive function there as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main case for the jitter here is when the sampling is used with an acquisition function, possibly using sqrt(var) or log(var) or cdf(mean, var), that would fail if it is numerically zero but negative.

Otherwise we would probably just want to avoid any offset that would get in the way, e.g. say the output is not rescaled and has very very small values so adding 1e-6 would change everything.

We could leave this logic to the acquisition function, or just ensure here that we are "just positive".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vpicheny so is your suggestion to ignore the jitter in IndependentReparametrizationSampler but still call ensure_positive?

@hstojic similarly, are you proposing to call ensure_positive in deep_ensemble_trajectory rather than adding DEFAULTS.JITTER?


def sample_eps() -> tf.Tensor:
self._initialized.assign(True)
Expand Down
6 changes: 6 additions & 0 deletions trieste/utils/misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,3 +462,9 @@ def _flatten_module( # type: ignore[no-untyped-def]
for subvalue in subvalues:
# Predicate is already tested for these values.
yield subvalue


def ensure_positive(x: TensorType) -> TensorType:
"""Esure that all the elements in `x` are strictly positive (using a dtype-dependent
capping threshold."""
return tf.math.maximum(x, 1e-6 if x.dtype == tf.float32 else 1e-16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naive question, is 1e-6 the lowest we can have with single precision?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all. This was just based on scaling up the suggested value of 1e-16 for float64. Both numbers can go significantly smaller if we want: float32 can go down to aound 1e-38 and float64 to 2e-308. Do you have any intuition for how small we should make these?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be fine with smallest number for each precision that makes it positive, though it may depend on the usage downstream - at the moment we are just taking sqrt and doing some multiplication, that will take it to equal 0 but in this use case it should be fine I think? eps contribution would be removed in these cases, but not sure if that's relevant

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I would probably vote for a very small value on both cases. 1e-6 is way too high.

And maybe we do not need to differentiate between single and double precision? Both could be e.g. 1e-32 or something

Loading