Fix ValueSupport to allow non-integer discrete support #941

richardreeve · 2019-07-26T01:36:34Z

I've made this PR to provide a better way of handling non-integer discrete support, since it doesn't currently exist in Distributions. There are cleverer things that could be done, but this seems to work and so far provides all of the functionality people are talking about that I've seen.

~~New ValueSupport type hierarchy, allowing non-integer discrete support through CountableSupport <: ValueSupport~~ [now in New type hierarchy for ValueSupport #945]
Default to pmf() and logpmf() for distributions with countable support as we have mass not density, but allow fallback to [log]pdf() for compatibility.
Incorporates RFC: Allow DiscreteNonParametric to have non-Real support #916 allowing DiscreteNonParametric to have non-integer discrete support, addressing DiscreteNonParametric should be <: ContinuousDistribution #887
Preparatory work for Handling discontinuous density functions #925 to allow discontinuous densities
Incorporate Dirac distribution from Implement Dirac distribution #861 and fix to new ValueSupport.
Add Slab-and-Spike (SpikeSlab) distribution for Add Slab-and-Spike prior TuringLang/Turing.jl#847
Add general mixture of continuous and discrete (CompoundDistribution) addressing MixtureModel with Continuous and Discrete #332
Implement general zero-inflated (ZeroInflated) and hurdle (Hurdle) distributions as special cases of CompoundDistribution
Add tests

This PR is now split, with the ValueSupport changes in #945 and the new distributions using the new type hierarchy here. This PR contains both, but the important stuff is the latter, as the former will go once #945 is merged.

…tric be Real

…Aluthge/Distributions.jl into rr/support

…ntroducing ContinuousSupport, CountableSupport and UnionSupport.

…y DiscreteNonParametric...

DilumAluthge · 2019-07-26T02:30:24Z

I like it

codecov-io · 2019-07-26T03:05:30Z

Codecov Report

Merging #941 into master will decrease coverage by 1.27%.
The diff coverage is 54.32%.

@@            Coverage Diff             @@
##           master     #941      +/-   ##
==========================================
- Coverage   77.31%   76.03%   -1.28%     
==========================================
  Files         112      114       +2     
  Lines        5369     5508     +139     
==========================================
+ Hits         4151     4188      +37     
- Misses       1218     1320     +102

Impacted Files	Coverage Δ
src/multivariate/mvtdist.jl	`58.69% <ø> (-0.45%)`	⬇️
src/multivariate/mvlognormal.jl	`96.77% <ø> (-0.06%)`	⬇️
src/multivariate/mvnormal.jl	`71.08% <ø> (-0.18%)`	⬇️
src/multivariate/mvnormalcanon.jl	`80.43% <ø> (-0.42%)`	⬇️
src/multivariate/dirichletmultinomial.jl	`98.55% <ø> (ø)`	⬆️
src/multivariate/dirichlet.jl	`59.35% <ø> (-0.22%)`	⬇️
src/Distributions.jl	`100% <ø> (ø)`	⬆️
src/univariate/discrete/poisson.jl	`62.5% <0%> (ø)`	⬆️
src/univariate/discrete/binomial.jl	`73.68% <0%> (ø)`	⬆️
src/univariate/discrete/geometric.jl	`79.68% <0%> (ø)`	⬆️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a15c8ba...45809a4. Read the comment docs.

…into rr/support

… any real. Some others for Dirac.

…odel as its subtype. Add SpikeSlab distribution and general CompoundDistribution as new AbstractMixtureDistribution subtypes.

…oundDistribution

matbesancon · 2019-07-30T13:31:51Z

@richardreeve could you merge master into your branch? Very sorry for the mess, it's a big PR that just got merged (but boilerplaty so the diff should be trivial)

richardreeve · 2019-07-30T21:16:13Z

Done. Any comments welcome by the way @matbesancon and others! I need to put in some tests for the new distributions, but apart from that I’m pretty happy with this and I think it does what I need, and I think @DilumAluthge is also okay with it...

src/common.jl

matbesancon · 2019-07-30T21:22:24Z

src/univariate/continuous/normal.jl

@@ -169,11 +169,11 @@ invlogccdf(d::Normal, lp::Real) = xval(d, -norminvlogcdf(lp))
 function quantile(d::Normal, p::Real)


Wait what? Some the code wrong before? If so, could you add a test fixed by this

The code previously was giving the right answer by accident as the final line of code (the xval() call) worked on its own anyway even in these edge cases so there's nothing to test. I just corrected the earlier code that was not previously returning anything. So two mistakes cancelled each other out and there's nothing to test.

matbesancon · 2019-07-30T21:23:58Z

src/univariate/discrete/bernoulli.jl

-pdf(d::Bernoulli, x::Bool) = x ? succprob(d) : failprob(d)
-pdf(d::Bernoulli, x::Int) = x == 0 ? failprob(d) :
+pmf(d::Bernoulli, x::Bool) = x ? succprob(d) : failprob(d)
+pmf(d::Bernoulli, x::Int) = x == 0 ? failprob(d) :


no need to add two methods nor to hard-code Int:
Bool <: Integer and true == T(1) for any T <: Integer

Again, this isn't my code - all I've done is searched and replaced pdf with pmf for discrete distributions. I'm sure there are lots of problems with the existing code, but I felt that I was already being ambitious enough!

matbesancon · 2019-07-30T21:25:28Z

ok before going any further, this PR is huge, should probably be chunked in multiple ones, there should be one for the type hierarchy, then we'll see for the rest of it

richardreeve · 2019-07-30T22:00:28Z

I did wonder about that, but it's worth pointing out that what you're spotting up there are not my introductions. Although the changes touch a lot of files, it's mostly just search and replace of pdf for pmf...

richardreeve · 2019-07-30T22:38:25Z

It's also true that the Dirac work has been sitting around as a #861 for a while and needed updating to this new hierarchy, and the same is true in #916 for the DiscreteNonParametric and Categorical distributions - I've just merged and updated these, so they have been largely reviewed already. Also, the type hierarchy is all in common.jl (except for exports in Distributions.jl and the doc updates), so it can be reviewed almost entirely in isolation. However, I can split it out if you really think it will be better as I obviously want this to be merged.

matbesancon · 2019-07-31T14:30:31Z

@richardreeve maybe wait for #945 before continuing this? Otherwise you risk running into merge conflicts

richardreeve · 2019-07-31T14:47:52Z

Absolutely. I'm just keeping the two synced by merging in rr/countable and checking it still works...

…where the support is over contiguous integers (as most are).

Add in ContiguousSupport for support in a:b.

richardreeve · 2019-08-03T23:55:56Z

Keeping up to date with #945, so we now have ContiguousSupport as a subtype of CountableSupport.

richardreeve · 2019-08-04T12:02:48Z

This PR is now split, with the ValueSupport changes in #945 and the new distributions using the new type hierarchy here. This PR contains both, but the important stuff is the latter, as the former will go once #945 is merged.

cscherrer · 2019-08-04T13:40:18Z

Can you give some more detail of the benefit of splitting pdf/pmf? Formally, a pmf is still a pdf over counting measure. Breaking them apart forces everything to consider two separate cases without any obvious benefit. This can be a pain point for PPL implementation in particular.

I can see a benefit if you're describing something that's singular over the base measure like a sample from a Dirichlet process, but I don't see that addressed here.

richardreeve · 2019-08-04T13:55:09Z

Once you start having arbitrary mixtures of distributions, that includes mixtures of continuous and discrete distributions. If you do that, you get infinite spikes in the (pdf) density that represent discrete probability masses (in the pmf). However, in the pdf there is no way of quantifying the size of the spikes (as far as I'm aware).

For cumulative densities this isn't a problem, but I don't see a way around having two functions for the density and mass functions if you want to generally be able to describe the density and the mass of distributions with discontinuous cdfs but continuous support (like spike-and-slab and hurdle distributions).

cscherrer · 2019-08-04T14:24:35Z

Once you start having arbitrary mixtures of distributions, that includes mixtures of continuous and discrete distributions. If you do that, you get infinite spikes in the (pdf) density that represent discrete probability masses (in the pmf). However, in the pdf there is no way of quantifying the size of the spikes (as far as I'm aware).

Ok, so this addresses the problem of singularities. That makes sense. Say you have a mixture like this, and d is in the support of the discrete component, while c is outside it. Does your implementation return

pdf(mix, d) == Inf
pdf(mix, c) == continuousWeight * pdf(continuousComponent, c)
pmf(mix, d) == discreteWeight * pmf(discreteComponent, d)
pmf(mix, c) == 0

? Is that what it "should" return, or am I thinking about it wrong?

For cumulative densities this isn't a problem, but I don't see a way around having two functions for the density and mass functions if you want to generally be able to describe the density and the mass of distributions with discontinuous cdfs but continuous support (like spike-and-slab and hurdle distributions).

Me neither. I hadn't considered carefully representing mixed continuous/discrete distributions as even being a possibility, but it would be pretty great. It seems like it would also allow things like MvNormals with positive semidefinite covariance, which could help for working with low-rank models.

Have you seen such general representation of distributions in other systems? It would be helpful to know what trouble people have had to be sure we're learning from their mistakes.

richardreeve · 2019-08-04T14:59:59Z

Ok, so this addresses the problem of singularities. That makes sense. Say you have a mixture like this, and d is in the support of the discrete component, while c is outside it. Does your implementation return
pdf(mix, d) == Inf
pdf(mix, c) == continuousWeight * pdf(continuousComponent, c)
pmf(mix, d) == discreteWeight * pmf(discreteComponent, d)
pmf(mix, c) == 0
? Is that what it "should" return, or am I thinking about it wrong?

Yes, that's exactly what I was thinking of! SpikeSlab and CompoundDistribution in this PR implement this in fact, though I haven't put any testing in yet to check as I got distracted by creating #945.

For cumulative densities this isn't a problem, but I don't see a way around having two functions for the density and mass functions if you want to generally be able to describe the density and the mass of distributions with discontinuous cdfs but continuous support (like spike-and-slab and hurdle distributions).

Me neither. I hadn't considered carefully representing mixed continuous/discrete distributions as even being a possibility, but it would be pretty great. It seems like it would also allow things like MvNormals with positive semidefinite covariance, which could help for working with low-rank models.

Great, I hadn't thought of that. Obviously it's important that this extension to the type hierarchy is as extensible as possible - though it doesn't necessarily have to be able to handle everything on day one. At the moment, I'm not at all that confident that the UnionSupport type that I suggest is the right way to go, so I'm open to other suggestions?

Have you seen such general representation of distributions in other systems? It would be helpful to know what trouble people have had to be sure we're learning from their mistakes.

Unfortunately not. It's only having the ability to work on Distributions.jl in Julia directly that gave me the idea that I might be able to do this at all...

cscherrer · 2019-08-04T15:36:07Z

I guess to be really rigorous about it we should be working (eventually) in terms of the refinement to Lebesgue's decomposition.

richardreeve · 2019-08-04T15:43:23Z

I may have to take your word for that!

richardreeve · 2019-08-04T15:43:53Z

In the meanwhile, it would be good to know what you think of this first step...

cscherrer · 2019-08-04T16:17:01Z

I think it's definitely a step in the right direction. And if it's really non-breaking, I say go for it. But I'm having trouble seeing how this can be non-breaking. If we have mixed discrete/continuous distributions, then

pdf(mix, d) == Inf
pdf(mix, c) == continuousWeight * pdf(continuousComponent, c)
pmf(mix, d) == discreteWeight * pmf(discreteComponent, d)
pmf(mix, c) == 0

seems sensible. But say we have some simple example of this, like mix is a mixture of a standard Normal() and a Bernoulli(0.5). Then we'd have

pdf(mix, 0.0) == Inf
pdf(Bernoulli(0.5)) == 0.5

This would seem to imply that pdf(Normal(),0.0) == Inf, which of course isn't right. The problem is that we're implicitly switching base measures.

richardreeve · 2019-08-04T16:45:34Z

You're right, but it's non-breaking in the sense that everything behaves exactly as it did before except for new distributions, which obviously can't be breaking however you code them. And up till now it hasn't been possible to make these mixed distributions at all, so I'm safe...

My feeling is that we ought to move to a system where pdf and pmf are distinct though, and so the pdf(d::CountableDistribution) = pmf(d) compatibility bit should be dropped, for exactly the reason you bring up.

The only other option I can think of would be to have a third (mostly optional) argument to pdf, that allowed you to choose the type of density to return. By default it would return the appropriate density:

pdf(d::CountableDistribution, x) = pdf(d, x, Mass())
pdf(d::ContinuousDistribution, x) = pdf(d, x, Density())
pdf(::ContinuousDistribution, _, ::Mass) = 0.0
pdf(d::CountableDistribution, x, ::Density) = insupport(d, x) ? Inf : 0.0

But this would not be defined for mixed distributions, so you'd have to select the appropriate density type explicitly. I don't have any feeling for whether that's a good idea, but it would allow us to maintain backward compatibility indefinitely...

aplavin and others added 9 commits May 4, 2019 11:28

Implement Dirac distribution

7f354a2

Merge branch 'master' into patch-1

5dc860d

Float return values instead of Int

d6503c4

Remove the restriction that the support values of a DiscreteNonParame…

d438ccc

…tric be Real

Merge branch 'da/discrete-non-parametric' of https://github.com/Dilum…

caed648

…Aluthge/Distributions.jl into rr/support

ValueSupport is now parameterised by the eltype of its support, and i…

2afcb05

…ntroducing ContinuousSupport, CountableSupport and UnionSupport.

Fix univariates and multivariates to use new ValueSupport - especiall…

5965616

…y DiscreteNonParametric...

Update testing.

e50262f

Fixing log[c]cdf to work with DiscreteNonParametric.

31051b8

DilumAluthge approved these changes Jul 26, 2019

View reviewed changes

richardreeve added 8 commits July 28, 2019 21:53

Merge branch 'patch-1' of https://github.com/aplavin/Distributions.jl …

61f8114

…into rr/support

Fixing function signatures, especially for quantile(), which can take…

335766f

… any real. Some others for Dirac.

Categorical can support any integer.

4783b13

Fixing eltypes for Categorical and Dirac types.

d73a229

Fixing Dirac and DiscreteNonParametric

a9359ec

Minor ValueSupport fixes

fe08381

Add in AbstractMixtureDistribution as supertype with AbstractMixtureM…

a4eceb6

…odel as its subtype. Add SpikeSlab distribution and general CompoundDistribution as new AbstractMixtureDistribution subtypes.

Add in ZeroInflated and Hurdle distributions as special cases of Comp…

88de1b2

…oundDistribution

Merge branch 'master' into rr/support

67dca49

matbesancon requested a review from mschauer July 30, 2019 21:17

matbesancon reviewed Jul 30, 2019

View reviewed changes

src/common.jl Outdated Show resolved Hide resolved

matbesancon reviewed Jul 30, 2019

View reviewed changes

richardreeve added 2 commits July 31, 2019 14:58

Merge branch 'rr/countable' into rr/support

492c090

Remove NonMatrixDistribution

f4e5156

richardreeve added 11 commits July 31, 2019 17:05

Add in some more tests for new code.

f8e9918

Merge branch 'master' into rr/countable

6642ae9

Updating nsamples() and adding testing.

42477c6

Remove 0.7 on appveyor.

6c1c76e

Merge branch 'rr/countable' into rr/support

8c6e326

Update readme to mention pmf.

4b64208

Merge branch 'master' into rr/countable

bb9c2ac

Merge branch 'master' into rr/countable

acad538

Add in a specific subtype of countable support - ContiguousSupport - …

0c148ef

…where the support is over contiguous integers (as most are).

Fix non-ContiguousSupport distributions.

d42c02a

Merge branch 'rr/countable' into rr/support

45809a4

Add in ContiguousSupport for support in a:b.

richardreeve mentioned this pull request Aug 4, 2019

Change to ValueSupport interface #951

Closed

DilumAluthge mentioned this pull request Sep 3, 2019

RFC: Allow DiscreteNonParametric to have non-Real support #916

Closed

richardreeve mentioned this pull request Dec 1, 2020

DiscreteNonParametric should be <: ContinuousDistribution #887

Open

richardreeve closed this by deleting the head repository Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ValueSupport to allow non-integer discrete support #941

Fix ValueSupport to allow non-integer discrete support #941

richardreeve commented Jul 26, 2019 •

edited

Loading

DilumAluthge commented Jul 26, 2019

codecov-io commented Jul 26, 2019 •

edited

Loading

matbesancon commented Jul 30, 2019

richardreeve commented Jul 30, 2019

matbesancon Jul 30, 2019

richardreeve Jul 30, 2019

matbesancon Jul 30, 2019

richardreeve Jul 30, 2019

matbesancon commented Jul 30, 2019

richardreeve commented Jul 30, 2019

richardreeve commented Jul 30, 2019

matbesancon commented Jul 31, 2019

richardreeve commented Jul 31, 2019

richardreeve commented Aug 3, 2019 •

edited

Loading

richardreeve commented Aug 4, 2019

cscherrer commented Aug 4, 2019

richardreeve commented Aug 4, 2019

cscherrer commented Aug 4, 2019

richardreeve commented Aug 4, 2019

cscherrer commented Aug 4, 2019

richardreeve commented Aug 4, 2019

richardreeve commented Aug 4, 2019

cscherrer commented Aug 4, 2019

richardreeve commented Aug 4, 2019 •

edited

Loading

		@@ -169,11 +169,11 @@ invlogccdf(d::Normal, lp::Real) = xval(d, -norminvlogcdf(lp))
		function quantile(d::Normal, p::Real)

Fix ValueSupport to allow non-integer discrete support #941

Fix ValueSupport to allow non-integer discrete support #941

Conversation

richardreeve commented Jul 26, 2019 • edited Loading

DilumAluthge commented Jul 26, 2019

codecov-io commented Jul 26, 2019 • edited Loading

Codecov Report

matbesancon commented Jul 30, 2019

richardreeve commented Jul 30, 2019

matbesancon Jul 30, 2019

Choose a reason for hiding this comment

richardreeve Jul 30, 2019

Choose a reason for hiding this comment

matbesancon Jul 30, 2019

Choose a reason for hiding this comment

richardreeve Jul 30, 2019

Choose a reason for hiding this comment

matbesancon commented Jul 30, 2019

richardreeve commented Jul 30, 2019

richardreeve commented Jul 30, 2019

matbesancon commented Jul 31, 2019

richardreeve commented Jul 31, 2019

richardreeve commented Aug 3, 2019 • edited Loading

richardreeve commented Aug 4, 2019

cscherrer commented Aug 4, 2019

richardreeve commented Aug 4, 2019

cscherrer commented Aug 4, 2019

richardreeve commented Aug 4, 2019

cscherrer commented Aug 4, 2019

richardreeve commented Aug 4, 2019

richardreeve commented Aug 4, 2019

cscherrer commented Aug 4, 2019

richardreeve commented Aug 4, 2019 • edited Loading

richardreeve commented Jul 26, 2019 •

edited

Loading

codecov-io commented Jul 26, 2019 •

edited

Loading

richardreeve commented Aug 3, 2019 •

edited

Loading

richardreeve commented Aug 4, 2019 •

edited

Loading