-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ValueSupport to allow non-integer discrete support #941
Conversation
…ntroducing ContinuousSupport, CountableSupport and UnionSupport.
…y DiscreteNonParametric...
I like it |
Codecov Report
@@ Coverage Diff @@
## master #941 +/- ##
==========================================
- Coverage 77.31% 76.03% -1.28%
==========================================
Files 112 114 +2
Lines 5369 5508 +139
==========================================
+ Hits 4151 4188 +37
- Misses 1218 1320 +102
Continue to review full report at Codecov.
|
…into rr/support
… any real. Some others for Dirac.
…odel as its subtype. Add SpikeSlab distribution and general CompoundDistribution as new AbstractMixtureDistribution subtypes.
@richardreeve could you merge master into your branch? Very sorry for the mess, it's a big PR that just got merged (but boilerplaty so the diff should be trivial) |
Done. Any comments welcome by the way @matbesancon and others! I need to put in some tests for the new distributions, but apart from that I’m pretty happy with this and I think it does what I need, and I think @DilumAluthge is also okay with it... |
@@ -169,11 +169,11 @@ invlogccdf(d::Normal, lp::Real) = xval(d, -norminvlogcdf(lp)) | |||
function quantile(d::Normal, p::Real) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait what? Some the code wrong before? If so, could you add a test fixed by this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code previously was giving the right answer by accident as the final line of code (the xval() call) worked on its own anyway even in these edge cases so there's nothing to test. I just corrected the earlier code that was not previously returning anything. So two mistakes cancelled each other out and there's nothing to test.
pdf(d::Bernoulli, x::Bool) = x ? succprob(d) : failprob(d) | ||
pdf(d::Bernoulli, x::Int) = x == 0 ? failprob(d) : | ||
pmf(d::Bernoulli, x::Bool) = x ? succprob(d) : failprob(d) | ||
pmf(d::Bernoulli, x::Int) = x == 0 ? failprob(d) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add two methods nor to hard-code Int
:
Bool <: Integer
and true == T(1)
for any T <: Integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this isn't my code - all I've done is searched and replaced pdf with pmf for discrete distributions. I'm sure there are lots of problems with the existing code, but I felt that I was already being ambitious enough!
ok before going any further, this PR is huge, should probably be chunked in multiple ones, there should be one for the type hierarchy, then we'll see for the rest of it |
I did wonder about that, but it's worth pointing out that what you're spotting up there are not my introductions. Although the changes touch a lot of files, it's mostly just search and replace of pdf for pmf... |
It's also true that the Dirac work has been sitting around as a #861 for a while and needed updating to this new hierarchy, and the same is true in #916 for the DiscreteNonParametric and Categorical distributions - I've just merged and updated these, so they have been largely reviewed already. Also, the type hierarchy is all in common.jl (except for exports in Distributions.jl and the doc updates), so it can be reviewed almost entirely in isolation. However, I can split it out if you really think it will be better as I obviously want this to be merged. |
@richardreeve maybe wait for #945 before continuing this? Otherwise you risk running into merge conflicts |
Absolutely. I'm just keeping the two synced by merging in rr/countable and checking it still works... |
…where the support is over contiguous integers (as most are).
Add in ContiguousSupport for support in a:b.
Keeping up to date with #945, so we now have |
Can you give some more detail of the benefit of splitting pdf/pmf? Formally, a pmf is still a pdf over counting measure. Breaking them apart forces everything to consider two separate cases without any obvious benefit. This can be a pain point for PPL implementation in particular. I can see a benefit if you're describing something that's singular over the base measure like a sample from a Dirichlet process, but I don't see that addressed here. |
Once you start having arbitrary mixtures of distributions, that includes mixtures of continuous and discrete distributions. If you do that, you get infinite spikes in the (pdf) density that represent discrete probability masses (in the pmf). However, in the pdf there is no way of quantifying the size of the spikes (as far as I'm aware). For cumulative densities this isn't a problem, but I don't see a way around having two functions for the density and mass functions if you want to generally be able to describe the density and the mass of distributions with discontinuous cdfs but continuous support (like spike-and-slab and hurdle distributions). |
Ok, so this addresses the problem of singularities. That makes sense. Say you have a mixture like this, and pdf(mix, d) == Inf
pdf(mix, c) == continuousWeight * pdf(continuousComponent, c)
pmf(mix, d) == discreteWeight * pmf(discreteComponent, d)
pmf(mix, c) == 0 ? Is that what it "should" return, or am I thinking about it wrong?
Me neither. I hadn't considered carefully representing mixed continuous/discrete distributions as even being a possibility, but it would be pretty great. It seems like it would also allow things like Have you seen such general representation of distributions in other systems? It would be helpful to know what trouble people have had to be sure we're learning from their mistakes. |
Yes, that's exactly what I was thinking of!
Great, I hadn't thought of that. Obviously it's important that this extension to the type hierarchy is as extensible as possible - though it doesn't necessarily have to be able to handle everything on day one. At the moment, I'm not at all that confident that the
Unfortunately not. It's only having the ability to work on |
I guess to be really rigorous about it we should be working (eventually) in terms of the refinement to Lebesgue's decomposition. |
I may have to take your word for that! |
In the meanwhile, it would be good to know what you think of this first step... |
I think it's definitely a step in the right direction. And if it's really non-breaking, I say go for it. But I'm having trouble seeing how this can be non-breaking. If we have mixed discrete/continuous distributions, then pdf(mix, d) == Inf
pdf(mix, c) == continuousWeight * pdf(continuousComponent, c)
pmf(mix, d) == discreteWeight * pmf(discreteComponent, d)
pmf(mix, c) == 0 seems sensible. But say we have some simple example of this, like pdf(mix, 0.0) == Inf
pdf(Bernoulli(0.5)) == 0.5 This would seem to imply that |
You're right, but it's non-breaking in the sense that everything behaves exactly as it did before except for new distributions, which obviously can't be breaking however you code them. And up till now it hasn't been possible to make these mixed distributions at all, so I'm safe... My feeling is that we ought to move to a system where pdf and pmf are distinct though, and so the The only other option I can think of would be to have a third (mostly optional) argument to pdf(d::CountableDistribution, x) = pdf(d, x, Mass())
pdf(d::ContinuousDistribution, x) = pdf(d, x, Density())
pdf(::ContinuousDistribution, _, ::Mass) = 0.0
pdf(d::CountableDistribution, x, ::Density) = insupport(d, x) ? Inf : 0.0 But this would not be defined for mixed distributions, so you'd have to select the appropriate density type explicitly. I don't have any feeling for whether that's a good idea, but it would allow us to maintain backward compatibility indefinitely... |
I've made this PR to provide a better way of handling non-integer discrete support, since it doesn't currently exist in Distributions. There are cleverer things that could be done, but this seems to work and so far provides all of the functionality people are talking about that I've seen.
New ValueSupport type hierarchy, allowing non-integer discrete support through CountableSupport <: ValueSupport[now in New type hierarchy for ValueSupport #945]pmf()
andlogpmf()
for distributions with countable support as we have mass not density, but allow fallback to[log]pdf()
for compatibility.DiscreteNonParametric
to have non-Real
support #916 allowing DiscreteNonParametric to have non-integer discrete support, addressing DiscreteNonParametric should be <: ContinuousDistribution #887This PR is now split, with the
ValueSupport
changes in #945 and the new distributions using the new type hierarchy here. This PR contains both, but the important stuff is the latter, as the former will go once #945 is merged.