Type parameterization #771

cscherrer · 2018-09-15T02:06:43Z

Currently, a Distribution, say D, is parameterized by a F <: VariateForm and S <: ValueSupport, where

F can be Univariate, Multivariate, or Matrixvariate
S can be Discrete or Continuous.

There's a lot of useful structural information missing in this parameterization.

If D is one-dimensional, can it be negative? Is there an upper bound?
If D is multivariate, is it over R^n, or constrained, for example to a simplex?
If D is matrix-variate, does it have to be positive definite?
These constraints only address the values, what about the parameter space?

It seems the current type hierarchy could be replaced by one that that's parameterized by the supports of the parameter space and the observation space. This would make it much easier, for example, to automate the kind of transformations done by Stan and replicated in some of @tpapp's libraries. And it would make it natural to extend to distributions over data structures. For example a Markov chain would be a distribution over an iterator.

All of the functionality I'm describing could instead be implemented in a bolt-on sort of way, but it would be much more usable to have it all in one place. Also, my suggestion would break a lot of existing code. I wonder if there may be a way to have an alternate constructor and gracefully transition to something more informative.

I could try to take a crack at this, but it would be helpful to know...

Would others be interested in this functionality as well?
Are there performance or expressiveness reasons this is fundamentally a bad idea?
How can this be done in a way that avoids leading to two incompatible libraries?

As it is, this library is already a lot more enjoyable to use than its counterpart in other languages. Thank you for your work on it, and for any insights you might have into this issue.

The text was updated successfully, but these errors were encountered:

abraunst · 2019-05-14T15:03:01Z

Currently, a Distribution, say D, is parameterized by a F <: VariateForm and S <: ValueSupport, where

F can be Univariate, Multivariate, or Matrixvariate

S can be Discrete or Continuous.

There's a lot of useful structural information missing in this parameterization.

If D is one-dimensional, can it be negative? Is there an upper bound?

If D is multivariate, is it over R^n, or constrained, for example to a simplex?

If D is matrix-variate, does it have to be positive definite?

These constraints only address the values, what about the parameter space?

It seems the current type hierarchy could be replaced by one that that's parameterized by the supports of the parameter space and the observation space. This would make it much easier, for example, to automate the kind of transformations done by Stan and replicated in some of @tpapp's libraries. And it would make it natural to extend to distributions over data structures. For example a Markov chain would be a distribution over an iterator.

All of the functionality I'm describing could instead be implemented in a bolt-on sort of way, but it would be much more usable to have it all in one place. Also, my suggestion would break a lot of existing code. I wonder if there may be a way to have an alternate constructor and gracefully transition to something more informative.

I could try to take a crack at this, but it would be helpful to know...

Would others be interested in this functionality as well?

Are there performance or expressiveness reasons this is fundamentally a bad idea?

How can this be done in a way that avoids leading to two incompatible libraries?

As it is, this library is already a lot more enjoyable to use than its counterpart in other languages. Thank you for your work on it, and for any insights you might have into this issue.

Would parametrizing by the support of the distribution forbid having mixtures of distributions with different supports? This would be a setback in my opinion... Alternatively, one could make a mixture of two distributions be always allowed, with a support which is the union of the supports.

tpapp · 2019-05-14T15:26:52Z

I am not sure that a very rich type system like this a good match for describing various features of distributions.

Is it used much for dispatch? Could accessor/query function for various properties, ideally type stable (= traits) be a better match?

mschauer · 2019-05-15T09:23:37Z

I am also pushing this view a bit. Imho most of the time this type system doesn't help but goes into the way of usability. Usually a single abstract supertype (AbstractArray or Distribution) or traits (iterators) work nice.

cscherrer · 2019-05-18T10:39:29Z

Thanks for the responses. Here are a few possibilities:

Distributions parameterized by support
support method with correct support
(Below)

For (1), I would expect the mixtures to result in a Distribution over the union of the supports of components. This approach is mathematically nice, but has a few potential problems:

It requires a very rich type hierarchy that would need to "keep up" with the pace of description of new distributions. It's likely these would either fall out of sync or that the support guarantee would be invalid for some distributions
Often we don't know the support until runtime. If we have Uniform(f(0),f(1)), we have no idea what the support is, making it awkward at best to reason about.

For (2), the biggest (only?) issue is that it breaks existing use. The information returned by the current implementation is useful, but it's wrong. A "support" has a well-established use in measure theory. There are several variants of this that tend to coincide on "well-behaved" measures, and are all entirely different than the current implementation in Distributions.jl.

Some CAS systems (it's like "PIN number", I know) involve the concept of a carrier. For example, if you sample from a gamma distribution the result will be positive, but you may still represent it as a Float64. If code will be broken anyway (say at 2.0), it seems sensible to be able to represent both the carrier and the (actual) support.

The implementation of this could involve something like an abstract type Distribution{T}. T is the carrier, and would likely match the current (incorrect) implementation of support. Then support could be a function that returns the (correct) support.

matbesancon · 2019-05-23T07:37:58Z

@cscherrer @mschauer we could progress on this. One thing that would be acceptable is replace the current "Discrete" with an IntegerSupport, possibly sub-typed with FullInteger and PositiveInteger.
Distributions like Dirac #861 would then be Discrete, but not Integer, continuous could also be sub-typed with Positive.

Overall, I think keeping the two parameters is fine, but the type hierarchy could be made more flexible with this.

Thoughts

tpapp · 2019-05-23T07:54:44Z

@matbesancon: can you clarify what the type hierarchy is actually used for?

matbesancon · 2019-05-23T07:57:18Z

It is used:

To signal what a distribution is and what to expect from it (a multi-variate distribution is expected to return vectors)
To dispatch on corresponding methods

tpapp · 2019-05-23T08:08:41Z

To dispatch on corresponding methods

Sure — in general, this is what types are used for. But I meant specifically in this package.

My understanding of the code suggests that while it is used in a few places, eg for eltype, most of the time it is just carried around for no apparent reason.

There are a lot of instances when the type is just used to dispatch to a MethodError that would be thrown anyway.

I respect that different programmers have different styles, and that a package as large and complex as Distributions can benefit from some structure. But I wonder if the type hierarchy in Distributions reflects a design that was replaced by using traits as Julia evolved.

matbesancon · 2019-05-23T08:34:03Z

My understanding of the code suggests that while it is used in a few places, eg for eltype, most of the time it is just carried around for no apparent reason.

It is exposed behavior and I would believe it is used for packages depending on Distributions in an 'advanced' way.

There are a lot of instances when the type is just used to dispatch to a MethodError that would be thrown anyway.

Agreed, this can be removed, but it's a common pattern to explicitly show errors where an interface must be implemented.

But I wonder if the type hierarchy in Distributions reflects a design that was replaced by using traits as Julia evolved.

Traits are still not used everywhere and I believe we should not throw the type parameter baby with the bath of current types

mschauer · 2019-05-23T13:03:55Z

Reasonable uses are imho: indicating what pdf returns (a density or a probability vector or undefined) or whether rand(D, 5) creates a Matrix or a Vector.

tpapp · 2022-07-25T08:43:32Z

Revisiting this issue: traits are now used pervasively in Julia, so a potential redesign could just replace (a lot of) the type hierarchy with traits, addressing this and a host of related issues.

zsunberg · 2023-04-14T21:57:11Z

It would help me to switch to traits for this: JuliaPOMDP/POMDPs.jl#473 (comment)

matbesancon mentioned this issue Sep 20, 2018

RFC: First take on cleaning up EmpiricalUnivariateDistribution #661

Closed

matbesancon mentioned this issue May 14, 2019

DiscreteNonParametric should be <: ContinuousDistribution #887

Open

matbesancon mentioned this issue May 17, 2019

Release Distributions.jl v1.0 #880

Open

matbesancon added the types label May 23, 2019

richardreeve mentioned this issue Jul 8, 2019

Handling discontinuous density functions #925

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type parameterization #771

Type parameterization #771

cscherrer commented Sep 15, 2018

abraunst commented May 14, 2019

tpapp commented May 14, 2019

mschauer commented May 15, 2019

cscherrer commented May 18, 2019 •

edited

Loading

matbesancon commented May 23, 2019

tpapp commented May 23, 2019

matbesancon commented May 23, 2019

tpapp commented May 23, 2019

matbesancon commented May 23, 2019

mschauer commented May 23, 2019

tpapp commented Jul 25, 2022

zsunberg commented Apr 14, 2023

Type parameterization #771

Type parameterization #771

Comments

cscherrer commented Sep 15, 2018

abraunst commented May 14, 2019

tpapp commented May 14, 2019

mschauer commented May 15, 2019

cscherrer commented May 18, 2019 • edited Loading

matbesancon commented May 23, 2019

tpapp commented May 23, 2019

matbesancon commented May 23, 2019

tpapp commented May 23, 2019

matbesancon commented May 23, 2019

mschauer commented May 23, 2019

tpapp commented Jul 25, 2022

zsunberg commented Apr 14, 2023

cscherrer commented May 18, 2019 •

edited

Loading