You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It generalizes the normal (for p=2) and double exponential (Laplace) distribution (for p=1) and in the limit case also the uniform (p → ∞) for the shape parameter p.
These correspond to the $L_p$ norms when used for regularization and in other cases.
For examples:
y ~normal(X*beta, sigma);
produces the criterion minimize $L_2$ norm of (X*beta-y) (a.k.a. least squares). But then
y ~double_exponential(X*beta, sigma);
produces the criterion minimize $L_1$ norm of (X*beta-y). (a.k.a. least absolute deviations)
Similarly, in the Bayesian interpretation of ridge and LASSO,
beta ~normal(0, lambda);
y ~normal(X*beta, sigma);
produces the $L_2$ regularized ridge and
beta ~double_exponential(0, lambda);
y ~normal(X*beta, sigma);
produces the $L_1$ regularized LASSO.
Using the Generalized normal distribution would allow to conveniently use an arbitrary $L_p$ norm for the optimization criterion, or to even find the suitable value of p when used as a parameter.
The text was updated successfully, but these errors were encountered:
Description
Implement Generalized normal distribution https://en.wikipedia.org/wiki/Generalized_normal_distribution
Why this is useful?
It generalizes the normal (for p=2) and double exponential (Laplace) distribution (for p=1) and in the limit case also the uniform (p → ∞) for the shape parameter p.
These correspond to the$L_p$ norms when used for regularization and in other cases.
For examples:
produces the criterion minimize$L_2$ norm of (X*beta-y) (a.k.a. least squares). But then
produces the criterion minimize$L_1$ norm of (X*beta-y). (a.k.a. least absolute deviations)
Similarly, in the Bayesian interpretation of ridge and LASSO,
produces the$L_2$ regularized ridge and
produces the$L_1$ regularized LASSO.
Using the Generalized normal distribution would allow to conveniently use an arbitrary$L_p$ norm for the optimization criterion, or to even find the suitable value of p when used as a parameter.
The text was updated successfully, but these errors were encountered: