-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add residuals method for GLM #499
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Could you add more tests to cover other families?
I'm not sure what's the best default, but doing the same as R is reasonable unless we have reasons to differ.
Regarding your XXX comment, if the relationship appears to holds for all families we could ask for confirmation to others.
Co-authored-by: Milan Bouchet-Valat <[email protected]>
# XXX I think this might be the same as | ||
# 2 * wrkresid, but I'm not 100% sure if that holds across families |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmbates Does this relationship hold across families?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the devresid
function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it seems that that function is lower level and used in the computation of the associated field rather returning said field.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #499 +/- ##
=======================================
Coverage 90.48% 90.49%
=======================================
Files 8 8
Lines 1125 1136 +11
=======================================
+ Hits 1018 1028 +10
- Misses 107 108 +1 ☔ View full report in Codecov by Sentry. |
@nalimilan the current doctests failures are unrelated to this PR, but do you have any objections to me updating them? For now, I would update update the expected output, but longterm we should improve pretty-printing of the types. |
Sure. Would you feel like adding tests for all other model families? I'm always uncomfortable when what we test varies from one family to the next. |
Of course -- I've already got tests in there for deviance, response and working residuals of the families:
I've also added commented out tests for the same families with Pearson residuals. |
Bump. Perhaps @mousum-github can also help review this PR? |
Thanks, @ViralBShah, let me go through the PR. |
We can easily add Pearson Residuals using the following: - (and the sum of square of Pearson residuals / residual dof, is the estimated dispersion parameter) |
@mousum-github Maybe you know the answer to https://github.com/JuliaStats/GLM.jl/pull/499/files#r976581242? @palday Do you plan to add test for other method families? |
I am not sure about the relationship. I tried but was unable to establish such a relationship easily. |
It's based on the variance function: so should be |
Models with weights are not tested though, we need to make sure the result is correct or an error is thrown. |
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
@mousum-github please don't start making changes on my PR without asking me first. |
Co-authored-by: Alex Arslan <[email protected]>
I would like to add a few more test cases this week. |
Sure, @palday. It is my bad. |
The following compares three types of residuals (response, deviance and Pearson) for different distributions and link functions in Julia and R. Overall the comparisons look good. julia> using GLM, Random, LinearAlgebra, DataFrames, RCall
julia> Random.seed!(12345);
julia> df = DataFrame(x0 = ones(10), x1 = rand(10), x2 = rand(10), n = rand(50:70, 10), w=1 .+ 0.01*rand(10), off=1 .+ 0.01*rand(10));
julia> β = [1, -1, 1];
julia> X = convert(Matrix, df[:,1:3]);
julia> η = X*β;
julia> σ = 1;
julia>
julia> # Poisson Distribution with Log link #
julia> Random.seed!(123456);
julia> μ = GLM.linkinv.(LogLink(), η);
julia> df["y"] = rand.(Poisson.(μ));
julia> mdl = glm(@formula(y ~ x1 + x2), df, Poisson(););
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> @rput df;
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=poisson());
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
julia> @rget rresid1 rresid2 rresid3;
julia> isapprox(rresid1, jresid1, atol=1.0E-6) && isapprox(rresid2, jresid2, atol=1.0E-6) && isapprox(rresid3, jresid3, atol=1.0E-6)
true
julia> # ..with weights and offset #
julia> mdl = glm(@formula(y ~ x1 + x2), df, Poisson(); wts=df["w"], offset=df["off"]);
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=poisson(), weights=w, offset=off);
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
julia> @rget rresid1 rresid2 rresid3;
julia> isapprox(rresid1, jresid1, atol=1.0E-6) && isapprox(rresid2, jresid2, atol=1.0E-6) && isapprox(rresid3, jresid3, atol=1.0E-6)
true
julia>
julia> # Binomial Distribution with Logit link #
julia> Random.seed!(123456);
julia> μ = GLM.linkinv.(LogitLink(), η);
julia> n = df["n"];
julia> df["y"] = rand.(Binomial.(n, μ)) ./ n;
julia> mdl = glm(@formula(y ~ x1 + x2), df, Binomial(););
julia> @rput df;
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=binomial());
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
┌ Warning: RCall.jl: Warning in eval(family$initialize) :
│ non-integer #successes in a binomial glm!
└ @ RCall C:\Users\user\.julia\packages\RCall\6kphM\src\io.jl:172
julia> @rget rresid1 rresid2 rresid3;
julia> rresid1 ≈ jresid1 && rresid2 ≈ jresid2 && rresid3 ≈ jresid3
true
julia> # .. with weights and offset #
julia> mdl = glm(@formula(y ~ x1 + x2), df, Binomial(); wts=df["w"], offset=df["off"]);
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=binomial(), weights=w, offset=off);
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
┌ Warning: RCall.jl: Warning in eval(family$initialize) :
│ non-integer #successes in a binomial glm!
│ Warning in eval(family$initialize) :
│ non-integer #successes in a binomial glm!
└ @ RCall C:\Users\user\.julia\packages\RCall\6kphM\src\io.jl:172
julia> @rget rresid1 rresid2 rresid3;
julia> rresid1 ≈ jresid1 && rresid2 ≈ jresid2 && rresid3 ≈ jresid3
true
julia>
julia> # Bernoulli Distribution with Logit link #
julia> Random.seed!(123456);
julia> μ = GLM.linkinv.(LogitLink(), η);
julia> df["y"] = rand.(Bernoulli.(μ));
julia> mdl = glm(@formula(y ~ x1 + x2), df, Bernoulli(););
julia> @rput df;
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=binomial());
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
julia> @rget rresid1 rresid2 rresid3;
julia> rresid1 ≈ jresid1 && rresid2 ≈ jresid2 && rresid3 ≈ jresid3
true
julia> # .. with weights and offset #
julia> mdl = glm(@formula(y ~ x1 + x2), df, Bernoulli(); wts=df["w"], offset=df["off"]);
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=binomial(), weights=w, offset=off);
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
┌ Warning: RCall.jl: Warning in eval(family$initialize) :
│ non-integer #successes in a binomial glm!
│ Warning in eval(family$initialize) :
│ non-integer #successes in a binomial glm!
└ @ RCall C:\Users\user\.julia\packages\RCall\6kphM\src\io.jl:172
julia> @rget rresid1 rresid2 rresid3;
julia> rresid1 ≈ jresid1 && rresid2 ≈ jresid2 && rresid3 ≈ jresid3
true
julia>
julia>
julia> # Negative Binomial Distribution with Logit link #
julia> Random.seed!(123456);
julia> μ = GLM.linkinv.(LogLink(), η);
julia> π = μ ./ (μ .+ 10.0);
julia> df["y"] = rand.(NegativeBinomial.(10, π));
julia> mdl = negbin(@formula(y ~ x1 + x2), df, LogLink(); rtol=1.0E-12);
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> @rput df;
julia> R"""
library("MASS");
mdl = glm.nb(y ~ x1+x2, data=df);
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
julia> @rget rresid1 rresid2 rresid3;
julia> isapprox(rresid1, jresid1, atol=1.0E-4) && isapprox(rresid2, jresid2, atol=1.0E-4) && isapprox(rresid3, jresid3, atol=1.0E-4)
false
julia>
julia> # Gamma Distribution with InverseLink link #
julia> Random.seed!(1234);
julia> μ = GLM.linkinv.(LogLink(), η);
julia> df["y"] = rand.(Gamma.(μ, σ));
julia> mdl = glm(@formula(y ~ x1 + x2), df, Gamma(););
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> @rput df;
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=Gamma());
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
julia> @rget rresid1 rresid2 rresid3;
julia> isapprox(rresid1, jresid1, atol=1.0E-6) && isapprox(rresid2, jresid2, atol=1.0E-6) && isapprox(rresid3, jresid3, atol=1.0E-6)
true
julia> # .. with weights and offset #
julia> mdl = glm(@formula(y ~ x1 + x2), df, Gamma(),; wts=df["w"], offset=df["off"]);
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=Gamma(), weights=w, offset=off);
cf = coef(mdl);
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
julia> @rget rresid1 rresid2 rresid3;
julia> isapprox(rresid1, jresid1, atol=1.0E-6) && isapprox(rresid2, jresid2, atol=1.0E-6) && isapprox(rresid3, jresid3, atol=1.0E-6)
true
julia>
julia> # InverseGaussian Distribution with InverseSquareLink link #
julia> Random.seed!(1234);
julia> μ = GLM.linkinv.(InverseSquareLink(), η);
julia> df["y"] = rand.(InverseGaussian.(μ, σ));
julia> mdl = glm(@formula(y ~ x1 + x2), df, InverseGaussian(););
julia> @rput df;
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=inverse.gaussian());
cff = coef(mdl);
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
julia> @rget rresid1 rresid2 rresid3;
julia> isapprox(rresid1, jresid1, atol=1.0E-6) && isapprox(rresid2, jresid2, atol=1.0E-6) && isapprox(rresid3, jresid3, atol=1.0E-6)
true
julia> # .. with weights and offset #
julia> mdl = glm(@formula(y ~ x1 + x2), df, InverseGaussian(); wts=df["w"], offset=df["off"]);
julia> jresid1,jresid2,jresid3 = GLM.residuals(mdl, type=:response), GLM.residuals(mdl, type=:deviance), GLM.residuals(mdl, type=:pearson);
julia> R"""
mdl = glm(y ~ x1+x2, data=df, family=inverse.gaussian(), weights=w, offset=off);
rresid1 = resid(mdl, type='response');
rresid2 = resid(mdl, type='deviance');
rresid3 = resid(mdl, type='pearson');
""";
julia> @rget rresid1 rresid2 rresid3;
julia> isapprox(rresid1, jresid1, atol=1.0E-6) && isapprox(rresid2, jresid2, atol=1.0E-6) && isapprox(rresid3, jresid3, atol=1.0E-6)
true |
@palday Are you OK with @mousum-github adding more tests? @mousum-github Thanks for checking. However, existing tests already fit models for all families, so better reuse these models and just add checks that |
@nalimilan - I also considered suggesting some tests within existing test cases. Since there are conflicts in the |
For residuals functionalities, Existing Test cases:
So far I have added test cases (locally) to the following
And, I am planning to add some more test cases to the following by this week
Also, I have changed the existing residuals function for My plan is to raise a PR to @nalimilan, please let us know if there is a better way to proceed. |
The changes in the residuals function with resid = response(model) - fitted(model)
if length(model.rr.wts) > 0 && (type === :deviance || type === :pearson)
return resid .* sqrt.(model.rr.wts)
else
return resid
end |
Sorry, I've been a bit underwater with other priorities. I will start integrating these suggestions -- thanks @mousum-github ! |
Thanks. I guess that's the best approach so that @palday can have a look at the changes. |
Thanks @nalimilan. |
I haven't had the time to figure out how the denominator is computed for Pearson residuals and so haven't yet implemented them.I set the default residual type to
deviance
to match R's behavior, though that might be surprising for GLMs with normal distribution and identity link.Note: once again I've noticed that
-0
in outputs creates a problem. Tests fail for me on Julia 1.8 with Apple Silicon because of sign-swapping on zero.