Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for help #12

Open
goedman opened this issue Feb 2, 2019 · 17 comments
Open

Request for help #12

goedman opened this issue Feb 2, 2019 · 17 comments

Comments

@goedman
Copy link

goedman commented Feb 2, 2019

Tamas, I was wondering if you would mind to take a look at m10.04d.jl?

Until chapter 10 in StatisticalRethinking.jl I've had no trouble adding the DynamicHMC versions in addition to Turing and CmdStan versions, e.g. m10.04t.jl and m10.04s.jl for. model 10.4.

I followed the approach I'd used successfully for m10.02d1.jl, which works fine, but in this case I get

julia> include("/Users/rob/.julia/dev/StatisticalRethinking/scripts/10/m10.04d.jl")

MCMC, adapting ϵ (75 steps)
0.011 s/step ...done
MCMC, adapting ϵ (25 steps)
0.0088 s/step ...done
MCMC, adapting ϵ (50 steps)
ERROR: LoadError: ArgumentError: isfinite(value) && all(isfinite, gradient) || value == -Inf must hold.
Stacktrace:
 [1] LogDensityProblems.ValueGradient{Float64,Array{Float64,1}}(::Float64, ::Array{Float64,1}) at /Users/rob/.julia/packages/ArgCheck/BUMkA/src/checks.jl:165
 [2] Type at /Users/rob/.julia/packages/LogDensityProblems/F34aW/src/LogDensityProblems.jl:73 [inlined]
 [3] logdensity(::Type{LogDensityProblems.ValueGradient}, ::LogDensityProblems.ForwardDiffLogDensity{TransformedLogDensity{TransformVariables.TransformNamedTuple{(:β, :α),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1},TransformVariables.ArrayTransform{TransformVariables.Identity,1}}},m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}},ForwardDiff.GradientConfig{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##3#4")){TransformedLogDensity{TransformVariables.TransformNamedTuple{(:β, :α),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1},TransformVariables.ArrayTransform{TransformVariables.Identity,1}}},m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}}},Float64},Float64,9,Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##3#4")){TransformedLogDensity{TransformVariables.TransformNamedTuple{(:β, :α),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1},TransformVariables.ArrayTransform{TransformVariables.Identity,1}}},m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}}},Float64},Float64,9},1}}}, ::Array{Float64,1}) at /Users/rob/.julia/packages/LogDensityProblems/F34aW/src/AD_ForwardDiff.jl:45

<snipped>

Different starting values in θ = (β = [1.0, 0.0], α = [-1.0, 10.0, -1.0, -1.0, -1.0, 0.0, 2.0]) do not solve the issue, thus I wonder if this way of modeling such problems is correct.

Thanks for you help (if doable!),
Rob

@tpapp
Copy link
Owner

tpapp commented Feb 2, 2019

Will look into it today.

@tpapp
Copy link
Owner

tpapp commented Feb 2, 2019

I think it is an issue with AD producing NaNs or similar. Ideally one would investigate, but I added a quick fix in the form of a wrapper, I made a PR in the repo.

Please review and let me know if this works.

@tpapp
Copy link
Owner

tpapp commented Feb 2, 2019

Also, you can investigate corner cases of log densities with LogDensityProblems.stresstest.

@goedman
Copy link
Author

goedman commented Feb 2, 2019

As I mentioned in the SR pull request comment, will try several more models and compare the results. I wonder if rejecting the cases where AD produces NaNs might influence the resulting densities. Maybe stress testing will help out here.

If that's ok, I'll try some more models before closing this issue.

@tpapp
Copy link
Owner

tpapp commented Feb 2, 2019

That's fine, keep it open as long as you like and feel free to ask if there is anything I can help with.

@goedman
Copy link
Author

goedman commented Feb 10, 2019

Tamas, in trying to dig a bit deeper into the posterior draws from m10.4d.jl, I have tried to apply stresstest:

LogDensityProblems.stresstest(P, N=1000, scale=1.0)
0-element Array{Array{Float64,1},1}

P in this case uses:

P = TransformedLogDensity(problem_transformation(p), p)
#∇P = LogDensityRejectErrors(ADgradient(:ForwardDiff, P));
∇P = ADgradient(:ForwardDiff, P);

Interestingly, interchanging the ∇P options in m12.6d.jl is spot on (and unaffected), which I interpret as some support for my earlier hypothesis that the rejection errors might create a bias.

Am I correct to assume that stresstest, if applied correctly, would show parameter values where NaNs occur? I was hoping to be able to plot those regions as part of the debugging proces.

@tpapp
Copy link
Owner

tpapp commented Feb 10, 2019

You should try stresstest with the ∇P (plain vanilla ADgradient, no LogDensityRejectErrors). It should indeed return the vector of locations which are not OK (eg evaluate with NaNs). Plotting is a good strategy.

@goedman
Copy link
Author

goedman commented Feb 10, 2019

As you suggest as well, I'd commented out the LogDensityRejectErrors line and replaced it with the vanilla ADgradient:

P = TransformedLogDensity(problem_transformation(p), p)
#∇P = LogDensityRejectErrors(ADgradient(:ForwardDiff, P));
∇P = ADgradient(:ForwardDiff, P);

and it fails on the NUTS_init_tune_mcmc(∇P, 1000) line.

I'll experiment a bit more.

@goedman
Copy link
Author

goedman commented Mar 11, 2019

Tamas, as I noticed (and tested) your updates to DynamicHMC and LogDensityProblems, can I ask for a little push to get stresstest() to work in trying to resolve the only model that continues to give different results in DynamicHMC vs. Turing & CmdStan?

In m10.4d1.jl (results illustrated in m10.4d1) I've tried to use stresstest() and different AD methods.

If I enable do_stresstest it will show:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/scripts/10/m10.04d1.jl")
MCMC, adapting ϵ (75 steps)
ERROR: Error while loading expression starting at /Users/rob/.julia/dev/DynamicHMCModels/scripts/10/m10.04d1.jl:87
caused by [exception 1]
InvalidLogDensityException: value is Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##3#4")){TransformedLogDensity{TransformVariables.TransformNamedTuple{(:β, :α),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1},TransformVariables.ArrayTransform{TransformVariables.Identity,1}}},m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}}},Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN)

Using LogDensityRejectErrors provides very stable results and the plots of the chains look reasonable (I've also tried 3 and 4 chains). It is true that the pooled estimate of a[2] is a bit higher in scale. But it is surprising this is the only model where I've seen a problem.

Particularly the sd values are vey high, in Turing, using NUTS, usually an indication that the adaptation draws are included but I don't think the posterior includes those in DynamicHMC.

@tpapp
Copy link
Owner

tpapp commented Mar 11, 2019

If I understand correctly, there are two issues:

  1. the script errors,

  2. the results are different.

Did these happen after the recent (minor) updates to LogDensityProblems and DynamicHMC, or did you have them before?

@goedman
Copy link
Author

goedman commented Mar 11, 2019

Correct, w.r.t. issue 2), normally (for all other models) I use LogDensityRejectErrors(ADgradient(:ForwardDiff, P)) and the results are spot on.

W.r.t. issue 1), only for the m10.4d1.jl model I have experimented with stresstest by setting do_stresstest = true. The last few weeks it fails on the ADgradient() call with above script error.

@tpapp
Copy link
Owner

tpapp commented Mar 11, 2019

Are you using the latest LogDensityProblems and DynamicHMC?

I could not reproduce the errors above.

Also, I am not sure how I would detect the discrepancy, should I look at the plots?

@goedman
Copy link
Author

goedman commented Mar 11, 2019

In the script the results from Turing and CmdStan are included:

rethinking = "
      mean   sd  5.5% 94.5% n_eff Rhat
a[1] -0.74 0.27 -1.19 -0.31  2899    1
a[2] 10.77 5.20  4.60 20.45  1916    1
a[3] -1.05 0.28 -1.50 -0.62  3146    1
a[4] -1.05 0.28 -1.50 -0.61  3525    1
a[5] -0.73 0.28 -1.17 -0.28  3637    1
a[6]  0.22 0.27 -0.21  0.67  3496    1
a[7]  1.82 0.41  1.21  2.50  3202    1
bp    0.83 0.27  0.42  1.27  2070    1
bpC  -0.13 0.31 -0.62  0.34  3430    1
";

I tried this morning with:

[[DynamicHMC]]
deps = ["ArgCheck", "DataStructures", "DocStringExtensions", "LinearAlgebra", "LogDensityProblems", "Parameters", "Random", "Statistics", "StatsFuns", "Test", "TransformVariables"]
git-tree-sha1 = "f9c9e6e703afccb02a1179e4886c97134ff20b55"
uuid = "bbc10e6e-7c05-544b-b16e-64fede858acb"
version = "1.0.4"

...

[[LogDensityProblems]]
deps = ["ArgCheck", "BenchmarkTools", "DiffResults", "DocStringExtensions", "Parameters", "Pkg", "Random", "Requires", "Test", "TransformVariables"]
git-tree-sha1 = "1ce5ebd16b5026a35cbf4daec66b2433b83d0d23"
uuid = "6fdf6af0-433a-55f7-b3ed-c6c6e0b8df7c"
version = "0.8.0"

in Manifest.toml.

@goedman
Copy link
Author

goedman commented Mar 11, 2019

The results I'm getting:

Log evidence      = 0.0
Iterations        = 1:3000
Thinning interval = 1
Chains            = 1
Samples per chain = 3000
parameters        = bp, bpC

Empirical Posterior Estimates
───────────────────────────────────────────
parameters
     Mean    SD   Naive SE  MCSE     ESS   
 bp 1.5337 3.6815   0.0672 0.0886 1728.4017
bpC 0.4149 0.2460   0.0045 0.0038 3000.0000

Quantiles
───────────────────────────────────────────
parameters
      2.5%    25.0%   50.0%  75.0%  97.5% 
 bp -10.7577 -1.0319 1.4902 4.0067 16.9960
bpC  -0.4675  0.2435 0.4137 0.5796  1.3324

Log evidence      = 0.0
Iterations        = 1:3000
Thinning interval = 1
Chains            = 1
Samples per chain = 3000
pooled            = a[1], a[2], a[3], a[4], a[5], a[6], a[7]

Empirical Posterior Estimates
─────────────────────────────────────────────
pooled
       Mean    SD   Naive SE  MCSE     ESS   
a[1] -1.9824 3.6843   0.0673 0.0887 1727.1929
a[2]  9.8785 6.0428   0.1103 0.1973  937.6438
a[3] -2.2879 3.6866   0.0673 0.0899 1681.8172
a[4] -2.2809 3.6940   0.0674 0.0893 1712.3698
a[5] -1.9804 3.6927   0.0674 0.0896 1698.3607
a[6] -1.0489 3.6885   0.0673 0.0885 1736.5395
a[7]  0.5142 3.6860   0.0673 0.0908 1647.0961

Quantiles
─────────────────────────────────────────────
pooled
       2.5%    25.0%   50.0%   75.0%   97.5% 
a[1] -17.1055 -4.4397 -1.9079  0.5685 10.6230
a[2]  -7.1898  5.7305  9.3678 13.3436 34.4317
a[3] -17.9862 -4.7485 -2.2723  0.2305 10.1737
a[4] -17.6815 -4.7411 -2.2326  0.2594 10.6519
a[5] -17.6976 -4.3925 -1.9362  0.5755  9.9957
a[6] -16.6376 -3.5379 -1.0127  1.5214 11.4556
a[7] -14.9577 -1.9840  0.5571  3.0386 12.6625

or, just printing the means of the posterior vector (from another run):

2-element Array{Array{Float64,1},1}:
 [1.3940051607581527, 0.4210166348245524]                                                                                                     
 [-1.844585401458457, 10.05388176750645, -2.1519882289247705, -2.14490281190503, -1.8458711437273487, -0.9082955910437809, 0.6615688096440819]

@goedman
Copy link
Author

goedman commented Mar 12, 2019

Tamas, when you say "I could not reproduce the errors above." does that path (with do_stresstest=true) work on your system? Or if you use the LogDensityRejectErrors path, do you see estimates like in my previous post?

@tpapp
Copy link
Owner

tpapp commented Apr 9, 2019

Apologies for not getting back to you about this yet, this is a busy time for me.

I really appreciate this test case. Various other users have reported subtle (or not-so-subtle) maladaptation and bias issues. I have a couple of guesses about this:

  1. bugs in AD, or just numerical corner cases, for which I created (stess)test derivatives LogDensityProblems.jl#42,
  2. overly eager dense matrix adaptation, which can be spurious, to be addressed in adapt a diagonal mass matrix by default DynamicHMC.jl#35,
  3. a possible bug in the adaptation code,
  4. a possible bug in the tree-building and/or acceptance rate calculation code.

To investigate 3 and 4, I will be working on tpapp/DynamicHMC.jl#30. I ask for your patience in the meantime. The test cases are greatly appreciated.

@goedman
Copy link
Author

goedman commented Apr 9, 2019

Thank you Tamas. Absolutely no apologies needed! I simply can't think of anyone contributing as much to the Julia community as you do!

Any time I notice an update to one the various packages I will continue to try it. For now I will always compare the results to at least Turing and Stan. When I started the work on StatisticalRethinking I had a time frame of about 2 years in mind so from that point of view I'm not even halfway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants