-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about difference between #9
Comments
Answer 1. The random effects are only integrated out when Answer 2.
Both back transformed to the response scale. Answer 3. They don't both set the REs to 0, see Answer 2. All REs are integrated out when Answer 4
Answer 5 To get Prob(Y = 1|X1, X2, u) where you have averaged across random realizations of u from the assumed distribution (btw this = integrating over u) use: I don't have a particularly convenient way to get that estimate for only the "observed" u random intercept values, but you could generate all the predictions using Extra Not an answer to any specific question, but perhaps relevant here: my only real interest case in writing |
Dear Joshua, Thank you so very much for your clear and informative answers. They are really helpful! I still have some difficulties understanding your Answer 4. Maybe part of it is because I have bits of information floating through my brain about “broad” and “narrow” inference in the context of random effects models and I am trying to connect your answer to that. To quote the SAS help website, “in the narrow inference space, conclusions are drawn about the particular values of the random effects selected in the study.” I guess one could interpret this statement in several ways: 1. We can draw conclusion about a specific random effect (e.g., class # 1 in your example with 20 classes); 2. We can draw conclusions about all random effects in the study (e.g., what is the average effect of some predictor X on the scale of interest across the 20 classes included in the study). To make this concrete, if our study includes 20 classes (each with multiple student) and Y = student score on a test (measured once for each student; continuous) and X = study time (measured once for each student), then 1. would allow us to determine the effect of X on Y in class # 1. In contrast, 2. would allow us to determine the average effect of X on Y across just the 20 classes included in the study. What is not clear to me is whether brmsmargins() with type = “effectsRE” computes the effect of X on Y described in 1. or in 2.? Or does it compute the average value of Y (given X) across all the students in class # 1 (say) for 1.? Or does it compute the average value of Y (given X) across all the students in each of the 20 classes and then averages that value across the 20 classes? As you can see, I am really lost in the weeds here (and I haven’t even mentioned the possibility that it may predict the value of Y for a given X for a randomly selected student in class # 1., etc.) For “broad space inferences”, the 20 classes in your example would be viewed as being representative of a larger universe of classes about which we wish to make inferences. I think here it is clearer to me that we could average a quantity - whether it is an effect on the log odds scale or a probability in the context of a binary mixed effects model - across the entire distribution of random effects. Then that would allow us to compute things like a “marginal coefficient in the broad inference sense” and a “marginal effect in the broad inference sense”, with the “broad sense” taken to mean “on average across all the schools represented by the 20 schools included in our study”. In my mind, I don’t see any reason why we wouldn’t be able to restrict this averaging to just the distribution of “predicted” random effects in our study (e.g., restrict the averaging to just the 20 schools, when Y = whether or not the student passed the test). This would then allow us to get “marginal coefficients in a narrow sense” and “marginal effects in a narrow sense”, with the “narrow sense” taken to mean “on average across just the 20 schools in our study”. Am I totally off base here to suggest marginal effects (and marginal coefficients) can be defined either in a “narrow sense” or in a “broad sense”? I have not seen any explicit discussion of this in the literature but in practice it would be helpful to draw such a distinction (if the distinction is warranted). As a practitioner, I would like the ability to compute both of these types of marginal effects/coefficients (but just don’t have the programming skills to pull it off on my own, unfortunately). I looked into the marginaleffects package and I can’t tell whether it only does type = “includeRE” (which is my current understanding based on its vignette which compares marginaleffects with brmsmargins). Since I don’t yet fully understand how “includeRE” works in brmsmargins, I struggle to understand what marginaleffects does with the same setting. Ultimately, I would love it if I could find a way to compute marginal effects and marginal probabilities in a model fitted with the ordbetareg() function in the ordbetareg package. This function is built on top of brms and fits a model to zero-and-one inflated proportions; it fits a single model component at a time, which makes it easier to work with. But it fits effects on a latent scale which is not intuitive, so marginal effects are a must. Thank you very much for the Extra bonus. That mirrors exactly what I feel except that I don’t have the computational tools to pull off the most sensible quantities from my model (which is frustrating). This is further compounded by not always understanding what else can be computed instead given the currently available R options. I would very much appreciate any insights/clarifications you can throw my way. These things have been niggling at me for a long time and each time I understand one more nuance, I can start obsessing about some other elusive aspect of statistical practice. 😁 With huge gratitude, Isabella |
Thanks again @JWiley for taking so much time to write this package and answer our newbie questions. I’m still trying to grok the differences between values of the
Below, I replicate the results produced by Do you think it is possible to achieve something like posterior_epred(model, allow_new_levels=TRUE, sample_new_levels="gaussian") These two arguments do something which, to me, sounds quite similar to what you describe here:
Below, I show an example with raw Simulate data and fit modellibrary(brms)
library(collapse)
library(data.table)
library(brmsmargins)
h <- .0001
d <- withr::with_seed(
seed = 12345, code = {
nGroups <- 100
nObs <- 20
theta.location <- matrix(rnorm(nGroups * 2), nrow = nGroups, ncol = 2)
theta.location[, 1] <- theta.location[, 1] - mean(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] - mean(theta.location[, 2])
theta.location[, 1] <- theta.location[, 1] / sd(theta.location[, 1])
theta.location[, 2] <- theta.location[, 2] / sd(theta.location[, 2])
theta.location <- theta.location %*% chol(matrix(c(1.5, -.25, -.25, .5^2), 2))
theta.location[, 1] <- theta.location[, 1] - 2.5
theta.location[, 2] <- theta.location[, 2] + 1
d <- data.table(
x = rep(rep(0:1, each = nObs / 2), times = nGroups))
d[, ID := rep(seq_len(nGroups), each = nObs)]
for (i in seq_len(nGroups)) {
d[ID == i, y := rbinom(
n = nObs,
size = 1,
prob = plogis(theta.location[i, 1] + theta.location[i, 2] * x))
]
}
copy(d)
})
mlogit <- brms::brm(
y ~ 1 + x + (1 + x | ID), family = "bernoulli",
data = d, seed = 1234,
silent = 2, refresh = 0,
chains = 4L, cores = 4L, backend = "cmdstanr") 3 slightly different
|
I do not think it is off base or unreasonable to consider narrow or broad inference. I have not given the narrow sense a lot of consideration as that has not been a focus in my own research. I would think you could do the narrow inference in |
@vincentarelbundock Intriguing. I had not seen the sample new levels. Yes that sounds on target. How do the credible intervals look? |
Yep, it’s easy to increase the number of draws, and the CI look basically the same. However, Here are three ultra-minimal examples. library(brms)
library(brmsmargins)
mod <- brm(am ~ vs + (1 | gear), data = mtcars, family = "bernoulli", backend = "cmdstanr") fixedonlybrmsmargins(
object = mod,
at = data.frame(vs = c(0, 1)),
CI = .95, CIType = "ETI",
contrasts = cbind("AME vs" = c(-1, 1)),
effects = "fixedonly")$ContrastSummary |>
data.frame()
#> M Mdn LL UL PercentROPE PercentMID CI CIType
#> 1 -0.4413672 -0.4098166 -0.999893 0.1445257 NA NA 0.95 ETI
#> ROPE MID Label
#> 1 <NA> <NA> AME vs
nd <- expand.grid(vs = 0:1, gear = 1)
nd$H <- ifelse(nd$vs == 0, -1, 1)
p <- brms::posterior_epred(
mod,
re_formula = NA,
newdata = nd)
quantile(nd$H %*% t(p), probs = c(.5, .025, .975))
#> 50% 2.5% 97.5%
#> -0.4098166 -0.9998930 0.1445257 includeREbrmsmargins(
object = mod,
at = data.frame(vs = c(0, 1)),
CI = .95, CIType = "ETI",
contrasts = cbind("AME vs" = c(-1, 1)),
effects = "includeRE")$ContrastSummary |>
data.frame()
#> M Mdn LL UL PercentROPE PercentMID CI
#> 1 -0.1321468 -0.1385877 -0.3117927 0.09439569 NA NA 0.95
#> CIType ROPE MID Label
#> 1 ETI <NA> <NA> AME vs
nd <- rbind(
transform(mtcars, vs = 0),
transform(mtcars, vs = 1))
K <- nrow(mtcars)
nd$H <- ifelse(nd$vs == 0, -1 / K, 1 / K)
p <- brms::posterior_epred(
mod,
newdata = nd)
quantile(nd$H %*% t(p), probs = c(.5, .025, .975))
#> 50% 2.5% 97.5%
#> -0.13858772 -0.31179267 0.09439569 integrateoutREK <- 1000
brmsmargins(
k = K,
object = mod,
at = data.frame(vs = c(0, 1)),
CI = .95, CIType = "ETI",
contrasts = cbind("AME vs" = c(-1, 1)),
effects = "integrateoutRE")$ContrastSummary |>
data.frame()
#> M Mdn LL UL PercentROPE PercentMID CI CIType
#> 1 -0.2168032 -0.1879975 -0.6459883 0.0722646 NA NA 0.95 ETI
#> ROPE MID Label
#> 1 <NA> <NA> AME vs
nd <- expand.grid(
vs = 0:1,
gear = sample(1e6:2e6, K))
nd$H <- ifelse(nd$vs == 0, -1 / K, 1 / K)
p <- brms::posterior_epred(
mod,
allow_new_levels = TRUE,
sample_new_levels = "gaussian",
newdata = nd)
quantile(nd$H %*% t(p), probs = c(.5, .025, .975))
#> 50% 2.5% 97.5%
#> -0.18651311 -0.63625547 0.07199052 |
Note on speed: it is the Edit: slow Edit 2: And in particular these two lines are very slow, where |
So sorry for spamming your Github (I should find a different venue)!! But here’s a comparison of the 3 different kinds of “new levels”. In this particular example, “gaussian” comes very close to matching your results, but the other ones are not too far off and are somewhat faster. Using library(brms)
library(brmsmargins)
library(tictoc)
dat <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/plm/EmplUK.csv")
dat$x <- as.numeric(dat$output > median(dat$output))
dat$y <- as.numeric(dat$emp > median(dat$emp))
mod <- brm(y ~ x + (1 | firm), data = dat, backend = "cmdstanr", family = "bernoulli") K <- 1000
tic()
bm <- brmsmargins(
k = K,
object = mod,
at = data.frame(x = c(0, 1)),
CI = .95, CIType = "ETI",
contrasts = cbind("AME x" = c(-1, 1)),
effects = "integrateoutRE")
bm$ContrastSummary |> data.frame()
#> M Mdn LL UL PercentROPE PercentMID CI CIType
#> 1 0.09660978 0.09525802 0.05968685 0.1405216 NA NA 0.95 ETI
#> ROPE MID Label
#> 1 <NA> <NA> AME x
toc()
#> 36.21 sec elapsed
manual <- function(method = "old_levels") {
nd <- expand.grid(
x = 0:1,
firm = sample(1e6:2e6, K))
nd$H <- ifelse(nd$x == 0, -1 / K, 1 / K)
p <- posterior_epred(
mod,
allow_new_levels = TRUE,
sample_new_levels = method,
newdata = nd)
quantile(nd$H %*% t(p), probs = c(.5, .025, .975))
}
tic()
manual("gaussian")
#> 50% 2.5% 97.5%
#> 0.09539001 0.05998184 0.13916398
toc()
#> 30.687 sec elapsed
tic()
manual("old_levels")
#> 50% 2.5% 97.5%
#> 0.10122629 0.08244516 0.12020882
toc()
#> 0.924 sec elapsed
tic()
manual("uncertainty")
#> 50% 2.5% 97.5%
#> 0.10255900 0.08044943 0.12789991
toc()
#> 2.822 sec elapsed
|
@vincentarelbundock Very cool. I've no problems with these discussions continuing here on GitHub. From what I understand of the Generally because there are also many posterior draws getting averaged over, I doubt that a very high I assume you'll add support now in |
Yes, that would seem to be the main benefit for you. Should be mostly plug-and-play. Cool!
Very good to know, thanks!
That makes intuitive sense. It might eventually be fun to do some more serious benchmarking. I can’t replicate it now, but I ran into a case earlier where the
Yes, the dev version from Github currently has a prototype. I have not tested at all, but it generates output: library(brms)
library(marginaleffects)
dat <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/plm/EmplUK.csv")
dat$x <- as.numeric(dat$output > median(dat$output))
dat$y <- round(dat$emp)
dat$y <- ifelse(dat$y > 5, 5, dat$y)
mod <- brm(
y ~ x + (1 | firm),
data = dat,
family = "categorical") comparisons(
mod,
newdata = datagrid(firm = -100:-1),
allow_new_levels = TRUE,
sample_new_levels = "gaussian") |>
summary()
#> Group Term Contrast Effect 2.5 % 97.5 %
#> 1 0 x 1 - 0 -0.03499 -0.078327 -0.01163
#> 2 1 x 1 - 0 -0.08354 -0.136475 -0.04241
#> 3 2 x 1 - 0 -0.02185 -0.063338 0.01209
#> 4 3 x 1 - 0 0.04863 0.018110 0.09265
#> 5 4 x 1 - 0 0.03570 0.007809 0.08047
#> 6 5 x 1 - 0 0.05366 0.018884 0.11054
#>
#> Model type: brmsfit
#> Prediction type: response |
@vincentarelbundock one more question on your implementation --- how are you handling AMEs for continuous variables? |
Users can do whatever they want/need. By default, the newdata = datagrid(firm = -100:-1) This means that we make predictions only for a dataset with 100 rows (one per RE level) and all covariates at their means or modes. The result is a data frame with 500 rows: 100 rows per unit with 5 response levels. You can use the newdata = datagridcf(firm = -100:-1) You can also specify for what kind of "hypothetical" individual(s) you want to compute marginaleffects: newdata = datagrid(x = c(10, 100), z = "dingus", firm = -100:-1) Then, the Does that answer the question? |
Dear Joshua,
Thank you very much for developing the brmsmargins package. I am trying to understand a paragraph included in the help file for the brmsmargins() function and would appreciate any clarifications you are able to provide. The paragraph in question is as follows:
"Use the fitted() function to generate predictions based on this modified dataset. If effects is set to “fixedonly” (meaning only generate predictions using fixed effects) or to “includeRE” (meaning generate predictions using fixed and random effects), then predictions are generated entirely using the fitted() function and are, typically back transformed to the response scale. For mixed effects models with fixed and random effects where effects is set to “integrateoutRE”, then fitted() is only used to generate predictions using the fixed effects on the linear scale. For each prediction generated, the random effects are integrated out by drawing k random samples from the model assumed random effect(s) distribution. These are added to the fixed effects predictions, back transformed, and then averaged over all k random samples to perform numerical Monte Carlo integration."
Question 1
Can you confirm whether the last part of the above paragraph, which starts with "For each prediction generated, the random effects are integrated out", applies to each of the settings effects = "fixedonly”, effects = "includeRE" and effects =“integrateoutRE”? Or does it apply only to the effects =“integrateoutRE” setting?
Question 2
Can you confirm whether both effects = "fixedonly” and effects =“integrateoutRE” ignore the random effects (i.e., they set them to 0), with the difference between them being that effects = "fixedonly” returns the results on the natural response scale while effects =“integrateoutRE” on the link scale? (For example, for a mixed effects binary regression model, effects = "fixedonly” returns fitted probabilities whereas effects =“integrateoutRE” returns fitted log odds.)
Question 3
If both effects = "fixedonly” and effects =“integrateoutRE” set the random effects in the fitted part of the model to 0, I do not understand why we would want to "integrate out the random effects" for these two settings. The only setting for which it would make sense to "integrate out the random effects" would be effects = "includeRE" (since that is the only setting where the random effects would NOT be set to 0; unless, perhaps, there are multiple random effects in the model and only some of those would be set to 0?). I don't see a re_formula as an argument to brmsmargins() which would be used to indicate which random effects are set to 0 and which are not, so I am not sure how one would even specify which random effects are set to 0 and which are kept in the model?
Question 4
When would one want to use effects = "includeRE" versus effects =“integrateoutRE”?
Question 5
Getting back to the example of a mixed effects binary logistic regression model (with just a random intercept u for simplicity and 2 predictors, X1 and X2), if I needed to compute something like Prob(Y = 1|X1, X2, u) for X1 and X2 set at 2 relevant values x1 and x2 and u averaged across the observed random intercept values, would there be a way to get this out of brmsmargins? If I then needed to compute Prob(Y = 1|X1, X2, u) but averaged across random realizations of u from the assumed distribution of the random intercepts - say, Normal(0, sigma_u^2) with sigma_u^2 estimated from the data - would there be a way to compute this via brmsmargins? What does the latter estimate? What does the former estimate? Why not use the former instead of the latter (e.g., because we can get a better approximation if we use more random effects than what was observed in the data)?
The text was updated successfully, but these errors were encountered: