Statistics 541: Random Effects
Homework:
In this assignment you will simulate some repeated measurements data
and then estimate it to see how much of the information you can
recover. The scientific study we are simulating is that each
person has a sensitivity to sun. We will apply two different
sunscreens to their nose and measure how much exposure there
is. They will use one sunscreen for 10 days and then switch to
the other sunscreen. We are interested in whether there is a
difference between the treatment.
The data you will create then will consists of 20 measurements
for 15 people (300 observations in all). Suppose we measure
redness on a 10 point scale (1-10), with 1 being "winter time
pale" and 10 being "burned to a bright red." We then observe
Yij which is the observation taken at the end of
each day, and Xij which is 0 for j = 1..10, and 1
for j = 11..20. The model is that
Yij = beta0 + beta1
Xij + epsilonij
- Describe three different scientific stories as to how
the epsilonij might be correlated. Be as
concrete as possible--in other words, suppose you were
trying to explain to someone who knows basic statistics
(but not advanced statistics) why these shouldn't be
analyzed as simply i.i.d. errors.
- For each of your above three scientific stories,
describe how you might model them probabilisticly or
statisticial.
- Simulate a model of the form:
Yij = beta0 + beta1
Xij + epsiloni +
epsilonij
where all the epsilons are independent. Make various
plots to see if your simluation makes sense. You will
have to truncate the numbers to make the Y's land on the
numbers 1 through 10. You will have to pick a resonable
value for the variance so it isn't degenerate. You will
have to think hard about how big a variance you should
have on epsiloni since it describes
individual effects. You will
have to think about what are reasonable values of beta.
- Do some exploritory data analysis to show that your
model looks right. Make some histograms, etc.
Now that we have a data set, it is time to analyse it. The
primary attribute of interest is beta1.
- First run a two-sample t-test. In other words, just
compare the average for those under treatment 0 with
those under treatment 1. Run this test on your data.
Does this test provide correct size? In other words,
will 95% confidence intervals cover 95% of the time?
(You might be able to check this by simply resimulating
20 times and see how often you cover the truth.)
- Now run a "Tukey test." This means create an estimate
of beta1 for each person. Now compute the
one sample confidence interval based on these 15
numbers. Will a 95% confidence created in this fashion
cover 95% of the time?
- Use GEE (Generalized Estimating Equations from the
paper I passed out) to estimate the
standard error of the simple regression done in the
first part. (Note: This one is impossible to do in JMP.)
- Use a fixed effects model to deal with individual
effects. In other words, include the subject as a
variable. This will generate 15 estimates of the
individual effects. Plot the actual effects vs. the
estimated individual effects. Test if this is a 45
degree line as an "estimate" should be. Due to
regression to the mean it won't in fact be a 45 degree
line. Explain this.
- Use a random effects model to recover both the
individual effect and beta1.
If your software package does do this for you
automatically, it can be
done as follows. First estimate the fixed effects. Now
estimate how accurately you estimate each of the fixed
effects above. (JMP gives you standard errors.)
Now estimate the variance of the estimated fixed
effects (by computing a variance).
variance(estimated fixed effects) =
var(epsiloni) + SE2
So we can now estimate var(epsiloni) which is
also the covariance between the fixed effects estimates
and the true random effects. From this we can now
construct a regression equation for the
epsiloni based on the estimated fixed effects.
- Plot your estimate random effects vs the true random
effects. Does it seem to be a better fit than the fixed
effect plot?
Admistrivia
- If off to see my grandma (she is doing much better now)
Random intercept
Basic model
- Simple random effect: Yi = Xi*beta + epsiloni + epsilonij
- Called random intercept
- sigma2 = variance(epsilonij), tau2 = var(epsiloni)
- Often interested in the distribution of epsiloni
- In that case, want mean and variance
- Obviously, mean can't be estimated, so assume it is zero
- Variance is interesting term
- Simpler: Yi = alpha + epsiloni + epsilonij
- Var(Yij) = Var(epsiloni) + Var(epsilonij)
- We can easilly estimate Var(Yij)
- We can easilly estimate Var(epsilonij)
- Hence: estimate Var(epsiloni) = Var(Yij) - Var(epsilonij)
Standard errors
- alpha-hat = sum(Y)/(Kn) say
- var(alpha-hat) = (1/(Kn)2) var(sum(Y))
- var(alpha-hat) = (1/(Kn)2)(sum(var(Y)) + sum(covariance(Yij,Yi'j')))
- var(alpha-hat) = (1/(Kn)2)(Kn(sigma2+tau2) + sum(covariance(Yij,Yij')))
- var(alpha-hat) = (1/(Kn)2)(Kn(sigma2+tau2) + sum ni( ni-1)tau2)
- Notice: if inbalanced design, weighted least squares makes sense
Fixed effects version
- Yi = Xi*beta + epsiloni + epsilonij
- sigma2 = variance(epsilonij)
- epsiloni are unknown parameters
- This is just a one-way ANOVA
- Obviously no estimate of tau since there isn't a tau in the model
- Estimate effects by epsiloni
Estimating the random effects
- Plot estimates of effects vs true effect
- Notice: perfect regressions setup
- Compute means, variances, covariances
- But, we want the regression the other way around
- Still can be done
What is a random effect and what is a parameter?
- Are you a Bayesian vs frequentists? Bayesians don't distinguish between the two.
- Are you interested in describing the distribution or the actual values?
- How many effects do you have? (if few, then random effects don't help very much)
More complex random effects models
- random slopes (often intersted in average slope, say HGH)
Last modified: Thu Apr 12 08:53:02 2001