STAT 541: Random Effects

Statistics 541: Random Effects

Homework:

In this assignment you will simulate some repeated measurements data and then estimate it to see how much of the information you can recover. The scientific study we are simulating is that each person has a sensitivity to sun. We will apply two different sunscreens to their nose and measure how much exposure there is. They will use one sunscreen for 10 days and then switch to the other sunscreen. We are interested in whether there is a difference between the treatment.

The data you will create then will consists of 20 measurements for 15 people (300 observations in all). Suppose we measure redness on a 10 point scale (1-10), with 1 being "winter time pale" and 10 being "burned to a bright red." We then observe Y_ij which is the observation taken at the end of each day, and X_ij which is 0 for j = 1..10, and 1 for j = 11..20. The model is that

Y_ij = beta₀ + beta₁ X_ij + epsilon_ij

Describe three different scientific stories as to how the epsilon_ij might be correlated. Be as concrete as possible--in other words, suppose you were trying to explain to someone who knows basic statistics (but not advanced statistics) why these shouldn't be analyzed as simply i.i.d. errors.
For each of your above three scientific stories, describe how you might model them probabilisticly or statisticial.
Simulate a model of the form:
Y_ij = beta₀ + beta₁ X_ij + epsilon_i + epsilon_ij
where all the epsilons are independent. Make various plots to see if your simluation makes sense. You will have to truncate the numbers to make the Y's land on the numbers 1 through 10. You will have to pick a resonable value for the variance so it isn't degenerate. You will have to think hard about how big a variance you should have on epsilon_i since it describes individual effects. You will have to think about what are reasonable values of beta.
Do some exploritory data analysis to show that your model looks right. Make some histograms, etc.

Now that we have a data set, it is time to analyse it. The primary attribute of interest is beta₁.

First run a two-sample t-test. In other words, just compare the average for those under treatment 0 with those under treatment 1. Run this test on your data. Does this test provide correct size? In other words, will 95% confidence intervals cover 95% of the time? (You might be able to check this by simply resimulating 20 times and see how often you cover the truth.)
Now run a "Tukey test." This means create an estimate of beta₁ for each person. Now compute the one sample confidence interval based on these 15 numbers. Will a 95% confidence created in this fashion cover 95% of the time?
Use GEE (Generalized Estimating Equations from the paper I passed out) to estimate the standard error of the simple regression done in the first part. (Note: This one is impossible to do in JMP.)
Use a fixed effects model to deal with individual effects. In other words, include the subject as a variable. This will generate 15 estimates of the individual effects. Plot the actual effects vs. the estimated individual effects. Test if this is a 45 degree line as an "estimate" should be. Due to regression to the mean it won't in fact be a 45 degree line. Explain this.
Use a random effects model to recover both the individual effect and beta₁.
If your software package does do this for you automatically, it can be done as follows. First estimate the fixed effects. Now estimate how accurately you estimate each of the fixed effects above. (JMP gives you standard errors.) Now estimate the variance of the estimated fixed effects (by computing a variance).
variance(estimated fixed effects) = var(epsilon_i) + SE²
So we can now estimate var(epsilon_i) which is also the covariance between the fixed effects estimates and the true random effects. From this we can now construct a regression equation for the epsilon_i based on the estimated fixed effects.
Plot your estimate random effects vs the true random effects. Does it seem to be a better fit than the fixed effect plot?

Admistrivia

If off to see my grandma (she is doing much better now)

Random intercept

Basic model

Simple random effect: Y_i = X_i*beta + epsilon_i + epsilon_ij
Called random intercept
sigma² = variance(epsilon_ij), tau² = var(epsilon_i)
Often interested in the distribution of epsilon_i
In that case, want mean and variance
Obviously, mean can't be estimated, so assume it is zero
Variance is interesting term
Simpler: Y_i = alpha + epsilon_i + epsilon_ij
Var(Y_ij) = Var(epsilon_i) + Var(epsilon_ij)
We can easilly estimate Var(Y_ij)
We can easilly estimate Var(epsilon_ij)
Hence: estimate Var(epsilon_i) = Var(Y_ij) - Var(epsilon_ij)

Standard errors

alpha-hat = sum(Y)/(Kn) say
var(alpha-hat) = (1/(Kn)²) var(sum(Y))
var(alpha-hat) = (1/(Kn)²)(sum(var(Y)) + sum(covariance(Y_ij,Y_i'j')))
var(alpha-hat) = (1/(Kn)²)(Kn(sigma²+tau²) + sum(covariance(Y_ij,Y_ij')))
var(alpha-hat) = (1/(Kn)²)(Kn(sigma²+tau²) + sum n_i( n_i-1)tau²)
Notice: if inbalanced design, weighted least squares makes sense

Fixed effects version

Y_i = X_i*beta + epsilon_i + epsilon_ij
sigma² = variance(epsilon_ij)
epsilon_i are unknown parameters
This is just a one-way ANOVA
Obviously no estimate of tau since there isn't a tau in the model
Estimate effects by epsilon_i

Estimating the random effects

Plot estimates of effects vs true effect
Notice: perfect regressions setup
Compute means, variances, covariances
But, we want the regression the other way around
Still can be done

What is a random effect and what is a parameter?

Are you a Bayesian vs frequentists? Bayesians don't distinguish between the two.
Are you interested in describing the distribution or the actual values?
How many effects do you have? (if few, then random effects don't help very much)

More complex random effects models

random slopes (often intersted in average slope, say HGH)

Last modified: Thu Apr 12 08:53:02 2001