In this assignment you will simulate some repeated measurements data and then estimate it to see how much of the information you can recover. The scientific study we are simulating is that each person has a sensitivity to sun. We will apply two different sunscreens to their nose and measure how much exposure there is. They will use one sunscreen for 10 days and then switch to the other sunscreen. We are interested in whether there is a difference between the treatment.

The data you will create then will consists of 20 measurements
for 15 people (300 observations in all). Suppose we measure
redness on a 10 point scale (1-10), with 1 being "winter time
pale" and 10 being "burned to a bright red." We then observe
Y_{ij} which is the observation taken at the end of
each day, and X_{ij} which is 0 for j = 1..10, and 1
for j = 11..20. The model is that

Y_{ij} = beta_{0} + beta_{1}
X_{ij} + epsilon_{ij}

- Describe three different scientific stories as to how
the epsilon
_{ij}might be correlated. Be as concrete as possible--in other words, suppose you were trying to explain to someone who knows basic statistics (but not advanced statistics) why these shouldn't be analyzed as simply i.i.d. errors. - For each of your above three scientific stories, describe how you might model them probabilisticly or statisticial.
- Simulate a model of the form:
Y

_{ij}= beta_{0}+ beta_{1}X_{ij}+ epsilon_{i}+ epsilon_{ij}where all the epsilons are independent. Make various plots to see if your simluation makes sense. You will have to truncate the numbers to make the Y's land on the numbers 1 through 10. You will have to pick a resonable value for the variance so it isn't degenerate. You will have to think hard about how big a variance you should have on epsilon

_{i}since it describes individual effects. You will have to think about what are reasonable values of beta. - Do some exploritory data analysis to show that your model looks right. Make some histograms, etc.

- First run a two-sample t-test. In other words, just compare the average for those under treatment 0 with those under treatment 1. Run this test on your data. Does this test provide correct size? In other words, will 95% confidence intervals cover 95% of the time? (You might be able to check this by simply resimulating 20 times and see how often you cover the truth.)
- Now run a "Tukey test." This means create an estimate
of beta
_{1}for each person. Now compute the one sample confidence interval based on these 15 numbers. Will a 95% confidence created in this fashion cover 95% of the time? - Use GEE (Generalized Estimating Equations from the paper I passed out) to estimate the standard error of the simple regression done in the first part. (Note: This one is impossible to do in JMP.)
- Use a fixed effects model to deal with individual effects. In other words, include the subject as a variable. This will generate 15 estimates of the individual effects. Plot the actual effects vs. the estimated individual effects. Test if this is a 45 degree line as an "estimate" should be. Due to regression to the mean it won't in fact be a 45 degree line. Explain this.
- Use a random effects model to recover both the
individual effect and beta
_{1}.If your software package does do this for you automatically, it can be done as follows. First estimate the fixed effects. Now estimate how accurately you estimate each of the fixed effects above. (JMP gives you standard errors.) Now estimate the variance of the estimated fixed effects (by computing a variance).

variance(estimated fixed effects) = var(epsilon

_{i}) + SE^{2}So we can now estimate var(epsilon

_{i}) which is also the covariance between the fixed effects estimates and the true random effects. From this we can now construct a regression equation for the epsilon_{i}based on the estimated fixed effects. - Plot your estimate random effects vs the true random effects. Does it seem to be a better fit than the fixed effect plot?

- If off to see my grandma (she is doing much better now)

- Simple random effect: Y
_{i}= X_{i}*beta + epsilon_{i}+ epsilon_{ij} - Called random intercept
- sigma
^{2}= variance(epsilon_{ij}), tau^{2}= var(epsilon_{i}) - Often interested in the distribution of epsilon
_{i} - In that case, want mean and variance
- Obviously, mean can't be estimated, so assume it is zero
- Variance is interesting term
- Simpler: Y
_{i}= alpha + epsilon_{i}+ epsilon_{ij} - Var(Y
_{ij}) = Var(epsilon_{i}) + Var(epsilon_{ij}) - We can easilly estimate Var(Y
_{ij}) - We can easilly estimate Var(epsilon
_{ij}) - Hence: estimate Var(epsilon
_{i}) = Var(Y_{ij}) - Var(epsilon_{ij})

- alpha-hat = sum(Y)/(Kn) say
- var(alpha-hat) = (1/(Kn)
^{2}) var(sum(Y)) - var(alpha-hat) = (1/(Kn)
^{2})(sum(var(Y)) + sum(covariance(Y_{ij},Y_{i'j'}))) - var(alpha-hat) = (1/(Kn)
^{2})(Kn(sigma^{2}+tau^{2}) + sum(covariance(Y_{ij},Y_{ij'}))) - var(alpha-hat) = (1/(Kn)
^{2})(Kn(sigma^{2}+tau^{2}) + sum n_{i}( n_{i}-1)tau^{2}) - Notice: if inbalanced design, weighted least squares makes sense

- Y
_{i}= X_{i}*beta + epsilon_{i}+ epsilon_{ij} - sigma
^{2}= variance(epsilon_{ij}) - epsilon
_{i}are unknown parameters - This is just a one-way ANOVA
- Obviously no estimate of tau since there isn't a tau in the model
- Estimate effects by epsilon
_{i}

- Plot estimates of effects vs true effect
- Notice: perfect regressions setup
- Compute means, variances, covariances
- But, we want the regression the other way around
- Still can be done

- Are you a Bayesian vs frequentists? Bayesians don't distinguish between the two.
- Are you interested in describing the distribution or the actual values?
- How many effects do you have? (if few, then random effects don't help very much)

- random slopes (often intersted in average slope, say HGH)

Last modified: Thu Apr 12 08:53:02 2001

*
*