STAT 541: Influence

Statistics 541: Influence



Consider the following data on cleaning crews ( We expect that the number of rooms cleaned will be linear in the number of crews sent to clean them. In fact, we might even expect that if we send zero crews, we will get zero rooms cleaned. We want Y to be the number of rooms cleaned and X to be the number of crews sent out. How many rooms does each crew clean?
  1. Plot the data, run a simple regression and create prediction bounds. Does the data appear homoskedastic?

  2. Transform the data to do a weighted least squares. Try using both a standard deviation proportional to X and portional to squareroot of X. Plot both. Which appears to be more homoskedastic?

  3. Use the White estimator (the sandwich estimator) to generate standard errors for the slope and intercept.

  4. Discussion question: (Please type up a one page answer to the following.) Compare the confidence intervals for the slope in each of the methods above. Which ones do you believe? Are the ones that are theoretically wrong qualitatively wrong? Our theory suggests that the intercept should be zero. Which is the correct test to run? Do we fail to reject the null? Do any of the other test incorrectly reject the null?

  5. Pick the weighted least squares model that appears to be the most homoskedastic. Now use the White estimator on that model. Does it change the SE's very much?

  6. The envelope please: Add up all the rooms cleaned. Add up all the crews. Divide these two to come up with an average number of rooms cleaned per crew. This should match one of the slopes you computed above. (You could also compute a standard error by hand to see which confidence intervals above are the closest to describing the right answer. But you don't have to do this.)


    Influence = leverage x outlier

    Various definitions of influence

    Last modified: Thu Feb 22 08:41:05 2001