Class 12 Stat701 Fall 1997

Estimation, Heteroscedasticity and Weighted Least Squares.


Todays class.


What makes a good estimate?

A good estimator, tex2html_wrap_inline34 of a population parameter tex2html_wrap_inline36 has at least two properties:

*
On average it takes on the correct value, that is tex2html_wrap_inline38
*
It is concentrated around the true value

To see that the first is not enough by itself consider civilians living around a missile testing range.

Technically the first of these conditions is termed unbiased, and an unbiased estimator that is most concentrated about the true value is called efficient.

So a good estimator is unbiased and efficient.

Further, it is highly desirable to be able to estimate accurately the variability/s.e. of an estimator.

It is a fact that under the classical regression assumptions, the least squares estimate of tex2html_wrap_inline40 is both unbiased and efficient. Further, the estimate of the standard error is accurate - we quantify the uncertainty accurately.

Recall the the ordinary least squares (OLS) estimates are defined as the values of tex2html_wrap_inline42 and tex2html_wrap_inline44 that minimize

displaymath46

Heteroscedasticity

Under conditions of heteroscedasticity things change:

To be precise, the OLS estimate of tex2html_wrap_inline40 is still unbiased but it is not efficient. That is, there is another estimator out there that is more concentrated about the true value than the OLS estimate. In addition, the estimate of standard error that comes from OLS is just plain wrong, it misrepresents the standard error. The bottom line is that estimation is OK but could be better and inference (CI's and p-values) is broken.

Options:

1. A very straightforward option is to still use the OLS estimate, recognize that it is not efficient, but at least get the standard error right. One way to get the standard error right is to use bootstrap estimates of standard error.

2. If you want to deal with the efficiency issue you need to leave the OLS paradigm. The way to get efficient estimates in the presence of heteroscedasticity is to do Weighted Least Squares (WLS).

WLS estimates are defined as the values of tex2html_wrap_inline42 and tex2html_wrap_inline44 that minimize

displaymath54

where tex2html_wrap_inline56 are the weights.

The question is of course, just what should the weights be? Intuitively you should give higher weight in the regression to observations that are more ``informative'', and down weight the less informative observations. It turns out that information is just the inverse of variability, so that if we denote the variance of tex2html_wrap_inline58 as tex2html_wrap_inline60 , then tex2html_wrap_inline62 . That is we weight inversely proportional to the variance.

Next question; how do we get at tex2html_wrap_inline60 ? Options here are

*
Sometimes we know it. An example is Poisson data where the variance is proportional to the mean. Another example is when there is an a priori reason for believing that the variance is a function of some variable (often a proxy for size).
*
Otherwise we model it. The idea here is that if we knew the true error terms, the tex2html_wrap_inline66 , than by definition tex2html_wrap_inline68 . Unfortunately we don't have the true tex2html_wrap_inline70 's so we use the residuals as surrogates, and we make a model for the squared residuals. If we term the estimates from the squared residuals model as tex2html_wrap_inline72 then we use tex2html_wrap_inline74 as weights in the WLS. Clearly this is a two step procedure. First run a preliminary OLS regression to get at the residuals. Obtain weights, then rerun a WLS regression.

In summary, in the presence of heteroscedasticity we have

tabular26

To illustrate the points I have run some simulations from a fake dataset, one in which I generate the data, so know the ``truth''. This is the most common way (Monte Carlo simulation) around to understand long run properties of estimates and procedures.

The idea is to create data and run a regression. Coefficients and standard errors are saved for this regression. Then another new data set is created and the new coefficients etc are saved. This is replicated many times (2000 for this example) and the long run/average properties of the estimates are investigated.

It may even be the case that some of the coefficients themselves are obtained by a simulation, in this case the bootstrap standard error estimates, so that there are two levels of simulation. If each bootstrap estimate takes 1000 resamples from a single dataset, and there are 2000 replicates in the entire simulation then we end up doing 2,000,000 regressions!

The Fake.

tabular39

Comments:



Richard Waterman
Wed Oct 15 23:11:29 EDT 1997