STAT 541: Summary

Statistics 541: Summary

Admistrivia

Always run a regression first--it helps you understand your data
Estimators need standard errors to be useful
Identification of independence requires scientific knowledge not statistical knowledge
Design in orthogonality or suffer colinearity
Use generalized linear models for efficiency (i.e. when publishing)

This class pushed regression
We can handle almost any difficulty that arises in regression
Hence if you have trouble with your data--you can make sure that you can deal with the primary problems (by using regression)

Anyone can write down an estimator that will "guess" the right answer. (Method of moments, MLE, "it just feels right" estimators)
But without a standard error these are practically useless:
- How to justify significance?
- How to do Bonferonni?
- How to create confidence intervals?
- How to tell if one estimator is more accurate than another?
Cheap standard errors: Two independent estimates of the same thing.
- for example: One based on the future and the other based on the past (this would have shorten an Econometrica paper of mine by 30 pages)
- for example: histograms instead of kernal smoothed densities
- in expensive simulations: Run it twice

If there isn't independence the SEs are wrong
We can identify from the data:
- hetroskadasticity
- distributions of errors (normal, Cauchy, etc)
- covariance structure of repeated measurements
- linearity
- complex patterns (say polynomials)
It is impossible to identify independence
- The best we can do is look for some simple form of dependence
- Say the simple forms found in time series
So know the science behind your data to identify independence
Note: bootstrapping won't help.

If the coefficients of the regression are actually important orthoganlity/randomization is almost necessary for them to make sense

Efficiency requires believing a model
Some models are easy to believe (say binary data, or Poisson data). These don't need the disclaimers below.
Often times the more efficiency you have, the more sensitive your estimators become to assumptions of your model being correct
Tests based on ranks often avoid this problem
If you have explored your data first using regression, you can avoid this problem by hand
If your robust estimator disagrees with efficient estimator, use the robust one
If your robust confidence interval is much wider than your efficient one, life is good. Give both and let the readed decide if they are willing to make the added assumption necessary to justify the narrower window.

Last modified: Tue Apr 24 07:00:43 2001