STAT 541: Summary

# Statistics 541: Summary

• Evaluations

## Summary

• Always run a regression first--it helps you understand your data
• Estimators need standard errors to be useful
• Identification of independence requires scientific knowledge not statistical knowledge
• Design in orthogonality or suffer colinearity
• Use generalized linear models for efficiency (i.e. when publishing)

### Regression

• This class pushed regression
• We can handle almost any difficulty that arises in regression
• Hence if you have trouble with your data--you can make sure that you can deal with the primary problems (by using regression)

### Standard errors

• Anyone can write down an estimator that will "guess" the right answer. (Method of moments, MLE, "it just feels right" estimators)
• But without a standard error these are practically useless:
• How to justify significance?
• How to do Bonferonni?
• How to create confidence intervals?
• How to tell if one estimator is more accurate than another?
• Cheap standard errors: Two independent estimates of the same thing.
• for example: One based on the future and the other based on the past (this would have shorten an Econometrica paper of mine by 30 pages)
• for example: histograms instead of kernal smoothed densities
• in expensive simulations: Run it twice

### Independence

• If there isn't independence the SEs are wrong
• We can identify from the data:
• hetroskadasticity
• distributions of errors (normal, Cauchy, etc)
• covariance structure of repeated measurements
• linearity
• complex patterns (say polynomials)
• It is impossible to identify independence
• The best we can do is look for some simple form of dependence
• Say the simple forms found in time series
• So know the science behind your data to identify independence
• Note: bootstrapping won't help.

### Orthogonality and randomization

• If the coefficients of the regression are actually important orthoganlity/randomization is almost necessary for them to make sense

### Efficiency

• Efficiency requires believing a model
• Some models are easy to believe (say binary data, or Poisson data). These don't need the disclaimers below.
• Often times the more efficiency you have, the more sensitive your estimators become to assumptions of your model being correct
• Tests based on ranks often avoid this problem
• If you have explored your data first using regression, you can avoid this problem by hand
• If your robust estimator disagrees with efficient estimator, use the robust one
• If your robust confidence interval is much wider than your efficient one, life is good. Give both and let the readed decide if they are willing to make the added assumption necessary to justify the narrower window.

Last modified: Tue Apr 24 07:00:43 2001