Always run a regression first--it helps you understand your
data
Estimators need standard errors to be useful
Identification of independence requires scientific knowledge
not statistical knowledge
Design in orthogonality or suffer colinearity
Use generalized linear models for efficiency (i.e. when
publishing)
Regression
This class pushed regression
We can handle almost any difficulty that arises in regression
Hence if you have trouble with your data--you can make sure
that you can deal with the primary problems (by using
regression)
Standard errors
Anyone can write down an estimator that will "guess" the right
answer. (Method of moments, MLE, "it just feels right"
estimators)
But without a standard error these are practically useless:
How to justify significance?
How to do Bonferonni?
How to create confidence intervals?
How to tell if one estimator is more accurate than
another?
Cheap standard errors: Two independent estimates of the
same thing.
for example: One based on the future and the other based on the
past (this would have shorten an Econometrica paper of
mine by 30 pages)
for example: histograms instead of kernal smoothed densities
in expensive simulations: Run it twice
Independence
If there isn't independence the SEs are wrong
We can identify from the data:
hetroskadasticity
distributions of errors (normal, Cauchy, etc)
covariance structure of repeated measurements
linearity
complex patterns (say polynomials)
It is impossible to identify independence
The best we can do is look for some simple form of
dependence
Say the simple forms found in time series
So know the science behind your data to identify independence
Note: bootstrapping won't help.
Orthogonality and randomization
If the coefficients of the regression are actually important
orthoganlity/randomization is almost necessary for them to make
sense
Efficiency
Efficiency requires believing a model
Some models are easy to believe (say binary data, or Poisson
data). These don't need the disclaimers below.
Often times the more efficiency you have, the more sensitive
your estimators become to assumptions of your model being
correct
Tests based on ranks often avoid this problem
If you have explored your data first using regression, you can
avoid this problem by hand
If your robust estimator disagrees with efficient
estimator, use the robust one
If your robust confidence interval is much wider than your efficient
one, life is good. Give both and let the readed decide if they
are willing to make the added assumption necessary to justify
the narrower window.