Statistics 102H: Predictions and CI (cont'd)
Statistics 102H: Predictions and CI (cont'd)
Can't the preditions just deal with bad data?
Try a linear fit to curved data (look at prediction intervals)
Try a linear fit to hetroskadastic data (look at prediction intervals)
Coverage variaries depending on where you are in the data
sometimes 100% (i.e. too wide)
sometimes 50% or less (i.e. too narrow)
Hence looking at graph is VERY important
At least some of these we can fix
Transformations for hetroskadasticity
What is a transformation?
Suppose "log(Y) vs X" and and "Y vs exp(X)" both fit equally well.
Which to use?
Different prediction bounds
Do actual transformation
Example of picking a transformation:
picking between log-log and 1/X model in housing data
Generate all the new variables
Plot on new axis
Which is less hetroskadastic?
Example useful for linear case:
Cleaning crews data
linear-linear or should we use log-log or sqrt-sqrt?
Plot prediction intervals on orginal graph to see
Save residuals
The log-log transformation (elasticities)
consider the asteroid data
raw plot is very ugly (sketch picture)
too extream in both directions
log-log tames a whole range of values
Homework
(next time?)
What about that statistical error?
A little background:
Regression does least squares
solution to least squares is linear combination of erorr + truth
Hence normality sets in and CLT sets in
E(alpha-hat) = alpha (This is called unbiased)
Thus accuracy of alpha and beta depend on where the data is collected
more spread out X's generate better slope estimates
But might introduct unwanted data to problem. Hence design of where to put the X's must take care.
Error looks like sigma/sqrt(n)sigma(X)
Cell phones
Current statistics:
here
.
Looks like March 2002 = 180 Million?
Last modified: Wed Jan 29 07:29:13 2003