(No need to predict truth--that is ahead of your time)
But, if you predict future predictions, they will all have to
cite you!
Ignore future variation: only look at statistical error
Y - y-hat = prediction interval
EY - y-hat = confidence interval
Note: Y - y-hat = (Y - EY) - (EY - y-hat)
difference of two independent random variables
hence variances add
Variance of Y-EY is constant
variance of EY-y-hat grows as we get far from the data
we have previously collected
Where does statistical error come from?
A little background:
Regression does least squares
solution to least squares is linear combination of error + truth
Hence normality sets in and CLT sets in
E(alpha-hat) = alpha (This is called unbiased)
E(beta-hat) = beta (This is called unbiased)
Thus accuracy of alpha and beta depend on where the data is collected
more spread out X's generate better slope estimates
But might introduct unwanted data to problem. Hence design of where to put the X's must take care.
Error looks like sigma/sqrt(n)sigma(X)
Translating this statistical error into confidence intervals
y-hat = alpha-hat + beta-hat x
So errors in alpha-hat and beta-hat lead to errors in
y-hat
unfortunately, alhpa-hat and beta-hat are correlated
So formula is kinda ugly
Cell phones
pages 29-38 and pages 53-56
Extrapolation is where the fun is! (And the danger)
Looks like March 2002 = 180 Million (Current statistics: here).
Cottages
pages 78 - 85, and 89 - 98
Does linear regression really capture the spread of confidence?
Consider confidence intervals for polynomial regression. YIKES!
Homework
Be sure you understand all the pictures in the first three
chapters of the book. The primary issues is to understand WHY
they were generated, not how.
Read blurb on page 109. Look at the car89 data using simple
regression. Don't follow the book--just play around with the
data.
Look at the stat-lib data
archive. Find a data set that sounds interesting--read it
into JMP and analyse it. Examples that might be of interest:
sleep
(question: does size affect hours of sleep? Does
predation affect hours of sleep?)
world
series (question: Is there a home field advantage?
Has it increase or decrased over time?)
places looks at various
properties of places: crime, transportation, education.
If you don't want to unpack it, I"ve done so here:
(.DAT, .documentation, .KEY)
Notice: you will probably have to use an editor and maybe excel
to read these into JMP. Be sure to page to the bottom of the
file in case there are suprises! (Sometimes there is a
different data set at the bottom of the file.)