Its primary question is should it build some cottages around 2500 sq-ft in size.

- Using S-plus read in the data
- Run a simple regression and create a prediction interval for a 2500 sq-ft cottage.
- Run a multiple regression on both square-ft and
(square-ft)
^{2}. (y ~ x + x^2). Now create a prediction interval for a 2500 sq-ft cottage. Why is there such a big difference from the interval you created before? - Generate prediction intervals for cottages of sizes 0, 500, 1000, ..., 3000 square ft using both the linear and the quadradic model. Plot these intervals on a graph (you might want to do this by hand if you can't trick S-plus into doing it for you.)
- Sketch a few quadradics that might stay within your bounds for the entire range of the data.
- Do you think they should build these cottages or not? Type up a one paragraph description of your reasoning.

Exclude the 3500 ft point. Now re-generate the linear fit with its prediction intervals. Notice how most of the "accuracy" of prediction of the linear fit is due to the 3500 ft cottage. Also generate the quadradic fit on this smaller data set.

Print the entire mess (with all 5 fits on it). Obviously this is a fairly useless graph!

- Skiing Kinderschool example: Y = how much child learns. Xs = instructor age, instructor skiing expericiense, instructor teaching expericiense, instructor sex, student age, student sex, student expericence.
- Does the teacher matter?
- OK, the teacher matters, do interactions matter?
- needs a divisor (could use TSS, or SSE, tradition says SSE)
- called partial F
- tests whole subspace is irrelvant
- Not much power against only one variable being relavent

- Interactions
- instructor variables * student variables
- (quadradic surface, so include squared variables)
- Hopefully nothing is significant
- Tests if the R-sq'd has gone up significianlty

- Peak vs off peak
- consider a variable X
_{7}= PEAK = 1 if peak 0 if offpeak - add it to regression
- doesn't do much
- interact it with all the other 6 varibles
- NOW it does a lot! Creates two totally different regressions
- The partial F is very useful and meaningful here

- consider a variable X
- How partial F is usually used (albeit incorrectly)
- Look at regression with all 13 variables in it,
- Looks like X1, X4, X7, X13 are useful (large t-statistics)
- Lets test if they are
- Run partial-F-test
- Reject if F is less than .05. WRONG!
- Consider experiment that does this on random data.
- There are 13 choose 4 different F statistics being looked at each time
- A total of 715 different tests!
- So use Bonferroni (see M:page 50)
- If each test is done at .05/715, total chance of error is less than .05

Last modified: Thu Feb 8 08:35:09 2001

*
*