STAT 541: F Tests

Statistics 541: F Tests

Homework

Homework: (due next thursday) A construction company that builds lakefront vacation cottages is considering a move to construction of a line of larger vacation homes. It has been building cottages in the 500 - 1000 square-foot range. It recently built one which was 3500 square feet.

Its primary question is should it build some cottages around 2500 sq-ft in size.

Using S-plus read in the data
Run a simple regression and create a prediction interval for a 2500 sq-ft cottage.
Run a multiple regression on both square-ft and (square-ft)². (y ~ x + x^2). Now create a prediction interval for a 2500 sq-ft cottage. Why is there such a big difference from the interval you created before?
Generate prediction intervals for cottages of sizes 0, 500, 1000, ..., 3000 square ft using both the linear and the quadradic model. Plot these intervals on a graph (you might want to do this by hand if you can't trick S-plus into doing it for you.)
Sketch a few quadradics that might stay within your bounds for the entire range of the data.
Do you think they should build these cottages or not? Type up a one paragraph description of your reasoning.

The above can be done more simply using JMP. Load the data (.jmp) and do a fit-Y-by-X. Now add a line. On the line button, have it add confidence-intervals-individual. Add the quadradic line also. Now add the confidence intervals for the quadradic fit. While your at it, try doing a cubic fit.

Exclude the 3500 ft point. Now re-generate the linear fit with its prediction intervals. Notice how most of the "accuracy" of prediction of the linear fit is due to the 3500 ft cottage. Also generate the quadradic fit on this smaller data set.

Print the entire mess (with all 5 fits on it). Obviously this is a fairly useless graph!

F Tests

Kinderschool example continued

Skiing Kinderschool example: Y = how much child learns. Xs = instructor age, instructor skiing expericiense, instructor teaching expericiense, instructor sex, student age, student sex, student expericence.
Does the teacher matter?
OK, the teacher matters, do interactions matter?
- needs a divisor (could use TSS, or SSE, tradition says SSE)
- called partial F
- tests whole subspace is irrelvant
- Not much power against only one variable being relavent
Interactions
- instructor variables * student variables
- (quadradic surface, so include squared variables)
- Hopefully nothing is significant
- Tests if the R-sq'd has gone up significianlty
Peak vs off peak
- consider a variable X₇ = PEAK = 1 if peak 0 if offpeak
- add it to regression
- doesn't do much
- interact it with all the other 6 varibles
- NOW it does a lot! Creates two totally different regressions
- The partial F is very useful and meaningful here
How partial F is usually used (albeit incorrectly)
- Look at regression with all 13 variables in it,
- Looks like X1, X4, X7, X13 are useful (large t-statistics)
- Lets test if they are
- Run partial-F-test
- Reject if F is less than .05. WRONG!
- Consider experiment that does this on random data.
- There are 13 choose 4 different F statistics being looked at each time
- A total of 715 different tests!
- So use Bonferroni (see M:page 50)
- If each test is done at .05/715, total chance of error is less than .05

Last modified: Thu Feb 8 08:35:09 2001