Statistics 541: F Tests
Homework
Homework: (due next thursday) A construction company that builds
lakefront vacation cottages is considering a move to construction of a
line of larger vacation homes. It has been building cottages in the
500 - 1000 square-foot range. It recently built one which was 3500
square feet.
Its primary question is should it build some cottages around
2500 sq-ft in size.
- Using S-plus read in the data
- Run a simple regression and create a prediction interval
for a 2500 sq-ft cottage.
- Run a multiple regression on both square-ft and
(square-ft)2. (y ~ x + x^2). Now create a
prediction interval for a 2500 sq-ft cottage. Why is
there such a big difference from the interval you
created before?
- Generate prediction intervals for cottages of sizes 0,
500, 1000, ..., 3000 square ft using both the linear and
the quadradic model. Plot these intervals on a graph
(you might want to do this by hand if you can't trick
S-plus into doing it for you.)
- Sketch a few quadradics that might stay within your
bounds for the entire range of the data.
- Do you think they should build these cottages or not?
Type up a one paragraph description of your reasoning.
The above can be done more simply using JMP. Load the data (.jmp) and do a fit-Y-by-X. Now add a
line. On the line button, have it add
confidence-intervals-individual. Add the quadradic line also.
Now add the confidence intervals for the quadradic fit. While
your at it, try doing a cubic fit.
Exclude the 3500 ft point. Now re-generate the linear fit with
its prediction intervals. Notice how most of the "accuracy" of
prediction of the linear fit is due to the 3500 ft cottage.
Also generate the quadradic fit on this smaller data set.
Print the entire mess (with all 5 fits on it). Obviously this
is a fairly useless graph!
F Tests
Kinderschool example continued
- Skiing Kinderschool example: Y = how much child learns. Xs =
instructor age, instructor skiing expericiense, instructor
teaching expericiense, instructor sex, student age, student
sex, student expericence.
- Does the teacher matter?
- OK, the teacher matters, do interactions matter?
- needs a divisor (could use TSS, or SSE, tradition says SSE)
- called partial F
- tests whole subspace is irrelvant
- Not much power against only one variable being relavent
- Interactions
- instructor variables * student variables
- (quadradic surface, so include squared variables)
- Hopefully nothing is significant
- Tests if the R-sq'd has gone up significianlty
- Peak vs off peak
- consider a variable X7 = PEAK = 1 if peak 0 if offpeak
- add it to regression
- doesn't do much
- interact it with all the other 6 varibles
- NOW it does a lot! Creates two totally different regressions
- The partial F is very useful and meaningful here
- How partial F is usually used (albeit incorrectly)
- Look at regression with all 13 variables in it,
- Looks like X1, X4, X7, X13 are useful (large t-statistics)
- Lets test if they are
- Run partial-F-test
- Reject if F is less than .05. WRONG!
- Consider experiment that does this on random data.
- There are 13 choose 4 different F statistics being looked at each time
- A total of 715 different tests!
- So use Bonferroni (see M:page 50)
- If each test is done at .05/715, total chance of error is less than .05
Last modified: Thu Feb 8 08:35:09 2001