Dealing with non-independence
- Administrivia:
- Final: May 4th 1:30 - 3:30 in 10 Leidy labs
- Extra office hours by Nuria: 10 - 1 April 26 and 27th.
- things to include in your writeup:
- predicion for a sample person (or posibly 2 or 3
people). Give a prediction interval (i.e. +/- 2
RMSE)
- interpret slopes. Like: "greeks drink on average
2 more drinks a week than non-greeks."
- Give confidence intervals for your slopes. for
example, "greeks drink between 1.5 and 5.5 drinks
more than non-greeks."
- give at least one confidence for a variable not
in your model (for example say, "there was no
significant difference between the college and engineering in
the number of drinks they drink. More precisely,
the college students drink from 1 drink fewer
per week to 2 drinks more per week.")
Non independence
- Definition: two observations are related through more than
their X's
- techinical definition: the RESIDUALS are correlated
- Example: "How much opera do you listen to?"
- Ask everyone on a dorm floor
- think: two people per room
- ab, cd, ef, gh are all room-mate pairs
- the amount a listens to opera is highly related to the
amount b does
- Response: Ignore it
- a&b are correlated
- do plot of opera vs year has a&b show up in pairs
- basically doubling the data
- t are wrong, CI are wrong, prediction intervals are
wrong, p-values are wrong, plots are mis-leading
- BAD IDEA!
- Response: remove 1/2 of the data
- only look at a, c, e, g, etc
- gets rid of dependence problems
- unfortuantely not as much data
- Works fine in practice (used in sample surveys often
times)
- Response: regress a on b, c on d, e on f, g on h
- use room mate to predict listening habits
- uses all the data--1/2 converted to X 1/2 stays as Y
- Of course, you must be willing use the roommate as a
predictor (you might not want to do this if you were
interested in year vs opera effects.)
- Regress both ways
- a on b AND b on a
- draw picture. once you see that a on b is positive, you
know b on a is negative
- so residuals are correlated
- t are wrong, CI are wrong, prediction intervals are
wrong, p-values are wrong, plots are mis-leading
- BAD IDEA!
Time series
- example Y = calories/person/day, X = year
- regress Y on X has correlated errors (draw picture)
- regress Yt on Yt-1
- other example: seasonal adjustment
Last modified: Tue Apr 25 13:15:30 2000