Questions about the project? (suggested due date: next tuesday)
Sure there aren't any question?
Really no questions?
Positive?
Logistic
Suppose Y is discrete (binary response) and X is one or more
continuous variable(s). We want to write P(Y) as a function of X
beta. This violates any sense of linear regression--so how do we
proceed? Logistic regression uses P(Y) = logit(X beta). But why is
this at all reasonable?
Normal distribution motivation
Consider a simple regression of Y (binary) on X (continuous).
Think of the joint distribution
Three ways of thinking about it:
P(Y,X)
P(Y|X)
P(X|Y) (this one we have done already)
So suppose X|Y is normal
We have a two sample t-test situation (simple linear regression
with one binary regressor.)
So, working this this model, compute P(Y|X)
Or more easilly, the odds ratio P(Y=1|X)/P(Y=0|X) = exp(alpha +
beta X)
Oops: Slope isn't 1/diference in means but is difference in
means itself!
Can this possibly be correct?
Ok, graphically it is fine--but does it make any sense
Regression slopes review
slope of Y|X = beta
slope of X|Y = gamma
if Y|X increases does gamma increase or decrease?
decreases: Scale Y by a factor r. Then beta --> r beta
and gamma -> gamma/r.
increases: slope is cov/var. So if beta increases, the
covaraince increases and hence gamma increases
Oops: that doesn't help
SD line: ratio of SD(y)/SD(x). Truely the best fit line!
Keeping the SD line constant, the slopes are identical.
This is what is happening in the logistic regression problem
Interpretation of the wrong-way two-sample t test model
Slope of logistic regression measures how far about the two
distributions are