Statistics 541: Logistic
- Read section 7.4
- Questions about the project? (suggested due date: next tuesday)
- Sure there aren't any question?
- Really no questions?
Suppose Y is discrete (binary response) and X is one or more
continuous variable(s). We want to write P(Y) as a function of X
beta. This violates any sense of linear regression--so how do we
proceed? Logistic regression uses P(Y) = logit(X beta). But why is
this at all reasonable?
Normal distribution motivation
Consider a simple regression of Y (binary) on X (continuous).
- Think of the joint distribution
- Three ways of thinking about it:
- P(X|Y) (this one we have done already)
- So suppose X|Y is normal
- We have a two sample t-test situation (simple linear regression
with one binary regressor.)
- So, working this this model, compute P(Y|X)
- Or more easilly, the odds ratio P(Y=1|X)/P(Y=0|X) = exp(alpha +
- Oops: Slope isn't 1/diference in means but is difference in
- Can this possibly be correct?
- Ok, graphically it is fine--but does it make any sense
Regression slopes review
- slope of Y|X = beta
- slope of X|Y = gamma
- if Y|X increases does gamma increase or decrease?
- decreases: Scale Y by a factor r. Then beta --> r beta
and gamma -> gamma/r.
- increases: slope is cov/var. So if beta increases, the
covaraince increases and hence gamma increases
- Oops: that doesn't help
- SD line: ratio of SD(y)/SD(x). Truely the best fit line!
- Keeping the SD line constant, the slopes are identical.
- This is what is happening in the logistic regression problem
Interpretation of the wrong-way two-sample t test model
- Slope of logistic regression measures how far about the two
- Different variances --> add a quadratic term
- Doesn't make sense if X is an outlier
- p-values should be identical (or at least close)
Last modified: Tue Mar 20 11:44:50 2001