STAT 541: Logistic

Statistics 541: Logistic

Admistrivia

Read section 7.4
Questions about the project? (suggested due date: next tuesday)
Sure there aren't any question?
Really no questions?
Positive?

Logistic

Suppose Y is discrete (binary response) and X is one or more continuous variable(s). We want to write P(Y) as a function of X beta. This violates any sense of linear regression--so how do we proceed? Logistic regression uses P(Y) = logit(X beta). But why is this at all reasonable?

Normal distribution motivation

Consider a simple regression of Y (binary) on X (continuous).

Think of the joint distribution
Three ways of thinking about it:
- P(Y,X)
- P(Y|X)
- P(X|Y) (this one we have done already)
So suppose X|Y is normal
We have a two sample t-test situation (simple linear regression with one binary regressor.)
So, working this this model, compute P(Y|X)
Or more easilly, the odds ratio P(Y=1|X)/P(Y=0|X) = exp(alpha + beta X)
Oops: Slope isn't 1/diference in means but is difference in means itself!
Can this possibly be correct?
Ok, graphically it is fine--but does it make any sense

Regression slopes review

slope of Y|X = beta
slope of X|Y = gamma
if Y|X increases does gamma increase or decrease?
- decreases: Scale Y by a factor r. Then beta --> r beta and gamma -> gamma/r.
- increases: slope is cov/var. So if beta increases, the covaraince increases and hence gamma increases
- Oops: that doesn't help
SD line: ratio of SD(y)/SD(x). Truely the best fit line!
Keeping the SD line constant, the slopes are identical.
This is what is happening in the logistic regression problem

Interpretation of the wrong-way two-sample t test model

Slope of logistic regression measures how far about the two distributions are
Different variances --> add a quadratic term
Doesn't make sense if X is an outlier
p-values should be identical (or at least close)

Last modified: Tue Mar 20 11:44:50 2001