STAT 541: Logistic

# Statistics 541: Logistic

## Admistrivia

• Read section 7.4
• Questions about the project? (suggested due date: next tuesday)
• Sure there aren't any question?
• Really no questions?
• Positive?

## Logistic

Suppose Y is discrete (binary response) and X is one or more continuous variable(s). We want to write P(Y) as a function of X beta. This violates any sense of linear regression--so how do we proceed? Logistic regression uses P(Y) = logit(X beta). But why is this at all reasonable?

## Normal distribution motivation

Consider a simple regression of Y (binary) on X (continuous).
• Think of the joint distribution
• Three ways of thinking about it:
• P(Y,X)
• P(Y|X)
• P(X|Y) (this one we have done already)
• So suppose X|Y is normal
• We have a two sample t-test situation (simple linear regression with one binary regressor.)
• So, working this this model, compute P(Y|X)
• Or more easilly, the odds ratio P(Y=1|X)/P(Y=0|X) = exp(alpha + beta X)
• Oops: Slope isn't 1/diference in means but is difference in means itself!
• Can this possibly be correct?
• Ok, graphically it is fine--but does it make any sense

## Regression slopes review

• slope of Y|X = beta
• slope of X|Y = gamma
• if Y|X increases does gamma increase or decrease?
• decreases: Scale Y by a factor r. Then beta --> r beta and gamma -> gamma/r.
• increases: slope is cov/var. So if beta increases, the covaraince increases and hence gamma increases
• Oops: that doesn't help
• SD line: ratio of SD(y)/SD(x). Truely the best fit line!
• Keeping the SD line constant, the slopes are identical.
• This is what is happening in the logistic regression problem

## Interpretation of the wrong-way two-sample t test model

• Slope of logistic regression measures how far about the two distributions are
• Different variances --> add a quadratic term
• Doesn't make sense if X is an outlier
• p-values should be identical (or at least close)

Last modified: Tue Mar 20 11:44:50 2001