STAT 541: Finding Independence

# Statistics 541: Finding Independence

## Admistrivia

• How to make prediction intervals in JMP (and possibly other software): If you regress Y on X1,X2 and you want to predict for X1=5,x2=10, then create Z1 = X1-5, Z2 = X2-10, and now regress Y on Z1 and Z2. The intercept of this regression is the point Z1=0, Z2=0, which is the same point as X1=5, X2=10. In other words, assuming you are doing a quadratic regression, the regression formula is:

logit(P(Y)) = alpha + beta1 Z1 + beta2 Z2 + beta3 Z12 + ...

But sinse the Z1 and Z2 are both zero, this reduces to

logit(P(Y)) = alpha + beta1 0 + beta2 0 + beta3 02 + ... = alpha

The confidence interval for this intercept is now the same as a confidence interval for alpha.

## Finding Independence

Experiment: A study has been performed to determine how students interact with professors. (Ok, it is currently being proformed here at Penn and so we don't have any data yet.) Students are observed raising their hands and data is recorded as to who gets called on. 1520's events are collected. The question of interest is whether male professors call on male students more or less than female students and likewise for female professors.

For each event, the following is recorded:

• The sex of the professor
• number of males with their hands up
• number of females with their hands up
• who gets called on
• fraction of males in the class
19 classes of data were collected with 80 observations taken in each class. 10 male professors and 10 female professors. (But one female professor used cold calling and so was eliminated from the study.) How should this be analyzed?

### Different attacks

• Regression vs. logistic regression
• Subsample times when there are the same number of males and females?
• What are the variables should we use? (Only sex of called on and professor? Also who has hand up?)
• What do we want to test for? (intercept? interaction term? some slope?)
• What is the sample size?

### Naive logistic regression

• run logistic regression of called_on_sex on fraction_male_hands for only the male professors.
• Look at prediction for .5 male hands. Is this "intercept" zero?
• If we reject that it is zero, what have we proven?
• Each professor has their own pet that they call on?
• Males are biased?
• These males are biased?
• People sitting at the front of classes are called on more often?
• The person called on is NOT a random draw from the students who have their hands up
• Now suppose we compute an intercept for each of the 10 professors. That gives us 10 numbers. We can now test if they are in fact different from zero. What have we proven?
• This is much closer to showing that males are biased.

### Doesn't clustering hurt us statistically alot?

• Suppose we have n samples from 10 different hospitals.
• Suppose the are all truely IID normal(mu,sigma2).
• We should use X-bar = sum/(10n), with SE = sigma/sqrt(10n)
• But, suppose we worry that the data might not be independent within each hospital
• So we compute X-bar1, X-bar2,..., X-bar10.
• Each under our assumption is normal(mu,sigma2/n)
• We then use classical statistics on these 10 numbers
• The average of the average is the same as X-bar before
• Its SE = (sigma/sqrt(n))/sqrt(10)
• The SAME SE as before!
• So this gratutious clustering hasn't hurt (very much).

Last modified: Thu Apr 5 08:31:09 2001