STAT 541: Finding Independence

Statistics 541: Finding Independence

Admistrivia

How to make prediction intervals in JMP (and possibly other software): If you regress Y on X1,X2 and you want to predict for X1=5,x2=10, then create Z1 = X1-5, Z2 = X2-10, and now regress Y on Z1 and Z2. The intercept of this regression is the point Z1=0, Z2=0, which is the same point as X1=5, X2=10. In other words, assuming you are doing a quadratic regression, the regression formula is:
logit(P(Y)) = alpha + beta₁ Z1 + beta₂ Z2 + beta₃ Z1² + ...
But sinse the Z1 and Z2 are both zero, this reduces to
logit(P(Y)) = alpha + beta₁ 0 + beta₂ 0 + beta₃ 0² + ... = alpha
The confidence interval for this intercept is now the same as a confidence interval for alpha.

Finding Independence

Experiment: A study has been performed to determine how students interact with professors. (Ok, it is currently being proformed here at Penn and so we don't have any data yet.) Students are observed raising their hands and data is recorded as to who gets called on. 1520's events are collected. The question of interest is whether male professors call on male students more or less than female students and likewise for female professors.

For each event, the following is recorded:

The sex of the professor
number of males with their hands up
number of females with their hands up
who gets called on
fraction of males in the class

19 classes of data were collected with 80 observations taken in each class. 10 male professors and 10 female professors. (But one female professor used cold calling and so was eliminated from the study.) How should this be analyzed?

Different attacks

Regression vs. logistic regression
Subsample times when there are the same number of males and females?
What are the variables should we use? (Only sex of called on and professor? Also who has hand up?)
What do we want to test for? (intercept? interaction term? some slope?)
What is the sample size?

Naive logistic regression

run logistic regression of called_on_sex on fraction_male_hands for only the male professors.
Look at prediction for .5 male hands. Is this "intercept" zero?
If we reject that it is zero, what have we proven?
- Each professor has their own pet that they call on?
- Males are biased?
- These males are biased?
- People sitting at the front of classes are called on more often?
- The person called on is NOT a random draw from the students who have their hands up
Now suppose we compute an intercept for each of the 10 professors. That gives us 10 numbers. We can now test if they are in fact different from zero. What have we proven?
This is much closer to showing that males are biased.

Doesn't clustering hurt us statistically alot?

Suppose we have n samples from 10 different hospitals.
Suppose the are all truely IID normal(mu,sigma²).
We should use X-bar = sum/(10n), with SE = sigma/sqrt(10n)
But, suppose we worry that the data might not be independent within each hospital
So we compute X-bar₁, X-bar₂,..., X-bar₁₀.
Each under our assumption is normal(mu,sigma²/n)
We then use classical statistics on these 10 numbers
The average of the average is the same as X-bar before
Its SE = (sigma/sqrt(n))/sqrt(10)
The SAME SE as before!
So this gratutious clustering hasn't hurt (very much).

Last modified: Thu Apr 5 08:31:09 2001