Stat 112: Model, parameters and Lurking variables
- Statistics: scatter diagram of Y vs X
- probability: histograms of future observations given X
- Both describe X relating to Y
- Model details
- describes the population
- Average Y = beta0 + beta1 x
- Actual Y = beta0 + beta1 x + noise
- Noise = normal, with SD = sigma
- Three greek letters (parameters):
- beta0 = intercept
- beta1 = slope
- sigma = noise
- Called "Simple Linear Regression Model"
- Estimating parameters from data
- Fit by least squares: Y = b0 + b1 X
- b0 estimates beta0
- b1 estimates beta1
- JMP will compute standard errors for you
- good guess for slope: b1 +/- 2 * SE
- Look at insurance data (or see handout)
- Model is different than causation.
- X ---> Y could generate model
- Y ---> X could generate model (draw smoking = Y, cancer = X)
- Z ---> X and Z ---> Y could generate model
- In all above, the model might hold
- But causation is only in the first case
- proving causation requires a controled experiment
- proving the model requires data (easy to do with observation)
- If we have good residuals, and a good linear fit do we have causation?
- NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO!
- Why: Lurking variables
- X causes Y is possible
- Y causes X is possible
- Z causes X and Y is posible (Z is lurking variable)
- Infant death rates and breast feeding: lurking variable
Last modified: Tue Feb 1 09:54:30 EST 2000