STAT 541: JMP and L and M estimators
Statistics 541: JMP and L and M estimators
Admistrivia
- 1st homework:
- Find a data set with between 5 and 20 columns of
comparable information.
- Read the data into JMP and compute the statistics computed
on page 10 of the handout. Look at comparison box plots
for all your data. Be sure to put them on the same scale!
- Read the data into Splus and compute the statistics for
all the columns in your dataset. (See Splus code on the
pages between 17 and 18.) The idea is to do this somewhat
automatically. Either by creating a function. Or by
using your editor to make a command list. You should be
able to re-generate your output after a small change is
made in the data without having to click lots and lots of
buttons.
- Comment on what you found out scientifically. (I.e. which
column has the highest data, and why is this of interest?)
JMP
Look at repairs.jmp.
Mathematics of robustness
Goal is to provide framework to discuss properties of estimators.
L estimators
- Ranking the data (called Y(1), ...
- L estimator is sum of weights of ranked data
- easy example: sample average is an L estimator
- harder example: trimmed mean is a L estimator (trim everything
and you have the median)
- IQR is an L estimate of scale
M estimators
- objective function = sum rho(yi)
- phi is derivative of rho
- Newton says the solution is: sum phi = 0
- Example: rho = x2, phi = 2x, solution is average
- Example: rho = |x| phi = sign(x), solution is median
Invariance
Shift invariance: T(y + c) = T(y) + c
scale invariance: T(ay) = aT(y)
Obvious for L esitimators. Not always true for M estimators.
Clasic m-estimator
The clasic m-estimator is the biweight;
phi(u) = u(1 - u2)2 (for |u| < 1)
Sensitivity and influence
sensitivity is the effect of adding one observation to a small
dataset.
Influence curve is n times the effect of adding one observation to a
large number of observations.
Claim: The influence curve is proportional to the phi-function
Properties of estimators
- efficiency
- breakdown (how much bad data can be included without
arbitarilly blowing up the estimator)
- gross error sensitivity (max of influence curve)
Last modified: Tue Jan 23 08:50:00 2001