STAT 541: JMP and L and M estimators

# Statistics 541: JMP and L and M estimators

## Admistrivia

• 1st homework:
• Find a data set with between 5 and 20 columns of comparable information.
• Read the data into JMP and compute the statistics computed on page 10 of the handout. Look at comparison box plots for all your data. Be sure to put them on the same scale!
• Read the data into Splus and compute the statistics for all the columns in your dataset. (See Splus code on the pages between 17 and 18.) The idea is to do this somewhat automatically. Either by creating a function. Or by using your editor to make a command list. You should be able to re-generate your output after a small change is made in the data without having to click lots and lots of buttons.
• Comment on what you found out scientifically. (I.e. which column has the highest data, and why is this of interest?)

## JMP

Look at repairs.jmp.

## Mathematics of robustness

Goal is to provide framework to discuss properties of estimators.

### L estimators

• Ranking the data (called Y(1), ...
• L estimator is sum of weights of ranked data
• easy example: sample average is an L estimator
• harder example: trimmed mean is a L estimator (trim everything and you have the median)
• IQR is an L estimate of scale

### M estimators

• objective function = sum rho(yi)
• phi is derivative of rho
• Newton says the solution is: sum phi = 0
• Example: rho = x2, phi = 2x, solution is average
• Example: rho = |x| phi = sign(x), solution is median

### Invariance

Shift invariance: T(y + c) = T(y) + c

scale invariance: T(ay) = aT(y)

Obvious for L esitimators. Not always true for M estimators.

### Clasic m-estimator

The clasic m-estimator is the biweight;

phi(u) = u(1 - u2)2 (for |u| < 1)

### Sensitivity and influence

sensitivity is the effect of adding one observation to a small dataset.

Influence curve is n times the effect of adding one observation to a large number of observations.

Claim: The influence curve is proportional to the phi-function

Properties of estimators

• efficiency
• breakdown (how much bad data can be included without arbitarilly blowing up the estimator)
• gross error sensitivity (max of influence curve)

Last modified: Tue Jan 23 08:50:00 2001