STAT 541: JMP and L and M estimators

Statistics 541: JMP and L and M estimators

Admistrivia

1st homework:
- Find a data set with between 5 and 20 columns of comparable information.
- Read the data into JMP and compute the statistics computed on page 10 of the handout. Look at comparison box plots for all your data. Be sure to put them on the same scale!
- Read the data into Splus and compute the statistics for all the columns in your dataset. (See Splus code on the pages between 17 and 18.) The idea is to do this somewhat automatically. Either by creating a function. Or by using your editor to make a command list. You should be able to re-generate your output after a small change is made in the data without having to click lots and lots of buttons.
- Comment on what you found out scientifically. (I.e. which column has the highest data, and why is this of interest?)

Look at repairs.jmp.

Goal is to provide framework to discuss properties of estimators.

Ranking the data (called Y₍₁₎, ...
L estimator is sum of weights of ranked data
easy example: sample average is an L estimator
harder example: trimmed mean is a L estimator (trim everything and you have the median)
IQR is an L estimate of scale

Shift invariance: T(y + c) = T(y) + c

scale invariance: T(ay) = aT(y)

Obvious for L esitimators. Not always true for M estimators.

The clasic m-estimator is the biweight;

phi(u) = u(1 - u²)² (for |u| < 1)

sensitivity is the effect of adding one observation to a small dataset.

Influence curve is n times the effect of adding one observation to a large number of observations.

Claim: The influence curve is proportional to the phi-function

Properties of estimators

efficiency
breakdown (how much bad data can be included without arbitarilly blowing up the estimator)
gross error sensitivity (max of influence curve)

Last modified: Tue Jan 23 08:50:00 2001