STAT 541: Box plots

# Statistics 541: Box plots

• Homeworks
• one project
• one final exam
• Software:
• JMP IN for most of the stuff
• Splus for more difficult stuff
• You may use other software, but I won't be able to help you with it
• Follow links from my web page: http://diskworld.wharton.upenn.edu (Hint: --> teaching --> stat 541)

## Course goals

• Real data: Learn statistics useful for analysing real data. This means that you have to identify the real problem and not the part of the problem most easilly modelled. This requires knowledge of the underlieing science.
• Graphics: The primary tool is to use graphical methods. Linked plots. Dynamic graphics. The RIGHT simple plot goes a long ways.
• Standard errors: In reporting information the thing that seperates statistics from the animals, is the concept of error of an estimator. Before an estimator is useful, a standard error for it must be computed.

## Today's material: EDA (Exploratory Data Analysis) and box plots

What is EDA?
• (For further readings see Mosteller and Tukey chapter 3)
• population/sample: --> probability <-- statistics
• Models in theoretical stat can be very complex
• Models used in the real world are often

DATA = SIGNAL + NOISE

• Statistics can be ANY function of the data
• Real world statistics again are very simple
• Goals:
• resistence: (small changes in data don't change statistic very much)
• Robustness: (small changes in the model don't change the properties of statistics very much)
What are Box plots?
• What is the distribution of the sample?
• There is extensive theory on how to do histograms
• Box plot is much simplier
• 25%, 50%, 75% of data represent box
• For normal these are at mu +/- .67 sigma
• Inner Fence = lower - 1.5 IQR
• Outer fence = lower - 3 IQR