# Statistics 102H Homework and Projects

## Homeworks to be collected

• First homework
• First project: Multiple regression.
• Work alone
• Due shortly after spring break
• Read the end of the instructions to see how to download data (Your data is setup by first name not by SS#)
• Second project: Work in a group on this project.

## Exercise mentioned in class

• If you didn't look at examples I mentioned last time DO SO!
• Revisiting Display ft.
• We look two transformation: log(x) and 1/X
• Make both of these transformations using the formula editor in JMP
• Now plot Y vs the new variable log(x)
• Now plot Y vs the new variable 1/x
• Which fits better?
• Which graph seems more useful for economics? the orginal graph given in the book, or the ones you just created?
• Which graph seems more useful for Statistics? In other words, which can you check your assumptions better using?
• You should now have looked at all the examples (except cell phones) upto page 95.
• Re-read introduction to class 2 (p 39 - 52).

### Homework from predictions intervals class

• Compute the solution to the least squares equation I presented in class today. In other words, find the alpha and beta that minimize:

Sum(Yi - alpha - beta Xi)2

• You can start looking at the first short assignment.

### Homework from confidence interval class

• Be sure you understand all the pictures in the first three chapters of the book. The primary issues is to understand WHY they were generated, not how.
• Read blurb on page 109. Look at the car89 data using simple regression. Don't follow the book--just play around with the data.
• Look at the stat-lib data archive. Find a data set that sounds interesting--read it into JMP and analyse it. Examples that might be of interest:
• sleep (question: does size affect hours of sleep? Does predation affect hours of sleep?)
• world series (question: Is there a home field advantage? Has it increase or decrased over time?)
• places looks at various properties of places: crime, transportation, education. If you don't want to unpack it, I"ve done so here: (.DAT, .documentation, .KEY)
Notice: you will probably have to use an editor and maybe excel to read these into JMP. Be sure to page to the bottom of the file in case there are suprises! (Sometimes there is a different data set at the bottom of the file.)

### Homework from Multiple regression

• You should now be able to finish the first assignment. I have edited it a bit from when I first put it up.
• look at am stat archive. Pick a dataset that sounds interesting (say AAUP, car, or baseball). Pick an interesting Y variable and do a multiple regression. Match all the pictures we generated in class 4. You should be able to understand both how to interpret each graph and why it is important to generate it.

### Homework from colinearity class

• Look at the relationship between price and units
• What do you want to use as a Y variable?
• Average cost?
• Total cost? (Ave_cost * units)
• Log(total cost)
• log(average cost)?
• Look at a correlation matrix. Which variables are highly colinear with other variables?

### Homework from Package handing and indicator class

• Find the categorical variables in your project
• Do a cross tabs to see how colinear they are with eachother
• Look at a variety of relationships between the categorical variables and the continuous variables.
• color code by one of the categorical variables. Make different symbols for another catogorical variable. Now look at your scatter plot matrix. See any interesting patterns?
• Use a formula to convert a two-category variable into an indicator function. Now it can be summarized by a correlation with other variables. Why doesn't this work if there are three categories?

### Homework from Categorical variables

• How are rush and detail related to the average cost?
• How are they related to each other?
• Suppose that there is a cost for rush, but there isn't any cost for detail. What would a simple regression of average cost on detail look like? (Your data may actually show this, or it might not show it. This doesn't matter. Figure out what the relationship would be anyway!)

### Homework from Executive comp

• Work through the timing production runs case: page 189.
• Notice, there are 4 or 5 managers in your project.
• Color code by manager
• Are there any obvious patterns when you look at your scatter plot matrix?
• Are they significant when you put them in your project? Which one is best, which one is worst?
• Are big jobs better to do by some manager instead of some other manager? (Hint: do the same interaction that is done in the timing example.)
• Well you are at it, notice that there is a strong relationship between manager and plant. Why is this the case? Does this change your feelings about how to compare the managers?
• If you want to look at a sample midterm here is an exam from 621. It is close in style. But they covered a bit more material than we will end up covering. So some of the questions won't make sense. (Questions 36-40 are on material we haven't covered. You might be able to do them, but don't worry if you can't.)

### Homework from FPP

• Read chapters 1 and 2 of FPP(A)
• Do the first part of your Second project.
• Read through the review exercises (page 24) with your project group. Discuss each question and come up with an answer.

### Homework to prep for final exam

I made up a handout (.ps) to help guide you for the final. You can get the answers to the Freedman problems from outside my door.

The answers to the MLE questions are: (I did these in my head--so I'm not positive they are correct. If you disagree, send me an email.)

• n/sum(log(x))
• (n/sum log(x))^2
• squareroot(sum(x^2)/2n)