Assignment #5

This assignment builds a regression tree for the amount spent by customers who visit amazon.com and purchase an item in the video games category. ( .JMP) The following questions thus resemble those considered in class. In that example, we built a regression tree for the amount spent on toys; only the purchase type has changed. As in that example, amazon.com would like to understand the demographic characteristics of customers; in particular, it would like to know which segments of customers buy heavily and which purchase relatively small amounts. It would also like to learn whether customers coming from the various referring sites are similar, or perhaps learn whether referrals from some sites generate larger sales.

NOTE: Answer all the questions as briefly as you can. Unlike previous exercises which emphases communication skills, the purpose of this exercise is to expose you to trees.

  1. Extract the video game purchasers from the amazon_baskets.jmp file. Place these in a separate file (which should have 545 rows). From these 545, reserve a validation sample of 145 cases, leaving the other 400 for estimation. Conclude this step by showing a comparison of the amount spent on video games in the two samples; in particular, are there significant differences in spending? Should there be?
  2. Construct a balanced tree having only 3 splits. As the predictors, use all of the factors in the data file from Referring Domain through Income (continuous). Allow the software to pick the optimal split, but keep the tree balanced. Show the tree summary and offer a brief interpretation. Do the 3 splits look meaningful?
  3. Continuing from the balanced tree used in Q2, continue the splitting by allowing the software to chose both how and where to split. Continue until you believe the data no longer support further splits. For the resulting tree
    1. Show the tree diagram,
    2. Offer a brief interpretation of the meaning of this tree, noting particularly the roles of demographic factors like age, income, and education
    3. Explain how you stopped the fitting process.
  4. Are the trees described in Q2 and Q3 calibrated? Show the appropriate plots.
  5. Fit a regression model which is motivated by one of your trees above. Do you like it better or worse than the original tree.
    Last modified: Thu Dec 11 15:38:12 EST 2003