STAT 541: Bankruptcy

Statistics 541: Bankruptcy

Extrapolation

• lots of data from x=1..3, little data for x=4..6, but interested in x=4..6
• Should we use all the data? Or just the data from 4..6?
• Using all the data assumes the model holds everwhere
• Borrows strength. Generates a good null distribution.
• Maybe then: extropolate 1..3 to generate y-hat-linear. Then regression on Y-y-hat-linear. Advantage, uses all data, but doesn't use it very much.
• Bad idea: simply use all the data. Basically the same as only using the data from 1..3

Fit on what your criterion is going to be

• Point made by George Easton: robust estimators should be evaluated with robust losses
• Thinking about how you would validate an estimator helps focus your mind on what you are trying to accomplish
• Efficiency says to fit based on statistical loss, not economic loss: This requires the model to be correc so it is taking a risk and isn't robust
• Often a good idea to fit based on the criterion you are going to evalute with--this is a robust technique
• Lots of fun research on it. (see calibration and no-regret) Fun at least to me!

Example: Bankruptcy

• the problem:
• forecast bankruptcies based on things credit card companies know.
• not an economic model
• prediction is goal and not estimating the parameters
• millions of person-months of observations
• 1000s of bankruptcies
• 100 basic variables --> 67000 interactions, and dummies for missing values
• Economic loss
• Ideally, classification, with abolute error loss
• Most interested in classifying people close to 5% chance of bankruptcy than people close to .001% chance.
• closer to squared error than to weighted error
• Most people don't go bankrupt
• Most (as in 90%) people have a forecast of .001 or less
• weighting the heavily would lead to an extrapolation error
• Our criterion then is quadratic loss, better would be weighted quadratic loss weighting by importance of the person.
• Searching for independence
• repeated measurements on each person