Recall, for linear regression the response variable \(Y\) is quantitative and generally thought of as continuous
How do we handle response variables that are qualitative (i.e., categorical)?
Predicting a qualitative response for an observation can be referred to as classifying that observation, since it involves assigning the observation to a category, or class
Often the methods used for classification first predict the probability that the observation belongs to each of the categories of a qualitative variable, as the basis for making the classification
Widely used classifiers:
Qualitative variables take values in an unordered set \(C\)
eye color \(\in\) {brown, blue, green}
color \(\in\) {blue, red, green, yellow}
Given a feature vector \(X\) and a qualitative response \(Y\) taking values in the set \(C\), the classification task is to build a function \(C(X)\) that takes as a input the feature vector \(X\) and predicts its value for \(Y\)
Often we are more interested in estimating the probabilities that \(X\) belongs to each category in \(C\)
We are interested in predicting whether an individual will default on his or her credit card payment, on the basis of annual income and monthly credit card balance
In the left-hand panel, we have plotted annual income and monthly credit card balance for a subset of 10,000 individuals
In the center and right-hand panels, two pairs of boxplots are shown. The first shows the distribution of balance split by the binary default variable; the second is a similar plot for income
Suppose for the example above that we code the response as follows:
\(Y = \left\{\begin{matrix} 0 \\ 1 \end{matrix}\right.\)
Where 0 equals “No”
And 1 equals “Yes”
Can we simply perform linear regression of Y on X and classify as \(Yes\) if \(\hat{Y}\) > 0.5
In this case of a binary outcome, linear regression does a decent job as a classifier and is equivalent to Linear Discriminant Analysis (discussed below)
Since in the population \(E(Y|X = x)= Pr(Y = 1 | X = X)\), we might think that regression is perfect for this task
However, linear regression might produce probabilities that are less than zero or larger than 1