Given some data we want to be able to decide for each observation to which of the possible classes it belongs.

Think, as an example, at the problem of email classification. We want to separate the spam emails from the “ham” emails. In this case is the text of the email and .

We call true state of nature the class to which is more probable to belong.

So, in case we have two classes (), decide that:

  • if true state of nature
  • if true state of nature

Therefore, whenever we observe a particular , the probability of error is:

  • if we classify as
  • if we classify as

In order to minimize the error we should classify as belonging to the true state of nature (the class with the greater probability).

Loss function

A more general way to do classification is to use a loss function. This approach allows to take actions upon each observation and not only to decide the true state of nature. So we can refuse to make a decision (the action of not deciding a class).

Definition

Let be the set of states of nature (or categories). Let be the set of possible actions.

The loss incurred for taking the action when the state of nature is is:

Resuming the previous example, we can specify a loss function associated to a matrix that specify for each possible couple of and the relative loss value:

spam ham
spam 0 100
ham 1 0
  • The loss of choosing spam when the state is ham is (classifing a “good” email as spam is a very wrong action)
  • Classifing a spam as ham is a wrong action, but not so much.
  • Classifing correctly an email has zero loss. .

Risk or Expected Loss

To decide the best action to make we want to minimize the expected loss carried out by our choice or, to put in other words, we want to choose the less risky action.

Definition

The risk of choosing the action on the observation is:

We want to find the value that minimizes the risk:

In the email classification problem we calculate now the expected losses in the case we decide for spam and in the other case:

  • ,
  • ,

We can say nothing about which one is the less risky, because we don’t know the value of , and we don’t know (the true value).

To estimate the value of we can use several approaches as the MLE or the MAP.