Loss Functions Part-2
This is a continuation of this Loss Functions used for Classification As we all know that for regression problems we use Least square error as the loss function. Through this, we get a convex loss function and we can optimize by finding its global minimal. But when it comes to logistic regression the concept is completely changed, the Least Square error will give us a non-convex loss function with, more than one local minima. Here we get a wavy curve due to the non-linear sigmoid function used in the logistic regression hypothesis so it has multiple local minima which are bad for gradient Descent which is used to find minima. Cross-Entropy Loss This is the most common setting for classification problems. Cross-entropy loss increases as the predicted probability diverge from the actual label. An important aspect of this is that cross-entropy loss penalizes heavily the predictions that are confident but wrong . We can’t give equal weight to all false resul...