Standardization and Normalization
Feature scaling is one of the important step in data preprocessing that attempts to equalize the range of values for all the columns. Most of ML algorithms require data to be in the same range of values and non-uniform value ranges can impair the model’s ability to learn from the data. We do scale the data before feeding it into the model for training. Feature Scaling is used when algorithm uses the gradient calculation. Linear regression, Logistic regression, LDA, KNN, K means and Neural Network needs feature scaling as preprocessing.
There are various scaling techniques we use, and the two most popular techniques are Min-Max Scaling (Normalization) and Z-Score Scaling (Standardization).
Min-Max Scaling
In this approach, Features are scaled down to values in between [0,1].
Xmin is the minimum value in the feature.
Xmax is the maximum value in the feature.
Each value in the column is subtracted by the minimum value of the column and divided by the column range.
Min-Max Scaling is quite simple to implement using the panda’s min () and max () functions. However, we can import MinMaxScalar class from Sklearn’s preprocessing module to implement this technique.
From the above output, we can see that all the values are transformed in range [0,1]. KNN, K means and Artificial Neural Networks use this technique, this algorithm doesn’t make assumptions about the distribution of the data.
Z-Score Scaling (Standardization):
Standardization will scale down where the values are centered around the mean with unit standard deviation. This means that mean of the attribute become zero and distribution has unit standard deviation.
We can use the StandardScaler class of sklearn’s preprocessing module to implement z-score scaling.
We can see from above output that z-score scaling transformed each value on the data.
Algorithms like Logistic regression, Linear regression and Linear discriminant analysis uses standardization.
However, Algorithms like Decision tree, Random Forest don’t need to feature scaling. since the magnitude of the feature doesn’t have any impact on the algorithm.
Happy Learning!