"""What does the Normalization & Standardization of Data basically means?"""

Normalization or standardization is defined as the process of rescaling original data without changing its behavior or nature. They are both part of preprocessing.

What is Normalization?

Normalization is one of the technique used in Data pre- procession. We define new boundary (mostly 0,1) and convert data accordingly. This technique is useful in classification algorithms involving neural network or distance based algorithm (e.g. KNN, K-means).It is also known as Min-Max scaling.

Why is normalization important?

Let’s understand it by an example. Suppose we are making some predictive model using dataset that contains the net worth of citizens of a country. For this dataset we find that there is large variation in data. If we feed this data to train any model, then it may generate some undesirable results. So, to get rid of that we opt normalization.

What is Standardization?

Data standardization is the process of rescaling one or more attributes so that they have a mean value of 0 and a standard deviation of 1. Standardization assumes that your data has a Gaussian (bell curve) distribution.

Which is Better Normalization or standardization ?

Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbors and artificial neural networks. Standardization assumes that your data has a Gaussian (bell curve) distribution.

How do we perform Normalization?

There are following ways to perform Normalization.

Zscore: Converts all values to a z-score.

The values in the column are transformed using the following formula:

normalization using z-scores

x is an attribute of data column.

We can see the the significance of normalizing with the Z-transform in following example.

Decimal Normalization: If Vi value of attribute A, then normalized value Ui is given as,

Let the input data is: -10, 201, 301, -401, 501, 601, 701
To normalize the above data,
Step 1: Maximum absolute value in given data(m): 701
Step 2: Divide the given data by 1000 (i.e. j=3)
Result: The normalized data is: -0.01, 0.201, 0.301, -0.401, 0.501, 0.601, 0.701

Min-Max Normalization:

In this technique of data normalization, linear transformation is performed on the original data. Minimum and maximum value from data is fetched and each value is replaced according to the following formula.

v'= (( (v - min(A)/max(A) - min (A)) (new_max(A) - new_min (A)) + new_min(A)

Where A is the attribute data,
Min(A), Max(A) are the minimum and maximum absolute value of A respectively.
v’ is the new value of each entry in data.
v is the old value of each entry in data.
new_max(A), new_min(A) is the max and min value of the range(i.e. boundary value of range required) respectively.

We can see the the significance of normalizing with min max scaler in following example.

Precautions before normalizing data columns:

Choose the numeric columns to normalize: If you don't choose individual columns, by default all numeric type columns in the input are included, and the same normalization process is applied to all selected columns. This can lead to strange results if you include numeric columns that shouldn't be normalized! Always check the columns carefully.

Use 0 for constant columns when checked: Select this option when any numeric column contains a single unchanging value. This ensures that such columns are not used in normalization operations.

Search This Blog

Its All About Analytics.....

"""What does the Normalization & Standardization of Data basically means?"""

Precautions before normalizing data columns:

Comments

Post a Comment

Popular posts from this blog

"""Don't get confused between Linear Regression & Logistic Regression.""""

"""Machine Learning :Supervised Learning Vs Unsupervised Learning"""