CompartiMOSS - 50 Numeros
El día de ayer la revista CompartiMOSS ha cumplido su numero 50. Para esto los
Before writing the next post about Algorithms, I thought it was important to talk first about feature normalization, as it will be relevant in almost all algorithms moving forward.
Some of the algorithms apply a penalty, for example in Linear Regression Ridge the L2 penalty is a sum of squared of all the coefficients of the formula. If your input features are in different scales for example house price is in millions of dollars, but number of years since it was built its only in the range of 1-50 years, this will have a huge impact on the L2 penalty.
With feature normalization what we do is to take the features and normalize them to the same scale, so when the model is created using the .fit method, they are all on the same scale. So transforming the input features, means the ridge penalty is in some sense applied more fairly to all features without unduly weighting some more than others, just because of the difference in scales.
You will see as we proceed that feature normalization is important to perform for a number of different learning algorithms, beyond just regularized regression. This includes K_Nearest neighbors, Support Vector Machines, Neural Networks and others.
In the example below we are going to apply a widely used form of feature normalizacion called MinMax Scaling, this will compute the min and max values for each feature on the training data, and then apply the transformation for each feature.
In the example below I am using a water profile dataset which I have stored in my Azure Blob Storage, this dataset has several numeric features like: ph ,Hardness, Solids, Chloramines, Sulfate,etc.
As you can see with the code below, all features have a very different range of values, by scaling them to the same range of values, some algorithms will make a better model.
from azureml.core import Workspace, Dataset
subscription_id = 'myid'
resource_group = 'mlplayground'
workspace_name = 'mlplayground'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='WaterQuality')
df = dataset.to_pandas_dataframe()
print(df)
ph Hardness Solids Chloramines Sulfate \
0 NaN 204.890455 20791.318981 7.300212 368.516441
1 3.716080 129.422921 18630.057858 6.635246 NaN
2 8.099124 224.236259 19909.541732 9.275884 NaN
3 8.316766 214.373394 22018.417441 8.059332 356.886136
4 9.092223 181.101509 17978.986339 6.546600 310.135738
... ... ... ... ... ...
3271 4.668102 193.681735 47580.991603 7.166639 359.948574
3272 7.808856 193.553212 17329.802160 8.061362 NaN
3273 9.419510 175.762646 33155.578218 7.350233 NaN
3274 5.126763 230.603758 11983.869376 6.303357 NaN
3275 7.874671 195.102299 17404.177061 7.509306 NaN
Conductivity Organic_carbon Trihalomethanes Turbidity Potability
0 564.308654 10.379783 86.990970 2.963135 0
1 592.885359 15.180013 56.329076 4.500656 0
2 418.606213 16.868637 66.420093 3.055934 0
3 363.266516 18.436524 100.341674 4.628771 0
4 398.410813 11.558279 31.997993 4.075075 0
... ... ... ... ... ...
3271 526.424171 13.894419 66.687695 4.435821 1
3272 392.449580 19.903225 NaN 2.798243 1
3273 432.044783 11.039070 69.845400 3.298875 1
3274 402.883113 11.168946 77.488213 4.708658 1
3275 327.459760 16.140368 78.698446 2.309149 1
Now, the fun part, how do we use the MinMax Scaler?
There are several ways, but I prefer to have the results of the scaler back to the same dataframe, so I assign the transformation back to the same dataframe columns, in this way at least visually it will be easier for the reader.
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
scaler = MinMaxScaler()
df[["ph", "Hardness","Solids","Chloramines","Sulfate", "Conductivity","Organic_carbon", "Trihalomethanes","Turbidity"]] = scaler.fit_transform(df[["ph", "Hardness","Solids","Chloramines","Sulfate", "Conductivity","Organic_carbon", "Trihalomethanes","Turbidity"]])
print(df)
The result is self explanatory, now all the numbers are in the same range value. And we can continue in our journey.
ph Hardness Solids Chloramines Sulfate Conductivity \
0 NaN 0.571139 0.336096 0.543891 0.680385 0.669439
1 0.265434 0.297400 0.300611 0.491839 NaN 0.719411
2 0.578509 0.641311 0.321619 0.698543 NaN 0.414652
3 0.594055 0.605536 0.356244 0.603314 0.647347 0.317880
4 0.649445 0.484851 0.289922 0.484900 0.514545 0.379337
... ... ... ... ... ... ...
3271 0.333436 0.530482 0.775947 0.533436 0.656047 0.603192
3272 0.557775 0.530016 0.279263 0.603473 NaN 0.368912
3273 0.672822 0.465486 0.539101 0.547807 NaN 0.438152
3274 0.366197 0.664407 0.191490 0.465860 NaN 0.387157
3275 0.562477 0.535635 0.280484 0.560259 NaN 0.255266
Organic_carbon Trihalomethanes Turbidity Potability
0 0.313402 0.699753 0.286091 0
1 0.497319 0.450999 0.576793 0
2 0.562017 0.532866 0.303637 0
3 0.622089 0.808065 0.601015 0
4 0.358555 0.253606 0.496327 0
... ... ... ... ...
3271 0.448062 0.535037 0.564534 1
3272 0.678284 NaN 0.254915 1
3273 0.338662 0.560655 0.349570 1
3274 0.343638 0.622659 0.616120 1
3275 0.534114 0.632478 0.162441 1
If you don't apply the same scaling to training and test sets, you'll end up with more or less random data skew, which will invalidate your results. If you prepare the scaler or other normalization method by showing it the test data instead of the training data, this leads to a phenomenon called Data Leakage, where the training phase has information that is leaked from the test set.
One downside to performing feature normalization is that the resulting model and the transformed features may be harder to interpret.
That's all folks for now, I hope this clarify what feature normalization is.