In this piece, my goal is to suggest the mathematical background necessary to build products or conduct academic research in machine learning. These suggestions are derived from conversations with mac
Outliers are the data points that are distinctly separate from the overall pattern of the rest of the data.
The StandardScaler assumes your data is normally distributed within each feature and will scale them such that the distribution is now centred around 0, with a standard deviation of 1.
A Cambridge University course with lecture notes, providing an Introduction to string theory and conformal field theory.
The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimat
Great framework to address Machine Learning Problems: there is no template for solving a data science problem. The roadmap changes with every new dataset and new problem. But we do see similar steps in many different projects.
A Cambridge University course with lecture notes, providing an Introduction to string theory and conformal field theory.
Feature scaling is a method used to standardize the range of independent variables or features of data.
This paper aims to clarify how and why data are normalized or standardized, these two process are used in the data preprocessing stage in which the data is prepared to be processed later by one of the data mining and machine learning techniques.
When performing classification you often want not only to predict the class label, but also obtain a probability of the respective label. This probability gives you some kind of confidence on the prediction. Some models can give you poor estimates of the class probabilities and some even do not support probability prediction. The calibration module allows you to better calibrate the probabilities.
Very complex, but... there is a good explanation about data entropy. Roughly, it measures the loss of information when you encode/compress an information. The main goal is to minimize the information loss when simplifying problems. KL divergence measusures the loss. And... you can use this to evaluate unsupervised learning alghorithms.
Kullback–Leibler divergence is a very useful way to measure the difference between two probability distributions. In this post we'll go over a simple example to help you better grasp this interesting