In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable.
Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample.
Kernel density estimation is a really useful statistical tool with an intimidating name. Often shortened to KDE, it’s a technique that let’s you create a smooth curve given a set of data.
This can be useful if you want to visualize just the “shape” of some data, as a kind of continuous replacement for the discrete histogram.
How does it work?
The KDE algorithm takes a parameter, bandwidth, that affects how “smooth” the resulting curve is.
Changing the bandwidth changes the shape of the kernel: a lower bandwidth means only points very close to the current position are given any weight, which leads to the estimate looking squiggly; a higher bandwidth means a shallow kernel where distant points can contribute.
What is a kernel, in non-parametric statistics?
In non-parametric statistics, a kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables‘ density functions, or in kernel regression to estimate the conditional expectation of a random variable.
(https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1m)XWo6uco/wiki/Kernel_(statistics).html
How to plot KDE using Pandas Series?
Let’s see..
Series.plot.kde
(bw_method=None, ind=None, **kwds)
Parameters:
- bw_method
The method used to calculate the estimator bandwidth. This can be ‘scott’, ‘silverman’, a scalar constant or a callable. If None (default), ‘scott’ is used.
- ind
Evaluation points for the estimated PDF :
If None (default), 1000 equally spaced points are used.
If ind is a NumPy array, the KDE is evaluated at the points passed. If ind is an integer, ind number of equally spaced points are used.
Returns :
axes : matplotlib.axes.Axes or numpy.ndarray of them
Note - A scalar bandwidth can be specified. Using a small bandwidth value can lead to over-fitting, while using a large bandwidth value may result in under-fitting:
Happy to help 🙂 You can reach out to me at harjotsaini69@gmail.com for any questions.