What does .describe do?
It generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN
values.
Parameters:
- Percentiles :
The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75]
, which returns the 25th, 50th, and 75th percentiles.
Now, some of you might wonder how are percentiles calculated? Let’s see :
It is calculated at the back-end by the ‘interpolation’ parameter of numpy function named numpy.percentile.
For eg. let’s say we have this output.
One
count 4.000000
mean 7.000000
std 2.581989
min 4.000000
25% 5.500000
50% 7.000000
75% 8.500000
max 10.000000
25 % is calculated as : 4 + (10-4)*(1/4) = 5.5
75 % is calculated as : 4 + (10-4)* (3/4) = 8.5
Bonus Info –
The 25th percentile is also called the first quartile.
The 50th percentile is generally the median (if you’re using the third definition—see below).
The 75th percentile is also called the third quartile.
The difference between the third and first quartiles is the interquartile range.
- include
‘all’, list-like of dtypes or None (default)
all : All columns of the input will be included in the output.
a list-like of dtypes : Limits the results to the provided data types.
None (default) : The result will include all numeric columns.
- exclude:
list-like of dtypes or None (default)
A list-like of dtypes : Excludes the provided data types from the result.
None (default) : The result will exclude nothing.
Returns:
Series or DataFrame
Summary statistics of the Series or Dataframe provided.
Note-
For object data (e.g. strings or timestamps), the result’s index will include count
, unique
, top
, and freq
.
The top
is the most common value.
The freq
is the most common value’s frequency. Timestamps also include the first
and last
items.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html
Happy to help 🙂 For any queries, you can reach out to me at harjotsaini69@gmail.com.
See you in the next blog with another data science topic 🙂