Seaborn: Distribution Plots
Seaborn is built for statistical data visualization, and a crucial aspect of understanding data is examining its distributions. Seaborn provides several powerful functions to visualize univariate (single variable) and bivariate (two variables) distributions more easily and attractively than with Matplotlib alone.
1. Histograms (histplot)
histplot combines the functionality of Matplotlib's hist with enhanced aesthetics and statistical insights. It shows the distribution of a single numerical variable.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
# Load a built-in dataset for demonstration
tips = sns.load_dataset('tips')
plt.figure(figsize=(8, 5))
sns.histplot(data=tips, x='total_bill', kde=True, color='skyblue', edgecolor='black', bins=15)
plt.title('Distribution of Total Bill with KDE')
plt.xlabel('Total Bill ($)')
plt.ylabel('Count')
plt.show()
# Histograms for multiple variables (or by category)
plt.figure(figsize=(8, 5))
sns.histplot(data=tips, x='total_bill', hue='sex', multiple='stack', palette='pastel', bins=15)
plt.title('Total Bill Distribution by Gender')
plt.xlabel('Total Bill ($)')
plt.ylabel('Count')
plt.show()
2. Kernel Density Estimate Plots (kdeplot)
KDE plots represent the probability density function of a continuous variable. They are useful for visualizing the shape of a distribution without being tied to specific bin sizes, unlike histograms.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
tips = sns.load_dataset('tips')
plt.figure(figsize=(8, 5))
sns.kdeplot(data=tips, x='total_bill', fill=True, color='purple', alpha=0.5, linewidth=2)
plt.title('Kernel Density Estimate of Total Bill')
plt.xlabel('Total Bill ($)')
plt.ylabel('Density')
plt.show()
# Bivariate KDE plot (2D density plot)
plt.figure(figsize=(8, 6))
sns.kdeplot(data=tips, x='total_bill', y='tip', fill=True, cmap='viridis', cbar=True)
plt.title('Bivariate KDE of Total Bill vs Tip')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()
3. Combined Distribution Plot (displot)
displot is a figure-level function that can create histograms, KDEs, or empirical cumulative distribution function (ECDF) plots. It can also easily split distributions across different facets.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
tips = sns.load_dataset('tips')
# Combined histogram and KDE using displot
sns.displot(data=tips, x='total_bill', kind='hist', kde=True, rug=True, height=5, aspect=1.2)
plt.suptitle('Distribution of Total Bill (Displot - Hist + KDE + Rug)', y=1.02)
plt.show()
# Displot with multiple categories in facets
sns.displot(data=tips, x='total_bill', col='time', row='sex', hue='smoker', kind='kde', fill=True)
plt.suptitle('Total Bill KDE by Time, Sex, and Smoker Status', y=1.02)
plt.show()
4. Empirical Cumulative Distribution Function Plots (ecdfplot)
An ECDF plot shows the proportion of observations that are less than or equal to a given value. It's an alternative to histograms and KDEs for visualizing distributions.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
tips = sns.load_dataset('tips')
plt.figure(figsize=(8, 5))
sns.ecdfplot(data=tips, x='total_bill', complementary=False, color='darkgreen')
plt.title('Empirical Cumulative Distribution Function of Total Bill')
plt.xlabel('Total Bill ($)')
plt.ylabel('Proportion')
plt.grid(True, linestyle='--', alpha=0.7)
plt.show()
# ECDF plots by category
plt.figure(figsize=(8, 5))
sns.ecdfplot(data=tips, x='total_bill', hue='smoker')
plt.title('ECDF of Total Bill by Smoker Status')
plt.xlabel('Total Bill ($)')
plt.ylabel('Proportion')
plt.show()
5. Rug Plot (rugplot)
A rug plot draws a small vertical tick mark at each observation. It's often used in conjunction with histograms or KDE plots to show the individual data points that contribute to the overall distribution.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
tips = sns.load_dataset('tips')
plt.figure(figsize=(8, 2))
sns.kdeplot(data=tips, x='total_bill', linewidth=2, color='blue')
sns.rugplot(data=tips, x='total_bill', color='red', height=0.1) # height determines the length of the ticks
plt.title('KDE with Rug Plot for Total Bill')
plt.xlabel('Total Bill ($)')
plt.ylabel('') # Hide y-label
plt.yticks([]) # Hide y-axis ticks
plt.show()
Further Topics:
jointplotfor combined bivariate and univariate distributions.pairplotfor visualizing pairwise relationships and distributions across an entire DataFrame.- Customizing plot aesthetics (styles, colors, fonts).
- Understanding different kernel density estimation parameters.
Distribution plots are fundamental for understanding the nature of your data, identifying skewness, outliers, and comparing distributions across different groups. Seaborn simplifies the creation of these insightful visualizations.