⬡ Hub
Skip to content

Seaborn: Statistical Data Visualization

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Seaborn is particularly adept at visualizing the relationships between multiple variables.

Key Features:

  • High-level Interface: Simplifies the creation of complex plots compared to raw Matplotlib.
  • Built-in Themes: Comes with attractive default styles and color palettes to make your plots look great.
  • Statistical Plotting: Focuses on visualizing statistical relationships, distributions, and categorical data.
  • Integrates with Pandas: Works seamlessly with Pandas DataFrames.
  • Specialized Plots: Offers specialized plots for visualizing linear regression models, matrices of data, and more.

Getting Started: Installation

You can install Seaborn using pip or conda. It requires Matplotlib and Pandas as dependencies, so make sure they are installed.

Using pip:

pip install seaborn

Using conda:

conda install seaborn

Basic Concepts: Enhancing Matplotlib

Seaborn essentially builds on Matplotlib, providing a more convenient syntax for common statistical plots and better default aesthetics.

Example: Enhancing a Scatter Plot

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Load a built-in dataset for demonstration
iris = sns.load_dataset('iris')

# Original Matplotlib scatter plot (without Seaborn styling)
plt.figure(figsize=(6, 4))
plt.scatter(iris['sepal_length'], iris['sepal_width'])
plt.title('Iris Sepal Length vs Width (Matplotlib)')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()

# Seaborn scatter plot (enhanced aesthetics and functionality)
plt.figure(figsize=(6, 4))
sns.scatterplot(x='sepal_length', y='sepal_width', hue='species', data=iris)
plt.title('Iris Sepal Length vs Width (Seaborn)')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()

Common Seaborn Plots

Distribution Plots (displot, histplot, kdeplot)

import matplotlib.pyplot as plt
import seaborn as sns

# Load a built-in dataset
tips = sns.load_dataset('tips')

# Histogram with KDE
sns.histplot(tips['total_bill'], kde=True)
plt.title('Distribution of Total Bill')
plt.show()

# Kernel Density Estimate plot
sns.kdeplot(tips['total_bill'], fill=True)
plt.title('KDE of Total Bill')
plt.show()

# Displot (combines histogram, KDE, and rug plot)
sns.displot(tips['total_bill'], kind='hist')
plt.title('Displot of Total Bill')
plt.show()

Relational Plots (relplot, scatterplot, lineplot)

import matplotlib.pyplot as plt
import seaborn as sns

# Load a built-in dataset
fmri = sns.load_dataset('fmri')

# Line plot showing relationship over time
sns.lineplot(x='timepoint', y='signal', hue='event', data=fmri)
plt.title('fMRI Signal over Time')
plt.show()

Categorical Plots (catplot, boxplot, violinplot, swarmplot)

import matplotlib.pyplot as plt
import seaborn as sns

# Load a built-in dataset
tips = sns.load_dataset('tips')

# Box plot
sns.boxplot(x='day', y='total_bill', data=tips)
plt.title('Total Bill by Day (Box Plot)')
plt.show()

# Violin plot
sns.violinplot(x='day', y='total_bill', data=tips)
plt.title('Total Bill by Day (Violin Plot)')
plt.show()

Matrix Plots (heatmap, clustermap)

import matplotlib.pyplot as plt
import seaborn as sns

# Create a correlation matrix
flights = sns.load_dataset('flights')
flights_pivot = flights.pivot_table(index='month', columns='year', values='passengers')

# Heatmap
sns.heatmap(flights_pivot, annot=True, fmt='d', cmap='YlGnBu')
plt.title('Passengers by Month and Year (Heatmap)')
plt.show()

Further Topics:

  • Customizing Seaborn Plots
  • Facet Grids for Multi-variate Analysis
  • Regression Plots
  • Pair Plots
  • Plotting with different themes and palettes

This document provides a basic introduction to Seaborn. More detailed topics, advanced plotting techniques, and practical examples will be covered in subsequent files.