Data Visualization with Pandas
Pandas provides built-in plotting capabilities through integration with Matplotlib, making it easy to create visualizations directly from DataFrames and Series. This guide covers the essential visualization techniques for data analysis.
Quick Start
Pandas plotting is built on top of Matplotlib and provides a convenient interface for creating common plot types.
import pandas as pd
import matplotlib.pyplot as plt
# Basic line plot
df = pd.DataFrame({
'x': range(10),
'y': [1, 3, 2, 5, 4, 6, 5, 7, 6, 8]
})
df.plot(x='x', y='y')
plt.show()
Common Plot Types
Line Plots
Perfect for time series and continuous data.
# Single line
df['column'].plot()
# Multiple lines
df.plot(y=['col1', 'col2', 'col3'])
# Customized line plot
df.plot(
x='date',
y='value',
kind='line',
title='My Line Plot',
xlabel='Date',
ylabel='Value',
figsize=(10, 6),
color='blue',
linewidth=2,
linestyle='--'
)
plt.show()
Bar Charts
Great for comparing categories.
# Vertical bar chart
df.plot(kind='bar', x='category', y='value')
# Horizontal bar chart
df.plot(kind='barh', x='category', y='value')
# Stacked bar chart
df.plot(kind='bar', stacked=True)
# Grouped bar chart
df.groupby('category')['value'].mean().plot(kind='bar')
plt.show()
Histograms
Visualize distributions of numerical data.
# Basic histogram
df['column'].plot(kind='hist')
# With bins
df['column'].plot(kind='hist', bins=30)
# Multiple columns
df[['col1', 'col2']].plot(kind='hist', bins=20, alpha=0.5)
# Normalized histogram
df['column'].plot(kind='hist', density=True)
plt.show()
Box Plots
Show distributions and identify outliers.
# Single box plot
df.boxplot(column='value')
# Multiple columns
df.boxplot(column=['col1', 'col2', 'col3'])
# Grouped box plot
df.boxplot(column='value', by='category')
# Horizontal box plot
df.plot(kind='box', vert=False)
plt.show()
Scatter Plots
Explore relationships between variables.
# Basic scatter plot
df.plot(kind='scatter', x='var1', y='var2')
# With size and color
df.plot(
kind='scatter',
x='var1',
y='var2',
s=df['population']/1000, # Point size
c='category', # Color by category
colormap='viridis',
alpha=0.5
)
plt.show()
Pie Charts
Show proportions of a whole.
# Basic pie chart
df['category'].value_counts().plot(kind='pie')
# Customized
df.groupby('category')['value'].sum().plot(
kind='pie',
autopct='%1.1f%%',
startangle=90,
figsize=(8, 8)
)
plt.ylabel('') # Remove y-label
plt.show()
Area Plots
Show cumulative totals or contributions over time.
# Stacked area plot
df.plot(kind='area', stacked=True, alpha=0.5)
# Unstacked area plot
df.plot(kind='area', stacked=False)
plt.show()
Time Series Visualization
Date-Based Plots
# Create time series data
dates = pd.date_range('2024-01-01', periods=100)
df = pd.DataFrame({
'date': dates,
'value': range(100)
})
df.set_index('date', inplace=True)
# Plot time series
df.plot()
# Resample and plot
df.resample('W').mean().plot()
# Multiple time series
df.plot(subplots=True, figsize=(10, 8))
plt.show()
Rolling Statistics
# Plot with rolling mean
df['value'].plot(label='Original')
df['value'].rolling(window=7).mean().plot(label='7-day MA')
plt.legend()
plt.show()
Customization
Styling Plots
# Use a style
plt.style.use('seaborn-v0_8') # or 'ggplot', 'fivethirtyeight', etc.
# Custom colors
df.plot(color=['#FF0000', '#00FF00', '#0000FF'])
# Custom figure size
df.plot(figsize=(12, 6))
# Grid
df.plot(grid=True)
# Title and labels
ax = df.plot()
ax.set_title('My Plot Title', fontsize=16)
ax.set_xlabel('X Label', fontsize=12)
ax.set_ylabel('Y Label', fontsize=12)
plt.show()
Subplots
# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
df['col1'].plot(ax=axes[0, 0], kind='line')
df['col2'].plot(ax=axes[0, 1], kind='bar')
df['col3'].plot(ax=axes[1, 0], kind='hist')
df.plot(ax=axes[1, 1], kind='scatter', x='col1', y='col2')
plt.tight_layout()
plt.show()
Saving Plots
# Save to file
ax = df.plot()
plt.savefig('my_plot.png', dpi=300, bbox_inches='tight')
# Different formats
plt.savefig('plot.pdf')
plt.savefig('plot.svg')
Advanced Visualization
Heatmaps
# Correlation heatmap
import seaborn as sns
corr = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
plt.show()
Pivot Table Visualization
# Create and plot pivot table
pivot = df.pivot_table(
values='sales',
index='month',
columns='category',
aggfunc='sum'
)
pivot.plot(kind='bar', stacked=True)
plt.show()
Multiple Y-Axes
# Create figure and primary axis
ax1 = df.plot(y='col1', color='blue', legend=False)
ax1.set_ylabel('Column 1', color='blue')
# Create secondary y-axis
ax2 = ax1.twinx()
df['col2'].plot(ax=ax2, color='red', legend=False)
ax2.set_ylabel('Column 2', color='red')
plt.show()
Best Practices
1. Choose the Right Chart Type
- Line: Time series, trends
- Bar: Comparisons between categories
- Scatter: Relationships between variables
- Histogram: Distribution of single variable
- Box: Distribution with outliers
- Pie: Parts of a whole (use sparingly)
2. Label Everything
ax = df.plot()
ax.set_title('Clear, Descriptive Title')
ax.set_xlabel('X-Axis Label with Units')
ax.set_ylabel('Y-Axis Label with Units')
ax.legend(['Series 1', 'Series 2'])
plt.show()
3. Use Appropriate Colors
# Colorblind-friendly palettes
import matplotlib.cm as cm
df.plot(colormap='viridis') # Good for sequential data
df.plot(colormap='RdYlBu') # Good for diverging data
4. Control Figure Size
# Make it readable
df.plot(figsize=(12, 6)) # Width x Height in inches
Common Patterns
Comparing Multiple DataFrames
fig, ax = plt.subplots(figsize=(10, 6))
df1.plot(ax=ax, label='Dataset 1')
df2.plot(ax=ax, label='Dataset 2')
ax.legend()
plt.show()
Annotating Plots
ax = df.plot()
ax.annotate(
'Peak Value',
xy=(10, 100),
xytext=(12, 110),
arrowprops=dict(arrowstyle='->')
)
plt.show()
Creating Publication-Ready Figures
plt.style.use('seaborn-v0_8-paper')
fig, ax = plt.subplots(figsize=(8, 6))
df.plot(ax=ax, linewidth=2)
ax.set_title('Publication Title', fontsize=14, fontweight='bold')
ax.set_xlabel('X Label', fontsize=12)
ax.set_ylabel('Y Label', fontsize=12)
ax.grid(True, alpha=0.3)
ax.legend(frameon=True, fancybox=True)
plt.tight_layout()
plt.savefig('publication_figure.png', dpi=300, bbox_inches='tight')
plt.show()
Integration with Seaborn
For more advanced statistical visualizations, combine Pandas with Seaborn:
import seaborn as sns
# Set style
sns.set_style('whitegrid')
# Use Seaborn with Pandas DataFrames
sns.scatterplot(data=df, x='var1', y='var2', hue='category')
sns.boxplot(data=df, x='category', y='value')
sns.violinplot(data=df, x='category', y='value')
plt.show()
Next Steps
- Statistical Operations - Analyze your data before plotting
- GroupBy Operations - Aggregate data for visualization
- Time Series Analysis - Advanced time-based plots
