Data Visualization with Pandas

Pandas provides built-in plotting capabilities through integration with Matplotlib, making it easy to create visualizations directly from DataFrames and Series. This guide covers the essential visualization techniques for data analysis.

Quick Start

Pandas plotting is built on top of Matplotlib and provides a convenient interface for creating common plot types.

import pandas as pd
import matplotlib.pyplot as plt

# Basic line plot
df = pd.DataFrame({
    'x': range(10),
    'y': [1, 3, 2, 5, 4, 6, 5, 7, 6, 8]
})

df.plot(x='x', y='y')
plt.show()

Common Plot Types

Line Plots

Perfect for time series and continuous data.

# Single line
df['column'].plot()

# Multiple lines
df.plot(y=['col1', 'col2', 'col3'])

# Customized line plot
df.plot(
    x='date',
    y='value',
    kind='line',
    title='My Line Plot',
    xlabel='Date',
    ylabel='Value',
    figsize=(10, 6),
    color='blue',
    linewidth=2,
    linestyle='--'
)
plt.show()

Bar Charts

Great for comparing categories.

# Vertical bar chart
df.plot(kind='bar', x='category', y='value')

# Horizontal bar chart
df.plot(kind='barh', x='category', y='value')

# Stacked bar chart
df.plot(kind='bar', stacked=True)

# Grouped bar chart
df.groupby('category')['value'].mean().plot(kind='bar')
plt.show()

Histograms

Visualize distributions of numerical data.

# Basic histogram
df['column'].plot(kind='hist')

# With bins
df['column'].plot(kind='hist', bins=30)

# Multiple columns
df[['col1', 'col2']].plot(kind='hist', bins=20, alpha=0.5)

# Normalized histogram
df['column'].plot(kind='hist', density=True)
plt.show()

Box Plots

Show distributions and identify outliers.

# Single box plot
df.boxplot(column='value')

# Multiple columns
df.boxplot(column=['col1', 'col2', 'col3'])

# Grouped box plot
df.boxplot(column='value', by='category')

# Horizontal box plot
df.plot(kind='box', vert=False)
plt.show()

Scatter Plots

Explore relationships between variables.

# Basic scatter plot
df.plot(kind='scatter', x='var1', y='var2')

# With size and color
df.plot(
    kind='scatter',
    x='var1',
    y='var2',
    s=df['population']/1000,  # Point size
    c='category',              # Color by category
    colormap='viridis',
    alpha=0.5
)
plt.show()

Pie Charts

Show proportions of a whole.

# Basic pie chart
df['category'].value_counts().plot(kind='pie')

# Customized
df.groupby('category')['value'].sum().plot(
    kind='pie',
    autopct='%1.1f%%',
    startangle=90,
    figsize=(8, 8)
)
plt.ylabel('')  # Remove y-label
plt.show()

Area Plots

Show cumulative totals or contributions over time.

# Stacked area plot
df.plot(kind='area', stacked=True, alpha=0.5)

# Unstacked area plot
df.plot(kind='area', stacked=False)
plt.show()

Time Series Visualization

Date-Based Plots

# Create time series data
dates = pd.date_range('2024-01-01', periods=100)
df = pd.DataFrame({
    'date': dates,
    'value': range(100)
})
df.set_index('date', inplace=True)

# Plot time series
df.plot()

# Resample and plot
df.resample('W').mean().plot()

# Multiple time series
df.plot(subplots=True, figsize=(10, 8))
plt.show()

Rolling Statistics

# Plot with rolling mean
df['value'].plot(label='Original')
df['value'].rolling(window=7).mean().plot(label='7-day MA')
plt.legend()
plt.show()

Customization

Styling Plots

# Use a style
plt.style.use('seaborn-v0_8')  # or 'ggplot', 'fivethirtyeight', etc.

# Custom colors
df.plot(color=['#FF0000', '#00FF00', '#0000FF'])

# Custom figure size
df.plot(figsize=(12, 6))

# Grid
df.plot(grid=True)

# Title and labels
ax = df.plot()
ax.set_title('My Plot Title', fontsize=16)
ax.set_xlabel('X Label', fontsize=12)
ax.set_ylabel('Y Label', fontsize=12)
plt.show()

Subplots

# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

df['col1'].plot(ax=axes[0, 0], kind='line')
df['col2'].plot(ax=axes[0, 1], kind='bar')
df['col3'].plot(ax=axes[1, 0], kind='hist')
df.plot(ax=axes[1, 1], kind='scatter', x='col1', y='col2')

plt.tight_layout()
plt.show()

Saving Plots

# Save to file
ax = df.plot()
plt.savefig('my_plot.png', dpi=300, bbox_inches='tight')

# Different formats
plt.savefig('plot.pdf')
plt.savefig('plot.svg')

Advanced Visualization

Heatmaps

# Correlation heatmap
import seaborn as sns

corr = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
plt.show()

Pivot Table Visualization

# Create and plot pivot table
pivot = df.pivot_table(
    values='sales',
    index='month',
    columns='category',
    aggfunc='sum'
)
pivot.plot(kind='bar', stacked=True)
plt.show()

Multiple Y-Axes

# Create figure and primary axis
ax1 = df.plot(y='col1', color='blue', legend=False)
ax1.set_ylabel('Column 1', color='blue')

# Create secondary y-axis
ax2 = ax1.twinx()
df['col2'].plot(ax=ax2, color='red', legend=False)
ax2.set_ylabel('Column 2', color='red')

plt.show()

Best Practices

1. Choose the Right Chart Type

Line: Time series, trends
Bar: Comparisons between categories
Scatter: Relationships between variables
Histogram: Distribution of single variable
Box: Distribution with outliers
Pie: Parts of a whole (use sparingly)

2. Label Everything

ax = df.plot()
ax.set_title('Clear, Descriptive Title')
ax.set_xlabel('X-Axis Label with Units')
ax.set_ylabel('Y-Axis Label with Units')
ax.legend(['Series 1', 'Series 2'])
plt.show()

3. Use Appropriate Colors

# Colorblind-friendly palettes
import matplotlib.cm as cm

df.plot(colormap='viridis')  # Good for sequential data
df.plot(colormap='RdYlBu')   # Good for diverging data

4. Control Figure Size

# Make it readable
df.plot(figsize=(12, 6))  # Width x Height in inches

Common Patterns

Comparing Multiple DataFrames

fig, ax = plt.subplots(figsize=(10, 6))
df1.plot(ax=ax, label='Dataset 1')
df2.plot(ax=ax, label='Dataset 2')
ax.legend()
plt.show()

Annotating Plots

ax = df.plot()
ax.annotate(
    'Peak Value',
    xy=(10, 100),
    xytext=(12, 110),
    arrowprops=dict(arrowstyle='->')
)
plt.show()

Creating Publication-Ready Figures

plt.style.use('seaborn-v0_8-paper')

fig, ax = plt.subplots(figsize=(8, 6))
df.plot(ax=ax, linewidth=2)
ax.set_title('Publication Title', fontsize=14, fontweight='bold')
ax.set_xlabel('X Label', fontsize=12)
ax.set_ylabel('Y Label', fontsize=12)
ax.grid(True, alpha=0.3)
ax.legend(frameon=True, fancybox=True)

plt.tight_layout()
plt.savefig('publication_figure.png', dpi=300, bbox_inches='tight')
plt.show()

Integration with Seaborn

For more advanced statistical visualizations, combine Pandas with Seaborn:

import seaborn as sns

# Set style
sns.set_style('whitegrid')

# Use Seaborn with Pandas DataFrames
sns.scatterplot(data=df, x='var1', y='var2', hue='category')
sns.boxplot(data=df, x='category', y='value')
sns.violinplot(data=df, x='category', y='value')
plt.show()

Next Steps

Statistical Operations - Analyze your data before plotting
GroupBy Operations - Aggregate data for visualization
Time Series Analysis - Advanced time-based plots