#data science#data visualization#pandas#python#matplotlib

Data Visualization with Pandas

A hands-on guide to building line, bar, histogram, box, scatter, KDE, Andrews curve, and subplot visualizations directly from a pandas DataFrame or Series.

May 24, 2026 at 8:15 PM18 min readFollowFollow (Hindi)

Topics You Will Master

Generating line, bar, and stacked bar plots directly from DataFrames
Creating histograms, box plots, and density estimation plots
Building scatter matrices and Andrews curves for multi-variable analysis
Configuring subplots and layouts on DataFrame plotting calls
Best For

Data analysts and engineers seeking a fast, convenient way to visualize Pandas DataFrames inline.

Expected Outcome

Quick and efficient inline DataFrame visualizations for rapid exploratory data analysis.

Data visualization is essential for understanding data distributions, trends, and patterns before starting any modeling task. While dedicated plotting libraries like Seaborn are useful, you often need to generate plots quickly during data manipulation.

Pandas provides built-in plotting capabilities directly on DataFrames and Series, wrapping Matplotlib. This allows you to generate line, bar, histogram, box, scatter, and area plots quickly without writing extensive plotting boilerplate.

In this tutorial, you will build a wide range of plots directly from Pandas DataFrames and Series. You will work with random walks, the Iris dataset, and the Titanic dataset to learn the full suite of Pandas visual tools.

Prerequisites: Python 3.x, Pandas, NumPy, Seaborn, Matplotlib.

Setup

Before plotting, you must set up the environment by importing the required data processing libraries.

Imports and Libraries

Group the necessary libraries into a single import block, which sets up Pandas, NumPy, Seaborn, Matplotlib, and random utility functions:

PYTHON
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from numpy.random import randn, randint, uniform, sample

Data Structure Concepts

A Pandas DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It consists of three principal components: the data, rows, and columns. A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.). Each column of a DataFrame is a Series.

We will create a DataFrame df containing 1,000 random normal values indexed by a daily datetime range starting on June 7, 2019, and a Series ts structured similarly:

PYTHON
df = pd.DataFrame(randn(1000), index = pd.date_range('2019-06-07', periods = 1000), columns=['value'])
ts = pd.Series(randn(1000), index = pd.date_range('2019-06-07', periods = 1000))

Inspect the first five rows of the generated DataFrame:

PYTHON
df.head()
value
2019-06-07 -0.992350
2019-06-08 -0.849183
2019-06-09 0.126559
2019-06-10 0.640230
2019-06-11 -0.975090

Preview the first five values of the generated Series:

PYTHON
ts.head()
OUTPUT
2019-06-07    0.430385
2019-06-08    1.810955
2019-06-09    3.207345
2019-06-10   -0.366252
2019-06-11    1.406304
Freq: D, dtype: float64

Verify the underlying class types of both data structures:

PYTHON
type(df),type(ts)
OUTPUT
(pandas.core.frame.DataFrame, pandas.core.series.Series)

Line Plots

Line plots show trends over a continuous interval or time series. We will apply cumulative sum transformations to simulate a random walk.

Cumulative Transformations

Use the cumsum() function to compute the cumulative sum of the data along the index axis:

PYTHON
df['value'] = df['value'].cumsum()
df.head()
value
2019-06-07 -0.992350
2019-06-08 -2.833884
2019-06-09 -4.548859
2019-06-10 -5.623604
2019-06-11 -7.673438

Apply the same cumulative sum transformation to the Series:

PYTHON
ts = ts.cumsum()
ts.head()
OUTPUT
2019-06-07     0.430385
2019-06-08     2.671725
2019-06-09     8.120411
2019-06-10    13.202844
2019-06-11    19.691581
Freq: D, dtype: float64

Confirm that the structures remain a DataFrame and Series:

PYTHON
type(df), type(ts)
OUTPUT
(pandas.core.frame.DataFrame, pandas.core.series.Series)

Visualizing Line Charts

Plot the Series using a custom figure size of 10x5 inches:

PYTHON
ts.plot(figsize=(10,5))

The resulting chart tracks the cumulative trajectory of the random walk Series over time:

Line plot of cumulative random walk Series values over time

Now, plot the DataFrame containing the single value column:

PYTHON
df.plot()

The line plot tracks the cumulative values of the DataFrame over the datetime index:

Line plot of cumulative random walk DataFrame values over time

Load the built-in Iris dataset to explore multi-variable line plotting:

PYTHON
iris = sns.load_dataset('iris')
iris.head()

| | sepal_length | sepal_width | petal_length | petal_width | species | |---|---|---|---|---|---|---| | 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | | 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa | | 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa | | 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa | | 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |

Plot the Iris features with a custom title and axis labels:

PYTHON
ax = iris.plot(figsize=(15,8), title='Iris Dataset')
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')

The multi-line plot shows the measurements for all four Iris features across the samples:

Multi-line plot of Iris dataset features showing sepal and petal dimensions

Customize line styles by passing Matplotlib formatting strings (e.g., 'r--' for red dashed lines) and enabling the legend:

PYTHON
ts.plot(style = 'r--', label = 'Series', legend = True)

The resulting line plot displays the Series as a red dashed line accompanied by a legend:

Line plot of cumulative Series using a red dashed line style

For data covering multiple orders of magnitude, use a logarithmic y-axis scale by setting logy=True:

PYTHON
iris.plot(legend = False, figsize = (10, 5), logy = True)

The logarithmic scale compresses the vertical distribution, highlighting relative variations across samples:

Line plot of Iris features using a logarithmic y-axis scale

Secondary Y-Axes

When plotting columns with different measurement units, plot subset columns on a secondary y-axis to compare trends without squeezing smaller values. Drop sepal and petal widths to build the first DataFrame:

PYTHON
x = iris.drop(['sepal_width', 'petal_width'], axis = 1)
x.head()
sepal_length petal_length species
0 5.1 1.4 setosa
1 4.9 1.4 setosa
2 4.7 1.3 setosa
3 4.6 1.5 setosa
4 5.0 1.4 setosa

Create the second DataFrame by dropping sepal and petal lengths:

PYTHON
y = iris.drop(['sepal_length', 'petal_length'], axis = 1)
y.head()
sepal_width petal_width species
0 3.5 0.2 setosa
1 3.0 0.2 setosa
2 3.2 0.2 setosa
3 3.1 0.2 setosa
4 3.6 0.2 setosa

Plot x on the primary y-axis, and overlay y on the secondary y-axis using the same axes object ax:

PYTHON
ax = x.plot()
y.plot(figsize = (16,10), secondary_y = True, ax = ax)

The plot separates lengths on the left y-axis and widths on the right y-axis:

Line plot displaying primary lengths and secondary widths on dual y-axes

Adjust the tick label resolution by enabling the x_compat compatibility parameter:

PYTHON
x.plot(figsize=(10,5), x_compat = True)

The resulting chart adjusts grid alignment settings for compatibility:

Line plot of Iris lengths with tick resolution compatibilities enabled

Bar Plots

Bar plots display comparisons among discrete categories. Before plotting, drop non-numeric columns like species to isolate the variables:

PYTHON
df = iris.drop(['species'], axis = 1)
df.head()
sepal_length sepal_width petal_length petal_width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

Generate a bar plot for the first observation in the dataset by passing kind='bar':

PYTHON
df.iloc[0].plot(kind='bar')

The bar plot compares the four feature measurements for a single Iris flower:

Bar plot of Iris feature values for the first sample row

Alternatively, call the .bar() plotting method directly on the slice:

PYTHON
df.iloc[0].plot.bar()

The resulting bar chart displays the identical row comparisons:

Bar plot of Iris feature values for the first sample row using plot.bar

Load the Titanic dataset to explore discrete categorical distributions:

PYTHON
titanic = sns.load_dataset('titanic')
titanic.head()
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

Plot a histogram of passenger ticket classes to check frequencies:

PYTHON
titanic['pclass'].plot(kind = 'hist')

The histogram columns show passenger frequencies across the three ticket classes:

Histogram of passenger ticket class distributions on the Titanic

Create a small DataFrame with 10 rows and 4 columns to study multi-series bar plots:

PYTHON
df = pd.DataFrame(randn(10, 4), columns=['a', 'b', 'c', 'd'])
df.head()
a b c d
0 -0.358585 -0.530212 -1.037960 -0.620583
1 0.063102 0.872088 0.429474 2.020268
2 -1.064892 -0.521098 -0.238016 1.559072
3 -0.277393 -1.246629 1.723683 -0.069810
4 -1.123548 -0.375084 0.528301 0.739006

Plot the column values side-by-side for each index:

PYTHON
df.plot.bar()

The bar plot clusters the values of columns a, b, c, and d for each row:

Bar plot showing side-by-side column comparisons for a ten-row DataFrame

Stacked Bar Plots

To stack the values of the columns vertically for each index, set stacked=True:

PYTHON
df.plot.bar(stacked = True)

The stacked bar plot shows the cumulative contribution of each column to the total row value:

Stacked bar plot showing cumulative row totals across columns

Alternatively, draw the stacked plot by passing the kind argument:

PYTHON
df.plot(kind = 'bar', stacked = True)

The resulting chart displays the identical stacked columns:

Stacked bar plot showing cumulative row totals using kind='bar'

To plot the stacked bars horizontally, use .barh(stacked=True) and remove the border spines with plt.axis('off'):

PYTHON
df.plot.barh(stacked = True)
plt.axis('off')

The horizontal bar plot displays row components stacked along the x-axis with borders hidden:

Horizontal stacked bar plot with axis lines disabled

Histograms

A histogram represents the distribution of numerical data by grouping values into continuous intervals.

Histogram Implementations

Generate a default histogram combining all Iris features:

PYTHON
iris.plot.hist()

The histogram overlays the distribution spreads of all numerical Iris features:

Histogram of Iris feature distributions

Alternatively, pass kind='hist' to get the same output:

PYTHON
iris.plot(kind = 'hist')

The resulting chart displays the identical feature histograms:

![Histogram of Iris feature distributions using kind='hist']](../../images/image-58.png)

Stack the histograms and increase the bin count to 50 for finer interval resolution:

PYTHON
iris.plot(kind = 'hist', stacked = True, bins = 50)

The stacked histogram displays the combined distribution peaks across 50 bins:

Stacked histogram of Iris feature distributions using 50 bins

Rotate the histogram layout horizontally by setting the orientation parameter:

PYTHON
iris.plot(kind = 'hist', stacked = True, bins = 50, orientation = 'horizontal')

The horizontal stacked histogram plots value counts along the x-axis:

Horizontal stacked histogram of Iris feature distributions

Processing Differences

Use the diff() function to calculate the difference of consecutive values down the index axis:

PYTHON
iris['sepal_width'].diff()[:10]
OUTPUT
0    NaN
1   -0.5
2    0.2
3   -0.1
4    0.5
5    0.3
6   -0.5
7    0.0
8   -0.5
9    0.2
Name: sepal_width, dtype: float64

Plot the distribution of these consecutive differences:

PYTHON
iris['sepal_width'].diff().plot(kind = 'hist', stacked = True, bins = 50)

The histogram shows the spread of step differences for the sepal width measurements:

Histogram of consecutive step differences in Iris sepal width

Drop the categorical columns to isolate the numerical features:

PYTHON
df = iris.drop(['species'], axis = 1)
df.diff()[:10]
sepal_length sepal_width petal_length petal_width
0 NaN NaN NaN NaN
1 -0.2 -0.5 0.0 0.0
2 -0.2 0.2 -0.1 0.0
3 -0.1 -0.1 0.2 0.0
4 0.4 0.5 -0.1 0.0
5 0.4 0.3 0.3 0.2
6 -0.8 -0.5 -0.3 -0.1
7 0.4 0.0 0.1 -0.1
8 -0.6 -0.5 -0.1 0.0
9 0.5 0.2 0.1 -0.1

Plot separate histograms for each feature's difference values on a grid layout:

PYTHON
df.diff().hist(color = 'r', alpha = 0.5, figsize=(10,10))

The resulting 2x2 grid plots the difference distributions for each numerical column:

Grid of histograms showing difference spreads for all numerical Iris columns

Box Plots

Box plots summarize a distribution using five statistics: the minimum, lower quartile (), median, upper quartile (), and maximum.

The box plot diagram below details how these parameters partition a distribution:

Box plot anatomy diagram detailing minimum, lower quartile, median, upper quartile, maximum, and IQR

Define a styling dictionary to customize the colors of the boxes and whiskers:

PYTHON
color = {'boxes': 'DarkGreen', 'whiskers': 'r'}
color
OUTPUT
{'boxes': 'DarkGreen', 'whiskers': 'r'}

Generate box plots for the numerical Iris columns using the custom style colors:

PYTHON
df.plot(kind = 'box', figsize=(10,5), color = color)

The box plots compare the medians, interquartile ranges, and outliers across the Iris features:

Box plots comparing distributions and outlier points of Iris features

Rotate the box plots horizontally by setting vert=False:

PYTHON
df.plot(kind = 'box', figsize=(10,5), color = color, vert = False)

The horizontal box plots display the same distributions rotated 90 degrees:

Horizontal box plots comparing Iris feature distributions

Area and Scatter Plots

Area plots stack filled line segments, showing cumulative totals over an axis. Generate a stacked area plot using the Iris features:

PYTHON
df.plot(kind = 'area')

The area plot stacks the feature values to display cumulative totals:

Stacked area plot of Iris feature values

Disable stacking by setting stacked=False to overlay transparent filled line areas:

PYTHON
df.plot.area(stacked = False)

The unstacked area plot displays overlapping filled curves representing individual feature dimensions:

Unstacked area plot of Iris feature values

Scatter plots map individual samples as coordinate points. Plot sepal length against petal length:

PYTHON
df.plot.scatter(x = 'sepal_length', y = 'petal_length')

The scatter plot shows the positive correlation between Iris sepal length and petal length:

Scatter plot of Iris sepal length vs petal length

Color-code the points based on sepal width values by passing the column to the c parameter:

PYTHON
df.plot.scatter(x = 'sepal_length', y = 'petal_length', c = 'sepal_width')

The points are shaded based on sepal width values using the default colormap:

Scatter plot of sepal length vs petal length colored by sepal width

Overlay two scatter series onto the same plot by sharing the axes reference ax:

PYTHON
ax = df.plot.scatter(x = 'sepal_length', y = 'petal_length', label = 'Length');
df.plot.scatter(x = 'sepal_width', y = 'petal_width', label = 'Width', ax = ax, color = 'r')

The plot overlays the sepal length vs. petal length (blue) and sepal width vs. petal width (red) series:

Overlaid scatter plot comparing length and width coordinates on a single axes

Scale the marker diameters dynamically by passing values from a continuous column (e.g., petal_width) to the s parameter:

PYTHON
df.plot.scatter(x = 'sepal_length', y = 'petal_length', c = 'sepal_width', s = df['petal_width']*200)

The plot adjusts marker sizes based on petal width, adding a third variable dimension:

Bubble scatter plot of sepal length vs petal length with marker sizes scaled by petal width

Hex and Pie Plots

Hexbin plots aggregate crowded coordinate points into hexagonal bins. This is useful for large datasets to avoid overplotting.

Hexagonal Binning

Plot sepal length vs. petal length using hexbins, scaling the bin colors by sepal width:

PYTHON
df.plot.hexbin(x = 'sepal_length', y = 'petal_length', gridsize = 10, C = 'sepal_width')

The hexbin plot aggregates the density of data points within hexagonal regions:

Hexbin plot aggregating sepal length vs petal length points

Pie Charts

Pie charts display proportional relationships within a single column. Extract the first row of data:

PYTHON
d = df.iloc[0]
d
OUTPUT
sepal_length    5.1
sepal_width     3.5
petal_length    1.4
petal_width     0.2
Name: 0, dtype: float64

Plot the first row's feature values as a pie chart:

PYTHON
d.plot.pie(figsize = (10,10))

The pie chart compares the proportions of the features for the single sample:

Pie chart comparing Iris feature proportions for the first row observation

To compare multiple observations, transpose the first three rows of the DataFrame:

PYTHON
d = df.head(3).T
d
0 1 2
sepal_length 5.1 4.9 4.7
sepal_width 3.5 3.0 3.2
petal_length 1.4 1.4 1.3
petal_width 0.2 0.2 0.2

Generate separate pie subplots for each sample column:

PYTHON
d.plot.pie(subplots = True, figsize = (20, 20))

The grid displays three pie charts comparing feature proportions across the samples:

Grid of three pie charts comparing feature proportions for the first three Iris samples

Add percentage annotations and customize label sizes using autopct and fontsize:

PYTHON
d.plot.pie(subplots = True, figsize = (20, 20), fontsize = 16, autopct = '%.2f')

The pie charts display percentage values calculated for each wedge:

Grid of pie charts displaying percentage annotations with customized label sizes

If the sum of values in a Series is less than 1, Pandas plots an incomplete pie chart with an empty wedge representing the remainder:

PYTHON
x=[0.2]*4
print(x)
print(sum(x))
OUTPUT
[0.2, 0.2, 0.2, 0.2]
0.8

Generate the pie chart for this series:

PYTHON
series = pd.Series(x, index = ['a','b','c', 'd'], name = 'Pie Plot')
series.plot.pie()

The pie chart leaves a blank segment representing the 0.2 remainder:

Incomplete pie chart showing an empty wedge representing values summing to less than one

Scatter Matrix

A scatter matrix (pairs plot) displays all pairwise correlations across numerical columns in a single grid layout. Import the utility function:

PYTHON
from pandas.plotting import scatter_matrix

Plot the pairwise relationships, using kernel density estimates along the diagonal:

PYTHON
scatter_matrix(df, figsize= (8,8), diagonal='kde', color = 'r')
plt.show()

The scatter matrix displays pairwise scatter plots off the diagonal and univariate KDE plots along the diagonal:

Scatter matrix grid comparing pairwise relationships across numerical Iris columns

KDE Plots

Kernel Density Estimate (KDE) plots visualize the probability density of a continuous variable. Plot the density for the cumulative Series:

PYTHON
ts.plot.kde()

The KDE plot displays the smooth probability density curve of the Series values:

Kernel Density Estimate plot showing the probability density distribution of the Series

Andrews Curves

Andrews curves project multivariate data onto a one-dimensional curve using Fourier coefficients:

Where the coefficients correspond to the feature values, and is linearly spaced between and . Each row is plotted as a curve. Import the function:

PYTHON
from pandas.plotting import andrews_curves

Plot the Andrews curves grouped by sepal width:

PYTHON
andrews_curves(df, 'sepal_width')

The Andrews curves display the multivariate relationships as overlapping wave patterns:

Andrews curves plot projecting the Iris features

Subplots

To split DataFrame columns into separate subplots, set subplots=True. Set sharex=False to give each plot its own independent x-axis:

PYTHON
df.plot(subplots = True, sharex = False)
plt.tight_layout()

The grid stacks four line charts showing individual column values over the sample index:

Stack of line subplots displaying individual Iris feature dimensions

Customize the subplot layout grid by passing a layout tuple (e.g., (2,2) for a 2x2 grid):

PYTHON
df.plot(subplots = True, sharex = False, layout = (2,2), figsize = (16,8))
plt.tight_layout()

The 2x2 grid displays the four feature subplots in a clean rectangular arrangement:

Grid of 2x2 subplots displaying individual Iris feature dimensions

Conclusion

In this tutorial, you explored Pandas' built-in data visualization tools. By working with random walks, the Iris dataset, and the Titanic dataset, you generated line, bar, stacked bar, area, scatter, hexbin, pie, scatter matrix, KDE, Andrews curve, and grid subplot visualizations.

Key takeaways:

  • Convenient Wrapper: Pandas plotting wraps Matplotlib, allowing you to create charts directly from DataFrame and Series objects with minimal syntax.
  • Plot Customization: You can customize figure sizes, axes titles, legends, log scales, secondary y-axes, and color mapping directly from the .plot() call.
  • Grid Layouts: Splitting columns into subplots or using tools like scatter_matrix provides a quick overview of multi-variable relationships.
  • Analytical Selection: Selecting the right plot type (e.g., hexbins for dense scatter plots or boxen/box plots for quantiles) is key to understanding your data distribution before modeling.

Next steps:

  • Read Complete Seaborn Tutorial to learn high-level statistical visualization techniques.
  • Explore Matplotlib Crash Course to gain low-level layout control and create custom compound figures.
  • Apply these visualization techniques to your own dataset to identify patterns and correlations during exploratory data analysis.

Find this tutorial useful?

Subscribe to our YouTube channels for more practical production walk-throughs.

Discussion & Comments