scientific-skills/seaborn/references/function_reference.md
This document provides a comprehensive reference for all major seaborn functions, organized by category.
Purpose: Create a scatter plot with points representing individual observations.
Key Parameters:
data - DataFrame, array, or dict of arraysx, y - Variables for x and y axeshue - Grouping variable for color encodingsize - Grouping variable for size encodingstyle - Grouping variable for marker stylepalette - Color palette name or listhue_order - Order for categorical hue levelshue_norm - Normalization for numeric hue (tuple or Normalize object)sizes - Size range for size encoding (tuple or dict)size_order - Order for categorical size levelssize_norm - Normalization for numeric sizemarkers - Marker style(s) (string, list, or dict)style_order - Order for categorical style levelslegend - How to draw legend: "auto", "brief", "full", or Falseax - Matplotlib axes to plot onExample:
sns.scatterplot(data=df, x='height', y='weight',
hue='gender', size='age', style='smoker',
palette='Set2', sizes=(20, 200))
Purpose: Draw a line plot with automatic aggregation and confidence intervals for repeated measures.
Key Parameters:
data - DataFrame, array, or dict of arraysx, y - Variables for x and y axeshue - Grouping variable for color encodingsize - Grouping variable for line widthstyle - Grouping variable for line style (dashes)units - Grouping variable for sampling units (no aggregation within units)estimator - Function for aggregating across observations (default: mean)errorbar - Method for error bars: "sd", "se", "pi", ("ci", level), ("pi", level), or Nonen_boot - Number of bootstrap iterations for CI computationseed - Random seed for reproducible bootstrappingsort - Sort data before plottingerr_style - "band" or "bars" for error representationerr_kws - Additional parameters for error representationmarkers - Marker style(s) for emphasizing data pointsdashes - Dash style(s) for lineslegend - How to draw legendax - Matplotlib axes to plot onExample:
sns.lineplot(data=timeseries, x='time', y='signal',
hue='condition', style='subject',
errorbar=('ci', 95), markers=True)
Purpose: Figure-level interface for drawing relational plots (scatter or line) onto a FacetGrid.
Key Parameters:
All parameters from scatterplot() and lineplot(), plus:
kind - "scatter" or "line"col - Categorical variable for column facetsrow - Categorical variable for row facetscol_wrap - Wrap columns after this many columnscol_order - Order for column facet levelsrow_order - Order for row facet levelsheight - Height of each facet in inchesaspect - Aspect ratio (width = height * aspect)facet_kws - Additional parameters for FacetGridExample:
sns.relplot(data=df, x='time', y='measurement',
hue='treatment', style='batch',
col='cell_line', row='timepoint',
kind='line', height=3, aspect=1.5)
Purpose: Plot univariate or bivariate histograms with flexible binning.
Key Parameters:
data - DataFrame, array, or dictx, y - Variables (y optional for bivariate)hue - Grouping variableweights - Variable for weighting observationsstat - Aggregate statistic: "count", "frequency", "probability", "percent", "density"bins - Number of bins, bin edges, or method ("auto", "fd", "doane", "scott", "stone", "rice", "sturges", "sqrt")binwidth - Width of bins (overrides bins)binrange - Range for binning (tuple)discrete - Treat x as discrete (centers bars on values)cumulative - Compute cumulative distributioncommon_bins - Use same bins for all hue levelscommon_norm - Normalize across hue levelsmultiple - How to handle hue: "layer", "dodge", "stack", "fill"element - Visual element: "bars", "step", "poly"fill - Fill bars/elementsshrink - Scale bar width (for multiple="dodge")kde - Overlay KDE estimatekde_kws - Parameters for KDEline_kws - Parameters for step/poly elementsthresh - Minimum count threshold for binspthresh - Minimum probability thresholdpmax - Maximum probability for color scalinglog_scale - Log scale for axis (bool or base)legend - Whether to show legendax - Matplotlib axesExample:
sns.histplot(data=df, x='measurement', hue='condition',
stat='density', bins=30, kde=True,
multiple='layer', alpha=0.5)
Purpose: Plot univariate or bivariate kernel density estimates.
Key Parameters:
data - DataFrame, array, or dictx, y - Variables (y optional for bivariate)hue - Grouping variableweights - Variable for weighting observationspalette - Color palettehue_order - Order for hue levelshue_norm - Normalization for numeric huemultiple - How to handle hue: "layer", "stack", "fill"common_norm - Normalize across hue levelscommon_grid - Use same grid for all hue levelscumulative - Compute cumulative distributionbw_method - Method for bandwidth: "scott", "silverman", or scalarbw_adjust - Bandwidth multiplier (higher = smoother)log_scale - Log scale for axislevels - Number or values for contour levels (bivariate)thresh - Minimum density threshold for contoursgridsize - Grid resolutioncut - Extension beyond data extremes (in bandwidth units)clip - Data range for curve (tuple)fill - Fill area under curve/contourslegend - Whether to show legendax - Matplotlib axesExample:
# Univariate
sns.kdeplot(data=df, x='measurement', hue='condition',
fill=True, common_norm=False, bw_adjust=1.5)
# Bivariate
sns.kdeplot(data=df, x='var1', y='var2',
fill=True, levels=10, thresh=0.05)
Purpose: Plot empirical cumulative distribution functions.
Key Parameters:
data - DataFrame, array, or dictx, y - Variables (specify one)hue - Grouping variableweights - Variable for weighting observationsstat - "proportion" or "count"complementary - Plot complementary CDF (1 - ECDF)palette - Color palettehue_order - Order for hue levelshue_norm - Normalization for numeric huelog_scale - Log scale for axislegend - Whether to show legendax - Matplotlib axesExample:
sns.ecdfplot(data=df, x='response_time', hue='treatment',
stat='proportion', complementary=False)
Purpose: Plot tick marks showing individual observations along an axis.
Key Parameters:
data - DataFrame, array, or dictx, y - Variable (specify one)hue - Grouping variableheight - Height of ticks (proportion of axis)expand_margins - Add margin space for rugpalette - Color palettehue_order - Order for hue levelshue_norm - Normalization for numeric huelegend - Whether to show legendax - Matplotlib axesExample:
sns.rugplot(data=df, x='value', hue='category', height=0.05)
Purpose: Figure-level interface for distribution plots onto a FacetGrid.
Key Parameters:
All parameters from histplot(), kdeplot(), and ecdfplot(), plus:
kind - "hist", "kde", "ecdf"rug - Add rug plot on marginal axesrug_kws - Parameters for rug plotcol - Categorical variable for column facetsrow - Categorical variable for row facetscol_wrap - Wrap columnscol_order - Order for column facetsrow_order - Order for row facetsheight - Height of each facetaspect - Aspect ratiofacet_kws - Additional parameters for FacetGridExample:
sns.displot(data=df, x='measurement', hue='treatment',
col='timepoint', kind='kde', fill=True,
height=3, aspect=1.5, rug=True)
Purpose: Draw a bivariate plot with marginal univariate plots.
Key Parameters:
data - DataFramex, y - Variables for x and y axeshue - Grouping variablekind - "scatter", "kde", "hist", "hex", "reg", "resid"height - Size of the figure (square)ratio - Ratio of joint to marginal axesspace - Space between joint and marginal axesdropna - Drop missing valuesxlim, ylim - Axis limits (tuples)marginal_ticks - Show ticks on marginal axesjoint_kws - Parameters for joint plotmarginal_kws - Parameters for marginal plotshue_order - Order for hue levelspalette - Color paletteExample:
sns.jointplot(data=df, x='var1', y='var2', hue='group',
kind='scatter', height=6, ratio=4,
joint_kws={'alpha': 0.5})
Purpose: Plot pairwise relationships in a dataset.
Key Parameters:
data - DataFramehue - Grouping variable for color encodinghue_order - Order for hue levelspalette - Color palettevars - Variables to plot (default: all numeric)x_vars, y_vars - Variables for x and y axes (non-square grid)kind - "scatter", "kde", "hist", "reg"diag_kind - "auto", "hist", "kde", Nonemarkers - Marker style(s)height - Height of each facetaspect - Aspect ratiocorner - Plot only lower triangledropna - Drop missing valuesplot_kws - Parameters for non-diagonal plotsdiag_kws - Parameters for diagonal plotsgrid_kws - Parameters for PairGridExample:
sns.pairplot(data=df, hue='species', palette='Set2',
vars=['sepal_length', 'sepal_width', 'petal_length'],
corner=True, height=2.5)
Purpose: Draw a categorical scatterplot with jittered points.
Key Parameters:
data - DataFrame, array, or dictx, y - Variables (one categorical, one continuous)hue - Grouping variableorder - Order for categorical levelshue_order - Order for hue levelsjitter - Amount of jitter: True, float, or Falsedodge - Separate hue levels side-by-sideorient - "v" or "h" (usually inferred)color - Single color for all elementspalette - Color palettesize - Marker sizeedgecolor - Marker edge colorlinewidth - Marker edge widthnative_scale - Use numeric scale for categorical axisformatter - Formatter for categorical axislegend - Whether to show legendax - Matplotlib axesExample:
sns.stripplot(data=df, x='day', y='total_bill',
hue='sex', dodge=True, jitter=0.2)
Purpose: Draw a categorical scatterplot with non-overlapping points.
Key Parameters:
Same as stripplot(), except:
jitter parametersize - Marker size (important for avoiding overlap)warn_thresh - Threshold for warning about too many points (default: 0.05)Note: Computationally intensive for large datasets. Use stripplot for >1000 points.
Example:
sns.swarmplot(data=df, x='day', y='total_bill',
hue='time', dodge=True, size=5)
Purpose: Draw a box plot showing quartiles and outliers.
Key Parameters:
data - DataFrame, array, or dictx, y - Variables (one categorical, one continuous)hue - Grouping variableorder - Order for categorical levelshue_order - Order for hue levelsorient - "v" or "h"color - Single color for boxespalette - Color palettesaturation - Color saturation intensitywidth - Width of boxesdodge - Separate hue levels side-by-sidefliersize - Size of outlier markerslinewidth - Box line widthwhis - IQR multiplier for whiskers (default: 1.5)notch - Draw notched boxesshowcaps - Show whisker capsshowmeans - Show mean valuemeanprops - Properties for mean markerboxprops - Properties for boxeswhiskerprops - Properties for whiskerscapprops - Properties for capsflierprops - Properties for outliersmedianprops - Properties for median linenative_scale - Use numeric scaleformatter - Formatter for categorical axislegend - Whether to show legendax - Matplotlib axesExample:
sns.boxplot(data=df, x='day', y='total_bill',
hue='smoker', palette='Set3',
showmeans=True, notch=True)
Purpose: Draw a violin plot combining boxplot and KDE.
Key Parameters:
Same as boxplot(), plus:
bw_method - KDE bandwidth methodbw_adjust - KDE bandwidth multipliercut - KDE extension beyond extremesdensity_norm - "area", "count", "width"inner - "box", "quartile", "point", "stick", Nonesplit - Split violins for hue comparisonscale - Scaling method: "area", "count", "width"scale_hue - Scale across hue levelsgridsize - KDE grid resolutionExample:
sns.violinplot(data=df, x='day', y='total_bill',
hue='sex', split=True, inner='quartile',
palette='muted')
Purpose: Draw enhanced box plot for larger datasets showing more quantiles.
Key Parameters:
Same as boxplot(), plus:
k_depth - "tukey", "proportion", "trustworthy", "full", or intoutlier_prop - Proportion of data as outlierstrust_alpha - Alpha for trustworthy depthshowfliers - Show outlier pointsExample:
sns.boxenplot(data=df, x='day', y='total_bill',
hue='time', palette='Set2')
Purpose: Draw a bar plot with error bars showing statistical estimates.
Key Parameters:
data - DataFrame, array, or dictx, y - Variables (one categorical, one continuous)hue - Grouping variableorder - Order for categorical levelshue_order - Order for hue levelsestimator - Aggregation function (default: mean)errorbar - Error representation: "sd", "se", "pi", ("ci", level), ("pi", level), or Nonen_boot - Bootstrap iterationsseed - Random seedunits - Identifier for sampling unitsweights - Observation weightsorient - "v" or "h"color - Single bar colorpalette - Color palettesaturation - Color saturationwidth - Bar widthdodge - Separate hue levels side-by-sideerrcolor - Error bar colorerrwidth - Error bar line widthcapsize - Error bar cap widthnative_scale - Use numeric scaleformatter - Formatter for categorical axislegend - Whether to show legendax - Matplotlib axesExample:
sns.barplot(data=df, x='day', y='total_bill',
hue='sex', estimator='median',
errorbar=('ci', 95), capsize=0.1)
Purpose: Show counts of observations in each categorical bin.
Key Parameters:
Same as barplot(), but:
stat - "count" or "percent"Example:
sns.countplot(data=df, x='day', hue='time',
palette='pastel', dodge=True)
Purpose: Show point estimates and confidence intervals with connecting lines.
Key Parameters:
Same as barplot(), plus:
markers - Marker style(s)linestyles - Line style(s)scale - Scale for markersjoin - Connect points with linescapsize - Error bar cap widthExample:
sns.pointplot(data=df, x='time', y='total_bill',
hue='sex', markers=['o', 's'],
linestyles=['-', '--'], capsize=0.1)
Purpose: Figure-level interface for categorical plots onto a FacetGrid.
Key Parameters: All parameters from categorical plots, plus:
kind - "strip", "swarm", "box", "violin", "boxen", "bar", "point", "count"col - Categorical variable for column facetsrow - Categorical variable for row facetscol_wrap - Wrap columnscol_order - Order for column facetsrow_order - Order for row facetsheight - Height of each facetaspect - Aspect ratiosharex, sharey - Share axes across facetslegend - Whether to show legendlegend_out - Place legend outside figurefacet_kws - Additional FacetGrid parametersExample:
sns.catplot(data=df, x='day', y='total_bill',
hue='smoker', col='time',
kind='violin', split=True,
height=4, aspect=0.8)
Purpose: Plot data and a linear regression fit.
Key Parameters:
data - DataFramex, y - Variables or data vectorsx_estimator - Apply estimator to x binsx_bins - Bin x for estimatorx_ci - CI for binned estimatesscatter - Show scatter pointsfit_reg - Plot regression lineci - CI for regression estimate (int or None)n_boot - Bootstrap iterations for CIunits - Identifier for sampling unitsseed - Random seedorder - Polynomial regression orderlogistic - Fit logistic regressionlowess - Fit lowess smootherrobust - Fit robust regressionlogx - Log-transform xx_partial, y_partial - Partial regression (regress out variables)truncate - Limit regression line to data rangedropna - Drop missing valuesx_jitter, y_jitter - Add jitter to datalabel - Label for legendcolor - Color for all elementsmarker - Marker stylescatter_kws - Parameters for scatterline_kws - Parameters for regression lineax - Matplotlib axesExample:
sns.regplot(data=df, x='total_bill', y='tip',
order=2, robust=True, ci=95,
scatter_kws={'alpha': 0.5})
Purpose: Figure-level interface for regression plots onto a FacetGrid.
Key Parameters:
All parameters from regplot(), plus:
hue - Grouping variablecol - Column facetsrow - Row facetspalette - Color palettecol_wrap - Wrap columnsheight - Facet heightaspect - Aspect ratiomarkers - Marker style(s)sharex, sharey - Share axeshue_order - Order for hue levelscol_order - Order for column facetsrow_order - Order for row facetslegend - Whether to show legendlegend_out - Place legend outsidefacet_kws - FacetGrid parametersExample:
sns.lmplot(data=df, x='total_bill', y='tip',
hue='smoker', col='time', row='sex',
height=3, aspect=1.2, ci=None)
Purpose: Plot residuals of a regression.
Key Parameters:
Same as regplot(), but:
lowess - Fit lowess smoother to residualsExample:
sns.residplot(data=df, x='x', y='y', lowess=True,
scatter_kws={'alpha': 0.5})
Purpose: Plot rectangular data as a color-encoded matrix.
Key Parameters:
data - 2D array-like datavmin, vmax - Anchor values for colormapcmap - Colormap name or objectcenter - Value at colormap centerrobust - Use robust quantiles for colormap rangeannot - Annotate cells: True, False, or arrayfmt - Format string for annotations (e.g., ".2f")annot_kws - Parameters for annotationslinewidths - Width of cell borderslinecolor - Color of cell borderscbar - Draw colorbarcbar_kws - Colorbar parameterscbar_ax - Axes for colorbarsquare - Force square cellsxticklabels, yticklabels - Tick labels (True, False, int, or list)mask - Boolean array to mask cellsax - Matplotlib axesExample:
# Correlation matrix
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f',
cmap='coolwarm', center=0, square=True,
linewidths=1, cbar_kws={'shrink': 0.8})
Purpose: Plot a hierarchically-clustered heatmap.
Key Parameters:
All parameters from heatmap(), plus:
pivot_kws - Parameters for pivoting (if needed)method - Linkage method: "single", "complete", "average", "weighted", "centroid", "median", "ward"metric - Distance metric for clusteringstandard_scale - Standardize data: 0 (rows), 1 (columns), or Nonez_score - Z-score normalize data: 0 (rows), 1 (columns), or Nonerow_cluster, col_cluster - Cluster rows/columnsrow_linkage, col_linkage - Precomputed linkage matricesrow_colors, col_colors - Additional color annotationsdendrogram_ratio - Ratio of dendrogram to heatmapcolors_ratio - Ratio of color annotations to heatmapcbar_pos - Colorbar position (tuple: x, y, width, height)tree_kws - Parameters for dendrogramfigsize - Figure sizeExample:
sns.clustermap(data, method='average', metric='euclidean',
z_score=0, cmap='viridis',
row_colors=row_colors, col_colors=col_colors,
figsize=(12, 12), dendrogram_ratio=0.1)
Purpose: Multi-plot grid for plotting conditional relationships.
Initialization:
g = sns.FacetGrid(data, row=None, col=None, hue=None,
col_wrap=None, sharex=True, sharey=True,
height=3, aspect=1, palette=None,
row_order=None, col_order=None, hue_order=None,
hue_kws=None, dropna=False, legend_out=True,
despine=True, margin_titles=False,
xlim=None, ylim=None, subplot_kws=None,
gridspec_kws=None)
Methods:
map(func, *args, **kwargs) - Apply function to each facetmap_dataframe(func, *args, **kwargs) - Apply function with full DataFrameset_axis_labels(x_var, y_var) - Set axis labelsset_titles(template, **kwargs) - Set subplot titlesset(kwargs) - Set attributes on all axesadd_legend(legend_data, title, label_order, **kwargs) - Add legendsavefig(*args, **kwargs) - Save figureExample:
g = sns.FacetGrid(df, col='time', row='sex', hue='smoker',
height=3, aspect=1.5, margin_titles=True)
g.map(sns.scatterplot, 'total_bill', 'tip', alpha=0.7)
g.add_legend()
g.set_axis_labels('Total Bill ($)', 'Tip ($)')
g.set_titles('{col_name} | {row_name}')
Purpose: Grid for plotting pairwise relationships in a dataset.
Initialization:
g = sns.PairGrid(data, hue=None, vars=None,
x_vars=None, y_vars=None,
hue_order=None, palette=None,
hue_kws=None, corner=False,
diag_sharey=True, height=2.5,
aspect=1, layout_pad=0.5,
despine=True, dropna=False)
Methods:
map(func, **kwargs) - Apply function to all subplotsmap_diag(func, **kwargs) - Apply to diagonalmap_offdiag(func, **kwargs) - Apply to off-diagonalmap_upper(func, **kwargs) - Apply to upper trianglemap_lower(func, **kwargs) - Apply to lower triangleadd_legend(legend_data, **kwargs) - Add legendsavefig(*args, **kwargs) - Save figureExample:
g = sns.PairGrid(df, hue='species', vars=['a', 'b', 'c', 'd'],
corner=True, height=2.5)
g.map_upper(sns.scatterplot, alpha=0.5)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot, kde=True)
g.add_legend()
Purpose: Grid for bivariate plot with marginal univariate plots.
Initialization:
g = sns.JointGrid(data=None, x=None, y=None, hue=None,
height=6, ratio=5, space=0.2,
dropna=False, xlim=None, ylim=None,
marginal_ticks=False, hue_order=None,
palette=None)
Methods:
plot(joint_func, marginal_func, **kwargs) - Plot both joint and marginalsplot_joint(func, **kwargs) - Plot joint distributionplot_marginals(func, **kwargs) - Plot marginal distributionsrefline(x, y, **kwargs) - Add reference lineset_axis_labels(xlabel, ylabel, **kwargs) - Set axis labelssavefig(*args, **kwargs) - Save figureExample:
g = sns.JointGrid(data=df, x='x', y='y', hue='group',
height=6, ratio=5, space=0.2)
g.plot_joint(sns.scatterplot, alpha=0.5)
g.plot_marginals(sns.histplot, kde=True)
g.set_axis_labels('Variable X', 'Variable Y')