Data visualization python cheat sheet


This section provides a few cheat sheets related with python, data wrangling and data visualization. Even with a perfect understanding of python and its libraries, it's almost impossible to remember the syntax of each function of the ecosystem. That's where cheatsheets are useful 🔥!

Matplotlib cheatsheet

Datacamp provides a cheatsheet describing the basics of matplotlib. Matplotlib is the most widely used library for datavisualization with python. You can read more about it on its dedicated page.

The following 2 cheatsheets from the official matplotlib repository are also very handy:

Matplotlib cheatsheet by matplotlib (page 1).

Matplotlib cheatsheet by matplotlib (page 2).

Seaborn cheatsheet

Datacamp provides a cheatsheet describing the basics of seaborn. Seaborn is also a widely used library for datavisualization with python. It allows to get very clean chart with less code. You can read more about it on its dedicated page.

Pandas cheatsheet

Datacamp provides a cheatsheet describing the basics of pandas. Pandas is mainly used for data manipulation with Python, but also offers some dataviz helpers.

Data visualization python cheat sheet

Timeseries

Data visualization python cheat sheet

Animation

Data visualization python cheat sheet

Source: Pixabay Free to share

Create beautiful, customizable plots easily

This cheat-sheet contains the elements of a plot you will most commonly need in a clear and organized fashion, with code and examples. Before you create any plot, it is recommended to scroll through this cheat-sheet to get a clear idea of how you are going to construct the visualization — after all, your plot is only as clear to the audience as it is in your mind.

All images created by author unless explicitly stated otherwise.

Steps to creating a visualization

  1. Prepare data accordingly with how many dimensions your plot has (distribution plot has one dimension, boxplot has two, etc.).
  2. Initiate the graph world (the ‘world’ upon which the plot rests) aesthetics, like style or palette.
  3. Create the plot.
  4. Customize the plot with titles, labels, and additional features.

Imports

The two most popular libraries for Python plotting — matplotlib and seaborn — should be loaded under their common aliases, plt and sns, for quick access to their functions and properties without needing to type out their complete lengthy names.

import matplotlib.pyplot as plt
import seaborn as sns

Initiating the Graph World

Creating a figure is necessary to specify the graph size.

plt.figure(figsize=(horizontal_length,vertical_length))

Seaborn styles can add grids and styles to the graph space. There are four styles in seaborn, which can be loaded using .set_style.

sns.set_style(name_of_style)

Data visualization python cheat sheet

Seaborn contexts are built-in pre-created packages of how you may want your plot to look, which affects things like the size of the labels, lines, and other elements of the plot, but not the overall style.

sns.set_context(name_of_context)

Data visualization python cheat sheet

All plots have whitegrid style. This was set separately.

Seaborn color palettes provide a set of colors for the chart to be colored by, which can give your plot the feel or context you want your audience to feel.

Seaborn has dozens of curated palettes. They are loaded with

sns.set_palette(name_of_palette)

Data visualization python cheat sheet

Four palettes are visualized in a kde plot.

You can access all the names of seaborn’s many palettes by deliberately setting an incorrect palette:

sns.set_palette('a string deliberately entered to get an error')

Data visualization python cheat sheet

Each palette can then be viewed with seaborn’s palpot (palette plot). The first item passed into seaborn’s color_palette builder is the name of the palette, and the second is the number of colors should be displayed. In real plots, seaborn automatically determines this number, but you can control it in the palpot.

sns.palplot(sns.color_palette('GnBu', 15))

Data visualization python cheat sheet

Seaborn color palettes can also be set manually by passing in hex codes.

sns.set_palette(['#ffffff', ...])

Creating the Plot

All plots in seaborn are created with sns.name_of_plot(x, y), depending on how many dimensions the plot is. A one-dimensional plot like a boxplot would need only an x, whereas a scatterplot would need an x and a y.

Distribution plots

Distribution plots usually univariate data — data with only one dimension, and shows where the concentration of data points along a number line is. Seaborn has adaptations for two-dimensional distribution plots, which shows two distribution plots simultaneously.

Data visualization python cheat sheet

  • The distplot plots a one-dimensional kdeplot with a histogram.
  • The rugplot plots ticks in lieu of a data point to show clusters.
  • The kdeplot when only one dimension of data is inputted plots the curve of the distribution. It will plot out a contour plot when two dimensions of data are given.
  • The jointplot plots a scatterplot with histograms on each side to display its respective dimension.
  • The pairplot, commonly used for exploratory data analysis (EDA), plots each dimension of data against each other, displaying a variable’s histogram when the variable is plotted against itself. This plot takes in a pandas DataFrame.

Quantitative and qualitative variable relationships

These plots combine two types of variables — quantitative (e.g. 13, 16.54, 94.004, continuous) and qualitative (e.g. red, blue, male, discrete).

Data visualization python cheat sheet

  • The stripplot plots the vertical data points horizontally so that multiple data points of the same value can be seen. This takes in a qualitative x and a quantitative y.
  • The swarmplot, similar to the stripplot, plots vertical data points horizontally, but in a more organized fashion. This eliminates overlapping data points in a structured way.
  • The violinplot plots a distribution on both sides of the quantitative axes and is seen to be a favorable alternative to the boxplot.
  • The boxplot plots a five-number summary of the data — minimum, 1st quartile (25th percentile), the median, 3rd quartile (75th percentile), and the maximum. Unfortunately, it does have a tendency to hide irregular distributions.
  • The boxenplot can show tails and a more accurate depiction of the distribution by expanding on top of the boxplot.
  • The standard barplot displays bars whose height corresponds to the value. The countplot expresses the same visualization but takes in only one variable and displays the number of items in each distinct value.
  • The pointplot tries to find one point (with appropriate error bars) that appropriately represents that array. This plot is great for comparing qualitative variables that are numerical.

Quantitative relationships

These plots show the relationship between two quantitative variables.

Data visualization python cheat sheet

  • The scatterplot plots two quantitative variables against each other.
  • The lineplot plots a quantitative variable along a time variable, which may be quantitative or date.

Statistical models

Statistical model visualizations take advantage of statistical models to visualize the nature of the data. Within many of the statistical model visualizations there are parameters to adjust the nature of the visualization.

Data visualization python cheat sheet

  • The residplot displays residuals of a linear regression (how far each data point was off the linear regression fit by Euclidean distance).
  • The lmplot displays a linear regression fit with confidence intervals on a scatterplot. This plot has several parameters (which can be viewed in length here) which can be used to adjust the nature of the plot. For instance, setting logistic=True will assume the y-variable is binary and will create a Logistic (sigmoid) regression model.

Customize the plot

Customizing the plot after it is created involves adding features on top of the plot to increase readability or information.

x and y labels can be added with two commands: plt.xlabel(‘X Label’) and plt.ylabel(‘Y Label’).

Title labels can be added with the command plt.title(‘Title’).

Data visualization python cheat sheet

Tick mark rotations can be added with plt.xticks(rotation=90) (and yticks for y-axis tick labels), where 90 can be substituted with any suitable rotation degree.

Axis value ranges can be specified with plt.xlim(lower_limit, upper_limit) and plt.ylim(lower_limit, upper_limit). All values displayed for that dimension will be in between the specified limits. These can also be used to set appropriate y-axis baselines for figures.

Adding a legend, if it is not included by default, can be added with plt.legend(). A parameter loc can be added to dictate where the legend should be. By default, matplotlib finds the best location such that it doesn’t overlap with data points.

Displaying the plot is as easy as plt.show(). Although not completely necessary, it gets rid of some of the text matplotlib and seaborn print out and finalizes the plot.

Be sure to bookmark this page for easy reference! If you enjoyed, you may also enjoy the Ultimate Data Mining and Machine Learning Cheat Sheet, a field where you can put your visualization skills to good use, and the Ultimate Data Manipulation & Cleaning Cheat Sheet — skills needed to transform data into a form ready for visualizing.