Skip to content
Search
Generic filters
Exact matches only

A Beginner’s Guide to Plotting Your Data (Python & R)

Choosing the Right Graphs for Your Feature Variables

Christina

When analyzing your data for, say, determining the type of regression you wish to use, it is important to first figure what kind of data you actually have. In fact, we should all do a data exploration before proceeding with any form of analysis as it could save us a great deal of work later on. For example, we could have accidentally chosen the wrong regression model because there was an unforeseen interaction between variable one and variable five, and this could have been prevented if we took a closer look at our data beforehand. Data analysis depends heavily on your feature variable types, how they are distributed, how they are related to each other, etc.

Before proceeding with any data analysis, we must first make a distinction between quantitative and qualitative/categorical variables.

Quantitative variables are variables that can be measured, and they are expressed numerically. On the other hand, categorical variables are descriptive and typically take on values such as names or labels. Qualitative data can be grouped based on similar characteristics, thus being categorical.

Summary for Graph Selection

All the graphs mentioned can easily be plotted in Python with the Seaborn library (you can do this with Matplotlib as well if you wish), or in R with ggplot.

We must first start by loading our data into Python as a dataframe. Here, I am loading it from a csv file in the same directory.

Or load it into R as a dataframe.

Bar Chart

If you wish to visualize a single categorical variable, you should use a bar chart where the x-axis would be the variable and the y-axis will be a count axis.

In Python:

In R:

Grouped Bar Charts

If we have two categorical variables, we will proceed with a grouped bar chart. This is grouped as in it is grouped by that second categorical variable, usually, the one that has fewer categories.

In Python:

In R:

Histogram

Histograms are great for visualizing a quantitative variable. Here, we want to make sure we choose an appropriate number of bins to best represent the data. This number is easily selected based on past experience, playing around with the number of bins, or using an objective bin-selection formula such as Sturges Rule.

In Python:

In R:

Side-by-side Boxplots

When we have one quantitative and one qualitative variable, we will use a side-by-side boxplot to best showcase the data.

In Python:

In R:

Grouped Boxplots

Grouped boxplots are used when we have two categorical variables and a single quantitative one. Let the grouping be done on the categorical variable with the fewer groups.

In Python:

In R:

Scatterplot

Scatterplots are needed to visualize one quantitative variable against another. This is quite common to evaluate the type of relationship that exists between a quantitative feature variable / explanatory variable and a quantitative response variable, where the y-axis always holds the response variable.

In Python:

In R:

Scatterplot by Group

If we are trying to visualize two quantitative variables and one categorical one, we will use a scatterplot with its points grouped by the categorical variable.

In Python:

In R: