Skip to content
Search
Generic filters
Exact matches only

3 Easy Tricks to Get Started with Python (and Ditch Excel!)

So you’ve taken the mental leap and want to learn Python — that’s awesome! But where to start?

Let me guide you through tasks you already know how to do in Excel and how to do them in Python!

You’ll be using Pandas, the primary data analysis library in Python.

If you need to install Pandas, you can do this using pip or conda:

Pandas loads data into dataframes, which you can think of as Excel sheets.

Let’s load a dataset into a datagram. You can accomplish this by using the following code. We’ll also explore the first five rows by using the Pandas head method:

Generating our dataframe with Pandas. Source: Nik Piepenbreier

To follow along with the tutorial in Excel as well, the file can be found here.

There are a number of differences between Excel sheets and Pandas dataframes. Let’s take a quick look!

Comparing Excel worksheets and Pandas dataframes. Source: Nik Piepenbreier

Ok, now let’s get started!

Excel is a much more visual tool, which makes it easy to click a button that abstracts the function behind what you want to accomplish.

Sort by a Single Column

Sorting a single columns in Excel. Source: Nik Piepenbreier

For example, if you wanted to sort a column in Excel, you could simply:

  • Select the Data tab,
  • Highlight the column you want to sort, and
  • Click Sort A to Z or Sort Z to A.

To do this in Pandas, you would write:

Sorting by a single column in Pandas. Source: Nik Piepenbreier

Sort by Multiple Columns

Sometimes you might want to sort by multiple columns.

Sorting by multiple columns in Excel. Source: Nik Piepenbreier

Again, Excel makes this easy:

Click on the Data tab

  • Click on Sort
  • Enter the columns you want to sort by

To do this with Pandas, simply add a list to the “by” argument:

Sorting by multiple columns in Pandas. Source: Nik Piepenbreier

Filtering Columns

Filtering columns is an easy task in Excel! Simply click on the Data tab, then Filter. This creates arrows on all the column headers. When you click on these, simply fill in your selection:

Filtering columns in Excel. Source: Nik Piepenbreier

In Pandas, this is just as easy:

Filtering a column in Pandas. Source: Nik Piepenbreier

What’s great about this, you can also use comparison operators to select more than 10 units, or select based on multiple conditions:

  • Comparison: You can use > (greater than), < (less than), == (equal to), >= (greater than or equal to), and <= (less than or equal to),
  • ‘And’ Conditions: wrap each condition in brackets and separate the brackets with an ampersand (&)
  • ‘Or’ Conditions: wrap each condition in brackets and separate the brackets with a pipe (|)
Different types of filters in Pandas. Source: Nik Piepenbreier

Reordering Columns

Reordering columns is more of a visual cue for yourself.

To drag a reorder a column in Excel, you’d select the column by click its header, hover on the side until the cursor changes to a four-pointed arrow, then hold the SHIFT key and drag the column to a new position:

Moving a column in Excel. Source: Nik Piepenbreier

To accomplish the same thing in Pandas, you simply write in the order of columns you want to have into a double set of square brackets:

Re-ordering columns in Pandas. Source: Nik Piepenbreier

Pivot tables are one of those things that take you to the next level in Excel.

They allow you to easily summarize data quickly, without needing to rely on complex formulas.

Say you wanted to know the total value of sales in each region, you would:

  1. Select your data and click PivotTable on the Insert Tab and click OK to create your table.
  2. Drag Region into the Rows box, and Sales into the values tab. Excel automatically assumes we want to add up the values.
Creating a pivot table in Excel. Source: Nik Piepenbreier

To accomplish the same thing in Pandas, you could simply use the pivot_table function:

Creating a pivot table in Pandas. Source: Nik Piepenbreier

Let’s break this down a little bit:

  • We create a new variable called pivot.
  • We use the pandas pivot_table function. Our first argument is the dataframe df.
  • The index argument is ‘region’, which tells Pandas to create rows based on the ‘Region’ column.
  • We assign the argument values the field ‘Sales’, to let Pandas know we want to calculate the Sales column.
  • Finally, we use the aggfunc (‘aggregation function’) argument to tell Pandas to sum up the values. The default value is ‘mean’.

For a deep dive into the Pandas Pivot Table function, check out my other post on Pivot Tables in Pandas.

Show Pivot Table Values as Percentages

You may want to show your values as percentages of the column total. Again, Excel makes this very easy:

Calculating percentages in pivot tables in Excel. Source: Nik Piepenbreier
  • Simply right-click on a value,
  • Select Show Values as → % of column total

It’s just as easy to do this in Pandas. The easiest way is to create a new column for this:

Calculating pivot table percentages in Pandas. Source: Nik Piepenbreier

Let’s take a look at what’s happening:

  • We declare a new column by using pivot[‘% of column total’] — this assigns the name ‘% of column total’ to the column
  • We then divide each value in the row (pivot[‘sales’]) by the sum of the entire column (pivot[‘sales’]sum()) and multiply it by 100
Creating a chart in Excel. Source: Nik Piepenbreier

Now, what if you wanted to create some charts, this is incredibly easy in both Excel and in Python.

Let’s look at Excel first. If you want to plot this pivot table out as a column graph:

  • Place your pointer on one of the cells in the table,
  • Go to Insert → 2-D Column Chart

In Pandas, this is even easier. Pandas comes with one of Python’s top data visualization libraries functionality built-in. It’s as easy as adding .plot(kind = ‘bar’) to the end of your code:

Creating a chart in Pandas and Matplotlib. Source: Nik Piepenbreier

This might look a little daunting. Let’s break it down:

  • Import pyplot from matplotlib as plt
  • Re-write your earlier code (lines 2–3)
  • Now plot out the ‘Sales’ column and assign a kind = ‘bar’ as an argument
  • Finally, save the file by using the savefig method.

Note: if you’re using Jupyter notebooks, you can display the chart inline by writing this code after importing your library:

While you’re working with data, it may be helpful to format your values properly.

Formatting your data in Excel. Source: Nik Piepenbreier

For example, formatting currencies as dollars (etc.) or percentages as percentages.

To do this in Excel, you would:

  • Select the values you want to format,
  • Under the Home tab in the Number section, select your desired type

Pandas hides this away a little bit, which is something that can stump newcomers quite a bit.

An easy way to accomplish this would be using the apply() function. What this function does is take a series and apply another function to it. The function being applied would be the one formatting the values.

If you wanted to format the Sales column in the pivot dataframe, you could write:

Formatting values in Pandas. Source: Nik Piepenbreier

This might seem a little round-up (and it is), but it does give you a ton of flexibility in terms of how you style your data. Let’s take a closer look:

  • We define a function called format() that takes one argument (x)
  • The function only serves to return a formatted value using string formats in a particular format.
  • The ${:,.2f} part represents the actual formatting. The colon (:) is used to signify the beginning of the formatting, the comma (,) is used to signal comma separators for thousands, and the .2 signals two decimal places.
  • This notation can be a little tricky to get used to and I tend to google the style I want and copy and paste it.

Similarly, if you wanted to stay percentages, you could write:

Formatting percentages in Pandas. Source: Nik Piepenbreier

In this code, we created another function called format_percent() and went through similar steps as above.

Note: The ‘% of column total’ column has been amended to not multiply the value by 100. This is because the formatter does this automatically.

Thank you so much for reading this article! I hope you found it useful!

I have written a number of articles that explain how to take on common and advanced Excel tasks in Python, if you want to check them out!

If you’re ready to take the plunge into Python and Pandas, I have also written an eBook that provides a complete introduction to Python, Pandas, and Matplotlib and will have you up and running in no time!

You can find it by clicking this link.

Have a great day!