Using pd.loc to change a subset of your data based on conditions.
Have you ever needed to edit each row in a table based on some conditions?
It’s a common task for data analysis projects. I used to search StackOverflow for a solution every time I needed to do this. Finally, I read the Pandas documentation and created a template that works every time I need to edit data row by row.
Here’s a look at how you can use the pandas .loc method to select a subset of your data and edit it if it meets a condition.
Note, before trying any of the code below, don’t forget to import pandas.
import pandas as pd
The Pandas documentation has this description for “.loc”:
Access a group of rows and columns (in a
.DataFrame) by label(s) or a boolean array.
For our case, we’re going to use the method like this:
Here are some sample data and an illustration to what the
column_indexer are referring to. (For an explanation of how to get that sample data, check out my other piece on easily getting tables from websites.)
df = pd.read_html('https://finance.yahoo.com/quote/TSLA/profile?p=TSLA')
To set a
row_indexer , you need to select one of the values in blue. These numbers in the leftmost column are the “row indexes”, which are used to identify each row. a
column_indexer , you need to select one of the values in red, which are the column names of the DataFrame.
If we wanted to select the text “Mr. Elon R. Musk”, we would need to do the following:
This gives us our desired output:
Now that we’ve got the basics down, we’re ready to start editing our tables!
Now, say we wanted to edit the “Title” value for some of the people in the table. Let’s change Elon Musk to “The Boss Man”, Zachary Kirkhorn to “The Money Man” and everyone else to “Another Dude”.
We need to go through each row in the table and check what the “Name” value is, then edit the “Title” value based on the change we specified.
To go through the data row by row, we’re going to use df.index, which selects the “row indexes” from the DataFrame. To see how that works, we can print the index from our sample table in a basic “for” loop:
for index in df.index:
This gives us the same values you can see on the leftmost column in the first screenshot.
To search and edit the right subset of data for every row in the DataFrame, we use the following code:
for index in df.index:
if df.loc[index,'Name']=='Mr. Elon R. Musk':
df.loc[index,'Title'] = 'The Boss Man'
elif df.loc[index,'Name']=='Mr. Zachary J. Kirkhorn':
df.loc[index,'Title'] = 'The Money Man'
df.loc[index,'Title'] = 'Another Dude'
This loop will go through every row of the DataFrame, check what the “Name” is, and then edit the “Title” based on which condition it meets. Our conditions are whether the “Name” of value each row is with “Mr. Elon R. Musk”, “Mr. Zachary J. Kirkhorn” and everyone else.
In this case, because the first row has “Mr. Elon R. Musk” as the “Name”, the script will change this first row’s “Title” value to “The Boss Man”. It knows which row to perform this change because we specified the row index using df.loc.
The resulting DataFrame looks like this:
We have successfully changed the “Title” values of each row based on the conditions we specified earlier!