Skip to content
Search
Generic filters
Exact matches only

A gentle introduction to Python for Tableau Developers (Part 3) | by Elliott Stam | Aug, 2020

EXPLORING DATA WITH PYTHON

Translating familiar data exploration from Tableau to Python

Elliott Stam

Setting the stage

Sales by Sub-Category, colored by Profit Ratio

Step 1: deciding to look at our most impactful customers

Step 2: getting the sales per order for our customers

orders_per_customer_df = store_df
.groupby('Customer ID')
.agg({
'Order ID': pd.Series.nunique,
'Sales': 'sum'
})
.reset_index()
.rename(columns={'Order ID': 'order_count'})
.sort_values('Sales', ascending=False)

Step 3: calculate the total sales per order for each customer

orders_per_customer_df['avg_sales_per_order'] = 
orders_per_customer_df['Sales'] / orders_per_customer_df['order_count']
You can use the .head() function on a dataframe to get a preview like this!

Step 4: taking a look at the order count distribution

sns.distplot(orders_per_customer_df['order_count'])
Distribution of order counts for all customers

Step 4: filter our data to only consider the top 10% of customers

top_customers_df = 
store_df[store_df['Customer ID']
.isin(orders_per_customer_df.head(159)['Customer ID'])]

Step 5: preview the data for our top customers

subcat_top_customers_df = top_customers_df
.groupby('Sub-Category')
.agg({
'Sales': 'sum',
'Discount': 'mean',
'Profit': 'sum',
'Order ID': pd.Series.nunique
})
.rename(columns={
'Discount': 'Avg Discount',
'Order ID': 'Num Orders'
})
.sort_values('Profit', ascending=False)
.reset_index()

Step 6: visualize the results for the top 10% of customers

Step 7: understand how the visuals came together

fig, axs = plt.subplots(1, 2, figsize=(16, 8), sharey=True)sns.barplot(data=subcat_top_customers_df, 
x='Profit', y='Sub-Category', ax=axs[0],
palette=cm.RdBu(subcat_top_customers_df['Profit']), ci=False)
sns.barplot(data=subcat_top_customers_df,
x='Avg Discount', y='Sub-Category', ax=axs[1],
palette=cm.RdBu(subcat_top_customers_df['Avg Discount'] * 5.5), ci=False)
axs[0].tick_params(axis='both', which='both', length=0)
axs[1].tick_params(axis='both', which='both', length=0)
axs[1].set_ylabel('')
sns.despine(left=True, bottom=True)