Skip to content
Search
Generic filters
Exact matches only

Airbnb Part A (Python-Visualization, Comparative Study, Regression)

In Seattle, travelers prefer to have a real bed or at least a pull-out sofa for Entire home, Private room and Hotel room, which explains why they are the most expensive types of beds in those categories. Surprisingly, guests’ expectations dropped dramatically in the Shared Room category where a majority of them are satisfied with the Airbed even though those could be equally if not more expensive in comparison with the other bed types across all the other room types ($500/night at maximum). One rationale for such a strange phenomenon is the abundance of seasonal festivals that focus on art, culture, music, etc. throughout the summer. (Bell et al., 2019) Participants are often 20s-30s with more flexible adaptability, thus, an expensive Airbed in a Shared Room during high season is better than having no accommodation. On the contrary, in Washington DC, Airbed is the top two choices across all the property types except for the Hotel room. Bed types ‘price ranges are indistinguishable between Seattle and Washington DC, while the Hotel Room’s Real bed in the latter is two times more expensive ($600/ night on average) and has more sparse distribution ($50- $1200/ night) than the former’s (around $200/ night on average)

(Image by author)

In general, the number of beds and number of bathrooms have positive correlations to each other, the higher the number the beds and bathrooms guests need, the higher the price. However, after hitting a certain number of beds (11) and bathrooms (6–6.5) the correlation becomes negative. Hence, we can assume that as there are rules that any guests who book for more than 12 beds or 6 bathrooms will receive group discounts. The relationship between beds and prices in Seattle is much more stable and evenly distributed than in Washington DC. One could believe that the local hoteliers and hosts are much more familiar when hosting a big quantity of guests in one check-in according to the aforementioned reasons for the city’s love of festivals. Equally important, Bathrooms leave heavier impacts on the price changes compared to beds. The distributions are negatively skewed to the left with and they are not well distributed as there are fluctuations within a small amount. For example, price drops from $600/night to $100/night between 4.5- 5 bathrooms before bouncing back to $1000/night at 6 bedrooms in Seattle.

(Image by author)

On the other hand, the correlations between bedrooms and accommodates with the price in Seattle and Washington DC are strictly positive, left-skewed. These price ranges are also akin to each other with $100-$2000/night for the Bedrooms and $100-$800/night for Accommodates category. Interestingly, the highest number of Bedrooms in Seattle is 8 while in Washington DC is 27, the highest number of Accommodates in Seattle is 25 while in Washington DC is 16. We can conclude that those who visit Seattle for summer festivals only need to take a shower before going out again rather than staying in so they do not mind sleeping on uncomfortable Airbeds in Shared Apartment. This phenomenon was indicated by the drastic increase in room price by $600 between 7 and 8 bedrooms.

(Image by author)

2) What effects do AirBNB homes properties have on prices in both west and east coasts

To answer this question, we will create heat maps to demonstrate the correlation between reviews and behavior features of Airbnb in each city. There are four steps

· Step 1:Turn all six categorical columns into dummy variables to evaluate them further on: ‘host_response_time’, ‘host_has_profile_pic’, ‘host_identity_verified’, ‘host_is_superhost’, ‘instant_bookable’, ‘cancellation_policy’

· Step 2: Merge the Dummy variables columns together

· Step 3: Just keep only the relevant factors that contributed to the behavior related columns from the groups. Some factors have a wide array of values inside, so we also need to take out some of the most excessive ones as well like ‘host_response_time_in a week’ because they only apply for a minority of the datasets.

· Step 4: Create the heat map for Seattle and Washington DC using the seaborn.heatmap() function

For Seattle’s Airbnb, the correlations are relatively lower, with the highest being 0.39 between ‘host_is_super_host” and “new_review_metric”. This means that the review score for your listing depends on whether or not the host is a super host in Seattle. On the other hand, Washington DC’s Airbnb, the correlations are relatively higher on positive and similar on the negative side, with the highest being 0.55 between ‘host_response_rate” and “host_acceptance_rate”. This means that if a host in Washington DC reads your message, there is a 55% chance that he/she will accept your Airbnb booking request.

(Image by author)
(Image by author)

3) Which AirBNB listing property is the most important one in reviews?- (Regression Analysis)

In order to access how factors can influence the final review score, we will carry out Regression analysis research thanks to the Random Forest method and then illustrate. The data preprocessing includes: Dropping irrelevant variables and rows with missing reviews values, Filling missing numeric columns with their means, Creating dummy for the categorical variables. X is all the independent factors while y has ‘new_review_metric’ as dependent variables and split these data tables with 0.75/0.25 ratio. For the regressors, we will use the (n_estimators=100, criterion=’mse’, random_state=42, n_jobs=-1. The final model accuracies and validation are as below. The results are very similar to each other, with low mean squared error and high R2, which indicate that there is just minor difference between the predicted and available dependent value and we can explain more than 90% of the observed variation can be explained by the model’s inputs.

(Image by author)

In the visualization for the degree of importance for the independent features to the reviews, we can see that there is a greater difference between the first and second highly important feature in Washington DC compared to Seattle. Additionally, visitors to the “Queen City” put more emphasis on the quality of the hosts themselves rather than the properties or rooms when booking Airbnb, on the contrary to the pattern in the Capital of America. The ranking of ‘Price’ is also lower in Seattle thanks to the moderate average rental price of the city as discussed above.

(Image by author)

*** To be continued in part B ***

Github: https://github.com/Lukastuong123/Python-Projects/tree/master/Project-%20Airbnb%20(Python-%20Interactive%20Map%2C%20Natural%20Language%20Processing%2C%20Comparative%20Study%2C%20Regression)

Reference & Sources:

Bell, J., Friedman, E., Selling, K., & Zelman, J. (2019, July 17). 46 Festivals to Check Out in the Seattle Area This Weekend. The Stranger. https://www.thestranger.com/things-to-do/2019/07/17/40789078/46-festivals-to-check-out-in-seattle-this-weekend

Carville, O. (2020, June 8). Airbnb sees a surge in vacation-rental demand. Los Angeles Times. https://www.latimes.com/business/story/2020-06-07/airbnb-coronavirus-demand

Grind, K. (2020, April 8). Airbnb’s Coronavirus Crisis: Burning Cash, Angry Hosts and an Uncertain Future. The Wall Street Journal. https://www.wsj.com/articles/airbnbs-coronavirus-crisis-burning-cash-angry-hosts-and-an-uncertain-future-11586365860

— — —

Dataset: http://insideairbnb.com/get-the-data.html

Inspired by: https://www.kaggle.com/xichenlou/seattle-and-boston-airbnb-data-comparison