Skip to content
Generic filters
Exact matches only

Amazon’s Complex Network of Hidden Data Science Systems | by Andre Ye | May, 2020

Even before Amazon gets you to buy a product, it has signed deals with authors, publishers, seller networks, and affiliates for prioritization on their platform. When Amazon implemented their powerful recommendation algorithm in nearly every part of the purchasing process, the company reported a 29% sales increase to $12.83 billion during its second fiscal year. The recommendation algorithm is one of the more obvious applications of data science, which not only appears on the site but in email rollouts, which have been proven in Amazon’s tests to be significantly more effective than on-site recommendations. Combining the algorithm with third parties paying hefty amounts of prioritization, Amazon has already established two very solid revenue boosters.

Amazon’s product prices are carefully optimized based on a predictive model that determines the best price such that users will increase their value and do not refuse to buy it on the basis of price. This is not nearly a simple ‘find the tipping point’ problem in which the price is scaled up a cent under a price the customer would be refused, because generally higher prices correlate with generally low purchase counts. A model determining the price of a product must not only consider a customer’s likelihood of purchasing it but also how its price, combined with the prices of many other products, will affect the customer’s future buying patterns.

Balancing the customer purchase likelihood and the impact of its future impact on the customer in determining the price of a product.

Before you decide on each product, you may take a quick glance at the product’s five-star rating but get the most information from user reviews. In general, any product’s review section is plagued by bias — people usually only write reviews for exceedingly great or exceedingly negative reviews, whereas most people just settle for a quick 5-star-scale rating. Because of the extremely polarized reviews and discrepancy between the 5-star-scale-rating and the reviews section, customers may simply decide not to buy the product at all.

Providing accurate information to customers is always in Amazon’s best interest. If Amazon attempts to rig reviews such that all products are ranked highly, short-term growth may boom, but the company’s brand image and customer trust will be spoilt. On the other hand, Amazon wants to let users know more about products that actually are great but have a sullied image because of biased reviews. Indeed, fake reviews are a lucrative business, in which fake-account-businesses can be hired to either write glowing or pessimistic, cold comments from hundreds of accounts to increase your product’s image and knock out competitors. Amazon has addressed this problem in part by showing verified purchasers and deploys models that determine the trustworthiness of an account and the helpfulness of a review, which determines the order it is shown in.

When you purchased your product, you were likely an Amazon Prime member. The subscription-based product has proven to be valuable to Amazon, becoming one of its major revenue streams. In order to advertise the attractive and high-value service, Amazon uses data analysis to determine and target their customer segments with specific ads. A key reason why Amazon has been able to rise so quickly is because it played to the digital era’s customer centricity craze — instead of using traditional one-size-fits-all, ‘spray-and-pray’ marketing approaches, Amazon tailors their message depending on how effective they think it will be for you.

Perhaps Amazon’s models determine that you love to read books (maybe with information from Goodreads, which it acquired), and point out that with Prime, you can read unlimited free books on Kindle. (Amazon also tracks and stores the text highlights users make on Kindle to make book recommendations.) If you are not a Prime member yet but routinely pay to have your product shipped within one day or a few hours, Amazon’s model may mention that with Prime, many shipping speeds are free. The company is constantly using A/B tests to adjust their advertising method to acquire new customers.

During your purchase, Amazon runs fraud detection algorithms for credit card transactions, which may take into account irregular purchasing behavior. These predictive models can detect system intrusions and hacking attempts, preventing data theft of items like like credit card data or employee ID.

After you have purchased a product, Amazon must find a way to transport that product through the product’s four stages: the warehouse, the cargo plane, the ‘last mile’ delivery truck transportation, and your house.

Hypothetical product route mapping.

Leveraging Amazon’s 90 warehouses, 50 cargo planes, and hundreds of thousands of delivery trucks, data science algorithms must plan the optimal logistical operation:

  • Suppose a cargo airplane loaded at 75% capacity, in which all of those items are one-day-delivery, meaning the airplane must leave in one hour to be on time. The remaining cargo is late and will arrive in eight hours. Does the plane leave or not? The data science algorithm must put a number on how much value the customers will lose if the one-day guarantee is broken and compare it to the travel costs to make a decision.
  • Amazon’s anticipatory shipping model uses your purchase data to predict which products you are likely to purchase, when you may buy them, and where you might need them. Combined with the preferences of those living near you, Amazon sends products it is sure you will buy to a distribution center near you so it can be ready when you need it.
  • How do you plan the optimal delivery truck route throughout a city, taking into account weather, heavy traffic, or an unforeseen event, as well as minimizing distribution costs and optimizing gas purchase costs and locations? At Amazon’s massive scale, it cannot sacrifice the hundreds of milliseconds it takes to consult with a Google Maps API; instead, it must create its own route optimization system based on traffic prediction. This is one of the most difficult tasks in data science (graph theory).
  • Many of Amazon’s customers settle for a longer, 4–5 day delivery window (either because they are not Prime or because they receive some form of compensation for the wait). Amazon deploys inventory forecasting models to determine how many copies of each product should be kept at any time in any warehouse such that when the 4–5 day delivery window nears its end, the product will still be available, while minimizing transportation costs and product decay.
  • Within the warehouse, Amazon is replacing workers (in product retrieval) with robots, which can work nonstop, fitting the company’s 24/7 business model, can work faster, and don’t need to be paid. Within each warehouse fulfillment center, Amazon’s warehouse robots chart the fastest route to reach the items assigned to it, being conscious of other robots’ routes. To create an army of intelligent robots that don’t naively follow a least-Euclidean-distance path but take into account the intentions of others, warehouse robots are trained using reinforcement learning procedures. This machine learning field seeks to continually train the model, using external environment feedback to make it better and more adaptive, even as it is in service. These warehouse robots significantly boost Amazon’s product efficiency.