Skip to content
Search
Generic filters
Exact matches only

3 Deep Learning Algorithms in under 5 minutes — Part 2 (Deep Sequential Models) | by Thushan Ganegedara | Sep, 2020

Simple recurrent neural networks (referred to also as RNNs) are to time-series problems as CNNs to computer vision. In a time-series problem, you feed a sequence of values to a model and ask it to predict the next n values of that sequence. RNNs go through each value of the sequence while building up memory of what it has seen which helps it to predict what the future will look like. (Learn more about RNNs [1] [2])

Analogy: New and improved secret train

I’ve played this game as a kid and you might know this by a different name. Kids are asked to stand in a line and you whisper the first kid in the line a random word. The kid should add an appropriate word to that word and whisper that to the next kid, and so on. By the time the message reaches the last kid, you should have an exciting story brewed up by kid’s imagination.

How a RNN works in a sentiment analysis problem. It goes from one word to the other while producing a state (red ball). Finally there’s a fully-connected network (FCN) that takes the last state and produces a label (positive /negative/neutral).

Applications

  • Time series prediction (e.g. weather / sales predictions)
  • Sentiment analysis — Given a movie/product review ( a sequence of words), predict if that’s negative/positive/neutral.
  • Language modelling — Given a part of a story, imagine the rest of the story / Generate code from descriptions

LSTM is the cool new kid in RNN-ville. LSTM is a complicated beast than RNNs and able to remember things longer than RNNs. LSTMs would also go through each value of the sequence while building up memory of what it has seen which helps it to predict what the future will look like. But remember RNNs had a single state (that represented memory)? LSTMs have two states (one long-term and one short-term), thus the name LSTMs. (Learn more: LSTMs)

Analogy: Fast-food chain

All this explaining is making me hungry! So let’s go to a fast-food chain. This is a literal chain because, if you order a meal, one shop makes the burger, the other chips, and so on. In this fast-food drive through, you go to the first shop and say the following.

The fast food chain. There are three people; green (input), red (cell state) and blue (output state). They also can discard certain information from the inputs you provide as well as discard information while processing them internally.
  • an output state h(t-1) (the blue person from the previous shop) and
  • a cell state c(t-1) (the red person from the previous shop).
  • a cell state c(t) (the red person in this shop)
  • A forget gate (part of red person) — discard information that’s not useful in the previous cell state
  • An output gate (part of blue person)- Discard information that’s not useful from cell state, to generate the output state

An LSTM maintains two states (an output — short-term state and a cell state — long-term state) and uses gating to discard information when computing final and interim outputs.

Here’s what an LSTM would look like.

LSTM in real world. You can see that it’s a complex labyrinth of connections. So don’t try to understand how they all connect at this point. Understand the various entities involved. The red dashed ball represents an interim output computed by the LSTM cell

Applications

Phew! LSTMs really took a toll on time I got left. GRU is a successor to LSTMs that simplifies the mechanics of LSTMs future without jeopardising performance too much. (Learn more: GRUs [1] [2])

Analogy: Fast-food chain v2.0

Not to be a food critic, but the fast-food chain we saw earlier looks pretty inefficient. Is there a way to make it more efficient? Here’s one way.

The new and improved fast-food chain. We no longer got the red person. This will cause less delays and help to get your good quicker.
  1. There’s only an input gate and an output gate (i.e. no forget gate)
GRU in real world. Though it’s not as complex as LSTMs still can be quite a bit to swallow. So don’t try to understand how they all connect at this point. Understand the various entities involved. The red dashed ball represents an interim output computed by the GRU cell.

Applications:

We looked at simple RNNs, LSTMs and GRUs. Here are the main take-aways.

  • LSTMs — Quite complicated. Has two states; a cell state (long-term) and an output state (short-term). It also has a gating mechanism to control how much information flows through the model.
  • GRUs — An compromise between RNNs and LSTMs. Has only one output state but still has the gating mechanism.

Part 1: Feedforward models