Skip to content
Search
Generic filters
Exact matches only

A Framework For Contrastive Self-Supervised Learning And Designing A New Approach | by William Falcon | Sep, 2020

The third way to characterize these methods is by the strategy they employ to extract representations. This is arguably where the “magic” happens in all of these methods and where they differ the most.

CPC “future” prediction task
From pixel space to latent space
CPC representation extraction
References [1], [2].
AMDIM representation extraction: AMDIM uses the same encoder to extract 3 sets of feature maps. Then it makes comparisons across feature maps.
Credit: Original Moco Authors. (Source)
Credit: Deepmind (source)
  1. BYOL does not use negative samples. But instead relies on the rolling weight updates as a way to give a contrastive signal to the training. However, a recent ablation discovered that this may not be necessary and that in fact adding batch-normalization is what keeps ensures the system does not generate trivial solutions.
Credit: Swav Authors (source)