由于最近工作需要用到对比损失，这里对近期看到的一些资料进行整理。之前我们也对对比学习做过一个初步学习，具体详见此笔记。

本文主要是对以下三篇资料的学习：

SimCSE: Simple Contrastive Learning of Sentence Embeddings [2021, EMNLP]
Prototypical Contrastive Learning of Unsupervised Representations [2021, ICLR]
Tutotial: Contrastive Data and Learning for Natural Language Processing [2022, NAACL]

SimCSE

同时有监督式和无监督式对比学习，standard dropout used as noise：

Unsupervised approach:
- 使用两次 dropout 并互为正样本。
- Unsupervised SimCSE essentially improves uniformity while avoiding degenerated alignment via dropout noise, thus improving the expressiveness of the representations.
Supervised approach:
- 将标注样本加入对比学习。
- Takes alignment between semantically-related positive pairs and uniformity of the whole representation space to measure the quality of learned embeddings.
对比了其他的正样本生成方式：
- crop / word deletion / synonym replacement
使用 supervised 和 unsupervised CL：
- 目测这个 supervised 里面已经包含了 unsupervised？
- 在他自己的分析中，他们这种使用 dropout 的方法保证了 uniformity 和 alignment。

Prototypical Contrastive Learning

同时使用无监督对比表征学习和聚类。

It encodes semantic structures discovered by clustering into the learned embedding space.
Propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes.
- 包含一项 infoNCE，to retain the property of local smoothness and help bootstrap clustering.
- 也包含一项 cluster 相关的 loss，形式和 infoNCE 很像，只是把 augmented examples 替换成了 cluster centroids。这个其实可以看做某种半/无监督类型的对比，因为本质上是使用了 pseudo-labels。

这个是 NAACL 上的一个 Toturial，此处记录一些自认为重要的点：

Contrastive Learning = Contrastive Data Creation + Contrastive Objective Optimization
Noise Contrastive Estimation (NCE) 仅是对比学习中的一类 loss。
Normalized Temperature-scaled Cross-Entropy (NT-Xent) 即为 InfoNCE with Cosine Similarity on Normalized Embeddings。
Hard negative mining: find hard negative examples. We want to ancher-neg is greater than anchor-pos, at least by the margin.
更大的批选取通常有更好的表现。
Two geometric forces on the hypersphere: alignment & uniformity

本次阅读这些文章主要是为了回答自己的两个问题：

在文本上如何进行正负样本的构建？（图像上的构建方式在上一次的学习笔记中已经提到不少。）
- crop / word deletion / synonym replacement / back-translation / cut-off / mixup
同时添加 supervised 和 unsupervised contrastive loss 这种方式有时表现不错，如何解释？
- 前两篇文章都有监督损失和非监督损失的结合，都取得了较好的表现。
- 或许是因为两种结合可以更好的保证学得表征的 alignment & uniformity。

Machine Learning

#machine-learning #contrastive-learning #self-supervised-learning

Contrastive Learning 再次学习

https://blog.superui.cc/machine-learning/contrastive-learning-again/

作者

Superui

发布于

2023年2月21日

许可协议