Contrastive Learning 再次学习
由于最近工作需要用到对比损失,这里对近期看到的一些资料进行整理。 之前我们也对对比学习做过一个初步学习,具体详见此笔记。
本文主要是对以下三篇资料的学习:
- SimCSE: Simple Contrastive Learning of Sentence Embeddings [2021, EMNLP]
- Prototypical Contrastive Learning of Unsupervised Representations [2021, ICLR]
- Tutotial: Contrastive Data and Learning for Natural Language Processing [2022, NAACL]
SimCSE
同时有监督式和无监督式对比学习,standard dropout used as noise:
- Unsupervised approach:
- 使用两次 dropout 并互为正样本。
- Unsupervised SimCSE essentially improves uniformity while avoiding degenerated alignment via dropout noise, thus improving the expressiveness of the representations.
- Supervised approach:
- 将标注样本加入对比学习。
- Takes alignment between semantically-related positive pairs and uniformity of the whole representation space to measure the quality of learned embeddings.
- 对比了其他的正样本生成方式:
- crop / word deletion / synonym replacement
- 使用 supervised 和 unsupervised CL:
- 目测这个 supervised 里面已经包含了 unsupervised?
- 在他自己的分析中,他们这种使用 dropout 的方法保证了 uniformity 和 alignment。
Prototypical Contrastive Learning
同时使用无监督对比表征学习和聚类。
- It encodes semantic structures discovered by clustering into the learned embedding space.
- Propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes.
- 包含一项 infoNCE,to retain the property of local smoothness and help bootstrap clustering.
- 也包含一项 cluster 相关的 loss,形式和 infoNCE 很像,只是把 augmented examples 替换成了 cluster centroids。这个其实可以看做某种半/无监督类型的对比,因为本质上是使用了 pseudo-labels。
Contrastive Data and Learning for NLP
这个是 NAACL 上的一个 Toturial,此处记录一些自认为重要的点:
- Contrastive Learning = Contrastive Data Creation + Contrastive Objective Optimization
- Noise Contrastive Estimation (NCE) 仅是对比学习中的一类 loss。
- Normalized Temperature-scaled Cross-Entropy (NT-Xent) 即为 InfoNCE with Cosine Similarity on Normalized Embeddings。
- Hard negative mining: find hard negative examples. We want to ancher-neg is greater than anchor-pos, at least by the margin.
- 更大的批选取通常有更好的表现。
- Two geometric forces on the hypersphere: alignment & uniformity
My Questions?
本次阅读这些文章主要是为了回答自己的两个问题:
- 在文本上如何进行正负样本的构建?(图像上的构建方式在上一次的学习笔记中已经提到不少。)
- crop / word deletion / synonym replacement / back-translation / cut-off / mixup
- 同时添加 supervised 和 unsupervised contrastive loss 这种方式有时表现不错,如何解释?
- 前两篇文章都有监督损失和非监督损失的结合,都取得了较好的表现。
- 或许是因为两种结合可以更好的保证学得表征的 alignment & uniformity。
Contrastive Learning 再次学习
https://blog.superui.cc/machine-learning/contrastive-learning-again/