智能论文笔记

Revisiting a kNN-based Image Classification System with High-capacity Storage

Kengo Nakata , Youyang Ng , Daisuke Miyashita , Asuka Maki , Yu-Chieh Lin , Jun Deguchi

分类：计算机视觉

2022-04-03

在使用深神经网络的现有图像分类系统中，图像分类所需的知识隐含在模型参数中。如果用户想更新此知识，则需要微调模型参数。此外，用户无法验证推理结果的有效性或评估知识对结果的贡献。在本文中，我们研究了一个存储图像分类知识的系统，例如图像特征图，标签和原始图像，而不是模型参数，而是在外部高容量存储中。我们的系统在对输入图像进行分类时，像数据库一样引用存储。为了增加知识，我们的系统会更新数据库，而不是微调模型参数，从而避免了在增量学习方案中灾难性的遗忘。我们重新访问一个KNN（K-Nearest邻居）分类器，并在我们的系统中使用它。通过分析KNN算法引用的邻域样本，我们可以解释过去如何将知识用于推理结果。我们的系统在ImageNet数据集上实现了79.8％的TOP-1精度，而在预处理后无需微调模型参数，而在任务增量学习设置中，在Split CIFAR-100数据集中获得了90.8％的精度。

translated by 谷歌翻译

A soft nearest-neighbor framework for continual semi-supervised learning

Zhiqi Kang , Enrico Fini , Moin Nabi , Elisa Ricci , Karteek Alahari

分类：计算机视觉 | 机器学习

2022-12-09

Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual semi-supervised learning -- a setting where not all the data samples are labeled. An underlying issue in this scenario is the model forgetting representations of unlabeled data and overfitting the labeled ones. We leverage the power of nearest-neighbor classifiers to non-linearly partition the feature space and learn a strong representation for the current task, as well as distill relevant information from previous tasks. We perform a thorough experimental evaluation and show that our method outperforms all the existing approaches by large margins, setting a strong state of the art on the continual semi-supervised learning paradigm. For example, on CIFAR100 we surpass several others even when using at least 30 times less supervision (0.8% vs. 25% of annotations).

translated by 谷歌翻译

Continual Contrastive Learning for Image Classification

Zhiwei Lin , Yongtao Wang , Hongxiang Lin

分类：计算机视觉 | 人工智能

2021-07-05

对于人工学习系统，随着时间的流逝，从数据流进行持续学习至关重要。对监督持续学习的新兴研究取得了长足的进步，而无监督学习中灾难性遗忘的研究仍然是空白的。在无监督的学习方法中，自居民学习方法在视觉表示上显示出巨大的潜力，而无需大规模标记的数据。为了改善自我监督学习的视觉表示，需要更大和更多的数据。在现实世界中，始终生成未标记的数据。这种情况为学习自我监督方法提供了巨大的优势。但是，在当前的范式中，将先前的数据和当前数据包装在一起并再次培训是浪费时间和资源。因此，迫切需要一种持续的自我监督学习方法。在本文中，我们首次尝试通过提出彩排方法来实现连续的对比自我监督学习，从而使以前的数据保持了一些典范。我们通过模仿旧网络通过一组保存的示例，通过模仿旧网络推断出的相似性分数分布，而不是将保存的示例与当前数据集结合到当前的培训数据集，而是利用自我监督的知识蒸馏将对比度信息传输到当前网络。此外，我们建立一个额外的样本队列，以帮助网络区分以前的数据和当前数据并在学习自己的功能表示时防止相互干扰。实验结果表明，我们的方法在CIFAR100和Imagenet-Sub上的性能很好。与基线的学习任务无需采用任何技术，我们将图像分类在CIFAR100上提高了1.60％，Imagenet-Sub上的2.86％，在10个增量步骤设置下对Imagenet-Full进行1.29％。

translated by 谷歌翻译

Continually Learning Self-Supervised Representations with Projected Functional Regularization

Alex Gomez-Villa , Bartlomiej Twardowski , Lu Yu , Andrew D. Bagdanov , Joost van de Weijer

分类：计算机视觉

2021-12-30

最近的自我监督学习方法能够学习高质量的图像表示，并通过监督方法关闭差距。但是，这些方法无法逐步获取新的知识 - 事实上，它们实际上主要仅用为具有IID数据的预训练阶段。在这项工作中，我们在没有额外的记忆或重放的情况下调查持续学习制度的自我监督方法。为防止忘记以前的知识，我们提出了功能正规化的使用。我们将表明，朴素的功能正则化，也称为特征蒸馏，导致可塑性的低可塑性，因此严重限制了连续的学习性能。为了解决这个问题，我们提出了预测的功能正则化，其中一个单独的投影网络确保新学习的特征空间保留了先前的特征空间的信息，同时允许学习新功能。这使我们可以防止在保持学习者的可塑性时忘记。针对应用于自我监督的其他增量学习方法的评估表明我们的方法在不同场景和多个数据集中获得竞争性能。

translated by 谷歌翻译

How Well Does Self-Supervised Pre-Training Perform with Streaming Data?

Dapeng Hu , Shipeng Yan , Qizhengqiu Lu , Lanqing Hong , Hailin Hu , Yifan Zhang , Zhenguo Li , Xinchao Wang , Jiashi Feng

分类：机器学习 | 计算机视觉

2021-04-25

先前的关于自我监督预训练的研究重点是联合培训方案，在该场景中，假定大量未标记的数据一次性地将其作为输入，只有那时才受过培训的学习者。不幸的是，这种问题设置通常是不切实际的，即使不是不可行的，因为许多现实世界的任务依赖于顺序学习，例如，数据是以流方式分散或收集的。在本文中，我们对通过流数据进行了对自我监督的预训练进行了首次彻底而专门的研究，旨在阐明这种被忽视的设置下的模型行为。具体而言，我们在来自ImageNet和域内的四类预训练流数据数据上预先培训超过500个模型，并在三种类型的下游任务和12个不同的下游数据集上对其进行评估。我们的研究表明，以某种方式超出了我们的期望，通过简单的数据重播或参数正则化，顺序的自我监督预训练的预训练证明是联合预训练的有效替代方法，因为前者的性能主要与这些培训相同后者。此外，灾难性的遗忘是顺序监督学习中的一个常见问题，在顺序的自学学习（SSL）中得到了极大的缓解，这是通过我们对损失景观中最小值的表示和敏锐度的全面经验分析来很好地证明的。因此，我们的发现表明，在实践中，对于SSL，可以主要通过顺序学习来代替繁琐的联合培训，这反过来又可以更广泛的潜在应用方案。

translated by 谷歌翻译

e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

Wonyoung Shin , Jonghun Park , Taekang Woo , Yongwoo Cho , Kwangjin Oh , Hwanjun Song

分类：机器学习 | 计算机视觉

2022-07-01

了解产品内容的视觉和语言表示对于电子商务中的搜索和推荐应用程序至关重要。作为在线购物平台的骨干，受到代表学习研究的最新成功的启发，我们提出了一个对比度学习框架，该框架使用未标记的原始产品文本和图像来对齐语言和视觉模型。我们介绍了我们用来培训大规模代表性学习模型的技术，并共享解决特定领域挑战的解决方案。我们使用预先训练的模型作为多种下游任务的骨干进行研究，包括类别分类，属性提取，产品匹配，产品聚类和成人产品识别。实验结果表明，我们所提出的方法在每个下游任务中均优于单个模态和多种方式的基线。

translated by 谷歌翻译

PIVOT: Prompting for Video Continual Learning

Andrés Villa , Juan León Alcázar , Motasem Alfarra , Kumail Alhamoud , Julio Hurtado , Fabian Caba Heilbron , Alvaro Soto , Bernard Ghanem

分类：计算机视觉 | 人工智能

2022-12-09

Modern machine learning pipelines are limited due to data availability, storage quotas, privacy regulations, and expensive annotation processes. These constraints make it difficult or impossible to maintain a large-scale model trained on growing annotation sets. Continual learning directly approaches this problem, with the ultimate goal of devising methods where a neural network effectively learns relevant patterns for new (unseen) classes without significantly altering its performance on previously learned ones. In this paper, we address the problem of continual learning for video data. We introduce PIVOT, a novel method that leverages the extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.

translated by 谷歌翻译

Dissecting Continual Learning a Structural and Data Analysis

Francesco Pelosin

分类：计算机视觉 | 机器学习

2023-01-03

Continual Learning (CL) is a field dedicated to devise algorithms able to achieve lifelong learning. Overcoming the knowledge disruption of previously acquired concepts, a drawback affecting deep learning models and that goes by the name of catastrophic forgetting, is a hard challenge. Currently, deep learning methods can attain impressive results when the data modeled does not undergo a considerable distributional shift in subsequent learning sessions, but whenever we expose such systems to this incremental setting, performance drop very quickly. Overcoming this limitation is fundamental as it would allow us to build truly intelligent systems showing stability and plasticity. Secondly, it would allow us to overcome the onerous limitation of retraining these architectures from scratch with the new updated data. In this thesis, we tackle the problem from multiple directions. In a first study, we show that in rehearsal-based techniques (systems that use memory buffer), the quantity of data stored in the rehearsal buffer is a more important factor over the quality of the data. Secondly, we propose one of the early works of incremental learning on ViTs architectures, comparing functional, weight and attention regularization approaches and propose effective novel a novel asymmetric loss. At the end we conclude with a study on pretraining and how it affects the performance in Continual Learning, raising some questions about the effective progression of the field. We then conclude with some future directions and closing remarks.

translated by 谷歌翻译

Don't Stop Learning: Towards Continual Learning for the CLIP Model

Yuxuan Ding , Lingqiao Liu , Chunna Tian , Jingyuan Yang , Haoxuan Ding

分类：计算机视觉

2022-07-19

对比性语言图像预训练（剪辑）模型是最近提出的大规模训练模型，它吸引了计算机视觉社区越来越多的关注。从其巨大的图像文本训练集中受益，剪辑模型在零拍学习和图像文本匹配方面学习了出色的功能。为了提高剪辑在某些目标视觉概念上的识别性能，通常希望通过在额外的培训数据上微调一些利益来进一步更新剪辑模型。但是，此操作引起了一个重要的关注：更新会损害零镜头学习或剪辑的图像文本匹配能力，即灾难性的遗忘问题吗？如果是，是否可以适应现有的持续学习算法来减轻灾难性遗忘的风险？为了回答这些问题，这项工作对剪辑模型的持续学习问题进行了系统性研究。我们构建评估协议，以衡量微调更新的影响，并探索不同的方法来升级现有的持续学习方法，以减轻剪辑模型的遗忘问题。我们的研究揭示了剪辑持续学习问题的特殊挑战，并为进一步的研究奠定了基础。此外，我们提出了一种新算法，被称为学习，而无需通过重播词汇（VR-LWF）忘记，该算法显示出减轻剪辑模型遗忘问题的确切有效性。

translated by 谷歌翻译

S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning

Yabin Wang , Zhiwu Huang , Xiaopeng Hong

分类：计算机视觉 | 机器学习

2022-07-26

最新的深层神经网络仍在努力解决持续学习中的灾难性遗忘问题。在本文中，我们提出了一种简单的范式（称为S宣传）和两种具体方法，以高度降低最典型的连续学习场景之一，即域增量学习（DIL）。范式的关键思想是通过预先训练的变压器独立学习提示，以避免使用常规方法中通常出现的示例。这导致了双赢游戏，提示可以为每个域获得最佳状态。跨域的独立提示仅请求一个单一的跨凝结损失，以进行训练，而一个简单的K-NN操作作为推理的域标识符。学习范式得出了图像及时的学习方法和全新的语言图像及时学习方法。拥有出色的可伸缩性（每个域的参数增加0.03％），我们最好的方法在三个标准的最先进的无典范方法上实现了显着的相对改进（平均约30％）当他们使用示例时，DIL任务甚至相对超过了他们的最好的任务。

translated by 谷歌翻译

FLUID: A Unified Evaluation Framework for Flexible Sequential Data

Matthew Wallingford , Aditya Kusupati , Keivan Alizadeh-Vahid , Aaron Walsman , Aniruddha Kembhavi , Ali Farhadi

分类：计算机视觉 | 机器学习

2020-07-06

现代ML方法在培训数据是IID，大规模和良好标记的时候Excel。在不太理想的条件下学习仍然是一个开放的挑战。在不利条件下，几次射击，持续的，转移和代表学习的子场在学习中取得了很大的进步;通过方法和见解，每个都提供了独特的优势。这些方法解决了不同的挑战，例如依次到达的数据或稀缺的训练示例，然而，在部署之前，ML系统将面临困难的条件。因此，需要可以处理实际设置中许多学习挑战的一般ML系统。为了促进一般ML方法目标的研究，我们介绍了一个新的统一评估框架 - 流体（灵活的顺序数据）。流体集成了几次拍摄，持续的，转移和表示学习的目标，同时能够比较和整合这些子场的技术。在流体中，学习者面临数据流，并且必须在选择如何更新自身时进行顺序预测，快速调整到新颖的类别，并处理更改的数据分布;虽然会计计算总额。我们对广泛的方法进行实验，这些方法阐述了新的洞察当前解决方案的优缺点并表明解决了新的研究问题。作为更一般方法的起点，我们展示了两种新的基线，其在流体上优于其他评估的方法。项目页面：https：//raivn.cs.washington.edu/projects/fluid/。

translated by 谷歌翻译

Patching open-vocabulary models by interpolating weights

Gabriel Ilharco , Mitchell Wortsman , Samir Yitzhak Gadre , Shuran Song , Hannaneh Hajishirzi , Simon Kornblith , Ali Farhadi , Ludwig Schmidt

分类：计算机视觉 | 机器学习

2022-08-10

在许多图像分类任务中，诸如夹子之类的开放式摄影模型具有高精度。但是，在某些设置中，他们的零拍摄性能远非最佳。我们研究模型修补程序，目的是提高对特定任务的准确性，而不会在表现已经足够的任务上降低准确性。为了实现这一目标，我们引入了油漆，这是一种修补方法，该方法在微调之前使用模型的权重与要修补的任务进行微调后的权重。在零机夹的性能差的九个任务上，油漆可将精度提高15至60个百分点，同时将ImageNet上的精度保留在零拍模型的一个百分点之内。油漆还允许在多个任务上修补单个模型，并通过模型刻度进行改进。此外，我们确定了广泛转移的案例，即使任务不相交，对一个任务进行修补也会提高其他任务的准确性。最后，我们研究了超出常见基准的应用程序，例如计数或减少印刷攻击对剪辑的影响。我们的发现表明，可以扩展一组任务集，开放式摄影模型可实现高精度，而无需从头开始重新训练它们。

translated by 谷歌翻译

Saliency-Augmented Memory Completion for Continual Learning

Guangji Bai , Chen Ling , Yuyang Gao , Liang Zhao

分类：机器学习

2022-12-26

Continual Learning is considered a key step toward next-generation Artificial Intelligence. Among various methods, replay-based approaches that maintain and replay a small episodic memory of previous samples are one of the most successful strategies against catastrophic forgetting. However, since forgetting is inevitable given bounded memory and unbounded tasks, how to forget is a problem continual learning must address. Therefore, beyond simply avoiding catastrophic forgetting, an under-explored issue is how to reasonably forget while ensuring the merits of human memory, including 1. storage efficiency, 2. generalizability, and 3. some interpretability. To achieve these simultaneously, our paper proposes a new saliency-augmented memory completion framework for continual learning, inspired by recent discoveries in memory completion separation in cognitive neuroscience. Specifically, we innovatively propose to store the part of the image most important to the tasks in episodic memory by saliency map extraction and memory encoding. When learning new tasks, previous data from memory are inpainted by an adaptive data generation module, which is inspired by how humans complete episodic memory. The module's parameters are shared across all tasks and it can be jointly trained with a continual learning classifier as bilevel optimization. Extensive experiments on several continual learning and image classification benchmarks demonstrate the proposed method's effectiveness and efficiency.

translated by 谷歌翻译

Learning Representations for New Sound Classes With Continual Self-Supervised Learning

Zhepei Wang , Cem Subakan , Xilin Jiang , Junkai Wu , Efthymios Tzinis , Mirco Ravanelli , Paris Smaragdis

分类：机器学习

2022-05-15

In this paper, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framework where the model can be updated without relying on labeled data. For this purpose, we propose adopting representation learning, where an encoder is trained using unlabeled data. This learning framework enables the study and implementation of a practically relevant use case where only a small amount of the labels is available in a continual learning context. We also make the empirical observation that a similarity-based representation learning method within this framework is robust to forgetting even if no explicit mechanism against forgetting is employed. We show that this approach obtains similar performance compared to several distillation-based continual learning methods when employed on self-supervised representation learning methods.

translated by 谷歌翻译

Continual Semi-Supervised Learning through Contrastive Interpolation Consistency

Matteo Boschini , Pietro Buzzega , Lorenzo Bonicelli , Angelo Porrello , Simone Calderara

分类： (统计)机器学习 | 机器学习

2021-08-14

持续学习（CL）调查如何在无需遗忘的情况下培训在任务流上的深网络。文献中提出的CL设置假设每个传入示例都与地面真实注释配对。然而，这与许多真实应用的冲突这项工作探讨了持续的半监督学习（CSSL）：这里只有一小部分标记的输入示例显示给学习者。我们评估当前CL方法（例如：EWC，LWF，Icarl，ER，GDumb，Der）在这部小说和具有挑战性的情况下，过度装箱纠缠忘记。随后，我们设计了一种新的CSSL方法，用于在学习时利用度量学习和一致性正则化来利用未标记的示例。我们展示我们的提案对监督越来越令人惊讶的是，我们的提案呈现出更高的恢复能力，甚至更令人惊讶地，仅依赖于25％的监督，以满足全面监督培训的优于营业型SOTA方法。

translated by 谷歌翻译

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark

分类：

2021-02-26

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.

translated by 谷歌翻译

Learning to Prompt for Continual Learning

Zifeng Wang , Zizhao Zhang , Chen-Yu Lee , Han Zhang , Ruoxi Sun , Xiaoqi Ren , Guolong Su , Vincent Perot , Jennifer Dy , Tomas Pfister

分类：机器学习 | 计算机视觉

2021-12-16

持续学习背后的主流范例一直在使模型参数调整到非静止数据分布，灾难性遗忘是中央挑战。典型方法在测试时间依赖排练缓冲区或已知的任务标识，以检索学到的知识和地址遗忘，而这项工作呈现了一个新的范例，用于持续学习，旨在训练更加简洁的内存系统而不在测试时间访问任务标识。我们的方法学会动态提示（L2P）预先训练的模型，以在不同的任务转换下顺序地学习任务。在我们提出的框架中，提示是小型可学习参数，这些参数在内存空间中保持。目标是优化提示，以指示模型预测并明确地管理任务不变和任务特定知识，同时保持模型可塑性。我们在流行的图像分类基准下进行全面的实验，具有不同挑战的持续学习环境，其中L2P始终如一地优于现有最先进的方法。令人惊讶的是，即使没有排练缓冲区，L2P即使没有排练缓冲，L2P也能实现竞争力的结果，并直接适用于具有挑战性的任务不可行的持续学习。源代码在https://github.com/google-Research/l2p中获得。

translated by 谷歌翻译

Florence: A New Foundation Model for Computer Vision

Lu Yuan , Dongdong Chen , Yi-Ling Chen , Noel Codella , Xiyang Dai , Jianfeng Gao , Houdong Hu , Xuedong Huang , Boxin Li , Chunyuan Li

分类：计算机视觉 | 人工智能 | 机器学习

2021-11-22

自动视觉解对我们多样化和开放的世界需要计算机视觉模型，以概括为特定任务的最小定制，类似于人类视力。计算机视觉基础型号培训，培训多样化，大型数据集，可以适应各种下游任务，对该任务来解决现实世界计算机视觉应用而言至关重要。虽然现有的视觉基础模型如剪辑，对齐和吴道2.0主要集中在映射图像和文本表示到跨模型共享表示，我们介绍了一台新的计算机视觉基础模型，佛罗伦萨，扩大粗糙的表示（现场）到精细（对象），从静态（图像）到动态（视频），以及从RGB到多个模态（标题，深度）。通过从Web级图像文本数据中纳入通用视觉语言表示，我们的佛罗伦萨模型可以很容易地适应各种计算机视觉任务，例如分类，检索，对象检测，VQA，图像标题，视频检索和动作识别。此外，佛罗伦萨在许多类型的转移学习中表现出出色的表现：全面采样的微调，线性探测，几次射击传输和用于新颖图像和物体的零拍摄传输。所有这些属性对于我们的视觉基础模型至关重要，以提供通用视觉任务。佛罗伦萨实现了新的最先进的导致44个代表性基准，例如Imagenet-1K零射击分类，最高1精度为83.74，最高5个精度为97.18，62.4地图上的Coco微调， 80.36在VQA上，动力学-600上的87.8。

translated by 谷歌翻译

Mitigating Forgetting in Online Continual Learning via Contrasting Semantically Distinct Augmentations

Sheng-Feng Yu , Wei-Chen Chiu

分类：计算机视觉

2022-11-10

Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one, under the constraints of having limited system size and computational cost, in which the main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones. With the specific focus on the class-incremental OCL scenario, i.e. OCL for classification, the recent advance incorporates the contrastive learning technique for learning more generalised feature representation to achieve the state-of-the-art performance but is still unable to fully resolve the catastrophic forgetting. In this paper, we follow the strategy of adopting contrastive learning but further introduce the semantically distinct augmentation technique, in which it leverages strong augmentation to generate more data samples, and we show that considering these samples semantically different from their original classes (thus being related to the out-of-distribution samples) in the contrastive learning mechanism contributes to alleviate forgetting and facilitate model stability. Moreover, in addition to contrastive learning, the typical classification mechanism and objective (i.e. softmax classifier and cross-entropy loss) are included in our model design for faster convergence and utilising the label information, but particularly equipped with a sampling strategy to tackle the tendency of favouring the new classes (i.e. model bias towards the recently learnt classes). Upon conducting extensive experiments on CIFAR-10, CIFAR-100, and Mini-Imagenet datasets, our proposed method is shown to achieve superior performance against various baselines.

translated by 谷歌翻译

Reproducible scaling laws for contrastive language-image learning

Mehdi Cherti , Romain Beaumont , Ross Wightman , Mitchell Wortsman , Gabriel Ilharco , Cade Gordon , Christoph Schuhmann , Ludwig Schmidt , Jenia Jitsev

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-14

Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale experiments are becoming increasingly expensive. However, previous work on scaling laws has primarily used private data \& models or focused on uni-modal language or vision learning. To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository. Our large-scale experiments involve models trained on up to two billion image-text pairs and identify power law scaling for multiple downstream tasks including zero-shot classification, retrieval, linear probing, and end-to-end fine-tuning. We find that the training distribution plays a key role in scaling laws as the OpenAI and OpenCLIP models exhibit different scaling behavior despite identical model architectures and similar training recipes. We open-source our evaluation workflow and all models, including the largest public CLIP models, to ensure reproducibility and make scaling laws research more accessible. Source code and instructions to reproduce this study will be available at https://github.com/LAION-AI/scaling-laws-openclip

translated by 谷歌翻译