Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
The problem of covariate-shift generalization has attracted intensive research attention. Previous stable learning algorithms employ sample reweighting schemes to decorrelate the covariates when there is no explicit domain information about training data. However, with finite samples, it is difficult to achieve the desirable weights that ensure perfect independence to get rid of the unstable variables. Besides, decorrelating within stable variables may bring about high variance of learned models because of the over-reduced effective sample size. A tremendous sample size is required for these algorithms to work. In this paper, with theoretical justification, we propose SVI (Sparse Variable Independence) for the covariate-shift generalization problem. We introduce sparsity constraint to compensate for the imperfectness of sample reweighting under the finite-sample setting in previous methods. Furthermore, we organically combine independence-based sample reweighting and sparsity-based variable selection in an iterative way to avoid decorrelating within stable variables, increasing the effective sample size to alleviate variance inflation. Experiments on both synthetic and real-world datasets demonstrate the improvement of covariate-shift generalization performance brought by SVI.
translated by 谷歌翻译
Product ranking is the core problem for revenue-maximizing online retailers. To design proper product ranking algorithms, various consumer choice models are proposed to characterize the consumers' behaviors when they are provided with a list of products. However, existing works assume that each consumer purchases at most one product or will keep viewing the product list after purchasing a product, which does not agree with the common practice in real scenarios. In this paper, we assume that each consumer can purchase multiple products at will. To model consumers' willingness to view and purchase, we set a random attention span and purchase budget, which determines the maximal amount of products that he/she views and purchases, respectively. Under this setting, we first design an optimal ranking policy when the online retailer can precisely model consumers' behaviors. Based on the policy, we further develop the Multiple-Purchase-with-Budget UCB (MPB-UCB) algorithms with $\~O(\sqrt{T})$ regret that estimate consumers' behaviors and maximize revenue simultaneously in online settings. Experiments on both synthetic and semi-synthetic datasets prove the effectiveness of the proposed algorithms.
translated by 谷歌翻译
几乎没有类似的课堂学习(FSCIL)旨在通过避免过度拟合和灾难性遗忘,从一些标记的样本中逐步学习新颖的课程。 FSCIL的当前协议是通过模仿一般类知识学习设置来构建的,而由于不同的数据配置,即新颖的类都在有限的数据状态下,因此并不完全合适。在本文中,我们通过保留第一个会话的可能性来重新考虑FSCIL对开放式假设的配置。为了为模型分配更好的近距离和开放式识别性能,双曲线相互学习模块(Hyper-RPL)建立在与双曲神经网络的相互点学习(RPL)上。此外,为了从有限标记的数据中学习新颖类别,我们将双曲线度量学习(超级现象)模块纳入基于蒸馏的框架中,以减轻过度拟合的问题,并更好地处理保存旧知识和旧知识之间的权衡问题。获得新知识。对三个基准数据集上提出的配置和模块的全面评估被执行,以验证有关三个评估指标的有效性。
translated by 谷歌翻译
在临床实践中,由于存储成本和隐私限制,通常需要进行分割网络在多个站点而不是合并集的顺序数据流上不断学习。但是,在持续学习过程中,现有方法通常在以前的网站上的网络记忆性或看不见的站点上的概括性中受到限制。本文旨在解决同步记忆性和概括性(SMG)的挑战性问题,并使用新颖的SMG学习框架同时提高以前和看不见的地点的性能。首先,我们提出一个同步梯度对准(SGA)目标,\ emph {不仅}通过对先前站点(称为重播缓冲区)的小型示例进行协调优化,从而促进网络的记忆力,\ emph {but emph {又增强了}的增强。通过促进模拟域移位下的现场不变性来概括。其次,为了简化SGA目标的优化,我们设计了一种双META算法,该算法将SGA目标近似为双元目标,以优化,而无需昂贵的计算开销。第三,为了有效的排练,我们全面考虑了重播缓冲区,以考虑额外的地点多样性以降低冗余。从六个机构中依次获得的前列腺MRI数据实验表明,我们的方法可以同时获得更高的记忆性和对最先进方法的可推广性。代码可在https://github.com/jingyzhang/smg-learning上找到。
translated by 谷歌翻译
近年来,Experts(MOE)的混合物已成为一种有前途的深度学习技术,可以将模型能力扩展为万亿多个参数,同时通过稀疏计算降低计算成本。虽然MoE开设了一个非常大的模型的新领域,但由于MOE的动态性质与系统的静态平行性/管道层之间的不匹配,因此其数以千计的GPU的实现受到限制。我们提出了Tutel,这是一种具有动态自适应并行性和管道的高度可扩展的堆栈设计和实现。 TUTEL在运行时提供自适应并行性切换和自适应管道,分别达到1.74倍和2.00倍的单MOE层加速度。我们还提出了一种用于MOE通信速度的新颖的二维层次结构算法,该算法的表现超过了2,048 GPU的先前最先前的最新时间。 Tutel汇总了所有技术,最终在16 GPU和2,048 GPU上分别提供了4.96倍和5.75倍的加速度,分别通过Fairseq:Meta的Facebook AI AI研究序列到序列工具Kit(Tutel(Tutel)(Tutel)(Tutel)(现在由Fairseq部分采用)。 Tutel源代码可在公共场所获得:https://github.com/microsoft/tutel。我们的评估表明,Tutel有效,有效地运行了一个基于现实的MOE模型,名为Swinv2-Moe,建立在Swin Transformer V2上,这是一种最先进的计算机视觉体系结构。在效率方面,Tutel加速了Swinv2-MoE,在FairSeq的训练和推理中分别达到1.55倍和2.11倍的速度。关于有效性,SWINV2-MOE模型在预训练和下游计算机视觉任务(例如可可对象检测)方面都比对应的密度密度模型都达到了卓越的精度,这表明Tutel准备对端到端现实世界模型训练的准备就绪和推理。 Swinv2-Moe在https://github.com/microsoft/swin-transformer中开放。
translated by 谷歌翻译
在现实世界中,具有挑战性的照明条件(低光,不渗透和过度暴露)不仅具有令人不愉快的视觉外观,而且还要污染计算机视觉任务。现有的光自适应方法通常分别处理每种条件。而且,其中大多数经常在原始图像上运行或过度简化相机图像信号处理(ISP)管道。通过将光转换管道分解为局部和全局ISP组件,我们提出了一个轻巧的快速照明自适应变压器(IAT),其中包括两个变压器式分支:本地估计分支和全球ISP分支。尽管本地分支估算与照明有关的像素的本地组件,但全局分支定义了可学习的Quires,可以参加整个图像以解码参数。我们的IAT还可以在各种光条件下同时进行对象检测和语义分割。我们已经在2个低级任务和3个高级任务上对多个现实世界数据集进行了广泛评估。我们的IAT只有90K参数和0.004S处理速度(不包括高级模块),其IAT始终达到了卓越的性能。代码可从https://github.com/cuiziteng/illumination-aptive-transformer获得
translated by 谷歌翻译
视频瞬间检索旨在找到给定自然语言查询描述的片刻的开始和结束时间戳(视频的一部分)。全面监督的方法需要完整的时间边界注释才能获得有希望的结果,这是昂贵的,因为注释者需要关注整个时刻。弱监督的方法仅依赖于配对的视频和查询,但性能相对较差。在本文中,我们仔细研究了注释过程,并提出了一种称为“ Glance注释”的新范式。该范式需要一个只有一个随机框架的时间戳,我们将其称为“目光”,在完全监督的对应物的时间边界内。我们认为这是有益的,因为与弱监督相比,添加了琐碎的成本,还提供了更大的潜力。在一眼注释设置下,我们提出了一种基于对比度学习的一眼注释(VIGA),称为视频力矩检索的方法。 Viga将输入视频切成片段,并在剪辑和查询之间形成对比,其中一眼指导的高斯分布重量被分配给所有夹子。我们的广泛实验表明,VIGA通过很大的边距较小的弱监督方法获得了更好的结果,甚至可以在某些情况下与完全监督的方法相媲美。
translated by 谷歌翻译
图像恢复算法(如超分辨率(SR)都是用于在劣化图像中的对象检测的必不可少的预处理模块。然而,大多数这些算法假设劣化是固定的并且已知先验。当真实劣化未知或与假设不同时,预处理模块和随后的高级任务(如对象检测)将失败。在这里,我们提出了一种新颖的框架,重新定位,以检测降低的低分辨率图像中的对象。 Restoredet利用下采样的降级作为自我监督信号的一种转换,以探索针对各种分辨率和其他降级条件的等分性表示。具体地,我们通过从一对原始和随机降级的图像编码和解码劣化转换来学习这种内在视觉结构。该框架可以进一步利用先进的SR架构的优点,该架构具有任意分辨率还原解码器以重建来自劣化的输入图像的原始对应关系。代表学习和对象检测都以端到端的培训方式共同优化。 Restoredet是一个通用框架,可以在任何主流对象检测架构上实现。广泛的实验表明,与在面对变体退化情况时,我们基于Centernet的框架已经实现了卓越的性能。我们的代码即将发布。
translated by 谷歌翻译