智能论文笔记

Artificial Intelligence Security Competition (AISC)

Yinpeng Dong , Peng Chen , Senyou Deng , Lianji L , Yi Sun , Hanyu Zhao , Jiaxing Li , Yunteng Tan , Xinyu Liu , Yangyi Dong

分类：人工智能 | 计算机视觉 | 机器学习

2022-12-07

The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.

translated by 谷歌翻译

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

Yi Wang , Kunchang Li , Yizhuo Li , Yinan He , Bingkun Huang , Zhiyu Zhao , Hongjie Zhang , Jilan Xu , Yi Liu , Zun Wang

分类：计算机视觉

2022-12-06

The foundation models have recently shown excellent performance on a variety of downstream tasks in computer vision. However, most existing vision foundation models simply focus on image-level pretraining and adpation, which are limited for dynamic and complex video-level understanding tasks. To fill the gap, we present general video foundation models, InternVideo, by taking advantage of both generative and discriminative self-supervised video learning. Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications. Without bells and whistles, InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications. Especially, our methods can obtain 91.1% and 77.2% top-1 accuracy on the challenging Kinetics-400 and Something-Something V2 benchmarks, respectively. All of these results effectively show the generality of our InternVideo for video understanding. The code will be released at https://github.com/OpenGVLab/InternVideo .

translated by 谷歌翻译

A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models

Chao Yan , Yao Yan , Zhiyu Wan , Ziqi Zhang , Larsson Omberg , Justin Guinney , Sean D. Mooney , Bradley A. Malin

分类：机器学习 | 人工智能

2022-08-02

合成健康数据在共享数据以支持生物医学研究和创新医疗保健应用的发展时有可能减轻隐私问题。基于机器学习，尤其是生成对抗网络（GAN）方法的现代方法生成的现代方法继续发展并表现出巨大的潜力。然而，缺乏系统的评估框架来基准测试方法，并确定哪些方法最合适。在这项工作中，我们引入了一个可推广的基准测试框架，以评估综合健康数据的关键特征在实用性和隐私指标方面。我们将框架应用框架来评估来自两个大型学术医疗中心的电子健康记录（EHRS）数据的合成数据生成方法。结果表明，共享合成EHR数据存在公用事业私人关系权衡。结果进一步表明，在每个用例中，在所有标准上都没有明确的方法是最好的，这使得为什么需要在上下文中评估合成数据生成方法。

translated by 谷歌翻译

Less is More: Consistent Video Depth Estimation with Masked Frames Modeling

Yiran Wang , Zhiyu Pan , Xingyi Li , Zhiguo Cao , Ke Xian , Jianming Zhang

分类：计算机视觉

2022-07-31

时间一致性是视频深度估计的主要挑战。以前的作品基于额外的光流或相机姿势，这是耗时的。相比之下，我们获得了较少信息的一致性。由于固有的视频存在着沉重的时间冗余，因此可以从附近的框架中恢复缺失的框架。受此启发的启发，我们提出了框架屏蔽网络（FMNET），这是一种空间 - 速度变压器网络，可根据其相邻框架预测蒙版框架的深度。通过重建掩盖的时间特征，FMNET可以学习固有的框架间相关性，从而导致一致性。与先前的艺术相比，实验结果表明，我们的方法可以达到可比的空间准确性和更高的时间一致性，而没有任何其他信息。我们的工作为一致的视频深度估计提供了新的视角。

translated by 谷歌翻译

GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation

Yifan Zhang , Qijian Zhang , Zhiyu Zhu , Junhui Hou , Yixuan Yuan

分类：计算机视觉

2022-07-06

由遮挡，信号丢失或手动注释错误引起的3D边界框的地面真相注释的固有歧义可能会使训练过程中的深3D对象检测器混淆，从而使检测准确性恶化。但是，现有方法在某种程度上忽略了此类问题，并将标签视为确定性。在本文中，我们提出了GLENET，这是一个从条件变异自动编码器改编的生成标签不确定性估计框架，以建模典型的3D对象与其潜在的潜在基边界框之间具有潜在变量的一对一关系。 Glenet产生的标签不确定性是一个插件模块，可以方便地集成到现有的深3D检测器中，以构建概率检测器并监督本地化不确定性的学习。此外，我们提出了概率探测器中的不确定性质量估计量架构，以指导对IOU分支的培训，并预测了本地化不确定性。我们将提出的方法纳入各种流行的3D检测器中，并观察到它们的性能显着提高到Waymo Open DataSet和Kitti数据集中的当前最新技术。

translated by 谷歌翻译

CVFNet: Real-time 3D Object Detection by Learning Cross View Features

Jiaqi Gu , Zhiyu Xiang , Pan Zhao , Tingming Bai , Lingxuan Wang , Xijun Zhao , Zhiyuan Zhang

分类：计算机视觉

2022-03-13

近年来，由于深度学习技术的发展，LiDar Point Clouds的3D对象检测取得了长足的进步。尽管基于体素或基于点的方法在3D对象检测中很受欢迎，但它们通常涉及耗时的操作，例如有关体素的3D卷积或点之间的球查询，从而使所得网络不适合时间关键应用程序。另一方面，基于2D视图的方法具有较高的计算效率，而通常比基于体素或基于点的方法获得的性能低。在这项工作中，我们提出了一个基于实时视图的单阶段3D对象检测器，即CVFNET完成此任务。为了在苛刻的效率条件下加强跨视图的学习，我们的框架提取了不同视图的特征，并以有效的渐进式方式融合了它们。我们首先提出了一个新颖的点范围特征融合模块，该模块在多个阶段深入整合点和范围视图特征。然后，当将所获得的深点视图转换为鸟类视图时，特殊的切片柱旨在很好地维护3D几何形状。为了更好地平衡样品比率，提出了一个稀疏的柱子检测头，将检测集中在非空网上。我们对流行的Kitti和Nuscenes基准进行了实验，并以准确性和速度来实现最先进的性能。

translated by 谷歌翻译

PDE-Based Optimal Strategy for Unconstrained Online Learning

Zhiyu Zhang , Ashok Cutkosky , Ioannis Paschalidis

分类：机器学习

2022-01-19

不受限制的在线线性优化（OLO）是研究机器学习模型培训的实用问题。现有作品提出了许多基于潜在的算法，但总的来说，这些潜在功能的设计在很大程度上取决于猜测。为了简化此工作流程，我们提出了一个框架，该框架通过求解部分微分方程（PDE）来生成新的潜在功能。具体来说，当损失是1-lipschitz时，我们的框架会产生一种新颖的算法，并随时随地遗憾绑定$ c \ sqrt {t}+|| || u || \ sqrt {2t} [\ sqrt {\ sqrt {\ log（1+|| |||/c）}+2] $，其中$ c $是用户指定的常数，$ u $是任何比较器未知和无限的先验者。这样的界限实现了最佳的损失重格折衷，而没有不切实际的tuble俩。此外，匹配的下限表明，包括常量乘数$ \ sqrt {2} $在内的领先订单项很紧。据我们所知，提出的算法是第一个实现此类最佳性的算法。

translated by 谷歌翻译

Benign Adversarial Attack: Tricking Models for Goodness

Jitao Sang , Xian Zhao , Jiaming Zhang , Zhiyu Lin

分类：人工智能 | 计算机视觉

2021-07-26

尽管在许多领域都有成功的应用，但如今的机器学习模型遭受了臭名昭著的问题，例如脆弱性，对对抗性例子。除了陷入对抗攻击和防御之间的猫与小鼠游戏之外，本文还提供了替代观点来考虑对抗性示例，并探索我们是否可以在良性应用中利用它。我们首先将对抗性示例归因于使用非语义特征的人类模型差异。尽管在经典的机器学习机制中很大程度上被忽略了，但非语义功能具有三个有趣的特征，因为（1）模型独有，（2）对推理至关重要，以及（3）可利用的功能。受到这一点的启发，我们提出了良性的对抗性攻击的新想法，以利用三个方向的对抗性示例以善良：（1）对抗性图灵测试，（2）拒绝恶意模型应用，以及（3）对抗性数据扩增。每个方向都以动机详细说明，理由分析和原型应用来展示其潜力。

translated by 谷歌翻译

Conditional Predictive Behavior Planning with Inverse Reinforcement Learning for Human-like Autonomous Driving

Zhiyu Huang , Haochen Liu , Jingda Wu , Chen Lv

分类：机器人

2022-12-17

Making safe and human-like decisions is an essential capability of autonomous driving systems and learning-based behavior planning is a promising pathway toward this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. Concretely, a behavior generation module first produces a diverse set of candidate behaviors in the form of trajectory proposals. Then the proposed conditional motion prediction network is employed to forecast other agents' future trajectories conditioned on each trajectory proposal. Given the candidate plans and associated prediction results, we learn a scoring module to evaluate the plans using maximum entropy inverse reinforcement learning (IRL). We conduct comprehensive experiments to validate the proposed framework on a large-scale real-world urban driving dataset. The results reveal that the conditional prediction model is able to forecast multiple possible future trajectories given a candidate behavior and the prediction results are reactive to different plans. Moreover, the IRL-based scoring module can properly evaluate the trajectory proposals and select close-to-human ones. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Moreover, we find that the conditional prediction model can improve both prediction and planning performance compared to the non-conditional model, and learning the scoring module is critical to correctly evaluating the candidate plans to align with human drivers.

translated by 谷歌翻译

EBHI-Seg: A Novel Enteroscope Biopsy Histopathological Haematoxylin and Eosin Image Dataset for Image Segmentation Tasks

Liyu Shi , Xiaoyan Li , Weiming Hua , Haoyuan Chen , Jing Chen , Zizhen Fan , Minghe Gao , Yujie Jing , Guotao Lu , Deguo Ma

分类：计算机视觉

2022-12-01

Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients.

translated by 谷歌翻译