尽管近年来从CT/MRI扫描中自动腹部多器官分割取得了很大进展,但由于缺乏各种临床方案的大规模基准,对模型的能力的全面评估受到阻碍。收集和标记3D医学数据的高成本的限制,迄今为止的大多数深度学习模型都由具有有限数量的感兴趣或样品器官的数据集驱动,这仍然限制了现代深层模型的力量提供各种方法的全面且公平的估计。为了减轻局限性,我们提出了AMO,这是一个大规模,多样的临床数据集,用于腹部器官分割。 AMOS提供了从多中心,多供应商,多模式,多相,多疾病患者收集的500 CT和100次MRI扫描,每个患者均具有15个腹部器官的体素级注释,提供了具有挑战性的例子,并提供了挑战性的例子和测试结果。在不同的目标和场景下研究健壮的分割算法。我们进一步基准了几种最先进的医疗细分模型,以评估此新挑战性数据集中现有方法的状态。我们已公开提供数据集,基准服务器和基线,并希望激发未来的研究。信息可以在https://amos22.grand-challenge.org上找到。
translated by 谷歌翻译
本文介绍了一个简单的MLP架构,CycleMLP,这是一种多功能骨干,用于视觉识别和密集的预测。与现代MLP架构相比,例如MLP混合器,RESMLP和GMLP,其架构与图像尺寸相关,因此在物体检测和分割中不可行,与现代方法相比具有两个优点。 (1)它可以应对各种图像尺寸。 (2)通过使用本地窗口,它可以实现对图像大小的线性计算复杂性。相比之下,由于完全空间连接,以前的MLP具有$ O(n ^ 2)$计算。我们构建一系列模型,超越现有的MLP,甚至最先进的基于变压器的模型,例如,使用较少的参数和拖鞋。我们扩展了类似MLP的模型的适用性,使它们成为密集预测任务的多功能骨干。 CycleMLP在对象检测,实例分割和语义细分上实现了竞争结果。特别是,Cyclemlp-tiny优于3.3%Miou在Ade20K数据集中的速度较少,具有较少的拖鞋。此外,CycleMLP还在Imagenet-C数据集上显示出优异的零射鲁布利。代码可以在https://github.com/shoufachen/cyclemlp获得。
translated by 谷歌翻译
As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
translated by 谷歌翻译
We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.
translated by 谷歌翻译
Three-dimensional (3D) ultrasound imaging technique has been applied for scoliosis assessment, but current assessment method only uses coronal projection image and cannot illustrate the 3D deformity and vertebra rotation. The vertebra detection is essential to reveal 3D spine information, but the detection task is challenging due to complex data and limited annotations. We propose VertMatch, a two-step framework to detect vertebral structures in 3D ultrasound volume by utilizing unlabeled data in semi-supervised manner. The first step is to detect the possible positions of structures on transverse slice globally, and then the local patches are cropped based on detected positions. The second step is to distinguish whether the patches contain real vertebral structures and screen the predicted positions from the first step. VertMatch develops three novel components for semi-supervised learning: for position detection in the first step, (1) anatomical prior is used to screen pseudo labels generated from confidence threshold method; (2) multi-slice consistency is used to utilize more unlabeled data by inputting multiple adjacent slices; (3) for patch identification in the second step, the categories are rebalanced in each batch to solve imbalance problem. Experimental results demonstrate that VertMatch can detect vertebra accurately in ultrasound volume and outperforms state-of-the-art methods. VertMatch is also validated in clinical application on forty ultrasound scans, and it can be a promising approach for 3D assessment of scoliosis.
translated by 谷歌翻译
To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning. However, such paradigm is computationally expensive. Humans have the amazing ability to learn new visual concepts from just one single exemplar. We hereby study a new T2V generation problem$\unicode{x2014}$One-Shot Video Generation, where only a single text-video pair is presented for training an open-domain T2V generator. Intuitively, we propose to adapt the T2I diffusion model pretrained on massive image data for T2V generation. We make two key observations: 1) T2I models are able to generate images that align well with the verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we propose Tune-A-Video with a tailored Sparse-Causal Attention, which generates videos from text prompts via an efficient one-shot tuning of pretrained T2I diffusion models. Tune-A-Video is capable of producing temporally-coherent videos over various applications such as change of subject or background, attribute editing, style transfer, demonstrating the versatility and effectiveness of our method.
translated by 谷歌翻译
We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite which is a novel sentence rewrite task. Compared with previous text style transfer tasks that can be mostly addressed by slight token- or phrase-level edits, polite language rewrite requires deep understanding and extensive sentence-level edits over an offensive and impolite sentence to deliver the same message euphemistically and politely, which is more challenging -- not only for NLP models but also for human annotators to rewrite with effort. To alleviate the human effort for efficient annotation, we first propose a novel annotation paradigm by a collaboration of human annotators and GPT-3.5 to annotate \textsc{PoliteRewrite}. The released dataset has 10K polite sentence rewrites annotated collaboratively by GPT-3.5 and human, which can be used as gold standard for training, validation and test; and 100K high-quality polite sentence rewrites by GPT-3.5 without human review. We wish this work (The dataset (10K+100K) will be released soon) could contribute to the research on more challenging sentence rewrite, and provoke more thought in future on resource annotation paradigm with the help of the large-scaled pretrained models.
translated by 谷歌翻译
Summary quality assessment metrics have two categories: reference-based and reference-free. Reference-based metrics are theoretically more accurate but are limited by the availability and quality of the human-written references, which are both difficulty to ensure. This inspires the development of reference-free metrics, which are independent from human-written references, in the past few years. However, existing reference-free metrics cannot be both zero-shot and accurate. In this paper, we propose a zero-shot but accurate reference-free approach in a sneaky way: feeding documents, based upon which summaries generated, as references into reference-based metrics. Experimental results show that this zero-shot approach can give us the best-performing reference-free metrics on nearly all aspects on several recently-released datasets, even beating reference-free metrics specifically trained for this task sometimes. We further investigate what reference-based metrics can benefit from such repurposing and whether our additional tweaks help.
translated by 谷歌翻译
Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part and a style restoration part. The style removal network removes the original image styles, and the style restoration network recovers image styles in a supervised manner. Meanwhile, to address the problems in current feature transformation methods, we propose decoupled instance normalization to decompose feature transformation into style whitening and restylization. It works quite well in ColoristaNet and can transfer image styles efficiently while keeping photorealism. To ensure temporal coherency, we also incorporate optical flow methods and ConvLSTM to embed contextual information. Experiments demonstrates that ColoristaNet can achieve better stylization effects when compared with state-of-the-art algorithms.
translated by 谷歌翻译
The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages. This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring. In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training Autoencoders as Cross-Modal Teachers (ACT). The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance. The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding. Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN. Codes will be released at https://github.com/RunpeiDong/ACT.
translated by 谷歌翻译