该技术报告描述了surreyaudioteam22s Dcase 2022 ASC任务1,低复杂性声学场景分类(ASC)。该任务有两个规则,(a)ASC框架应具有最大128K参数,并且(b)每个推理最多应有3000万次多功能操作(MAC)。在本报告中,我们为ASC提供了遵循该任务规则的ASC的低复杂系统。
translated by 谷歌翻译
鉴于对计算资源的限制(例如,模型大小,跑步内存)的限制,不断学习新课程而没有灾难性遗忘是一个具有挑战性的问题。为了解决这个问题,我们提出了一种简单有效的持续学习方法。我们的方法通过测量按样本分类不确定性来选择培训的历史数据。具体而言,我们通过观察数据的分类概率如何与添加到分类器嵌入中的平行扰动相比如何波动来测量不确定性。通过这种方式,与将扰动添加到原始数据相比,计算成本可以大大降低。 DCASE 2019任务1和ESC-50数据集的实验结果表明,我们所提出的方法优于基准的分类准确性和计算效率的基线连续学习方法,表明我们的方法可以有效,可以逐步学习新的课程,而无需用于灾难性环境的灾难性遗忘问题声音分类。
translated by 谷歌翻译
心脏磁共振(CMR)图像的多类分割,将数据分离为具有已知结构和构型的解剖组分。最流行的基于CNN的方法是使用像素明智的损失函数优化的,对表征解剖结构的空间扩展特征一无所知。因此,尽管与地面真理共享高空间重叠,但推断的基于CNN的分割可能缺乏连贯性,包括伪造的连接组件,孔和空隙。这样的结果令人难以置信,违反了预期的解剖拓扑。作为响应,已经提出了基于持续的同源性损失功能(单级)以捕获全局解剖特征。我们的工作将这些方法扩展到多级分割任务。我们的损失功能构建了所有类标签和类标签对的丰富拓扑描述,我们使用基于CNN的后处理框架可以预测和统计学上的分割拓扑改进。我们还基于立方复合物和并行执行,提出(并提供)高效的实现,这是第一次在高分辨率3D数据中实现实际应用。我们证明了我们在2D短轴和3D全心CMR细分方面的方法,对两个公开可用数据集的性能进行了详细而忠实的分析。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
Extracting complex structures from grid-based data is a common key step in automated medical image analysis. The conventional solution to recovering tree-structured geometries typically involves computing the minimal cost path through intermediate representations derived from segmentation masks. However, this methodology has significant limitations in the context of projective imaging of tree-structured 3D anatomical data such as coronary arteries, since there are often overlapping branches in the 2D projection. In this work, we propose a novel approach to predicting tree connectivity structure which reformulates the task as an optimization problem over individual steps of a recursive process. We design and train a two-stage model which leverages the UNet and Transformer architectures and introduces an image-based prompting technique. Our proposed method achieves compelling results on a pair of synthetic datasets, and outperforms a shortest-path baseline.
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
Grasping is an incredible ability of animals using their arms and limbs in their daily life. The human hand is an especially astonishing multi-fingered tool for precise grasping, which helped humans to develop the modern world. The implementation of the human grasp to virtual reality and telerobotics is always interesting and challenging at the same time. In this work, authors surveyed, studied, and analyzed the human hand-grasping behavior for the possibilities of haptic grasping in the virtual and remote environment. This work is focused on the motion and force analysis of fingers in human hand grasping scenarios and the paper describes the transition of the human hand grasping towards a tripod haptic grasp model for effective interaction in virtual reality.
translated by 谷歌翻译
We present a Machine Learning (ML) study case to illustrate the challenges of clinical translation for a real-time AI-empowered echocardiography system with data of ICU patients in LMICs. Such ML case study includes data preparation, curation and labelling from 2D Ultrasound videos of 31 ICU patients in LMICs and model selection, validation and deployment of three thinner neural networks to classify apical four-chamber view. Results of the ML heuristics showed the promising implementation, validation and application of thinner networks to classify 4CV with limited datasets. We conclude this work mentioning the need for (a) datasets to improve diversity of demographics, diseases, and (b) the need of further investigations of thinner models to be run and implemented in low-cost hardware to be clinically translated in the ICU in LMICs. The code and other resources to reproduce this work are available at https://github.com/vital-ultrasound/ai-assisted-echocardiography-for-low-resource-countries.
translated by 谷歌翻译
Multivariate time series forecasting with hierarchical structure is pervasive in real-world applications, demanding not only predicting each level of the hierarchy, but also reconciling all forecasts to ensure coherency, i.e., the forecasts should satisfy the hierarchical aggregation constraints. Moreover, the disparities of statistical characteristics between levels can be huge, worsened by non-Gaussian distributions and non-linear correlations. To this extent, we propose a novel end-to-end hierarchical time series forecasting model, based on conditioned normalizing flow-based autoregressive transformer reconciliation, to represent complex data distribution while simultaneously reconciling the forecasts to ensure coherency. Unlike other state-of-the-art methods, we achieve the forecasting and reconciliation simultaneously without requiring any explicit post-processing step. In addition, by harnessing the power of deep model, we do not rely on any assumption such as unbiased estimates or Gaussian distribution. Our evaluation experiments are conducted on four real-world hierarchical datasets from different industrial domains (three public ones and a dataset from the application servers of Alipay's data center) and the preliminary results demonstrate efficacy of our proposed method.
translated by 谷歌翻译