The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
在基于LIDAR的自主驱动的基于LIDAR的3D对象检测中,与2D检测情况相比,对象尺寸与输入场景尺寸的比率明显较小。俯瞰此差异,许多3D探测器直接遵循2D探测器的常见做法,即使在量化点云之后,也可以将特征映射下来。在本文中,我们首先重新思考这种多级刻板印象如何影响基于激光雷达的3D对象探测器。我们的实验指出,下采样操作带来了一些优势,并导致不可避免的信息损失。要解决此问题,我们提出了单程稀疏变压器(SST),以将原始分辨率从网络的开头维护。我们的方法武装变压器,我们的方法解决了单步体系结构中的接收领域不足的问题。它还与点云的稀疏合作,自然避免昂贵的计算。最终,我们的SST在大型Waymo Open DataSet上实现了最先进的结果。值得一提的是,由于单程的特征,我们的方法可以在小物体(行人)检测上实现令人兴奋的性能(83.8级)对小物体(行人)检测。代码将在https://github.com/tusimple/sst释放
translated by 谷歌翻译
以前的在线3D多对象跟踪(3DMOT)方法在与几帧的新检测无关时终止ROCKET。但是如果一个物体刚刚变暗,就像被其他物体暂时封闭或者只是从FOV暂时封闭一样,过早地终止ROCKET将导致身份切换。我们揭示了过早的轨迹终端是现代3DMOT系统中身份开关的主要原因。为了解决这个问题,我们提出了一个不朽的跟踪器,一个简单的跟踪系统,它利用轨迹预测来维护对象变暗的物体的轨迹。我们使用一个简单的卡尔曼滤波器进行轨迹预测,并在目标不可见时通过预测保留轨迹。通过这种方法,我们可以避免由过早托管终止产生的96%的车辆标识开关。如果没有任何学习的参数,我们的方法在Waymo Open DataSet测试集上的车载类别的0.0001级和竞争Mota处实现了不匹配的比率。我们的不匹配比率比任何先前发表的方法低一倍。在NUSCENes上报告了类似的结果。我们相信拟议的不朽追踪器可以为推动3DMOT的极限提供简单而强大的解决方案。我们的代码可在https://github.com/immortaltracker/immortaltracker中找到。
translated by 谷歌翻译
3D多对象跟踪(MOT)近年来目睹了众多新颖的基准和方法,尤其是那些在“逐侦测”范式下的基准。尽管他们的进步和有用,但对他们的优势和劣势的深入分析尚不可用。在本文中,我们通过将它们分解为四个组成部分来总结当前的3D MOL方法:检测,关联,运动模型和生命周期管理的预处理。然后,我们将现有算法的故障情况归因于每个组件并详细研究它们。基于分析,我们提出了相应的改进,导致强大但简单的基线:简单进展。 Waymo Open DataSet和Nuscenes上的综合实验结果表明,我们的最终方法可以通过微小的修改来实现新的最先进的结果。此外,我们采取额外的步骤并重新思考当前的基准面是否真实地反映了真实挑战的算法能力。我们深入了解现有基准的细节,并找到一些有趣的事实。最后,我们分析了\ name \中剩余失败的分布和原因,并提出了3D MOT的未来方向。我们的代码可在https://github.com/tusimple/simpletrack获得。
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译
Forecasts by the European Centre for Medium-Range Weather Forecasts (ECMWF; EC for short) can provide a basis for the establishment of maritime-disaster warning systems, but they contain some systematic biases.The fifth-generation EC atmospheric reanalysis (ERA5) data have high accuracy, but are delayed by about 5 days. To overcome this issue, a spatiotemporal deep-learning method could be used for nonlinear mapping between EC and ERA5 data, which would improve the quality of EC wind forecast data in real time. In this study, we developed the Multi-Task-Double Encoder Trajectory Gated Recurrent Unit (MT-DETrajGRU) model, which uses an improved double-encoder forecaster architecture to model the spatiotemporal sequence of the U and V components of the wind field; we designed a multi-task learning loss function to correct wind speed and wind direction simultaneously using only one model. The study area was the western North Pacific (WNP), and real-time rolling bias corrections were made for 10-day wind-field forecasts released by the EC between December 2020 and November 2021, divided into four seasons. Compared with the original EC forecasts, after correction using the MT-DETrajGRU model the wind speed and wind direction biases in the four seasons were reduced by 8-11% and 9-14%, respectively. In addition, the proposed method modelled the data uniformly under different weather conditions. The correction performance under normal and typhoon conditions was comparable, indicating that the data-driven mode constructed here is robust and generalizable.
translated by 谷歌翻译
Despite a sea of interpretability methods that can produce plausible explanations, the field has also empirically seen many failure cases of such methods. In light of these results, it remains unclear for practitioners how to use these methods and choose between them in a principled way. In this paper, we show that for even moderately rich model classes (easily satisfied by neural networks), any feature attribution method that is complete and linear--for example, Integrated Gradients and SHAP--can provably fail to improve on random guessing for inferring model behaviour. Our results apply to common end-tasks such as identifying local model behaviour, spurious feature identification, and algorithmic recourse. One takeaway from our work is the importance of concretely defining end-tasks. In particular, we show that once such an end-task is defined, a simple and direct approach of repeated model evaluations can outperform many other complex feature attribution methods.
translated by 谷歌翻译
Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.
translated by 谷歌翻译
Robust Markov decision processes (RMDPs) are promising models that provide reliable policies under ambiguities in model parameters. As opposed to nominal Markov decision processes (MDPs), however, the state-of-the-art solution methods for RMDPs are limited to value-based methods, such as value iteration and policy iteration. This paper proposes Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs with a global convergence guarantee in tabular problems. Unlike value-based methods, DRPG does not rely on dynamic programming techniques. In particular, the inner-loop robust policy evaluation problem is solved via projected gradient descent. Finally, our experimental results demonstrate the performance of our algorithm and verify our theoretical guarantees.
translated by 谷歌翻译
Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.
translated by 谷歌翻译