智能论文笔记

After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influential for that particular instance. The only method that is currently able to provide such explanations are decision trees. This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.

translated by 谷歌翻译

ScanQA: 3D Question Answering for Spatial Scene Understanding

Daichi Azuma , Taiki Miyanishi , Shuhei Kurita , Motoki Kawanabe

分类：计算机视觉

2021-12-20

我们提出了一项新的3D问题答案的3D空间理解任务（3D-QA）。在3D-QA任务中，模型从丰富的RGB-D室内扫描的整个3D场景接收视觉信息，并回答关于3D场景的给定文本问题。与VQA的2D答案不同，传统的2D-QA模型遭受了对对象对齐和方向的空间理解的问题，并且从3D-QA中的文本问题中失败了对象本地化。我们为3D-QA提出了一个名为ScanQA模型的3D-QA基线模型，其中模型从3D对象提案和编码的句子嵌入中获取融合描述符。该学习描述符将语言表达式与3D扫描的底层几何特征相关联，并促进3D边界框的回归以确定文本问题中的描述对象。我们收集了人类编辑的问题答案对，自由表格答案将接地为3D场景中的3D对象。我们的新ScanQA数据集包含来自Scannet DataSet的800个室内场景的超过41K问答对。据我们所知，ScanQA是第一个在3D环境中执行对象接地的问答的大规模工作。

translated by 谷歌翻译