自2020年推出以来,Vision Transformers(VIT)一直在稳步打破许多视觉任务的记录,通常被描述为``全部'''替换Convnet。而且对于嵌入式设备不友好。此外,最近的研究表明,标准的转话如果经过重新设计和培训,可以在准确性和可伸缩性方面与VIT竞争。在本文中,我们采用Convnet的现代化结构来设计一种新的骨干,以采取行动,以采取行动特别是我们的主要目标是为工业产品部署服务,例如仅支持标准操作的FPGA董事会。因此,我们的网络仅由2D卷积组成,而无需使用任何3D卷积,远程注意插件或变压器块。在接受较少的时期(5x-10x)训练时,我们的骨干线超过了(2+1)D和3D卷积的方法,并获得可比的结果s在两个基准数据集上具有vit。
translated by 谷歌翻译
Multiview检测使用多个校准摄像机,并具有重叠的视野来定位遮挡的行人。在该领域,现有方法通常采用``人类建模 - 聚合''策略。为了找到强大的行人表示,有些人直观地使用检测到的2D边界框的位置,而另一些则使用投影到地面上的整个框架功能。但是,前者不考虑人类的外表,并导致许多歧义,而后者由于缺乏人类躯干和头部的准确高度而遭受投影错误。在本文中,我们提出了一种基于人类点云建模的新行人代表方案。具体而言,使用射线跟踪进行整体人类深度估计,我们将行人建模为直立的,薄的纸板点云。然后,我们通过多个视图汇总了行人纸板的点云以进行最终决定。与现有表示形式相比,提出的方法明确利用人类的外观并通过相对准确的高度估计大大减少投影误差。在两个标准评估基准上,提出的方法取得了非常具竞争力的结果。
translated by 谷歌翻译
在腿部机器人的机车上,执行高度敏捷的动态动作,例如跳跃或跑步的踏板乐队,这仍然是一个挑战性的问题。本文提出了一个框架,该框架结合了轨迹优化和模型预测控制,以在踏脚石上执行强大的连续跳跃。在我们的方法中,我们首先利用基于机器人的全非线性动力学的轨迹优化来生成各种跳跃距离的周期性跳跃轨迹。然后,基于模型预测控制的跳跃控制器设计用于实现平滑的跳跃过渡,从而使机器人能够在步进石上实现连续跳跃。得益于将MPC作为实时反馈控制器的合并,该提议的框架也得到了验证,可以对机器人动力学上的高度扰动和模型不确定性具有不均匀的平台。
translated by 谷歌翻译
多视图检测包含多个相机视图,以减轻拥挤的场景中的闭塞,最先进的方法采用单独的转换来将多视图功能投影到地面平面。然而,我们发现这些2D变换不考虑物体的高度,并且这种疏忽沿着相同对象的垂直方向的忽略特征可能不会投影到相同的接地平面上,导致不纯的接地平面特征。为了解决这个问题,我们提出了VFA,Voxized 3D特征聚合,用于多视图检测中的功能转换和聚合。具体而言,我们将3D空间体制出来,将体素投影到每个相机视图上,并将2D功能与这些投影的体素相关联。这允许我们沿相同的垂直线识别然后聚合2D特征,在很大程度上减轻投影失真。此外,由于不同种类的物体(人与牛)在地面上具有不同的形状,因此我们引入了定向的高斯编码以匹配这种形状,从而提高准确性和效率。我们对多视图2D检测和多视图3D检测问题进行实验。结果四个数据集(包括新引入的Multiviewc数据集)表明,与最先进的方法相比,我们的系统与最有竞争力。 %我们的代码和数据将是开放的.code和multiviewc在https://github.com/robert-mar/vfa发布。
translated by 谷歌翻译
具有长飞行阶段的高度敏捷杂技动作需要完美的时机,高精度,以及整个身体运动的协调。为了解决这些挑战,本文提出了一个统一的时序和轨迹优化框架,可用于执行激进的3D跳跃的腿机器人。在我们的方法中,我们首先利用了有效的优化框架,使用简化的刚体动力学来解决机器人身体的接触时间和参考轨迹。然后使用该模块的解决方案基于机器人的全部非线性动力学制定全身轨迹优化。这种组合允许我们有效地优化接触定时,同时保证可以在硬件中实现的跳跃轨迹的准确性。我们在A1机器人模型上验证了所提出的框架,以获得各种3D跳跃任务,如双后跳和双桶分别从2M和0.8米的高海拔滚动。对于不同的3D跳跃动作,还成功地进行了实验验证,例如来自盒子或对角线跳转的桶卷。
translated by 谷歌翻译
Here, we demonstrate how machine learning enables the prediction of comonomers reactivity ratios based on the molecular structure of monomers. We combined multi-task learning, multi-inputs, and Graph Attention Network to build a model capable of predicting reactivity ratios based on the monomers chemical structures.
translated by 谷歌翻译
Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.
translated by 谷歌翻译
Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.
translated by 谷歌翻译
RTE is a significant problem and is a reasonably active research community. The proposed research works on the approach to this problem are pretty diverse with many different directions. For Vietnamese, the RTE problem is moderately new, but this problem plays a vital role in natural language understanding systems. Currently, methods to solve this problem based on contextual word representation learning models have given outstanding results. However, Vietnamese is a semantically rich language. Therefore, in this paper, we want to present an experiment combining semantic word representation through the SRL task with context representation of BERT relative models for the RTE problem. The experimental results give conclusions about the influence and role of semantic representation on Vietnamese in understanding natural language. The experimental results show that the semantic-aware contextual representation model has about 1% higher performance than the model that does not incorporate semantic representation. In addition, the effects on the data domain in Vietnamese are also higher than those in English. This result also shows the positive influence of SRL on RTE problem in Vietnamese.
translated by 谷歌翻译
To the best of our knowledge, this paper made the first attempt to answer whether word segmentation is necessary for Vietnamese sentiment classification. To do this, we presented five pre-trained monolingual S4- based language models for Vietnamese, including one model without word segmentation, and four models using RDRsegmenter, uitnlp, pyvi, or underthesea toolkits in the pre-processing data phase. According to comprehensive experimental results on two corpora, including the VLSP2016-SA corpus of technical article reviews from the news and social media and the UIT-VSFC corpus of the educational survey, we have two suggestions. Firstly, using traditional classifiers like Naive Bayes or Support Vector Machines, word segmentation maybe not be necessary for the Vietnamese sentiment classification corpus, which comes from the social domain. Secondly, word segmentation is necessary for Vietnamese sentiment classification when word segmentation is used before using the BPE method and feeding into the deep learning model. In this way, the RDRsegmenter is the stable toolkit for word segmentation among the uitnlp, pyvi, and underthesea toolkits.
translated by 谷歌翻译