Social recommender systems (SocialRS) simultaneously leverage user-to-item interactions as well as user-to-user social relations for the task of generating item recommendations to users. Additionally exploiting social relations is clearly effective in understanding users' tastes due to the effects of homophily and social influence. For this reason, SocialRS has increasingly attracted attention. In particular, with the advance of Graph Neural Networks (GNN), many GNN-based SocialRS methods have been developed recently. Therefore, we conduct a comprehensive and systematic review of the literature on GNN-based SocialRS. In this survey, we first identify 80 papers on GNN-based SocialRS after annotating 2151 papers by following the PRISMA framework (Preferred Reporting Items for Systematic Reviews and Meta-Analysis). Then, we comprehensively review them in terms of their inputs and architectures to propose a novel taxonomy: (1) input taxonomy includes 5 groups of input type notations and 7 groups of input representation notations; (2) architecture taxonomy includes 8 groups of GNN encoder, 2 groups of decoder, and 12 groups of loss function notations. We classify the GNN-based SocialRS methods into several categories as per the taxonomy and describe their details. Furthermore, we summarize the benchmark datasets and metrics widely used to evaluate the GNN-based SocialRS methods. Finally, we conclude this survey by presenting some future research directions.
translated by 谷歌翻译
As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics. Each rationale is a set of supporting statements connecting the question to the answer. WikiWhy serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit rationales for each answer to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition, leaving significant room for future improvements.
translated by 谷歌翻译
类比推理问题挑战了连接主义者和符号AI系统,因为这些系统需要将背景知识,推理和模式识别的结合。符号系统摄入显式域知识并执行演绎推理,但它们对噪声敏感,并且需要输入以预设符号特征。另一方面,Connectionist系统可以直接摄入丰富的输入空间,例如图像,文本或语音,即使使用嘈杂的输入也可以识别模式。但是,Connectionist模型努力将明确的领域知识用于演绎推理。在本文中,我们提出了一个框架,将神经网络的模式识别能力与象征性推理和背景知识结合在一起,以解决一类类似推理问题,其中一组属性和可能的​​关系是已知的。我们从“神经算法推理”方法[DeepMind 2020]中汲取灵感,并通过(i)基于问题的象征模型学习分布式表示(ii)培训神经网络转化反映了关系的分布式表示形式。参与问题,最后(iii)培训神经网络编码器,从图像到(i)中的分布式表示。这三个要素使我们能够使用神经网络作为操纵分布式表示的基本功能执行基于搜索的推理。我们在乌鸦渐进式矩阵中的视觉类比问题上进行了测试,并在人类绩效中实现准确性竞争,在某些情况下,优于初始端到端神经网络方法的方法。尽管最近接受大规模训练的神经模型产生了SOTA,但我们的新型神经符号推理方法是该问题的有希望的方向,可以说是更笼统的,尤其是对于可用的域知识的问题。
translated by 谷歌翻译
近年来,Imbersive显示器(例如VR耳机,AR眼镜,多视图显示器,自由点电视)已成为一种新的展示技术,与传统显示相比,提供了更好的视觉体验和观众的参与度。随着3D视频和展示技术的发展,高动态范围(HDR)摄像机和显示器的消费市场迅速增长。缺乏适当的实验数据是3D HDR视频技术领域的主要研究工作的关键障碍。同样,足够的现实世界多曝光实验数据集的不可用是用于HDR成像研究的主要瓶颈,从而限制了观众的体验质量(QOE)。在本文中,我们介绍了在印度理工学院马德拉斯校园内捕获的多元化立体曝光数据集,该数据集是多元化的动植物的所在地。该数据集使用ZED立体相机捕获,并提供户外位置的复杂场景,例如花园,路边景观,节日场地,建筑物和室内地区,例如学术和居住区。提出的数据集可容纳宽深度范围,复杂的深度结构,使物体运动复杂化,照明变化,丰富的色彩动态,纹理差异,除了通过移动摄像机和背景运动引入的显着随机性。拟议的数据集可公开向研究界公开使用。此外,详细描述了捕获,对齐和校准多曝光立体视频和图像的过程。最后,我们讨论了有关HDR成像,深度估计,一致的音调映射和3D HDR编码的进度,挑战,潜在用例和未来研究机会。
translated by 谷歌翻译
技术在康复领域发挥着重要作用,改善患者结果并降低医疗保健成本。然而,现有的方法缺乏临床验证,鲁棒性和易用性。我们提出Tele-EventNet,这是一个由两个组件组成的新颖系统:实时反馈模型和整体性能评估模型。实时反馈模型展示了对运动正确性的反馈,易于理解使用颜色标记突出显示的指令。整体绩效评估模型学会了联合数据的映射到分数,由临床医生的表现提供。该模型通过从联合数据中提取临床批准的特征来实现这一点。此外,这些特征与AutoEncoder一起编码到较低的尺寸空间。提出了一种新的多尺度CNN-LSTM网络,以通过利用在多个尺度提取的功能来学习对分数的性能数据的映射。所提出的系统显示出高度改善的分数预测和优于最先进的康复模型。
translated by 谷歌翻译
Embedding words in vector space is a fundamental first step in state-of-the-art natural language processing (NLP). Typical NLP solutions employ pre-defined vector representations to improve generalization by co-locating similar words in vector space. For instance, Word2Vec is a self-supervised predictive model that captures the context of words using a neural network. Similarly, GLoVe is a popular unsupervised model incorporating corpus-wide word co-occurrence statistics. Such word embedding has significantly boosted important NLP tasks, including sentiment analysis, document classification, and machine translation. However, the embeddings are dense floating-point vectors, making them expensive to compute and difficult to interpret. In this paper, we instead propose to represent the semantics of words with a few defining words that are related using propositional logic. To produce such logical embeddings, we introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee," thus being human-understandable. We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks. Furthermore, we investigate the interpretability of our embedding using the logical representations acquired during training. We also visualize word clusters in vector space, demonstrating how our logical embedding co-locate similar words.
translated by 谷歌翻译
Large training data and expensive model tweaking are standard features of deep learning for images. As a result, data owners often utilize cloud resources to develop large-scale complex models, which raises privacy concerns. Existing solutions are either too expensive to be practical or do not sufficiently protect the confidentiality of data and models. In this paper, we study and compare novel \emph{image disguising} mechanisms, DisguisedNets and InstaHide, aiming to achieve a better trade-off among the level of protection for outsourced DNN model training, the expenses, and the utility of data. DisguisedNets are novel combinations of image blocktization, block-level random permutation, and two block-level secure transformations: random multidimensional projection (RMT) and AES pixel-level encryption (AES). InstaHide is an image mixup and random pixel flipping technique \cite{huang20}. We have analyzed and evaluated them under a multi-level threat model. RMT provides a better security guarantee than InstaHide, under the Level-1 adversarial knowledge with well-preserved model quality. In contrast, AES provides a security guarantee under the Level-2 adversarial knowledge, but it may affect model quality more. The unique features of image disguising also help us to protect models from model-targeted attacks. We have done an extensive experimental evaluation to understand how these methods work in different settings for different datasets.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译
Tsetlin Machine (TM) has been gaining popularity as an inherently interpretable machine leaning method that is able to achieve promising performance with low computational complexity on a variety of applications. The interpretability and the low computational complexity of the TM are inherited from the Boolean expressions for representing various sub-patterns. Although possessing favorable properties, TM has not been the go-to method for AI applications, mainly due to its conceptual and theoretical differences compared with perceptrons and neural networks, which are more widely known and well understood. In this paper, we provide detailed insights for the operational concept of the TM, and try to bridge the gap in the theoretical understanding between the perceptron and the TM. More specifically, we study the operational concept of the TM following the analytical structure of perceptrons, showing the resemblance between the perceptrons and the TM. Through the analysis, we indicated that the TM's weight update can be considered as a special case of the gradient weight update. We also perform an empirical analysis of TM by showing the flexibility in determining the clause length, visualization of decision boundaries and obtaining interpretable boolean expressions from TM. In addition, we also discuss the advantages of TM in terms of its structure and its ability to solve more complex problems.
translated by 谷歌翻译