Purpose: Trans-oral robotic surgery (TORS) using the da Vinci surgical robot is a new minimally-invasive surgery method to treat oropharyngeal tumors, but it is a challenging operation. Augmented reality (AR) based on intra-operative ultrasound (US) has the potential to enhance the visualization of the anatomy and cancerous tumors to provide additional tools for decision-making in surgery. Methods: We propose and carry out preliminary evaluations of a US-guided AR system for TORS, with the transducer placed on the neck for a transcervical view. Firstly, we perform a novel MRI-transcervical 3D US registration study. Secondly, we develop a US-robot calibration method with an optical tracker and an AR system to display the anatomy mesh model in the real-time endoscope images inside the surgeon console. Results: Our AR system reaches a mean projection error of 26.81 and 27.85 pixels for the projection from the US to stereo cameras in a water bath experiment. The average target registration error for MRI to 3D US is 8.90 mm for the 3D US transducer and 5.85 mm for freehand 3D US, and the average distance between the vessel centerlines is 2.32 mm. Conclusion: We demonstrate the first proof-of-concept transcervical US-guided AR system for TORS and the feasibility of trans-cervical 3D US-MRI registration. Our results show that trans-cervical 3D US is a promising technique for TORS image guidance.
translated by 谷歌翻译
Graph convolutional neural networks have shown significant potential in natural and histopathology images. However, their use has only been studied in a single magnification or multi-magnification with late fusion. In order to leverage the multi-magnification information and early fusion with graph convolutional networks, we handle different embedding spaces at each magnification by introducing the Multi-Scale Relational Graph Convolutional Network (MS-RGCN) as a multiple instance learning method. We model histopathology image patches and their relation with neighboring patches and patches at other scales (i.e., magnifications) as a graph. To pass the information between different magnification embedding spaces, we define separate message-passing neural networks based on the node and edge type. We experiment on prostate cancer histopathology images to predict the grade groups based on the extracted features from patches. We also compare our MS-RGCN with multiple state-of-the-art methods with evaluations on both source and held-out datasets. Our method outperforms the state-of-the-art on both datasets and especially on the classification of grade groups 2 and 3, which are significant for clinical decisions for patient management. Through an ablation study, we test and show the value of the pertinent design features of the MS-RGCN.
translated by 谷歌翻译
Deep learning, especially convolutional neural networks, has triggered accelerated advancements in computer vision, bringing changes into our daily practice. Furthermore, the standardized deep learning modules (also known as backbone networks), i.e., ResNet and EfficientNet, have enabled efficient and rapid development of new computer vision solutions. Yet, deep learning methods still suffer from several drawbacks. One of the most concerning problems is the high memory and computational cost, such that dedicated computing units, typically GPUs, have to be used for training and development. Therefore, in this paper, we propose a quantifiable evaluation method, the convolutional kernel redundancy measure, which is based on perceived image differences, for guiding the network structure simplification. When applying our method to the chest X-ray image classification problem with ResNet, our method can maintain the performance of the network and reduce the number of parameters from over $23$ million to approximately $128$ thousand (reducing $99.46\%$ of the parameters).
translated by 谷歌翻译
我们提出了一个基于变压器的NERF(Transnerf),以学习在新视图合成任务的观察视图图像上进行的通用神经辐射场。相比之下,现有的基于MLP的NERF无法直接接收具有任意号码的观察视图,并且需要基于辅助池的操作来融合源视图信息,从而导致源视图与目标渲染视图之间缺少复杂的关系。此外,当前方法分别处理每个3D点,忽略辐射场场景表示的局部一致性。这些局限性可能会在挑战现实世界应用中降低其性能,在这些应用程序中可能存在巨大的差异和新颖的渲染视图之间的巨大差异。为了应对这些挑战,我们的Transnerf利用注意机制自然地将任意数量的源视图的深层关联解码为基于坐标的场景表示。在统一变压器网络中,在射线铸造空间和周围视图空间中考虑了形状和外观的局部一致性。实验表明,与基于图像的最先进的基于图像的神经渲染方法相比,我们在各种场景上接受过培训的Transnf可以在场景 - 敏捷和每个场景的燃烧场景中获得更好的性能。源视图与渲染视图之间的差距很大。
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译
This paper proposes a novel observer-based controller for Vertical Take-Off and Landing (VTOL) Unmanned Aerial Vehicle (UAV) designed to directly receive measurements from a Vision-Aided Inertial Navigation System (VA-INS) and produce the required thrust and rotational torque inputs. The VA-INS is composed of a vision unit (monocular or stereo camera) and a typical low-cost 6-axis Inertial Measurement Unit (IMU) equipped with an accelerometer and a gyroscope. A major benefit of this approach is its applicability for environments where the Global Positioning System (GPS) is inaccessible. The proposed VTOL-UAV observer utilizes IMU and feature measurements to accurately estimate attitude (orientation), gyroscope bias, position, and linear velocity. Ability to use VA-INS measurements directly makes the proposed observer design more computationally efficient as it obviates the need for attitude and position reconstruction. Once the motion components are estimated, the observer-based controller is used to control the VTOL-UAV attitude, angular velocity, position, and linear velocity guiding the vehicle along the desired trajectory in six degrees of freedom (6 DoF). The closed-loop estimation and the control errors of the observer-based controller are proven to be exponentially stable starting from almost any initial condition. To achieve global and unique VTOL-UAV representation in 6 DoF, the proposed approach is posed on the Lie Group and the design in unit-quaternion is presented. Although the proposed approach is described in a continuous form, the discrete version is provided and tested. Keywords: Vision-aided inertial navigation system, unmanned aerial vehicle, vertical take-off and landing, stochastic, noise, Robotics, control systems, air mobility, observer-based controller algorithm, landmark measurement, exponential stability.
translated by 谷歌翻译
Recent advances in upper limb prostheses have led to significant improvements in the number of movements provided by the robotic limb. However, the method for controlling multiple degrees of freedom via user-generated signals remains challenging. To address this issue, various machine learning controllers have been developed to better predict movement intent. As these controllers become more intelligent and take on more autonomy in the system, the traditional approach of representing the human-machine interface as a human controlling a tool becomes limiting. One possible approach to improve the understanding of these interfaces is to model them as collaborative, multi-agent systems through the lens of joint action. The field of joint action has been commonly applied to two human partners who are trying to work jointly together to achieve a task, such as singing or moving a table together, by effecting coordinated change in their shared environment. In this work, we compare different prosthesis controllers (proportional electromyography with sequential switching, pattern recognition, and adaptive switching) in terms of how they present the hallmarks of joint action. The results of the comparison lead to a new perspective for understanding how existing myoelectric systems relate to each other, along with recommendations for how to improve these systems by increasing the collaborative communication between each partner.
translated by 谷歌翻译
A "heart attack" or myocardial infarction (MI), occurs when an artery supplying blood to the heart is abruptly occluded. The "gold standard" method for imaging MI is Cardiovascular Magnetic Resonance Imaging (MRI), with intravenously administered gadolinium-based contrast (late gadolinium enhancement). However, no "gold standard" fully automated method for the quantification of MI exists. In this work, we propose an end-to-end fully automatic system (MyI-Net) for the detection and quantification of MI in MRI images. This has the potential to reduce the uncertainty due to the technical variability across labs and inherent problems of the data and labels. Our system consists of four processing stages designed to maintain the flow of information across scales. First, features from raw MRI images are generated using feature extractors built on ResNet and MoblieNet architectures. This is followed by the Atrous Spatial Pyramid Pooling (ASPP) to produce spatial information at different scales to preserve more image context. High-level features from ASPP and initial low-level features are concatenated at the third stage and then passed to the fourth stage where spatial information is recovered via up-sampling to produce final image segmentation output into: i) background, ii) heart muscle, iii) blood and iv) scar areas. New models were compared with state-of-art models and manual quantification. Our models showed favorable performance in global segmentation and scar tissue detection relative to state-of-the-art work, including a four-fold better performance in matching scar pixels to contours produced by clinicians.
translated by 谷歌翻译
Increasing popularity of deep-learning-powered applications raises the issue of vulnerability of neural networks to adversarial attacks. In other words, hardly perceptible changes in input data lead to the output error in neural network hindering their utilization in applications that involve decisions with security risks. A number of previous works have already thoroughly evaluated the most commonly used configuration - Convolutional Neural Networks (CNNs) against different types of adversarial attacks. Moreover, recent works demonstrated transferability of the some adversarial examples across different neural network models. This paper studied robustness of the new emerging models such as SpinalNet-based neural networks and Compact Convolutional Transformers (CCT) on image classification problem of CIFAR-10 dataset. Each architecture was tested against four White-box attacks and three Black-box attacks. Unlike VGG and SpinalNet models, attention-based CCT configuration demonstrated large span between strong robustness and vulnerability to adversarial examples. Eventually, the study of transferability between VGG, VGG-inspired SpinalNet and pretrained CCT 7/3x1 models was conducted. It was shown that despite high effectiveness of the attack on the certain individual model, this does not guarantee the transferability to other models.
translated by 谷歌翻译
Spectrum coexistence is essential for next generation (NextG) systems to share the spectrum with incumbent (primary) users and meet the growing demand for bandwidth. One example is the 3.5 GHz Citizens Broadband Radio Service (CBRS) band, where the 5G and beyond communication systems need to sense the spectrum and then access the channel in an opportunistic manner when the incumbent user (e.g., radar) is not transmitting. To that end, a high-fidelity classifier based on a deep neural network is needed for low misdetection (to protect incumbent users) and low false alarm (to achieve high throughput for NextG). In a dynamic wireless environment, the classifier can only be used for a limited period of time, i.e., coherence time. A portion of this period is used for learning to collect sensing results and train a classifier, and the rest is used for transmissions. In spectrum sharing systems, there is a well-known tradeoff between the sensing time and the transmission time. While increasing the sensing time can increase the spectrum sensing accuracy, there is less time left for data transmissions. In this paper, we present a generative adversarial network (GAN) approach to generate synthetic sensing results to augment the training data for the deep learning classifier so that the sensing time can be reduced (and thus the transmission time can be increased) while keeping high accuracy of the classifier. We consider both additive white Gaussian noise (AWGN) and Rayleigh channels, and show that this GAN-based approach can significantly improve both the protection of the high-priority user and the throughput of the NextG user (more in Rayleigh channels than AWGN channels).
translated by 谷歌翻译