SupervisorBot: NLP-Annotated Real-Time Recommendations of Psychotherapy Treatment Strategies with Deep Reinforcement Learning

Baihan Lin
Columbia University
New York, NY, USA
baihan.lin@columbia.edu

Abstract

We propose a recommendation system that suggests treatment strategies to a therapist during the psychotherapy session in real-time. Our system uses a turn-level rating mechanism that predicts the therapeutic outcome by computing a similarity score between the deep embedding of a scoring inventory, and the current sentence that the patient is speaking. The system automatically transcribes a continuous audio stream and separates it into turns of the patient and of the therapist using an online registration-free diarization method. The dialogue pairs along with their computed ratings are then fed into a deep reinforcement learning recommender where the sessions are treated as users and the topics are treated as items. Other than evaluating the empirical advantages of the core components on existing datasets, we demonstrate the effectiveness of this system in a web app.

1 Introduction

Mental illness is not only a severe healthcare problem in the US (1 in 5 estimated by National Institute of Mental Health) but also a major global issue Patel et al. (2018). However, most countries including the states face severe shortage of mental health practioners, such as psychiatrists and clincal psychologists Satiani et al. (2018). In recent two years, this demand gap was especially amplified by the toll of COVID-19 pandemic on everyone’s mental health Wang et al. (2020b). Current education systems and training programs cannot catch up to this trend because each licensed therapist requires years of continual learning and supervised training. Even when a therapist is ripe for independent practice, many still seek weekly supervision from "supervisors", who are usally a more senior therapist that have seen many more years of patients and serve as "a crucial triad of learning difficulties that tend to confront beginning therapists in their training" Watkins Jr (2013). These supervisors provide necessary guidance and periodic feedback to junior therapists with respect to their development of mindedness, psychotherapist identities and treatment roadblocks they face in their own cases.

Figure 1: Analytical Framework of SupervisorBot System. (Left) Major components of the recommendation systems. (Right) Reinforcement learning framework of the psychotherapy recommendation system problem.

Figure 2: Example inventory items and scales

Figure 3: Empirical results on MiniVox and AlexStreet datasets

Figure 4: Evaluation of deep reinforcement learning recommenders. (Left three) loss functions for the sub-networks for DDPG, TD3, and BCQ. (Right) Pearson’s r of the actual actions taken in the test set with their predicted actions.

Figure 5: State screenshots of SupervisorBot web app: inventory inputs, diarization training, state annotation, strategy recommendation.

In this work, we present SupervisorBot, a virtual AI companion that provides real-time feedback and recommends treatment strategy to the therapists while they are conducting their own psychotherapy. Like a supervisor, SupervisorBot offers feedback and guidance that are case-dependent. Like a supervisor, SupervisorBot has seen thousands of historical therapy sessions and case studies.

The base of our recommendation system relies on a rating system that evaluates how good a treatment strategy is. As the mental state of a patient can be complicated to characterize, we gravitates our approach towards well-defined clinical outcomes. The working alliance is such a psychological concept that is shown to be highly predictive of the success of psychotherapy in clinical setting Wampold (2015). It describes several important cognitive and emotional components of the relationship between these two agents in conversation, including the agreement on the goals to be achieved and the tasks to be carried out, and the bond, trust and respect to be established over the course of the dialogue Bordin (1979). It measures the tendency for communication partners to align with each other both in their verbal and non-verbal behaviors. We developed a natural language processing (NLP) approach to infer this quantity in real-time as ratings.

Our recommendation system transcribes the session in real-time, predicts the therapeutic outcome as a turn-level rating, and recommends treatment strategy that is best for the current context and state of the psychotherapy. It is web-based, interactive, informative, and learns continually. We believe this system is the first step towards solving the global issue of mental health by augmenting the treatment and education of clinical practitioners with a recommendation system of therapeutic strategy.

2 Methods

Fig 1 is an outline of the analytic framework. The continuous audio stream is fed into the system. First, we perform the speaker diarization by training the system for a few rounds by interacting with sparse feedback from the user.

Online speaker diarization. In research setting, the existing diarization methods often start with a registration stage training deep learning models on a pool of speaker profiles. In real world setting, on the other hand, it is suboptimal to have to register voiceprints for all users before an abitrary multi-user zoom meeting as a deployed system. Instead, a better lightweight deployable speaker diarization system should register the user when necessary instead of relying on pre-registering all the users. In order to efficiently adaptive to complex environment, they should also be able to run without depending on large-scale data access and mock up cold-start new user profiles using prior knowledge. This assumption requires the model to learn how to label the user profiles from scratch by having the user to interacting with the system and provide feedback to correct the system on the fly. To met these criteria for our real-time system, we follow Lin and Zhang (2021, 2020) and use BerlinUCB Lin (2020), an online semi-supervised learning bandit algorithm to do diarization. It separates audio into dyads of doctor-patient, which are then transcribed into natural language turns for real-time downstream analyses.

Therapeutic quality ratings. After we obtain a relatively well diarization result, we can configure the quality assessment setting by specifying a proper inventory. In this demonstration, we use the Working Alliance Inventory (WAI), a set of self-report measurement questionnaire that quantifies the therapeutic bond, task agreement, and goal agreement Horvath (1981); Tracey and Kokotovic (1989); Martin et al. (2000). Operationally, our goal is to derive from these 36 items three alliance scales: the task scale, the bond scale and the goal scale. They measures the three major themes of psychotherapy outcomes: (1) the collaborative nature of the dialogue participants’ relationship; (2) the affective bond between them, and (3) their capabilities to agree on treatment-related short-term tasks and long-term goals. The score corresponding to the three scales comes from a key table (Fig 2) which specifies the positivity or the sign weight to be applied on the questionnaire answer when summing in the end. The full scale is simply the sum of the scores of the three scales. The key table is like a weighting matrix that specifies the directionalities of the scales.

Transcription and real-time rating assessment. Now we are ready for real-time quality annotation. Given the audio stream for a given user, we first transcribe the diarized audio stream with standard automatic speech recognition module Adorf (2013). Following the approach proposed in Lin et al. (2022b); Lin (2022b, f), we embed both the dialogue turns and the inventories with deep sentence or paragraph embeddings (in this case, the SentenceBERT Reimers and et. al. (2019)), and then compute the cosine similarity between the embedding vectors of the turn and its corresponding inventory vectors. With that, for each turn (either by patient or by therapist), we obtain a 36-dimension working alliance score, which we may save in a relational database as in Lin (2022c).

Topic modeling as recommendation items. First, we define the “items”, “users”, “contents” and “ratings” in our recommendation system. Here, the “items” the system recommends are treatment strategies. In this example, we represents these strategies as a topic that the therapist should initiate or continue for the next turn. Given a large text corpus of many psychotherapy sessions, as in Lin et al. (2022a) we can first perform topic modeling (e.g. Miao et al. (2016, 2017); Nan et al. (2019); Dieng et al. (2020); Wang et al. (2020a)) to extract the main concepts discussed in the psychotherapy. We use the Embedded Topic Model (ETM) Wang et al. (2020a) in this work because it was shown to create the most diverse concepts in psychological corpus Lin et al. (2022a). One can also adopt a symbolic approach to the topic modeling to gain further insights into the causalities and relationships between these topical concepts as in Lin (2022d). In this study, we use annotate each turn with their most likely topic and identifies seven unique topics (Topic 0 is about figuring out, self-discovery and reminiscence; Topic 1 is about play. Topic 2 is about anger, scare and sadness. Topic 3 is about counts. Topic 6 is about explicit ways to deal with stress, such as keep busying and reaching out for help. Topic 7 is about numbers. Topic 8 is about continuation and keep doing.)

Recommendation system setting. Then, we pair these “items” with the “users” and “contents”, which in our case, would be the patientID, his or her previous turns, their aggregated formats and other meta data. For instance, we know that within each sessions, there exists many pairs of turns, and they would belong to the same “user”. However, one can also assign all turns belong to one clinical label, or all turns related to a certain topic as one “user”. In this example, we choose the session ids as users. And lastly, the “ratings” would be patient’s inferred alliance scores predictive of the therapeutic outcomes. Creating this database from historical data, we can train our system. Since we have defined our users, items, contents and ratings, the recommendation engine can be easily crafted with content-based Pazzani and Billsus (2007); Basu et al. (1998); Aggarwal and others (2016) and collaborative filtering Sarwar et al. (2001); He et al. (2017); Koren et al. (2022); Su and Khoshgoftaar (2009). Since our session turns are sequential and can specify a state or timestamp, it might be suitable for reinforcement learning (RL) Zheng et al. (2018); Wang et al. (2014); Zou et al. (2020) and session-based approaches Li et al. (2017); Wu et al. (2019); Ludewig and Jannach (2018), which can be neuroscience or psychiatry-inspired Lin et al. (2019, 2020a, 2021, 2020b) to provide further interpretable clinical insights.

Deep reinforcement learning recommendation approaches. Reinforcement learning approaches are effectively applied in language and speech tasks (as reviewed in Lin (2022e)), among which recommendation is an important use case. As shown in the right panel of Figure 1, the reinforcement learning environment is formulated such that the recommendation agent takes an action by recommending a strategy (say, a discussion topic). And the therapist will interact with the patient taking that suggestion into account. The dialogue interaction, in turn, has a quality evaluation of some sort (say, the therapeutic working alliance score). This serves as a reward to the recommendation agent to update its weights. In the meanwhile, the state is progressed to the next therapeutic states. As a first step, we evaluate three popular deep RL algorithms. Based on the deterministic policy gradient in an actor-critic architecture, the Deep Deterministic Policy Gradients (DDPG) Lillicrap et al. (2015) is a model-free algorithm for continous action spaces, and one of the first successful algorithms to learn policies end-to-end. Building upon the Double Q-Learning Hasselt (2010), Twin Delayed DDPG (TD3) Fujimoto et al. (2018) is a similar solution is proposed to correct for the overestimated value issue, and yields more competitive results in various game settings. As the online data collection of RL models are usually time consuming, in real world industrial setting, these models are usually trained using previously collected data. As a result, there is a growing popularity of offline reinforcement learning methods Levine et al. (2020). Among them, Batch Constrained Q-Learning (BCQ) Fujimoto et al. (2019) is the first continuous control deep RL algorithm that yields competitive results in off policy evaluations by restricting the agent’s exploration in the action space.

Ethical considerations. Following the ethical guidelines in Matthews et al. (2017); Graham et al. (2019) and the operational suggestions in Lin (2022a), we make sure that all training examples that we evaluate on are properly anonymized with pre- and post-processing techniques, and disclaim that these investigations are proof of concept and require extensive caution to prevent from the pitfall of over-interpretation.

3 Operational validation of components

We validated the performance of the speaker diarization component in MiniVox benchmark Lin and Zhang (2021) and report a robust cumulative diarization accuracy at each time step (Fig 3A).

For the rating computation, since there are no ground truth, we analyze the Alex Street Psychotherapy dataset which consists of transcribed recordings of over 950 therapy sessions between multiple anonymized therapists and patients. We observe that our alliance scores significantly predict suicidality in the patients (Fig 3B) and produces interesting and interpretable trajectories during the therapy sessions of different psychiatric conditions in both the alliance space (Fig 3B) and the topic space (Fig 3C). We notice that the treatment strategy adopted by experienced therapists differs when facing patients of different disorders.

To evaluate the recommendation systems, we preprocess the Alex Street dataset into a recommendation format (219,999 recommendation actions) and then split it into 95/5 train-test sets. To set up the batch training for reinforcement learning, we cut the turns into frames of 10 turn pairs and a batch size of 32. We train the three agents each for 100 epochs, where their losses consistently drop and converge (Figure 4 left panels) in a stable way. To compare them, we compute the Pearson’s r of the recommended actions with their corresponding ground truth actions the test set. We observe that BCQ is the best performing model with a correlation of 0.2843, followed by DDPG (0.2712) and TD3 (0.2192). The slight advantage might be due to the additional errors in not-offline methods introduced by extrapolation. This evaluation provides a proof of concept. Future work will focus on systematically comparing a larger spectrum of deep reinforcement learning and model architectures.

4 Interactive System: SupervisorBot

“SupervisorBot” is an interactive web-based system (Fig 5). We first give users the instructions on how to use the system. Then they are lead to input their own inventory used to evaluate the dialogue quality. In this case, we put in a default one, using the working alliance inventory. They are guided to input the score scale corresponding to each inventory item and click on “Analyze” to finalize. In the speaker diarization part, we compute and visualize the Mel Frequency Cepstral Coefficients (MFCC) in a sliding window fashion given the real-time audio input from microphone, with the MFCC bands color coded in the page. Finishing these two steps as the preparation, the system is now running, and the therapist can sit back and go on with the session. The app now moves to the annotation panel, where the therapist can see that a transcript is displayed, along with who is speaking. The computed alliance score in the three scales are also dynamically displayed in real-time according to the content of the dialogue turn. This is helpful information to assist the therapist. And in our last panel, we have our recommendation guidance. The topics to choose from are ranked and top N are displayed. The therapist can use it as a hint and initiate his response given a top recommendation. The system will transcribe his response and highlight the topic he most likely ended up choosing in the last round, and save that information as part of historical data. The system refreshes its parameters at the end of each session to fit new data.

5 Conclusions

In this work, we provide a practical example of how a real-time recommendation system can help therapists better treat their patients in psychotherapy sessions with informative clinical annotations and recommendations of treatment strategies with deep reinforcement learning. Although in this example, the strategies are the topics for the therapist to initiate or continue, the same approach can be extended to more complex and nuanced treatment suggestions. For instance, in the ABC approach of cognitive behavioral therapy (CBT), our system can suggest a belief (B) to guide the patients to better understand the causality between the activating event (A) and its consequence (C).

Before we conclude, another interesting perspective to view this line of research is hidden in Figure 1: while the recommendation agent is driven by reinforcement learning, the therapist (and even patient) have their agency which updates under the reinforcement learning principles. For instance, the patient can directly offer feedback to the therapists. And given the feedback, the therapist may adjust his or her internal model to weigh on the quality of the suggestions by the recommendation agent. Next steps include modeling these theory of minds and confidence levels in this multi-participant human computer interaction setting.

6 Resources

Website: https://www.baihan.nyc/viz/SupervisorBot

Video: https://youtu.be/X3bFx5bF95s

Codes: https://github.com/doerlbh/PsychiatryNLP

References

J. Adorf (2013) Web speech api. KTH Royal Institute of Technology, pp. 1–11. Cited by: §2.
C. C. Aggarwal et al. (2016) Recommender systems. Vol. 1, Springer. Cited by: §2.
C. Basu, H. Hirsh, W. Cohen, et al. (1998) Recommendation as classification: using social and content-based information in recommendation. In Aaai/iaai, pp. 714–720. Cited by: §2.
E. S. Bordin (1979) The generalizability of the psychoanalytic concept of the working alliance.. Psychotherapy: Theory, research & practice 16 (3), pp. 252. Cited by: §1.
A. B. Dieng, F. J. Ruiz, and D. M. Blei (2020) Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics 8, pp. 439–453. Cited by: §2.
S. Fujimoto, H. Hoof, and D. Meger (2018) Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. Cited by: §2.
S. Fujimoto, D. Meger, and D. Precup (2019) Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pp. 2052–2062. Cited by: §2.
S. Graham, C. Depp, E. E. Lee, C. Nebeker, X. Tu, H. Kim, and D. V. Jeste (2019) Artificial intelligence for mental health and mental illnesses: an overview. Current psychiatry reports 21 (11), pp. 1–18. Cited by: §2.
H. Hasselt (2010) Double q-learning. Advances in neural information processing systems 23. Cited by: §2.
X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua (2017) Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pp. 173–182. Cited by: §2.
A. O. Horvath (1981) An exploratory study of the working alliance: its measurement and relationship to therapy outcome. Cited by: §2.
Y. Koren, S. Rendle, and R. Bell (2022) Advances in collaborative filtering. Recommender systems handbook, pp. 91–142. Cited by: §2.
S. Levine, A. Kumar, G. Tucker, and J. Fu (2020) Offline reinforcement learning: tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643. Cited by: §2.
J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma (2017) Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1419–1428. Cited by: §2.
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Cited by: §2.
B. Lin, D. Bouneffouf, G. Cecchi, and R. Tejwani (2022a) Neural topic modeling of psychotherapy sessions. arXiv preprint arXiv:2204.10189. Cited by: §2.
B. Lin, D. Bouneffouf, and G. Cecchi (2019) Split Q Learning: Reinforcement Learning with Two-Stream Rewards. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 6448–6449. External Links: Document, Link Cited by: §2.
B. Lin, G. Cecchi, D. Bouneffouf, J. Reinen, and I. Rish (2020a) A story of two streams: reinforcement learning models from human behavior and neuropsychiatry. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 744–752. Cited by: §2.
B. Lin, G. Cecchi, D. Bouneffouf, J. Reinen, and I. Rish (2020b) Unified models of human behavioral agents in bandits, contextual bandits and rl. arXiv preprint arXiv:2005.04544. Cited by: §2.
B. Lin, G. Cecchi, D. Bouneffouf, J. Reinen, and I. Rish (2021) Models of human behavioral agents in bandits, contextual bandits and rl. In International Workshop on Human Brain and Artificial Intelligence, pp. 14–33. Cited by: §2.
B. Lin, G. Cecchi, and D. Bouneffouf (2022b) Deep annotation of therapeutic working alliance in psychotherapy. arXiv preprint arXiv:2204.05522. Cited by: §2.
B. Lin and X. Zhang (2020) VoiceID on the fly: a speaker recognition system that learns from scratch. In INTERSPEECH, Cited by: §2.
B. Lin and X. Zhang (2021) Speaker diarization as a fully online bandit learning problem in minivox. In ACML, pp. 1660–1674. Cited by: §2, §3.
B. Lin (2020) Online semi-supervised learning in contextual bandits with episodic reward. In AJCAI, pp. 407–419. Cited by: §2.
B. Lin (2022a) Computational inference in cognitive science: operational, societal and ethical considerations. arXiv preprint. Cited by: §2.
B. Lin (2022b) Dynamic inference of personality-specific patient-doctor interactions in support of psychotherapy in clinical setting. arXiv preprint. Cited by: §2.
B. Lin (2022c) Knowledge management system with nlp-assisted annotations: a brief survey and outlook. arXiv preprint arXiv:2206.07304. Cited by: §2.
B. Lin (2022d) Neuro-symbolic topic modeling. arXiv preprint. Cited by: §2.
B. Lin (2022e) Reinforcement learning and bandits for language and speech: tutorial, review and perspectives. arXiv preprint. Cited by: §2.
B. Lin (2022f) Voice2Alliance: automatic speaker diarization and quality assurance of conversational alignment. In INTERSPEECH, Cited by: §2.
M. Ludewig and D. Jannach (2018) Evaluation of session-based recommendation algorithms. User Modeling and User-Adapted Interaction 28 (4), pp. 331–390. Cited by: §2.
D. J. Martin, J. P. Garske, and M. K. Davis (2000) Relation of the therapeutic alliance with outcome and other variables: a meta-analytic review.. Journal of consulting and clinical psychology 68 (3), pp. 438. Cited by: §2.
T. Matthews, K. O’Leary, A. Turner, M. Sleeper, J. P. Woelfer, M. Shelton, C. Manthorne, E. F. Churchill, and S. Consolvo (2017) Stories from survivors: privacy & security practices when coping with intimate partner abuse. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 2189–2201. Cited by: §2.
Y. Miao, E. Grefenstette, and P. Blunsom (2017) Discovering discrete latent topics with neural variational inference. In International Conference on Machine Learning, pp. 2410–2419. Cited by: §2.
Y. Miao, L. Yu, and P. Blunsom (2016) Neural variational inference for text processing. In International conference on machine learning, pp. 1727–1736. Cited by: §2.
F. Nan, R. Ding, R. Nallapati, and B. Xiang (2019) Topic modeling with wasserstein autoencoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6345–6381. Cited by: §2.
V. Patel, S. Saxena, C. Lund, G. Thornicroft, F. Baingana, P. Bolton, D. Chisholm, P. Y. Collins, J. L. Cooper, J. Eaton, et al. (2018) The lancet commission on global mental health and sustainable development. The lancet 392 (10157), pp. 1553–1598. Cited by: §1.
M. J. Pazzani and D. Billsus (2007) Content-based recommendation systems. In The adaptive web, pp. 325–341. Cited by: §2.
N. Reimers and et. al. (2019) Sentence-bert: sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. Cited by: §2.
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl (2001) Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web, pp. 285–295. Cited by: §2.
A. Satiani, J. Niedermier, B. Satiani, and D. P. Svendsen (2018) Projected workforce of psychiatrists in the united states: a population analysis. Psychiatric Services 69 (6), pp. 710–713. Cited by: §1.
X. Su and T. M. Khoshgoftaar (2009) A survey of collaborative filtering techniques. Advances in artificial intelligence 2009. Cited by: §2.
T. J. Tracey and A. M. Kokotovic (1989) Factor structure of the working alliance inventory.. Psychological Assessment: A journal of consulting and clinical psychology 1 (3), pp. 207. Cited by: §2.
B. E. Wampold (2015) How important are the common factors in psychotherapy? an update. World Psych 14, pp. 270–277. Cited by: §1.
R. Wang, X. Hu, D. Zhou, Y. He, Y. Xiong, C. Ye, and H. Xu (2020a) Neural topic modeling with bidirectional adversarial training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 340–350. Cited by: §2.
X. Wang, S. Hegde, C. Son, B. Keller, A. Smith, F. Sasangohar, et al. (2020b) Investigating mental health of us college students during the covid-19 pandemic: cross-sectional survey study. Journal of medical Internet research 22 (9), pp. e22817. Cited by: §1.
X. Wang, Y. Wang, D. Hsu, and Y. Wang (2014) Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11 (1), pp. 1–22. Cited by: §2.
C. E. Watkins Jr (2013) Being and becoming a psychotherapy supervisor: the crucial triad of learning difficulties. American Journal of Psychotherapy 67 (2), pp. 134–150. Cited by: §1.
S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan (2019) Session-based recommendation with graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33, pp. 346–353. Cited by: §2.
G. Zheng, F. Zhang, Z. Zheng, Y. Xiang, N. J. Yuan, X. Xie, and Z. Li (2018) DRN: a deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pp. 167–176. Cited by: §2.
L. Zou, L. Xia, P. Du, Z. Zhang, T. Bai, W. Liu, J. Nie, and D. Yin (2020) Pseudo dyna-q: a reinforcement learning framework for interactive recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 816–824. Cited by: §2.