Computerized Speech Analysis – An Objective Marker for Psychiatry?
Sir William Osler may have been predicting the future when he said: “Listen to your patient, he is telling you the diagnosis.” Analyzing what our patients say, and how they say it, may become the next game-changer in mental health diagnosis and treatment.
WHY ANALYSE SPEECH?
The features of our speech, such as how many pauses we take, how we articulate our words, and the quality of our voice can reveal important clues about our emotional and psychological state. Speech and language involve interactions between many areas of the brain along with a complex series of motor movements across a group of muscles.
Even minor changes in the body’s physiologic or mental state can result in obvious changes to the acoustic characteristics of speech. Several commercial companies are already attempting to utilize advances in computerized speech analysis to help identify mental health conditions. Changes in speech have been described as being defining characteristics for many mental disorders, including schizophrenia, depression, autism and bipolar disorder.
Psychiatry residents and medical students have long been taught to note the quieter or slowed speech of depressed patients, and the rapid pressured speech of bipolar patients experiencing a “manic” episode. Clinical experience rapidly highlights the deficits in social communication and disorganized speech that are hallmarks of disorders such as autism and schizophrenia respectively.
Some researchers believe that the burgeoning understanding of speech and language through neuroscience and computer science will make these areas an incredibly important part of the future classification of mental illness. They have called for the National Institute of Mental Health’s Research Domain Criteria (RDoC) Initiative, a program designed to support the development of new ways of classifying psychopathology based on dimensions of observable behavior, to adopt a less reductionistic construct of language in the RDOC matrix [1].
SPEECH ANALYTICS - ALREADY IN COMMERCIAL USE
Can technologies that automatically analyze speech offer psychiatrists a new way of diagnosing and monitoring patients? The idea of automated speech analysis for mental health may sound farfetched to some, but outside of medicine, commercial call centers are rapidly scaling their speech analytics programs. For them, it is about selling, supporting and improving their product offering to customers.
Speech analytics allows companies to accurately understand their customer’s concerns, assess their emotional state and stress levels, and glean insights into how their staff are interacting with their clients [2]. Companies then use this information to match customers with the most appropriate call agents based on personality, provide customized coaching to their staff, and identify ways of improving customer satisfaction. These approaches ultimately pave the way in allowing for predictive assessment of customer behavior. So what will it take to move this technology from the call center to clinics, and ultimately to consumers own homes?
Speech analytics allows companies to accurately understand their customer’s concerns, assess their emotional state and stress levels, and glean insights into how their staff are interacting with their clients [2]. Companies then use this information to match customers with the most appropriate call agents based on personality, provide customized coaching to their staff, and identify ways of improving customer satisfaction. These approaches ultimately pave the way in allowing for predictive assessment of customer behavior. So what will it take to move this technology from the call center to clinics, and ultimately to consumers own homes?
SPEECH ANALYTICS CAN BRING OBJECTIVITY TO PSYCHIATRIC ASSESSMENTS
Assessment of the quality of speech, and the content of what is said is a fundamental part of the psychiatric interview. Psychiatrists and other mental health professionals undertake such assessments in a subjective manner, usually as part of a semi-structured clinical interview. Assessment of speech in mental health continues to rely on face-to-face interactions between patients and clinicians. Despite considerable investment in neuroscientific research, such biologic markers have been elusive. We continue to have no biochemical or neuroimaging tests in routine clinical psychiatric use.
It would be incredibly useful to have an objective quantitative marker of a patient’s clinical condition, especially one that could be remotely and continuously monitored. Despite considerable investment in neuroscientific research, such biologic markers have been elusive. We continue to have no biochemical or neuroimaging tests in routine clinical psychiatric use.
Automated speech analysis has the potential to have a dramatic effect on clinical psychiatry. Almost three out of four people in the US already own a smartphone [3], a device that can collect high-quality audio data in an unintrusive manner. Smartphones are powerful technological devices, with a plethora of sensors that can also track movement and usage patterns across a range of applications. Being able to perform speech analysis on smartphone-collected data may allow for a rapidly scalable and accessible approach to assessing mental health disorders such as depression and bipolar disorder.
Being able to perform speech analysis on smartphone-collected data may allow for a rapidly scalable and accessible approach to assessing mental health disorders such as depression and bipolar disorder. Speech analysis could be an inexpensive, unintrusive, non-invasive, and objective way of assessing a patient’s mental state. Furthermore, it may offer us a chance to remotely and continuously monitor those patients who are at greatest risk, dramatically enhancing our ability to intervene at the earliest signs of deterioration, as opposed to the next clinic appointment.
EMERGING RESEARCH
There have been a growing number of research reports looking at employing speech analysis in mental health conditions. One study analyzed the transcripts of interviews conducted on 34 youths (age 14-27) at high risk of psychosis [4]. By identifying specific speech features, including semantic coherence, and two syntactic markers of speech complexity, the authors were able to develop a predictive system that identified those individuals who would later develop psychosis with 100% accuracy, outperforming the classification from traditional clinical interviews also performed in the study. While this research requires replication with a much larger sample, being able to identify individuals who will likely progress to a psychotic disorder would allow for early intervention, and perhaps achieve one of our Holy Grails, the prevention of mental illness.
Another recent study analyzed the voice data of people with bipolar disorder. The voice data was gathered from their smartphones over a 12-week period [5]. The authors found that by analyzing the voice data, they could accurately identify whether a patient with bipolar disorder was experiencing a depressed or a mixed/manic state, as determined by relevant scores on the Hamilton Depression Rating Scale and Young Mania Rating Scale. Interestingly the authors found that they could identify mixed/manic states with a higher level of sensitivity and specificity than then depressed states.
While we are focusing on speech analysis, it may also be worthwhile highlighting that text-based analysis may also yield important insights that may advance psychiatric care. One study analyzed the SMS text messages of over 80,000 therapeutic counseling conversations from the crisis text line (www.crisistextline.org) [6]. Through detailed analysis, the researchers were able to identify the strategies that would most likely lead to successful counseling regardless of the presenting complaint (depression, self-harm, suicidal thoughts, anxiety, etc.). These strategies included counselor adaptability (being able to identify and react to a conversation that is going well or badly, dealing with ambiguity (reflecting back to check understanding, clarifying situations better, using affirmation to help users), counselor creativity (using less generic or templated responses, responding back in creative and personalized ways), making progress (more quickly identifying the core issue and moving towards collaborative problem solving), and finally changing the perspective (moving away from present struggles to focus on positive aspects, the future, and others). These findings may be important for future generations of psychotherapists who are aiming to use data-based strategies to improve clinical care.
THE FUTURE
While there is growing evidence that speech and other communication modalities from patients may someday become useful clinical tools, the current research remains relatively limited. It should also be noted that the mechanisms the underlie the generation of speech and language are extremely complicated, and analyzing speech may involve many factors that are only briefly considered here, for example the prosodic, formantic, source, and spectral features of speech. We should also be concerned, and actively engage in discussion about the confidentiality and privacy associated with gathering speech data or monitoring conversations. We should also remember that while research is needed to ensure safety and effectiveness, the end goal should be to deliver a product to our patients and fellow clinicians that will enhance the diagnosis, monitoring, and treatment of mental health conditions. Such technology must not only be safe and effective but must be suitably well-designed and engaging for both patients and clinicians.
We should also be concerned, and actively engage in discussion about the confidentiality and privacy associated with gathering speech data or monitoring conversations. We should also remember that while research is needed to ensure safety and effectiveness, the end goal should be to deliver a product to our patients and fellow clinicians that will enhance the diagnosis, monitoring, and treatment of mental health conditions. Such technology must not only be safe and effective but must be suitably well-designed and engaging for both patients and clinicians.
QUIZ
References
Elvevåg, B., Cohen, A. S., Wolters, M. K., Whalley, H. C., Gountouna, V. E., Kuznetsova, K. A., … & Nicodemus, K. K. (2016). An examination of the language construct in NIMH’s research domain criteria: Time for reconceptualization!. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics.
Tech Target, Top Five Benefits of Speech Analytics for the Call Center. Retrieved 9/7/2016. Webpage: http://searchcrm.techtarget.com/report/Top-five-benefits-of-speech-analytics-for-the-call-center
Poushter, J. (2016). Smartphone Ownership and Internet Usage Continues to Climb in Emerging Economies. Pew Research Center: Global Attitudes & Trends.
Bedi, Gillinder, et al. “Automated analysis of free speech predicts psychosis onset in high-risk youths.” npj Schizophrenia 1 (2015): 15030. This study is summarised in this article.
Faurholt-Jepsen, M., et al. “Voice analysis as an objective state marker in bipolar disorder.” Translational Psychiatry 6.7 (2016): e856.
Althoff, T., Clark, K., & Leskovec, J. (2016). Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health. Transactions of the Association for Computational Linguistics,4, 463-476.