The Doctor Will Hear You Soon

A smartphone handset can enable voice analytics. — Academic spin-offs and boutique startups are positioning themselves for a remote diagnostic market whose possibilities have rapidly expanded in the socially isolating dynamics of the COVID-19 pandemic.

Tal Wenderow knows what it takes to make it big with a technology startup. Wenderow is CEO of Newton, MA-based Vocalis Health, which is developing an artificial intelligence- and machine learning-based platform capable of discerning changes in health through subtle changes in a person's voice.

His last venture, robotic surgery vendor Corindus Vascular Robotics, was acquired by Siemens in 2019 for $1.1 billion. But it was hardly an overnight success story; "It took 17 years," he said. "People forget how long it takes sometimes."

When he signed up with Vocalis Health, he said, his wife asked him whether he was certain he didn't want to take a breather. Yet he went ahead anyway; "I liked its disruptive nature."

The disruption is starting to be noticed by the clinical and funding communities. Vocalis Health is about a year removed from receiving $9 million in venture funding, and spent 2020 establishing or cementing research relationships with the Mayo Clinic and the Geisinger Health System in the U.S. Its technology was also the linchpin of an Israeli study of vocal characteristics of congestive heart failure.

Wenderow is not alone in positioning new voice-based analytic technology for wider acceptance. Elsewhere, academic spin-offs and boutique startups are positioning themselves for a remote diagnostic market whose possibilities have rapidly expanded in the socially isolating dynamics of the COVID-19 pandemic. For example:

Aural Analytics, an Arizona State University spin-off, received $4.3 million in venture funding in September 2019 to expand its clinical trial platform, the global release of its next-generation integration portal to support speech collection and analytics, and several new clinical-grade mobile and Web products to serve across the continuum of care in neurology.
Boston-based Sonde Health completed the acquisition of vocal analytics startup NeuroLex in August 2020; the merger combined Sonde's 300,000 voice samples from 50,000 individuals with NeuroLex's hallmark SurveyLex platform. SurveyLex enabled researchers to create and distribute voice surveys in less than a minute as URL links through Web browsers. It contained a biobank of more than 500,000 voice samples from more than 30,000 individuals.

"Things are leaving academia because the cost is reasonable," said researcher Reza Hosseini Ghomi, M.D., a professor in the University of Washington's neurology department and Institute for Neuroengineering. Hosseini Ghomi was NeuroLex's chief medical officer prior to the sale to Sonde. He predicts there will be more such emerging projects in the near future.

"You can take $5,000 to $10,000 in seed money and really go somewhere. We were able to start NeuroLex because Google Cloud said, 'Here's $100,000 of credit,' and we could start doing a lot of voice analysis. I do think realistically in the next five to 10 years, we will see something in the market that is patient-facing, just like we're seeing with Apple Watch technologies or the automated retinal scanning. Voice will get there."

Hold your hoarseness!

The pioneers of vocal biomarker research are among the most cautionary when talking about how quickly the technology will emerge from research labs and limited releases to large-scale commercial viability. Macro trends such as the increased availability of data and lower costs associated with cloud computing have helped researchers scale projects more quickly; however, they say, there is a large disconnect between the public's perception of what voice analytics can do (based on speech recognition engines such as Apple's Siri and Amazon's Alexa) and the highly specialized needs of clinical voice researchers, who rely on minuscule changes in voice acoustics, rather than words alone.

The community is still small and collegial, and talk with each other regularly. Such communication helps them balance the hype around artificial intelligence (AI) and the reality of creating a new field.

"One of the main reasons I like talking to other folks in this space is that in some sense they are the only ones who know how hard it is," said Visar Berisha, Aural Analytics' chief analytics officer and co-founder. "From the outside looking in it, seems so compelling and easy; you can collect speech so easily, and it's cognitively taxing to produce, so if there are disturbances, hopefully you should be able to pick something up. But it's really very hard."

For example, Vocalis Health's Wenderow said, the company's researchers recently compared the signals a patient with COVID-19 produced with those of a patient with chronic obstructive pulmonary disease (COPD). While intuition may suggest the signals would be similar if not identical, Wenderow said they were not.

"If you have an upper respiratory condition, you would think it should be the exact same, and it was not. So, rather than having one overall vocal biomarker like a fever, from the data we have collected there appear to be different ones for different diseases."

When researchers begin designing vocal machine learning models, then, they need to balance the most likely route to quicker market adoption—the creation of a rather broad "check engine light" signal that might alert a physician to the symptoms of any of several underlying conditions—or a more specific signal of one or just a few possible diseases.

Of course, the more specific signal one looks for, the more one gets caught in the dilemma of modeling for ever-increasing variables: the curse of dimensionality, Berisha said.

"That is really probably the most challenging part of it," he said. "If you are measuring hundreds or thousands of speech features, your sample size has to grow exponentially. It's difficult because you have to figure out which variables to measure, and then you have to have enough data to make sure you're really measuring what you think you're measuring."

Defining and then finding "enough data," however, is currently a chicken-and-egg problem in voice research. In a cautionary essay about the hype surrounding AI and voice research, Berisha noted an ongoing antagonism between the "best practice" approach to creating the most accurate speech recognition model possible (with a seemingly endless amount of voice data with which to work) and what voice biomarker researchers face: "For some applications, particularly those focused on rarely occurring diseases, the patient population simply does not exist to generate data at this scale," he wrote. "If it exists, it is not available in a single repository, and there are complex challenges associated with combining across data sources. More importantly, even if the speech data exists, the clinical data required for training the models is very expensive and difficult to generate."

Sonde Health chief operating officer and co-founder Jim Harper said the data barriers are being addressed to some extent by mergers and acquisitions, such as his firm's purchase of NeuroLex.

"We have more than a million audio files from more than 80,000 individuals," Harper said. "Where you will commonly see research publications and peer-reviewed literature, like proof of concept studies that work with 50-200 individuals, we are now training models with 5,000 to 10,000 individuals speaking multiple languages. Our bigger obstacles now, honestly, are in matching the use-case and value proposition and commercializing the technology."

Multiple ways forward

The severity of the COVID-19 pandemic gave the healthcare industry, where regulatory safety measures mandate often long lag times between a treatment or technology's introduction and its widespread adoption, and where tradition also slows a disruptive solution's use, a swift lesson in remote patient monitoring. Primary care visits over video calls, which inhabited the slimmest margins of use-cases prior to the pandemic, became routine. Remote symptom monitoring technologies and new reimbursement approvals for them from the U.S. government have opened the door to platforms like vocal biomarkers.

"The convenience and ability to do the majority of primary care visits remotely will mean demand for what we are building will increase," Harper said, "and the companies that are successful there will probably be our best partners in the future."

Harper said one of the reasons he sees vocal analytics gaining traction in remote encounters is due to the loss of additional cues an in-person encounter allows: "You're compressing audio, you can't see everything in the body language. So providing some of that back from what we can do with our technology will catch on, I think."

The pioneers of vocal analytics do seem to have reached a rough consensus of how to proceed.

Berisha said Aural Analytics' approach is to "first, tax an individual so we can measure a domain of interest using speech. If we are interested in measuring cognitive function, we provide tasks that elicit speech that are cognitively taxing. If we are interested in measuring respiratory function, we ask for tasks that tax respiratory function.

"All of these are in service of developing a platform that enables others and ourselves to build clinical tools on top of. It is not in service of developing a singular Alzheimer's Disease diagnostic, or a marker for depression."

Vocalis Health's Wenderow sketched out a similar philosophy: "We are building the platform so that from one voice recording, you will have multiple vocal biomarkers. Think of vocal biomarkers as a customizable service."

For example, Wenderow said, a customer could ask for a COVID-19 biomarker to pinpoint either a positive or negative indicator: "One use-case is you want to catch the positive results; that's one configuration. Another use-case is to ascertain that whomever is indicated as negative is truly negative. Either use-case has its place. Finding the positive is how you control the disease, and ascertaining the negative is how you can open businesses back up."

Hosseini Ghomi, who has already been through a startup cycle in vocal analytics, said he prefers to call the technologies being developed "vocal features" rather than biomarkers; he sees the concept's immediate path forward as adjunct to established clinical technologies. Now the chief medical officer of cognitive analytics vendor BrainCheck, he said a recent study showed how voice can add to a true biomarker.

"We were able to use a partner to get the brain scans and other testing we could add the voice platform to," he said. "That was the first step toward a true biomarker, where we can say this voice feature lines up with this thing in your brain. We're starting to see that. That's the work I really want to see happen."

Gregory Goth is an Oakville, CT-based writer who specializes in science and technology.

Hold your hoarseness!

Multiple ways forward

The Doctor Will Hear You Soon

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Hold your hoarseness!

Multiple ways forward

The Doctor Will Hear You Soon

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.