Sketch, 2018. Flickr/Whinger. Some rights reserved.
'You sound a bit depressed' we might say to a friend,
Not only because of what they say but how they say it.
Perhaps their speech is duller than usual, tailing off between words,
Lacking their usual lively intonation.
There are many ways to boil a voice down into data points;
Low-level spectral features, computed from snippets as short as twenty milliseconds
That quantify the dynamism of amplitude, frequency and energy,
And those longer range syllabic aspects that human ears are tuned to,
Such as pitch and intensity.
A voice distilled into data
Becomes the training material for machine learning algorithms,
And there are many efforts being made to teach machines
To deduce our mental states from voice analysis.
The bet is that the voice is a source of biomarkers,
Distinctive data features that correlate to health conditions,
Especially the emergence of mental health problems
Such as depression, PTSD, Alzheimers and others.
And of course there's the words themselves;
We've already trained machines to recognise them.
Thanks to the deep neural network method called Long Short-Term Memory (LSTM)
We can command our digital assistants to buy something on Amazon.
Rules-based modelling never captured the complexity of speech,
But give neural networks enough examples,
They will learn to parrot and predict any complex pattern,
And voice data is plentiful.
So perhaps machines can be trained to detect symptoms
Of different kinds of stress or distress,
And this can be looped into an appropriate intervention
To prevent things from getting worse.
As data, the features of speech become tables of numbers;
Each chunk of sound becomes a row of digits,
Perhaps sixteen numbers from a Fourier Transform
And others for types of intensity and rhythmicity.
For machine learning to be able to learn
Each row must end in a classification; a number that tags a known diagnosis.
Presented with enough labelled examples it will produce a probabilistic model
That predicts the likelihood of a future speaker developing the same condition.
It's very clever to model the hair cells in the human ear as forced damped oscillators
And to apply AI algorithms that learn models through backpropagation,
But we should ask why we want machines to listen out for signs of distress;
Why go to all this trouble when we could do the listening ourselves?
One reason is the rise in mental health problems
At the same time as available services are contracting.
Bringing professional and patient together costs time and money,
But we can acquire and analyse samples of speech via our network infrastructures.
Machine listening offers the prospect of early intervention
Through a pervasive presence beyond anything that psychiatry could have previously imagined.
Machine learning's skill at pattern finding means it can be used for prediction;
As Thomas Insel says, "We are building digital smoke alarms for people with mental illness".
Insel is a psychiatrist, neuroscientist and former Director of the US National Institute of Mental Health,
Where he prioritised the search for a preemptive approach to psychosis
By "developing algorithms to analyze speech as an early window into the disorganization of thought".
He jumped ship to Google to pursue a big data approach to mental health, then founded a startup called Mindstrong
Which uses smartphone data to 'transform brain health' and 'detect deterioration early'.
The number of startups looking for traction on mental states,
Through the machine analysis of voice,
Suggests a restructuring of the productive forces of mental health,
Such that illness will be constructed by a techno-psychiatric complex.
HealthRhythms, for example, was founded by psychiatrist David Kupfer,
Who chaired the task force that produced DSM-5, the so-called 'bible of psychiatry',
Which defines mental disorders and the diagnostic symptoms for them.
The HealthRhythms app uses voice data to calculate a "log of sociability" to spot depression and anxiety.
Sonde Health screens acoustic changes in the voice for mental health conditions
With a focus on post-natal depression and dementia;
"We're trying to make this ubiquitous and universal" says the CEO.
Their app is not just for smartphones but for any voice-based technology.
Meanwhile Sharecare scans your calls and reports if you seemed anxious;
Founder Jeff Arnold describes it as 'an emotional selfie'.
Like Sonde Health, the company works with health insurers
While HealthRhythms' clients include pharmaceutical companies.
It's hardly a surprise that Silicon Valley sees mental health as a market ripe for Uber-like disruption;
Demand is rising, orthodox services are being cut, but data is more plentiful than it has ever been.
There's a mental health crisis that costs economies millions
So it must be time to 'move fast and break things'.
But as Simondon and others have tried to point out,
The individuation of subjects, including ourselves, involves a certain technicity,
Stabilising a new ensemble of AI and mental health
Will change what it is to be considered well or unwell.
There's little apparent concern among the startup-funder axis
That all this listening might silence voices.
Their enthusiasm is not haunted by the story of the Samaritans Radar
When an organisation which should have known better got carried away by enthusiasm for tech surveillance.
This was a Twitter app developed in 2014 by the Samaritans,
The UK organisation which runs a 24 hour helpline for anyone feeling suicidal.
You signed up for the app and it promised to send you email alerts
Whenever someone you follow on Twitter appeared to be distressed.
If any of their tweets matched a list of key phrases
It invited you to get in touch with them.
In engineering terms, this is light years behind the sophistication of Deep Learning,
But it's a salutory tale about unintended impacts.
Thanks to the inadequate involvement of service users in its production,
It ignored the fact that the wrong sort of well-meaning intervention at the wrong time might actually make things worse,
Or that malicious users could use the app to target and troll vulnerable people.
Never mind the consequences of false positives
When the app misidentified someone as distressed,
Or the concept of consent,
Given that the people being assessed were not even made aware that this was happening;
All riding roughshod over the basic ethical principle of 'do no harm'.
Although Twitter is a nominally public space,
People with mental health issues had been able to hold supportive mutual conversations
With a reasonable expectation that this wouldn't be put in a spotlight,
Allowing them to reach out to others who might be experiencing similar things.
One consequence of the Samaritans Radar was that many people with mental health issues,
Who had previously found twitter a source of mutual support,
Declared their intention to withdraw
Or simply went silent.
As with the sorry tale of the Samaritans Radar,
Without the voices of mental health users and survivors
The hubris that goes with AI has the potential to override the Hippocratic oath.
Fairness and Harm
The ubiquitous application of machine learning's predictive power
In areas with real world consequences, such as policing and the judicial system,
Is stirring an awareness that its oracular insights
Are actually constrained by complexities that are hard to escape.
The simplest of which is data discrimination;
A programme that only knows the data it is fed,
And which is only fed data containing a racist bias,
Will make racist predictions.
This should already be red flag for our mental health listening machines.
Diagnoses of mental health are already skewed with respect to race;
A high proportion of people from black and ethnic minority backgrounds get diagnosed,
And the questions about why are still very much open and contested.
But surely, proponents will say, one advantage of automation in general
Is to encode fairness and bypass the fickleness of human bias;
To apply empirical and statistical knowledge directly
And cut through the subjective distortions of face-to-face prejudice.
Certainly, as the general dangers of reproducing racism and sexism have become clear,
There have been conscientious efforts from engineers in one corner of machine learning
To automate ways to de-bias datasets.
But here's the rub;
Even when you know there's the potential for discrimination
It's mathematically impossible to produce all-round fairness.
If you're designing a parole algorithm to predict whether someone will reoffend,
You can design it so that the accuracy for high risk offenders is the same for white and black.
But if the overall base rates are different
There will be more false positives of black people, which can be considered a harm,
Because more black people who would not go on to reoffend will be refused bail than white people.
Machine learning's probabilistic predictions are the result of a mathematical fit,
The parameters of which are selected to optimise on specific metrics,
The are many mathematical ways to define fairness (perhaps twenty-one of them)
And you can't satisfy them all at the same time.
Proponents might argue that with machinic reasoning,
We should be able to disentangle the reasons for various predictions,
So we can make policy choices
About the various trade-offs.
But there's a problem with artifical neural networks,
Which is that their reasoning is opaque,
Obscured by the multiplicity of connections across their layers,
Where the weightings are derived from massively parallel calculations.
If we apply this deep learning to reveal what lies behind voice samples,
Taking different tremors as proxies for the contents of consciousness,
The algorithm will be tongue-tied
If asked to explain its diagnosis.
And we should ask who these methods will be most applied to,
Since to apply machinic methods we need data.
Data visibility is not evenly distributed across society;
Institutions will have much more data about you if you are part of the welfare system
Than from a comfortable middle class family.
What's apparent from the field of child protection,
Where algorithms are also seen as promising objectivity and pervasive preemption,
Is that the weight of harms from unsubstantiated interventions
Will fall disproportionately on the already disadvantaged,
With the net effect of 'automating inequality'.
If only we could rely on institutions to take a restrained and person-centred approach.
But certainly, where the potential for financial economies are involved,
The history of voice analysis is not promising.
Local authorities in the UK were still applying Voice Stress Analaysis to detect housing benefit cheats
Years after solid scientific evidence showed that its risk predictions were 'no better than horoscopes'.
Machine learning is a leap in sophistication from such crude measures,
But as we've seen it also brings new complexities,
As well as an insatiable dependency on more and more data.
Getting mental health voice analysis off the ground faces the basic challenge of data;
Most algorithms only perform well when there's a lot of it to train on.
They need voice data labelled as being from people who are unwell and those who are not,
So that the algorithm can learn the patterns that distinguish them.
The uncanny success of Facebook's facial recognition algorithms
Came from having huge numbers of labelled faces at hand,
Faces that we, the users, had kindly labelled for them
As belonging to us, or by tagging our friends,
Without realising we were also training a machine;
"if the product is free, you are the training data".
One approach to voice analysis is the kind of clever surveillance trick
Used by a paper investigating 'The Language of Social Support in Social Media And its Effect on Suicidal Ideation Risk',
Where they collected comments from Reddit users in mental health subreddits like
r/depression, r/mentalhealth, r/bipolarreddit, r/ptsd, r/psychoticreddit,
And tracked how many could be identified as subsequently posting in
A prominent suicide support community on Reddit called r/SuicideWatch.
Whether or not the training demands of voice algorithms
Are solved by the large scale collection of passive data,
The strategies of the Silicon Valley startups make it clear
That the application of these apps will have to be pervasive,
To fulfill the hopes for scaling and early identification.
Society is already struggling to grapple
With the social effects of pervasive facial recognition,
Whether mainstreamed in China's system of social credit,
Or marketed to US police forces by Amazon,
Where it has at least led to some resistance from employees themselves.
The democratic discourse around voice analysis seems relatively hushed,
And yet we are increasingly embedded in a listening environment,
With Siri and Alexa and Google Assistant and Microsoft's Cortana
And Hello Barbie and My Friend Cayla and our smart car,
And apps and games on our smartphones that request microphone access.
Where might our voices be analysed for signs of stress or depression
In a way that can be glossed as legitimate under the General Data Protection Regulation;
On our work phone? our home assistant? while driving? when calling a helpline?
When will using an app like HealthRhythms, which 'wakes up when an audio stream is detected',
Become compulsory for people receiving any form of psychological care?
Let's not forget that in the UK we already have Community Treatment Orders for mental health.
Surveillance is the inexorable logic of the data-diagnostic axis,
Merging with the benificent idea of Public Health Surveillance,
With its agenda of epidemiology and health & safety,
But never quite escaping the long history of sponsorship of speech recognition research
By the Defense Advanced Research Projects Agency (DARPA).
A history that Apple doesn't often speak of,
That it acquired Siri from SRI International,
Who'd developed it through a massive AI contract with DARPA.
As the Samaritans example made clear,
We should pause before embedding ideas like 'targeting' in social care;
Targeting people for preemptive intervention is fraught with challenges,
And forefronts the core questions of consent and 'do no harm'.
Before we imagine that "instead of waiting for traditional signs of dementia and getting tested by the doctor
The smart speakers in our homes could be monitoring changes in our speech as we ask for the news, weather and sports scores
And detecting the disease far earlier than is possible today",
We need to know how to defend against the creation of a therapeautic Stasi.
It might seem far fetched to say that snatches of chat with Alexa
Might be considered as signficant as a screening interview with a psychatrist or psychologist,
But this is to underestimate the aura of scientific authority
That comes with contemporary machine learning.
What algorithms offer is not just an outreach into daily life,
But the allure of neutrality and objectivity,
That by abstracting phenomena into numbers that can be statistically correlated
In ways that enable machines to imitate humans,
Quantitative methods can be applied to areas that were previously the purview of human judgement.
Big data seems to offer senstive probes of signals beyond human perception,
Of vocal traits that are "not discernable to the human ear",
Nor 'degraded' by relying on self-reporting;
It doesn't seem to matter much that this use of voice
Pushes the possibility of mutual dialogue further away,
Turning the patient's opinions into noise rather than signal.
Machinic voice analysis of our mental states
Risks becoming an example of epistemic injustice,
Where an authoritative voice comes to count more than our own;
The algorithm analysis of how someone speaks causing others to "give a deflated level of credibility to a speaker's word".
Of course we could appeal to the sensitivity and restraint of those applying the algorithms;
Context is everything when looking at the actual impact of AI,
Knowing whether it is being adopted situations where existing relations of power
Might indicate the possibility of overreach or arbitrary application.
The context of mental health certainly suggest caution,
Given that the very definition of mental health is historically varying;
The asymmetries of power are stark, because treatment can be compulsory and detention is not uncommon,
And the life consequences of being in treatment or missing out on treatment can be severe.
Mental health problems can be hugely challenging for everyone involved,
And in the darkest moments of psychosis or mania
People are probably not going to have much to say about how their care should be organised,
But, in between episodes, who is better placed to help shape specific ideas for their care
Than the person who experiences the distress;
They have the situated knowledge.
The danger with all machine learning
Is the introduction of a drone-like distancing from messy subjectivities,
With the danger that this will increase thoughtlessness
Through the outsourcing of elements of judgement to automated and automatising systems.
The voice as analysed by machine learning
Will become a technology of the self in Foucault's terms,
Producing new subjects of mental health diagnosis and intervention,
Whose voice spectrum is definitive but whose words count for little.
The lack of user voice in mental health services has been a bone of contention since the civil right's struggles of the 1960s,
With the emergence of user networks that put forward alternative views,
Seeking to be heard over the stentorian tones of the psychiatric establishment,
Forming alliances with mental health workers and other advocates;
Groups like Survivors Speak Out, The Hearing Voices Network, the National Self-Harm Network and Mad Pride.
Putting forward demands for new programmes and services,
Proposing strategies such as 'harm minimization' and 'coping with voices',
Making the case for consensual, non-medicalised ways to cope with their experiences,
And forming collective structures such as Patients' Councils.
While these developments have been supported by some professionals,
And some user participation has been assimilated as the co-production of services,
The validity of user voice, especially the collective voice, is still precarious within the mental health system
And is undermined by coercive legislation and reductionist biomedical models.
The introduction of machinic listening,
That dissects voices into quantifiable snippets,
Will tip the balance of this wider apparatus,
Towards further objectification and automaticity,
Especially in this era of neoliberal austerity.
And yet, ironically, it's only the individual and collective voices of users
That can rescue machine learning from talking itself into harmful contradictions;
That can limit its hunger for ever more data in pursuit of its targets,
And save classifications from overshadowing uniquely significant life experiences.
Designing for justice and fairness not just for optimised classifications
Means discourse and debate have to invade the spaces of data science;
Each layer of the neural networks must be balanced by a layer of deliberation,
Each datafication by caring human attentiveness.
If we want the voices of the users to be heard over the hum of the data centres,
They have to be there from the start;
Putting the incommensurability of their experiences
Alongside the generalising abstractions of the algorithms.
And asking how, if at all,
The narrow intelligence of large-scale statistical data-processing machines
Could support more Open Dialogue, where speaking and listening aim for shared understanding,
More Soteria type houses based on a social model of care,
The development of progressive user-led community mental health services,
And an end to the cuts.
Computation and Care
As machine learning expands into real world situations,
It turns out that interpretability is one of its biggest challenges;
Even DARPA, the military funder of so much research in speech recognition and AI,
Is panicking that targeting judgements will come without any way to interrogate the reasoning behind them.
Experiments to figure out how AI image recognition actually works,
Probed the contents of intermediary layers in the neural networks
By recursively applying the convolutional filters to their own outputs,
Producing the hallucinatory images of 'Inceptionism'.
We are developing AI listening machines that can't explain themselves,
That hear things of significance in their own layers,
Which they can't articulate to the world but that they project outwards as truths;
How would these AI systems fare if diagnosed against DSM-5 criteria?
And if objectivity, as some post-Relativity philosphers of science have proposed,
Consists of invariance under transformation,
What happens if we transform the perspective of our voice analysis,
Looking outwards at the system rather than inwards at the person in distress.
To ask what our machines might hear in the voices of the psychiatrists who are busy founding startups,
Or in the voices of politicians justifying cuts in services because they paid off the banks,
Or in the voice of the nurse who tells someone forcibly detained under the Mental Health Act,
"This ain't a hotel, love".
It's possible that prediction is not a magic bullet for mental health,
And can't replace places of care staffed by people with time to listen,
In a society where precarity, insecurity and austerity don't fuel generalised distress,
Where everyone's voice is not analysed but heard,
In a context which is collective and democratic.
The dramas of the human mind have not been scientifically explained,
And the nature of consciousness still slips the net of neuroscience,
Still less should we restructure the production of truths about the most vulnerable
On computational correlations.
The real confusion behind the Confusion Matrix,
That table of machine learning accuracy that includes percentages of false positives and negatives,
Is that the impact of AI in society doesn't pivot on the risk of false positives
But on the redrawing of boundaries that we experience as universal fact.
The rush towards listening machines tells us a lot about AI,
And the risk of believing it can transform intractable problems
By optimising dissonance out of the system.
If human subjectivities are intractably co-constructed with the tools of their time,
We should ask instead how our new forms of calculative cleverness
Can be stiched into an empathic technics,
That breaks with machine learning as a mode of targeting,
And wreathes computation with ways of caring.
Get our weekly email