Automated Language Analysis Post Christmas FAQ Special

Recently our research received some media coverage. Tom Whipple from The Times wrote an article about language in dementia, published on December 22nd. As a result of the article, Richard Hamilton contacted me to talk about our work for a piece for BBC World Service (follow this link and jump to the ten minute mark). We appreciate the attention not only to our project, but also to research on language in dementia in general (including its clinical potential). However, some more details and a bit of framing don’t hurt, so I put together this layman-friendly FAQ to add information and also credit colleagues who contributed but were not mentioned.

Why look at language in dementia?
Our ability to produce and understand language is complex and uses much of our brain. It is therefore sensitive to neurological change. We try to find out how changes in an individual’s language system may help us detect and track neurological disorders such as dementia. Detecting dementia early after it starts is one of the big clinical challenges.

Your project is on language production in dementia?
It goes beyond that. The project, which is led by Rosemary Varley (principal investigator) with me as co-investigator, looks at language production and comprehension. It finally tries to find out how communication can be improved for people living with dementia. We are funded by the Alzheimer’s Society and McKesson UK.

So you developed a computer programme? What does it do? Is it a “vocabulary test”?
We call it the Frequency in Language Analysis Tool (FLAT). It was conceived by me and programmed by Michael Coleman (University College London) and Mark Wibrow (former UCL, now Memberoo). It does a number of things, but the main contribution is that it checks for the individual’s ability to produce rare words as well as rare (or novel) word combinations.
I don’t prefer the term “vocabulary test”. We don’t explicity test knowledge; rather, we analyse what people say spontaneously. Also, FLAT extends beyond single words.

How does this relate to dementia?
Alzheimer’s disease has been associated with a reduction in language creativity, meaning that the individual is more restricted to common words and word combinations (formulaic language). This loss of creativity can be the result of a breakdown of systems underlying word processing, grammatical processing and/or general cognitive capacities. Using FLAT, we have detected this kind of change in other dementia types as well.
Because common language forms are, well, common, these changes are difficult to detect intuitively. People living with Alzheimer’s disease initially continue to produce correct, normal sentences. The difference is that words and phrases which someone without dementia says often are produced even more often by a person living with dementia. This is a true communication impairment since someone who can say only common words and phrases will have difficulties with new situations.

And the FLAT…
… quantifies this phenomenon. It extracts from any text (e.g. a transcript) each word and word combination and determines their usage frequency and collocation strength (to use more technical terms). The FLAT is very fast, currently operating at about 100 words per second.

This doesn’t sound like proper linguistics to me. Would Noam Chomsky approve?
I don’t think he’d approve. With that said, I am not saying that looking at these variables is all there is to language. It’s a measure for more complex phenomena. If you are interested in a broader theoretical framework within which I place our project, check out work by Adele Goldberg, Elena Lieven and Ewa Dąbrowska.

No one has done this before?
Not in this form, although word frequency in dementia has been investigated in the past and formulaic language has been researched using other means. I consider Diane van Lancker-Sidtis (New York University) and Alison Wray (Cardiff University) to be the leading experts on formulaic language in clinical populations.

And it works?
Yes. FLAT can detect differences between groups of speakers with dementia (different types) and speakers without dementia.

You said that FLAT is 90% accurate in detecting dementia?
The number is correct, but context is important. In our sample provided to us by UCL’s Dementia Research Centre (we have been working closely with Jason Warren and Chris Hardy) a machine learning classifier trained on FLAT values was 90% accurate in matching its judgment to a general diagnosis (dementia vs. no dementia). While this result is very encouraging, for many reasons it is unlikely that we would currently achieve this accuracy “in the field”. I have several ideas for improving the methods, and the more speakers we test, the better the model will become.

Can FLAT detect early signs of dementia?
In some data sets measures sometimes correlate with measures of disease progression, but it depends on the type of dementia. We have to carry out more work to find out how well we can detect early signs.

Did you find out about how Alzheimer’s changed the writing of Agatha Christie and Iris Murdoch?
No. Research on Agatha Christie’s language was carried out by a team including Ian Lancashire and Graeme Hirst from the University of Toronto. Work on Iris Murdoch was carried out by a team led by Peter Garrard, now at St George’s University. Peter has done work on other famous individuals.

It sounds like FLAT can be used for other purposes.
I agree! We are a clinical lab which looks at dementia, stroke aphasia and schizophrenia (among other populations) but a tool like the FLAT has potential for language acquisition research, forensic linguistics and other fields.

Where is the download link?
The FLAT is work in progress and while we have working versions which others have used, they are not as user friendly as I want them to be. Also, there is no good manual. For these reasons I have not uploaded it for the public; I fear that if I did I’d have to spend more time than I have explaining how to use it. Drop me an e-mail (v.zimmerer@ucl.ac.uk) if you want the current version and I will upload it for you. I am aiming for a proper release in the first half of 2019.

Where can I read more about the FLAT?
To date there are three publications based at least partially on analyses using FLAT:

Bruns, C., Varley, R., Zimmerer, V.C., Carragher, M., Brekelmans, G., & Beeke, S. (2018). “I don’t know”: a usage-based approach to familiar collocations in non-fluent aphasia. Aphasiology, 33(2), 140-162. doi: 10.1080/02687038.2018.1535692

Zimmerer, V.C., Newman, L., Thomson, R., Coleman, M., & Varley, R.A. (2018). Automated analysis of language production in aphasia and right-hemisphere damage: frequency and collocation strength. Aphasiology, 32(11), 1267-1283. doi:10.1080/02687038.2018.1497138

Zimmerer, V.C., Wibrow, M., Varley, R.A. (2016). Formulaic Language in People with Probable Alzheimer's Disease: A Frequency-Based Approach.. Journal of Alzheimer's disease, 53(3), 1145-1160. doi:10.3233/JAD-160099

There is a new manuscript we are almost ready to submit. Please find it here.