Formulaic language in people with probable Alzheimer's disease: a frequency-based approach.

Zimmerer, V.C., Wibrow, M., & Varley, R.A. (2016). Formulaic language in people with probable Alzheimer's disease: a frequency-based approach. Journal of Alzheimer's Disease, 53, 1145-1160.

The claim that "language is a window into the mind" has been made in so many contexts. We all use language to show what's in our minds. Steven Pinker and others argue that language shows how the human mind is structured. I focus much on the clinical side: We look at language to see if the mind of an individual is working as it should. Brain lesions, psychosis, intellectual disabilities, dementia - all are likely to have an effect on how we produce and understand language.

In Alzheimer's and other dementias, many believe that language can help us identify the disease earlier. The results of this research can have a real impact on individuals and health services. Dementias can start years before they are detected. Earlier diagnosis allows for earlier intervention, which helps slow down neurodegeneration. We just need to find out which aspects of language to look at. So we are on a big variable hunt.

There are many aspects to language. The most obvious language symptoms of Alzheimer's disease are disruptions of narrative and of word production. Individuals with Alzheimer's find it harder to retrieve the right word, or they use the wrong ones. Their language output becomes more "empty" and, much later, loses coherence. In contrast, grammar appears less affected. People with Alzheimer's make more grammatical errors, but the difference to healthy individuals is not that large. According to the traditional "words and rules" (lexicon and grammar) frameworks, we'd therefore need to look at which words are produced and understood, not how they are combined.

This work, which I started in 2013 when I joined UCL, takes a different view. The basic idea is that many word combinations behave like words. They come prefabricated, so that we don't need to put them together from their individual words. Expressions that are used very often -  sentences like "I don't know" or "it's alright" - may be stored completely and not involve any combinatorial processes. We call these expressions "formulas". Formulas are useful because they require less effort to produce than sentences which require individual words to be retrieved and combined.

There is some work that shows that people with Alzheimer's disease rely more on formulas when they communicate. It appears therefore that formula use can be a marker for disease. With the help of Mark Wibrow, who at that time was a PhD student at UCL's Department of Speech, Hearing and Phonetic Sciences, I desgined a computer programme that automatically determines the degree in which an individual's output is formulaic. It does so by checking each word combination for how common it is in everyday language (technically, it looks up these combinations in the British National Corpus). The more an individual relies on common phrases and sentences, the more formulaic his or her outcome. We call this tool the FLAT ("Frequency in Language Analysis Tool").

Our FLAT confirmed that people with an Alzheimer's diagnosis are more formulaic than people without. Further, we found that our formulaicity variables were the only language measures (of the ones that we selected for this study) that correlated with time after symptom onset. We showed how those who had been living with the disease for longer were more formulaic. The FLAT is a new way of analysing language.

I call this work "my second PhD". Formulaic language, Alzheimer's disease and computerised language analyses were new areas for me. Most things which were relevant to the project I had to learn from scratch. I also learned that designing such an analysis tool is not the end of the project, but the very beginning. I spent most of the time trying to figure out what our creation was actually doing, and what we were really learning about pathological language. At times I suffered from "analysis paralysis": There were so many ways of approaching the data that it was hard to do the first step.

In the near future I will write an entire post about formulaic language.