Chomsky et al. on ChatGPT. How important is statistical learning for language acquisition?

The way language scientists discuss artificial intelligence highlights a question that divides the field. How much do we rely on statistical learning in language acquisition?

For those who don’t know, statistical learning is the ability to track patterns of ocurrence or co-ocurrence of stimuli - in language, these can be speech sounds (or movements in sign language), syllables, words, phrases, but also patterns of communicated meaning and intentions - and have these shape your behaviour, for example how you produce and understand language.

Our collective discovery of the (marvelous) “ChatGPT” chatbot by OpenAI went through several stages. First, there was the utter astonishment and adventure as we learnt what the tool can do. ChatGPT answers all sorts of questions in a satisfying manner, and if you wish to, in the style of William Shakespeare, Charles Bukowski, or Trent Reznor. It handles follow-ups and references to previous answers. It can write your code for data analysis, or produce a choose-your-own-adventure game. It can answer an e-mail from your student, or be your writing partner when preparing an article.

Then followed the demonstrations of its stupidity: how bad its jokes and poems are, or how it can’t explain the solutions to riddles. How pathetically servile it is when retracting a correct statement only because you asked it to, and to replace it with any nonsense you suggest. Linguists discussed how it cannot interpret garden path sentences like “The horse raced the barn fell”, and how it doesn’t understand what a verb is. And finally, discussions about how the technology will change our lives, and how we can embrace it in the right way, are becoming more concrete.

It’s funny how this process of understanding ChatGPT unfolded over weeks via interactions with the tool, even though its creators (and the bot itself) were, from the start, very clear about what it is: a sophisticated algorithm which uses statistical properties in language to estimate which text to generate as the appropriate answer to a request. ChatGPT has no understanding of the world, or even a single word, no sensory experiences, no personality. Most of us understand that.

Now Ian Roberts, Jeffrey Watumull and Noam Chomsky wrote an opinion essay titled “The False Promise of ChatGPT” in the New York Times ultimately making that statement - that we cannot expect ChatGPT and similar systems to be human-like, or even intelligent in any meaningful sense, and the response has been mixed, ranging from full or partial agreement (here Gary Marcus’ writeup) to the quite antagonistic (which, sadly, is not unusual. I am certainly not defending Chomsky in that regard, who made his own contributions to several “Linguistics Wars”).

I won’t go into all the details and want to focus on the aspect of statistical learning, which links to decades-old and ongoing debates. Language can be described both as generated by rules, and as complext statistical patterns, which is of course why we are having these arguments. Here are the teams, members of which fought many battles:

Loving ChatGPT’s take on the sentence “Colorless green ideas sleep furiously.”, which Chomsky used to suggest separation between grammatical structure and meaning. Click to enlarge.

Team Generativism: Chomsky and many colleagues’ famous position, which was the mainstream for some decades, is that humans have genetically determined grammatical knowledge which helps us master our native language(s) despite insufficient input (“poverty of stimulus”). Importantly, statistical learning generally plays no significant role in these theories. All language use relies on rule-based, hierarchical representations. If the theory is correct, ChatGPT has nothing to do with human language processing.

Team Connectionism / Team Usage-based Construction Grammar: Sensitivity to statistical patterns plays a huge role in associating words and phrases with meaning, and determining how units can appear together. The connections between units which appear together often become strengthened, which links to Hebbian learning, long-term potentiation (“what fires together, wires together”) and similar concepts in neuroscience. Our language experience is sufficient for picking up all important patterns. If this is true, well, ChatGPT can explain this itself (prompt: “Write an article arguing that you process language like a human.”)

“[M]y processing capabilities are designed to emulate the way humans understand and generate language. One of the key ways that I process language like a human is through the use of neural networks [which are] modeled after the structure of the human brain […] . Another way that I process language like a human is through the use of probability distributions. When generating language, I take into account the likelihood of certain words or phrases following others in a given context. This allows me to generate responses that are coherent and natural-sounding, just like a human might do. […] In conclusion, while I may not be a conscious being like a human, I process language in a way that is designed to emulate the way that humans understand and generate language.”
- ChatGPT

(edited because ChatGPT is dumb and produced a lot of redundant statements)

Here is my experience with this debate: In the lab, it is very, very easy to demonstrate statistical learning. We pick up patterns immediately, no matter whether stimuli are visual, auditory, or tactile, and regardless of age (i.e. infants can do it). We also do it without instruction. In comparison, it is devilishly hard to get participants to learn rules governing hierarchical structures. It requires a lot more feedback and other guidance. Given how important language is to us, I’d be very surprised if we didn’t use the quick solutions where possible. While this wouldn’t mean that hierarchical, more rule-like representations don’t exist, their role would be diminished compared to their value in generative theories.

Importantly, theories incorporating statistical learning explain, and even predict, some patterns of language use that pose serious challenges to rule-based generativist theories. This is true for language acquisition, but also (if I may plug my own work) for patterns of language impairment caused by brain damage, such as in people with aphasia or dementia.

As for the comparison with AI, we speak like we do because of some years of maturation and cognitive-sensory experience. ChatGPT speaks like it does because it’s been trained on a language corpus containing more than 10 billion word tokens, which, based on estimates by Harald Baayen and others, is more than a human will encounter in their lifetime. Regardless of what your theory of language acquisition is, ChatGPT is a pretty brute force solution. However, while AI systems have no insight, they have fascinating breadth. I cannot write your Python code, or list all the moons of Jupiter. I’d be stumped if you asked me to list and summarize literary works from the Egyptian classic literature. ChatGPT can do all of these things. Many of these billions of words in the training corpus are there to enable it to spew out almost encyclopedial knowledge.

To understand how close AI solutions are to nature, it would be useful to know how much training and computational power ChatGPT and other system need to chat like one normal human, with normal limitations to their knowledge.

Are there generative ("Chomskyan”) alternatives to AI? I don’t think so. As far as I know attempts were mostly abandoned in favour of big data solutions of which ChatGPT (and others) are the latest iterations. If AI is a testing ground for cognitive theory, statistical learning theories are ahead.

As for the current gap between AI and humans, we need to consider the whole range of human features completely unavailable to ChatGPT. We don’t know if statistical learning would give rise to something we’d identify as intuition, or creativity, if these systems had sensory experiences, survival needs, communities and reward systems closer to our own. Consider intention as a primary motivator for communication: Some years ago, in a conversation with an engineer from Amazon, it ocurred to me that an AI will never be human-like if it only wants you to watch ads or buy products. As Adele Goldberg tweeted, ChatGPT may be ahead of previous systems because it tries to be relevant and cooperative. We consider these human motivations. What if we added more?