UlrikeHahn. “Stochastic parrot” is a misleading metaphor for LLMs. April 1, 2023.
https://write.as/ulrikehahn/stochastic- ... r-for-llms writes:
> Metaphors are hugely important both to how we think about things and how we structure debate, as a long research tradition within cognitive science attests [1]. Metaphors, as tools, can make us think better about an issue, but they can also lead us astray, depending on what relevant characteristics metaphors make clear and what they obscure. The notion that large language models (LLMs) are, in effect, “stochastic parrots” currently plays a central role in debate on LLMs. What follows are my thoughts on ways in which the metaphor is (now) creating confusion and hindering progress.
> ... what is the stochastic parrot metaphor? According to Bender and colleagues (2001),
>
> > Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that.
>
> In short, LLMs lack, either partly or wholly, “situationally embedded meaning”. In line with this, I take the phrase “stochastic parrot” to make salient three main things. Like the ‘speech’ of a parrot, the output of LLMs 1) involves repetition without understanding 2) albeit with some probabilistic, generative component, and, in that, it is very much 3) unlike what humans do or produce.
> ... the metaphor seems both useful and undoubtedly effective.
>
> Beyond that, however, I now see it giving rise to the following problems in the wider debate:
> 1. confusion between what’s ‘in the head’ and ‘in the world’
> 2. a false sense of confidence regarding how LLMs and human cognition ‘work’
> 3. an illusion of explanatory depth
> 4. a misdirection of evaluative effort
> 5. a misdirection of discussion about risks and harms
Ad 1:
> ... when Polly the pet parrot says “Polly has a biscuit” that (in some sense) ‘means’ something, and it can be true or false, regardless of whether Polly *herself* has any idea whatsoever what those sounds she produces ‘mean’, let alone a concept of ‘language’, ‘meaning’, or ‘truth’.
>
> This follows simply from the fact that this aspect of meaning doesn’t rest on any single head, artificial or otherwise, but rather on the practice of a community. And whether “Polly has a biscuit” is true depends not on Polly’s grasp of human language, but on whether she actually has a biscuit.
> My pocket calculator doesn’t ‘have a grasp of meaning’. It doesn’t ‘understand’ that 2+2 = 4. But that doesn’t stop it being useful. That utility ultimately rests on there being a semantic mapping somewhere; the calculator would be of no use if 2 and 4 didn’t ‘mean’ something, at least to me. But that doesn’t require that mapping to be internal to the calculator or in any way accessible to it. It simply isn’t something the calculator itself has to ‘know’.
Ad 2:
> The framework of cognitive science, which tries to understand human thought as ‘computation’ or information processing, itself exists as a discipline precisely because we *do not* understand (fully) how human language or thought actually work.
> ... we very much don’t know how LLMs ‘work’ either.
> Neither ‘just repetition’ nor ‘next token prediction’ suffice to explain the production of a made up reference to a fictitious author, with a fictitious title, complete with fake DOI, as this reference was never *in the input* nor (for the same reason) actually ever “the most likely next token” in any straightforward way.
Ad 3:
> ... not a single computational system we built in the past has, arguably, had “access to situationally embedded meaning” in the sense Bender et al. described above. This includes any simple script or computer programme I have ever written and run (functional or not), through the basic computational devices such as a pocket calculator, through a wide range of now essential systems such as electronic databases, on to computational systems that, by whatever design approach, manage to far exceed aspects of human performance...
> ... it seems hard to sustain the notion... explains anything specifically about LLMs themselves either.
Ad 4:
> ... falsely in my view, suggests that we know something about the possible behaviour of such systems, without having to look in any detail at their actual behaviour and performance.
> To understand what LLMs can do, what they can do well, and where they fail, we have to look at and evaluate the behaviour of actual systems. Those capabilities are an empirical question. They vary across LLMs to date, and those varying capabilities in turn determine what useful functions such systems could perform. None of that work can be short circuited by an in principle consideration...
Ad 5:
> If one takes the lack of ‘situationally embedded meaning’ to fundamentally restrict what a computational system can do, then it might also make sense to take that fact to limit what harms such a system could do now or in future.
>
> It should, by now, be clear that ‘lack of situationally embedded meaning’ patently does not (in my view) sufficiently restrict function for that argument to go through.
> ... there is an inductive argument to be made that there is additional cause for concern, beyond present risks and harms, based on the, empirically observed, rapid improvement of performance as a function of increase in scale in language models to date...
Conclusions:
> Whether one cares more about current, or more about potential future problems, both or neither, is a value judgment. The extent to which absence of situationally embedded meaning restricts future performance, and hence risk, by contrast, is a causal, empirical claim.
>
>It is an empirical issue what LLMs can do, and it is an empirical issue how they (or human beings) actually work, and what role situationally embedded meaning might play in that. The ‘stochastic parrots’ metaphor conveys something about an otherwise complex and opaque bit of technology, and to that extent, it has been helpful.
>
> But my impression is that it is now a red herring that misleads and distracts. It blocks and derails conversation unintentionally by pointing our thoughts in the wrong direction if we care about how these systems work and what they can do. Even worse, I think it now also functions to block conversation intentionally with increasingly exasperated restatements (ie., “they are just stochastic parrots” —why don’t you get that?).
>
> I think our discourse around LLMs would improve if we shifted our focus. So I would suggest that we put the metaphor to rest, at least for a bit.
References
[1] Lakoff, G., & Johnson, M. (2008). Metaphors we live by. University of Chicago press.
[2] Labov, W. (1973). The boundaries of words and their meanings. New ways of analyzing variation in English.
[3] Putnam, H. (1975). The meaning of” meaning”. Philosophical Papers, Mind, Language, and Reality, 2, 215-271.
[4] Katz, Daniel Martin and Bommarito, Michael James and Gao, Shang and Arredondo, Pablo, GPT-4 Passes the Bar Exam (March 15, 2023). Available at SSRN:
https://ssrn.com/abstract=4389233
[5] Bowman, S. (2023) Eight things to know about language models.
https://cims.nyu.edu/~sbowman/eightthings.pdf