Why Are the Critical Value and Emergent Behavior of Large Language Models (LLMs) Fake?

Why there are no emergent properties in Large Language Models.

We heard lot about emergent properties of Large Language Models (LLMs) last year. I will share with you my thoughts, and some other scientists, of why there are no emergent properties and especially why the assumed critical value that these so-called emergent properties are based upon is not substantial.

The excitement about emergent properties started with a paper by [1], where the authors show that scaling LLMs beyond a specific size (they claim is critical) then the system provided unexpected behavior. Unexpected in that it was not considered that it can be done like ‘doing’ arithmetics for instance. In support of their claim, the graphs that the authors provided, displayed a sharp jump in the performance of the LLM in terms of accuracy. The problem in their demonstration is the following: They are using logarithmic charts where the x axis represents the weights (i.e., hyperparameters of the neural network of the LLM in use) and is divided as 10^1, 10^2, 10^3…10^10, 10^11 in equally separated units. The sharp jump occurs between 10^10 and 10^11 on the chart. But, this single unit shift between 10^10 and 10^11 is in fact multiplying 10 billion by 10 which means an increase (i.e., shift) of 90 billion! This kind of representation should have been done in linear scale to avoid any misunderstanding of the rate of change of the behavior of the system. If we draw the same graph in [1] in linear scale, the rate of change will appear almost constant [2]. Thus, the system will appear evolving normally, as expected, and 10^10 will not appear as critical and alarming boundary. Besides, expanding the system by 90 billion weights means supporting it with much more data than when increasing from 1,000 parameters to 10,000 (i.e., an increase of 9K) or from 10,000 to 100,000 (i.e., an increase of 90K) which will not require as much data to its training repository compared to when adding 90 billion parameters. For example, the LLM was able to pretend doing addition (considered as emergent behavior) of two numbers by giving the result of the addition because it has seen similar addition operations, and/or sentences containing addition operations and their results, when its training data increased hugely. A counter example would be giving it very large complicated numbers (e.g., 126541478975317 + 97631257998631) then it will not give correct result, because there is less chance that these numbers exist in its training data repository even if it is huge; this is since these numbers become very unique and their encounter is extremely rare or even impossible despite a huge large corpus.

One could easily assume that, in the near future, the problem of adding two huge numbers will be fulfilled in LLMs, and this for instance by lexically catching the occurrence of two numbers, transferring them to a ‘cognitive’ agency software module that performs basic logical operations, which is connected to the LLM, and relaying the result of the operations back to the LLM. However, this will be called “implemented behavior” and not “emergent behavior.”

Last but not least, I add the following remarks from [3, 4] in support of my argument. In a series of more than 1,000 experiments, the authors in [3] found no evidence of emergent reasoning abilities of LLMs and the authors in [4] claim that the metrics used for evaluation of LLMs are the source of the emergence assumption problem.

Finally, in a small parenthesis about programming—emergent behavior—based on AI, I am not saying that if we let an AI system keep on analyzing and generating new programs for example, that it will not one day write a magnificent piece of code, because in slight and ironic comparison to the Infinite Monkey Theorem, one day it will.

References

1-Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., … & Fedus, W. (2022). Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.

2-Carter, D. (2023). There are no “emergent abilities” in LLMs. Better Programming https://betterprogramming.pub/there-are-no-emergent-abilities-in-llms-2bb42e17ce7e (Retrieved 23 January 2024)

3-Lu, S., Bigoulaeva, I., Sachdeva, R., Madabushi, H. T., & Gurevych, I. (2023). Are Emergent Abilities in Large Language Models just In-Context Learning?. arXiv preprint arXiv:2309.01809.

4-Schaeffer, R., Miranda, B., & Koyejo, S. (2023). Are emergent abilities of Large Language Models a mirage?. arXiv preprint arXiv:2304.15004.

Mario Antoine Aoun is an ACM Professional member who has been a Reviewer for ACM Computing Reviews since 2006. He has more than 25 years of computer programming experience and holds a Ph.D. in Cognitive Informatics from the Université du Québec à Montréal. His main research interest is memory modelling based on chaos theory and spiking neurons.