Animal, vegetable, or mineral? For readers outside American culture: That question is the common start of the game 20 Questions, in which the objective is to guess a target object. Those three categories, in which the last contains objects such "your car," "this spoon," and maybe "the Laramide Orogeny," circumscribe the search space for the next 19 questions. You can look up this game on Wikipedia [20Qs]. A common second question, "Is it bigger than a breadbox?", is often more useful than it sounds, although most generations need to be shown the size of a breadbox. Yet modern science and ontology render these conceptions specious; no one thinks that the universe is somehow synthesized from base features such as animal/vegetable/mineral and size-relative-to-breadbox.
Although the wild discourse on modern AI and computing troubles me, seldom do I find an actual claim to contradict. So I thank Matt Welsh [Welsh2023], who states in the January CACM that "nobody actually understands how large AI models work" a point he reiterates in the Letters to the Editor of the July 2023 issue. See also Beer on the BBC [Beer2023], and... many others. Although current work examines explainability, the units are standard statistical measures or decision paths or saliency maps [Balasubramanian2022, Buckner2019], all of which require propositional treatment of extracted discrete features, which seems to beg Welsh's question, sidestepping an "understanding" of the artificial neural network that emerges from the simulation of annealing.
Let me offer a critique from a different angle, posing this question: What would it mean to actually understand what large AI models do? Let's focus on the adjustment of weights on network nodes that seem to be representations of something, where we're troubled because we can't tell what they represent. Consider a simple example of an image classifier for recognizing a dog, showing impressive success after training. We're puzzled because the nodes don't seem to obviously correspond to the features that we ourselves pinpoint as determining dog-ness—a wet nose, ears, fur, and so forth.
But why would they?
So the search for meaningful feature nodes is misguided. Because we are the standard of perception and cognition, we seem to assume that features salient to us humans are those that are salient in reality. Maybe we are not the measure of all things. (Under a different paradigm, dog-ness would be recognized by DNA, not open to our perceptual apparatus.) There is no reason to think that image computation grows from some fundamental set of human-favored features, even though it is starting with pixels that are close to what the human vision system perceives (as far as I know). Image recognition's many pesky spurious associations between background and subject simply emphasize this detachment from human concepts.
I wrote about a related phenomenon [Hill2017], the structure of word vectors built in systems such as word2vec, and why its vectors for "king," "male," and "female" can be arithmetically composed with a result close to its vector for "queen." The result is very intriguing, yet how could it be otherwise? Since the input data is artifactual, gleaned directly or indirectly from dictionary definitions, it's no surprise that the structures built reflected the structures input, to put it crudely. Our semantics is not universal, but parochial, in a closed system.
This is not to say that AI in the form of deep learning produces no useful tools. This is to say that our expectation of understanding is flawed. When we wallow around in data from people, such as word vectors, we erroneously inflate our results to the reality of nature; when we process data from nature, such as lightwaves captured in images, we compress the results into expression in human terms. I trust that researchers are already investigating our understanding of this "understanding." The Stanford Encyclopedia of Philosophy calls for more attention to these issues [Buckner2019], noting "rich opportunities for philosophical research." I hope others join me in looking forward to that, and maybe contributing to it.
Meanwhile, we know exactly how deep learning works. It's not a black box. It's a complex data-gobbling number-crunching white box. We just can't force the results into our familiar concepts. When we try, and fail, it all starts to look mysterious.
[20Qs] Wikipedia contributors. (2023, June 27). Twenty questions. In Wikipedia, The Free Encyclopedia. Retrieved 22:17, July 28, 2023, from
[Balasubramanian2022] Vineeth N. Balasubramanian. 2022. Toward explainable deep learning. Commun. ACM 65, 11 (November 2022), 68–69.
[Beer2023] David Beer. 2023. Why humans will never understand AI. BBC Future, Machine Minds. 7 April 2023.
[Buckner2019] Cameron Buckner and James Garson. Connectionism. The Stanford Encyclopedia of Philosophy (Fall 2019 Edition), Edward N. Zalta (ed.).
[Hill2017] Robin K. Hill. Deep Dictionary. Blog@CACM. June 20 2017.
[Welsh2023] Matt Welsh. 2022. The End of Programming. Commun. ACM 66, 1 (January 2023), 34–35.
Robin K. Hill is a lecturer in the Department of Computer Science and an affiliate of both the Department of Philosophy and Religious Studies and the Wyoming Institute for Humanities Research at the University of Wyoming. She has been a member of ACM since 1978.
No entries found