Bigger than a Blackbox

Animal, vegetable, or mineral? For readers outside American culture: That question is the common start of the game 20 Questions, in which the objective is to guess a target object. Those three categories, in which the last contains objects such “your car,” “this spoon,” and maybe “the Laramide Orogeny,” circumscribe the search space for the next 19 questions. You can look up this game on Wikipedia [20Qs]. A common second question, “Is it bigger than a breadbox?”, is often more useful than it sounds, although most generations need to be shown the size of a breadbox. Yet modern science and ontology render these conceptions specious; no one thinks that the universe is somehow synthesized from base features such as animal/vegetable/mineral and size-relative-to-breadbox.

Although the wild discourse on modern AI and computing troubles me, seldom do I find an actual claim to contradict. So I thank Matt Welsh [Welsh2023], who states in the January CACM that “nobody actually understands how large AI models work” a point he reiterates in the Letters to the Editor of the July 2023 issue. See also Beer on the BBC [Beer2023], and… many others. Although current work examines explainability, the units are standard statistical measures or decision paths or saliency maps [Balasubramanian2022, Buckner2019], all of which require propositional treatment of extracted discrete features, which seems to beg Welsh’s question, sidestepping an “understanding” of the artificial neural network that emerges from the simulation of annealing.

Let me offer a critique from a different angle, posing this question: What would it mean to actually understand what large AI models do? Let’s focus on the adjustment of weights on network nodes that seem to be representations of something, where we’re troubled because we can’t tell what they represent. Consider a simple example of an image classifier for recognizing a dog, showing impressive success after training. We’re puzzled because the nodes don’t seem to obviously correspond to the features that we ourselves pinpoint as determining dog-ness—a wet nose, ears, fur, and so forth.

But why would they?

Words or morphemes would be great explanations, but we can immediately dispense with the bizarre idea that arbitrary symbols appear. Suppose, even, that the training data includes strings of characters, labeling “nose,” “ears,” etc. If the the final network associates them with nodes, the correspondence carries no meaning beyond text.
What else is there to show us how this works? The node weight may be viewed as having meaning based on its location relative to other weights. Of course! But that tells us nothing more than what we already know. The connections are in the working program.
Note that we assume that computational feature identification must be the same thing that we describe, that is, to find the nose, ears, fur, etc… but is it? Perhaps we’re actually discerning the gait, the tilt of the head, the shape of the muzzle, the nap of the fur rather than the fur itself. People may successfully identify objects without accurate articulation of the features used. The question whether the nodes capture any of those is the same problem.
And what about the connections themselves? That there is some network of nodes and connections that respresents reality we can grant as a stipulation… perhaps speculative. But why would the artificial neural network developed under the constraints of the system’s architecture be isomorphic to the “real” network; that is, why would we expect nodes in the same places?
As the crux of the matter, why do we assume that a non-human system would discern “our” features at all? Past Western societies used blood, bile, melancholy, and phlegm as features of human personality, and if those features turned up as weighted nodes, imagine how surprised we would be! In other words, why would we expect to find a node that stands for “wet nose” (and how would we find it anyway)?—That’s like expecting to find a particular node that distinguishes among animal, vegetable, and mineral, a rather silly aspiration.

So the search for meaningful feature nodes is misguided. Because we are the standard of perception and cognition, we seem to assume that features salient to us humans are those that are salient in reality. Maybe we are not the measure of all things. (Under a different paradigm, dog-ness would be recognized by DNA, not open to our perceptual apparatus.) There is no reason to think that image computation grows from some fundamental set of human-favored features, even though it is starting with pixels that are close to what the human vision system perceives (as far as I know). Image recognition’s many pesky spurious associations between background and subject simply emphasize this detachment from human concepts.

I wrote about a related phenomenon [Hill2017], the structure of word vectors built in systems such as word2vec, and why its vectors for “king,” “male,” and “female” can be arithmetically composed with a result close to its vector for “queen.” The result is very intriguing, yet how could it be otherwise? Since the input data is artifactual, gleaned directly or indirectly from dictionary definitions, it’s no surprise that the structures built reflected the structures input, to put it crudely. Our semantics is not universal, but parochial, in a closed system.

This is not to say that AI in the form of deep learning produces no useful tools. This is to say that our expectation of understanding is flawed. When we wallow around in data from people, such as word vectors, we erroneously inflate our results to the reality of nature; when we process data from nature, such as lightwaves captured in images, we compress the results into expression in human terms. I trust that researchers are already investigating our understanding of this “understanding.” The Stanford Encyclopedia of Philosophy calls for more attention to these issues [Buckner2019], noting “rich opportunities for philosophical research.” I hope others join me in looking forward to that, and maybe contributing to it.

Meanwhile, we know exactly how deep learning works. It’s not a black box. It’s a complex data-gobbling number-crunching white box. We just can’t force the results into our familiar concepts. When we try, and fail, it all starts to look mysterious.

References

[20Qs] Wikipedia contributors. (2023, June 27). Twenty questions. In Wikipedia, The Free Encyclopedia. Retrieved 22:17, July 28, 2023, from

[Balasubramanian2022] Vineeth N. Balasubramanian. 2022. Toward explainable deep learning. Commun. ACM 65, 11 (November 2022), 68–69.

[Beer2023] David Beer. 2023. Why humans will never understand AI. BBC Future, Machine Minds. 7 April 2023.

[Buckner2019] Cameron Buckner and James Garson. Connectionism. The Stanford Encyclopedia of Philosophy (Fall 2019 Edition), Edward N. Zalta (ed.).

[Hill2017] Robin K. Hill. Deep Dictionary. Blog@CACM. June 20 2017.

[Welsh2023] Matt Welsh. 2022. The End of Programming. Commun. ACM 66, 1 (January 2023), 34–35.

Robin K. Hill is a lecturer in the Department of Computer Science and an affiliate of both the Department of Philosophy and Religious Studies and the Wyoming Institute for Humanities Research at the University of Wyoming. She has been a member of ACM since 1978.