Practice
Artificial Intelligence and Machine Learning Practice

Echoes of Intelligence

Textual interpretation and large language models.

Posted
man holds a poodle while riding an escalator

back to top 

What counts is not what people actually know, but what people believe that everyone knows, and which is thus taken as a common background.
    — Patrizia Violi

The rising popularity of AI systems over the past few months is remarkable. Large language models (LLMs), once confined to little more than curiosities within AI labs or the topic of research papers, have now been deployed by corporations, making them available for public consumption in the form of various chat-like applications.

In many cases, users of these AI-powered applications are now greeted by an easy-to-use interface allowing them to send prompts to the LLM to generate a response. The quality of the text produced by these recent models is impressive compared to past attempts. In many cases, it’s almost impossible to tell if the result was written by a human or a chat app. This has led the tech sector, and the public in general, to speculate about possible uses of AI—from generating poetry, fiction, and restaurant recommendations, to extracting questions and answers from a text corpus.

It also raises the question: How does an LLM work?

An LLM can function in a generic way as a “plausibility machine.” It will consider some input sent by the user—the prompt—and will generate the text that would most probably follow. (For a deep dive into the technical aspects of how an LLM works, see Stephen Wolfram’s “What Is ChatGPT Doing … and Why Does It Work?”18)

Since the LLM is trained with signs alone, it cannot have a concept of understanding, as explained by Emily Bender et al. in “On the Dangers of Stochastic Parrots.”1 Meaning is beyond words and their syntactical arrangement. Meaning is communal, produced, and agreed upon by language users16—something to which the LLM has no access.

In the 2017 book Language in Our Brain, Angela Friederici explains: “… for each word there are two types of semantic representations: a linguistic-semantic representation on the one hand and a conceptual-semantic representation on the other hand.”10

Take, for example, the sentence: He picked a rose. Having a linguistic representation of a rose—a certain type of flower—is more than enough to process the sentence on a linguistic level. On a conceptual-semantic level, however, the human brain can link the word rose to its aroma, to the Valentine’s Day centerpiece on the restaurant table, or to the pain felt when trying to grab one from a garden, unaware of its thorns. It’s clear there is a second layer of meaning that goes beyond matching words to a dictionary.

Whereas modern apps demand attention, LLMs demand interpretation. When presented with information, people tend to try to assign it some meaning. This issue is presented by the Stochastic Parrots authors: “… the tendency of human interlocutors to impute meaning where there is none can mislead both NLP (natural language processing) researchers and the general public into taking synthetic text as meaningful.”1

Where and how does a human interlocutor impute meaning to synthetic text? Let’s echo literary theorist Terry Eagleton when he asks: What is involved in the act of reading?5

Back to Top

Of Dogs and Escalators

In Literary Theory: An Introduction, Eagleton proposes the following situation: Imagine you see a sign in the London Underground system that says: Dogs must be carried on the escalator. While the sentence might sound simple, Eagleton wonders:

  • Does it mean you must carry a dog on the escalator?
  • Are you going to be banned from the escalator unless you find a stray dog to carry?
  • Is “carried” meant to be taken metaphorically to help dogs get through life?
  • How do you know this is not a decoration?

Also, you are expected to understand:

  • The sign has been placed there by some authority.
  • “Escalator” means this escalator and not some escalator in Paraguay.
  • “Must be” means “must be now.”

This example illustrates how a simple sentence lends itself to multiple interpretations. Humans understand the multiple codes in place to perform the correct reading of the sign: If you bring a dog to the London Underground, carry it while you use the escalator.

This brings us to the idea of the two levels of interpretation, as described by Umberto Eco in The Limits of Interpretation.8 (Since this article borrows from Eco’s work, the key terms found there are italicized.)

On the first level, there is a semantic interpretation, which is the process where readers go through the linear manifestation of the text, they fill it up with meaning. On the second level, there is a critical interpretation. Here, the goal is to describe, from a metalinguistic point of view, the reasons why a text produces a certain response among its readers.

Let’s look at some of the codes used by readers to insert meaning into a text. (For the full detailed discussion, see The Role of the Reader.6)

Back to Top

The Role of the LLM Reader

Eco presents a framework in The Role of the Reader,6,9 explaining a series of codes used by readers as they transform the text expression into content. These codes are built on top of the text itself. English, with its dictionary and its syntactic rules, is but one example of a code. Traffic signals, with their red, yellow, and green lights, are another code used to signify who has the right of way at an intersection. The layout of a book—with a chapter title at the top of the page, the text separated into paragraphs, footnotes at the bottom, and page numbers above or below—is also a code that humans have learned to read a book. Readers do not necessarily read the chapter title at every page turn, despite it appearing at the top of every odd page, because they understand the code presented by a book’s layout and typography.

When they see text such as, “Once upon a time, there was a young princess called Snow White. She was very pretty,” readers first use a simple dictionary understanding to detect the most basic properties of the words. For example, since Snow White is a princess, she’s probably a woman. Woman activates ideas like human, having certain body parts, and so on. At this stage, readers are unaware which properties are relevant to comprehending the text unless they continue reading. Would it be important to know that a human body can get severely ill if it ingests some sort of potion?

Then there are rules of co-reference. In the Snow White example, readers can decide that the she mentioned in the second sentence refers to the princess from the first one. Again, none of these instructions are explicit in the text; the connections are made by the readers.

The next set of codes are related to contextual and circumstantial selections. When people understand the escalator from the initial example refers to the escalator from the current Tube station, then, as Eco says, they are making a circumstantial selection that connects the act of utterance with the extraverbal environment. The same sign hung in a bedroom has a completely different meaning.

With contextual selections, readers are expected to go from a basic dictionary understanding of each word to that of an encyclopedia. While the word princess might appear in many contexts, readers are expected to understand that in a children’s story, a lot of information that pertains to princesses isn’t relevant to the story, unless the author explicitly makes it so. A real-world princess might be part of a monarchy, with all its implications, while a fairy-tale princess is not. More importantly, an encyclopedia moves interpretation from that of the matching rules offered by a code like the dictionary to that of a “system of possible inferences,” which introduce interpretative freedom.16

Thanks to the readers’ own encyclopedic competence, they might know what a princess could be, in the whole sense of the concept, but that is not necessarily what the text needs it to be. Everything the text doesn’t mention is left as a possibility that could be actualized later or be left as is.

As mentioned earlier, a princess might activate the idea of a woman, and therefore a human. While neither the text nor the author might tell readers which properties of being a human or a princess are relevant for the rest of the story—whether it is having organs or dressing in a particular manner—because of encyclopedic knowledge, these properties remain latent from the moment you read them, and they might become relevant once the princess in the fairy tale is poisoned. Why would poison affect a fictional princess? Because fictional worlds are parasites of the real world; if alternate properties are not spelled out, then you assume those of the real world (that is, poison harms a princess).7

Now, you might ask: Why does the previous paragraph refer to a fairy tale? Nowhere in the Snow White example does the text explicitly talk about a children’s story. This brings us to the next code: rhetorical and stylistic overcoding.

In this scenario, “once upon a time” is a figure of speech that tells readers to expect a fictional account of events that don’t relate to the real world, and that this story is most likely targeted at kids, since that’s a literary convention of fairy tales. Many of those types of expressions in daily life help people contextualize the remainder of the text. Think of when someone addresses a group of people at the start of a speech as “Ladies and gentlemen,” regardless of the presence of ladies and gentlemen in the crowd, or if the speaker considers them such. The meaning of such expressions is taken from a code that interprets the figures of speech, instead of word by word.

Another form of literary convention is when a reader understands the I in many stories isn’t necessarily the empirical author of the book. When author Jorge Luis Borges starts a short story with: “My first recollection of Funes is quite clear, I see him at dusk, sometime in March or February of the year ’84. That year, my father had taken me to spend the summer at Fray Bentos,”3 readers know, or are expected to know, the “I” does not refer to Borges, even though it is not improbable for Borges to have traveled with his father to Fray Bentos, Uruguay. Also, due to literary conventions, the reader understands that him in the text refers to Funes, since usually a story speaks about the character for which it is named.6 You can see how readers are doing a good deal of work for the text to function. Let’s see the last code—intertextuality.

Literary critic Julia Kristeva said, “any text is constructed as a mosaic of quotations; any text is the absorption and transformation of another,” introducing the notion of intertextuality into European semiotics.12 Eco said by performing inferences by intertextual frames, readers bring in other texts to actualize the one they are reading. For example, in Don Quixote, Miguel de Cervantes expected his readers to know about chivalric romances of his time so they would understand the irony of the adventures of the unlikely hero, Alonso Quijano.

Authors are not always as explicit as Cervantes; they may latch onto archetypes, as in “rags to riches,” “voyage and return,” “the quest,” and others as described by Christopher Booker in The Seven Basic Plots.2 A text is a dialogue between texts. (While literary theory usually concerns itself with books, exposure to different platforms fills this intertextual knowledge—today more than ever—from social media with its memes, to streamed TV shows, to more classical media such as newspapers.)


One difference between text written by a human and synthetic text generated by an LLM is the former is produced by an author with intentions.


These are just a few of the codes employed when readers are confronted with the task of interpreting text; so, it’s no wonder that humans find the notion of text produced by an LLM a sensible one. Besides having the knowledge of a basic dictionary, and the understanding of the codes of textual coherence, humans have access to a semantic encyclopedia to match the words of the produced text along with a real world from which to borrow properties that have not been spelled out in the synthetic text. Additionally, intertextual knowledge also kicks in and recognizes genre motifs, even letting readers predict how the text will develop.

Back to Top

Model Authors and Model Readers

One difference between text written by a human and synthetic text generated by an LLM is the former is produced by an author with intentions. Whether it’s a serious essay or ironic prose on an Internet forum, authors have intentions, and these intentions condition the text they produce. From the language chosen to express their message, to the type of encyclopedic knowledge they expect from their readers, an author makes a lot of decisions to ensure the semantic receiver of their message matches the statistical semantic characteristics of those receivers.15

In The Role of the Reader, Eco refers to this ideal reader as the “Model Reader,” with the counterpart being the “Model Author.” These terms do not refer to the empirical author or reader, mind you, but they are taken as textual strategies employed by both with the goal of having a successful interpretation of the text. Eco presents these two concepts to describe the co-operation between empirical author and empirical reader. Because, as he puts it: “A text is a lazy (or economic) mechanism that lives on the surplus value of meaning introduced by the recipient …”

To return to the initial example of a dog on an escalator, based on the Model Author image, a reader knows that a sign hung on the wall of the London Underground, with a specific color and typography, must have been placed there by a certain authority that will have the capacity to enforce what the sign says.

When interpreting text produced by an LLM, however, who is the Model Author? What semantic and encyclopedic competence does this author have? Are there intentions behind the LLM’s generated text?

Some of these questions can be answered by examining how the models have been trained. In “Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus,” Dodge et al. discuss what type of text corpora companies use for training LLMs deployed to the public via chat apps, search engines, and similar applications.4 They explain how, from a corpus of unfiltered English text, many editing passes are made to remove text ranging from filtering out data that doesn’t resemble English, to removing tokens from a banned word list. The authors explain that this type of filtering “disproportionately removes documents in dialects of English associated with minority identities.” Additionally, filters remove documents that contain language deemed obscene. While some of these filters might be regarded as appropriate, it’s in the public interest to be aware of this type of filtering because despite the current fascination with LLM-generated text, users need as many clues as possible to frame synthetic text in an adequate way.

How does encyclopedic competence differ from semantic competence and why does it matter for LLMs, its programmers, and its users? Patrizia Violi explains: ” … there are facts that, when ignored, denote a scarce or insufficient cultural knowledge but do not have any consequence upon our linguistic ability, and that there are facts, the ignorance of which, demonstrates a lack of linguistic competence.”17

Semantic competence allows people to become users of a language, while encyclopedic competence indicates those users belong to a particular culture. In countries across the Spanish-speaking world, for example, this is quite common. While only one Spanish dictionary exists, two speakers from different Latin American countries might understand each other well on a lexical level, but not understand the meaning of certain words in a specific context.


Semantic competence allows people to become users of a language, while encyclopedic competence indicates those users belong to a particular culture.


Therefore, an encyclopedia is an intersubjective concept that helps define a culture. This intersubjective agreement regulates what things can possibly mean, but it’s an agreement that must be verified from time to time.17 Since the encyclopedia regulates meaning, then work like Dodge et al. becomes crucial, as it documents how LLMs build their encyclopedias.

Back to Top

What Game Are You Playing, LLM?

The Austrian philosopher Ludwig Wittgenstein introduced the idea of language games to describe the way people talk to each other. He posited that in the same way there are rules for playing a game of chess, utterances can be defined according to the rules that specify how they should be used.13 Explicit or not, every conversation seems to carry rules.

In The Postmodern Condition, Jean-François Lyotard explains these rules, which are not necessarily explicit or known by the players, can break communication if they are modified or ignored.13 “See you tomorrow” means one thing between two friends after school, something else between a school principal to a student after a disciplinary talk, and something else from a friend boarding a plane for a six-month exchange trip.

The first one is a phatic expression (used to maintain social relationships), the second one an order, and the last one a goodbye joke. Whether the friends see each other the next day does not matter, but the principal will be concerned if the student does not show up the next day in the office. If you look at these exchanges as language games, you can see how they set up certain expectations from each player.

So, what are the rules of a language game played with an LLM? What are the intentions of the synthetic text? Should the reader put all their encyclopedic knowledge into play to help the synthetic text work? Without explicit rules, it appears humans will end up making their own, with a tendency to humanize the interlocutor, known as the ELIZA effect, named after the ELIZA chatbot created by Joseph Weizenbaum of MIT in 1966. Douglas Hofstadter defines it as “the susceptibility of people to read far more understanding than is warranted into strings of symbols—especially words—strung together by computers.”11

Back to Top

Building the LLM Reader

Mitchell et al. wrote of the idea of “Model Cards” as a way to adjunct information to machine learning models, indicating their training details, performance, and so on.14 In a similar fashion, applications that bring content produced by LLMs to the public should provide enough clues, by way of tags and other user interface features, for users to understand the provenance of the information presented to them.

If a text is a “syntactic-semantic-pragmatic” device whose foreseen interpretation is part of its generative process, as Eco says,6 then applications that present LLM-generated text should aid the pragmatic aspect of interpretation.

Think about a book cover and how it helps a reader contextualize the text. Usually, a cover provides clues about the type of book: fiction or textbook, among others. The back cover helps identify the author and lets readers place the book in a time period.

A newspaper has a certain shape and typography that clearly indicates much of its content is time sensitive. A similar identification stamp should occur when readers are presented with text generated by an LLM to fairly facilitate the interpretation of the generated text to avoid, among other things, being taken as oracle-like responses to the world.

While industry-accepted guidelines for LLMs and their applications do not yet exist, a good starting point would be to expect these types of applications to disclose the text corpora used to train the model. Additionally, details of the process used for reinforcement learning with human feedback (RHLF) should be known, such as the diversity of the human group that provided feedback, or which languages they speak.

With the risk of tracing a parallel between LLMs and humans, it would be beneficial for users to understand which encyclopedia underlies the Model Author they project onto an LLM. When you read a book, for example, you may project an author with certain knowledge and a collection of pieces the author has written before. Based on those expectations, you form a strategy that helps you understand the text and what the author might have meant. For example, it’s impossible for a 14th-century Italian to know about the U.S. continent, so a reader would not expect Dante Alighieri to include it in his Divina Commedia. On the other hand, if a current Italian author claims there is nothing beyond the Atlantic Ocean, you might think it’s a joke. While these examples may sound contrived, they clearly show that before any interpretation effort, it is very important to be aware of the freshness of the encyclopedia available to any author, let alone an LLM producing synthetic text.

There are many ways in which a text can help a reader build the necessary context for its interpretation. In the case of LLM-generated text, providing citations for the generated response, together with information labeling the external systems consulted, aids the pragmatic response to the text. Having an LLM-generated response for a specific question is not the same as having an LLM parse a user’s prompt as a question and then produce an answer by bringing a summary of the articles produced by a Web search. In the first case, the answer is generated by the LLM—remember, an LLM generates the next most probable token.18 In the second case, it’s a summary from human-produced sources. The presentation and labeling of the text should be clear enough for the user to tell which is which.

Back to Top

Conclusion

In this article, we have shown how much a reader helps the lazy mechanism, which is a text work to produce its meaning. On the other hand, an author should build a reader so both can meet at the text’s interpretation. Now we are in the presence of a new medium disguised as good old text, but that text has been generated by an LLM, without authorial intention—an aspect that, if known beforehand, completely changes the expectations and response a human should have from a piece of text. Should our interpretation capabilities be engaged? If yes, under what conditions? The rules of the language game should be spelled out; they should not be passed over in silence.

Back to Top

Acknowledgments

Thanks to Silvana F., Daniel P., and Sergio S. Your discussions greatly helped me shape this text.

Back to Top

Back to Top

 

    1. Bender, E.M., Gebru, T., McMillan-Major, A., Schmitchell, S. On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conf. Fairness, Accountability, and Transparency, 610–623; https://dl.acm.org/doi/10.1145/3442188.3445922.

    2. Booker, C. The Seven Basic Plots. Bloomsbury Continuum, NY, 2004.

    3. Borges, J.L., Hurley, A. Funes the Memorious. Ficciones, Grove Press, 1962.

    4. Dodge, J., et al. Documenting large webtext corpora: a case study on the colossal clean crawled corpus. In Proceedings of the 2021 Conf. Empirical Methods in Natural Language Processing; https://doi.org/10.18653/v1/2021.emnlp-main.98.

    5. Eagleton, T. Literary Theory: An Introduction. Blackwell Publishing, Hoboken, N.J, 2015.

    6. Eco, U. Introduction. The Role of the Reader. Indiana University Press, Bloomington, IN, 1979.

    7. Eco, U. Small worlds. The Limits of Interpretation. Indiana University Press, Bloomington, IN, 1990, 74–75.

    8. Eco, U. Two levels of interpretation. The Limits of Interpretation. Indiana University Press, Bloomington, IN, 1990, 54–55.

    9. Eco, U. Lector in Fabula: La Cooperazione Interpretativa Nei Testi Narrativi. Bompiani, Editore, Milan, Italy, 2016.

    10. Friederici, A.D. Language as a specific cognitive system. Language in Our Brain: The Origins of a Uniquely Human Capacity. MIT Press, Cambridge, MA, 2017, 3–4.

    11. Hofstadter, D. Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought. Basic Books, New York, NY, 1995.

    12. Kristeva, J. Word, dialogue, and novel. Desire in Language. Columbia University Press, New York, NY, 1980, 66.

    13. Lyotard, J.-F. The method: language games. The Postmodern Condition: A Report on Knowledge. Les Éditions de Minuit, Paris, France, 1979.

    14. Mitchell, M., et al. Model Cards for Model Reporting. In Proceedings of FAT* '19: Con. Fairness, Accountability, and Transparency; https://doi.org/10.1145/3287560.3287596

    15. Shannon, C.E., Weaver, W. The interrelationship of the three levels of communication problems. The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL, 1998.

    16. Violi, P. Individual and communal encyclopedias. Umberto Eco's Alternative: The Politics of Culture and the Ambiguities of Interpretation. Peter Lang, New York, NY, 1998.

    17. Violi, P. Encyclopedic competence and semantic competence. Meaning and Experience. Indiana University Press, Bloomington, IN, 2001, 159–164.

    18. Wolfram, S. What Is ChatGPT doing … and why does it work? Stephen Wolfram Writings, 2023; https://bit.ly/44NwiDf.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More