Language models are statistical techniques for learning a probability distribution over linguistic data by observing samples of language use and encoding their observed presence in a large number of probability parameters. This probability distribution is a representation of human language. With increased processing power and well-designed process models, the resulting language model can be used as a component in a system to generate language for some purpose. Today’s generative language models are tasked to output the most probable or plausible sequence of strings given some window of relevant preceding discourse. Such a generative language model is not a communicative agent in its own right, but a representation of observations of general language use, much as a lexicon and a grammar would be, albeit one which can be interrogated with less expertise.
Probability Distributions Alone Do Not a Language Make
The variation inherent in the probability distributions of linguistic items is enormous, and there is no a priori single most plausible, most appropriate, and most useful path through the possible continuations of a string provided by preceding discourse. To help manage the decision space, the probability distribution can be modulated by modifying the representation of the model with a smaller selected sample of task-specific texts in a fine-tuning,6 instruction training,9 and alignment1 training process, and through carefully curated human feedback sessions where human assessors rate variant outputs for acceptability.8
The intention of such additional fine-tuning, alignment, and instruction is to enable a generative language model to generate task- and situation-appropriate material to fit a discourse of interest, essentially to create a voice out of language. Doing so is not a value-neutral process, and imposes normative constraints on the linguistic capacity of language models.
We Are in an Eliza Moment
Humans practice language usage skills over their lifetime and put great value upon using language effectively. Educational institutions spend great effort in instructing us to follow conventions that are based on general virtues, largely arbitrary with respect to their details, external to the code of language itself, and culturally defined. When human language users encounter conversational counterparts that are able to produce fluent and coherent contributions to discourse, they will ascribe sophistication to such counterparts and view them in a positive light. Generative language models produce impressively fluent language and human users of such models are prone to adopt an intentional stance2 toward such models, and to believe that a generative language model is an erudite and intelligent counterpart. This is, on balance, unfortunate, and is a result from conflating linguistic competence with other desirable human characteristics.
This deeply seated human instinct leads us to use terminology such as “trustworthiness” and “truthfulness” to describe qualities of the output of a generative language model, and to label some of the output “hallucination.” In fact, in light of the current architecture, representation scheme, and processing model, the entire output of a typical generative language model is hallucinated: language model output is not grounded in anything language-external. The target notion of a generative language model is to provide plausible strings as specified by its probability distribution of string segments, and “truthful” or “trustworthy” are not relevant concepts to describe such output. The fact that those strings occasionally or even mostly constitute language that conforms with human experiences does not change that labels such as “truthful” and “untruthful” are not relevant for the language model itself.
This is as it should be: if the linguistic capacity of a system would be constrained by adherence to some notion of truth it would be less versatile and useful, and our human language processing components are perfectly capable of prevarication, deception, and dishonesty, intentionally and not. This is a desirable property of both language and of language users. Language is not truthful or untruthful in itself: utterances are, by virtue of being pronounced in some context for some purpose by some language user.
Language models by themselves lack such purpose. Agents built on top of language models, however, might be implemented to have purpose. Consider an agent (or a bot) that fetches information from a database using input and output in natural language, and that uses a language model to manage the language interaction. If such an agent produces an output that is inconsistent with the information in the database, it would be relevant to talk about truthfulness, but this truthfulness would be a property of the entire system, and not of the language model.
Behavioral Conventions
Human communicative behavior is governed by conventions and rules on very varying levels of abstraction, and many of those conventions are fairly universal. There are human conversational principles that are easy to agree with: most everyone in most every culture will agree that one’s verbal behavior should be honest, relevant, and mindful of one’s counterparts’ feelings; in general, we wish to express ourselves in ways which are informative and which simultaneously establish our social position with an appropriate level of gloss and shine.a How these partially contradictory objectives are reconciled and realized varies across cultures. Linguistics, as a field of study, offers theoretical tools for the analysis of such behavioral choices and actions.
The British linguist Paul Grice formulated a Cooperative Principle for human-human conversation, which in short is “Make your contribution such as required for the purposes of the conversation you are engaged” and which is further specialized into four more specific Maxims of Conversation: the maxims of Quantity, Quality, Manner, and Relevance. Broadly, contributions to conversation should neither be too verbose nor too terse, should be truthful, should hold to the topic of the conversation at hand, and should be comprehensible and fluent.3
Elaborating on the aforementioned maxims, linguists study how social considerations and politeness modulate linguistic choices and how they facilitate or come in the way of efficient interaction.b A partial formalization are the Rules of Politeness, such as “Don’t impose,” “Give options,” “Be friendly,”4 or the Politeness Principle: “Minimize the expression of impolite beliefs,” by assessing relative costs and benefits and the extent of interlocutor praise and criticism.5
Most everyone will agree with the principles and maxims as stated. Adhering to such conversational conventions is good practice; departing from them will be viewed as an anomaly and carries social costs, whether done on purpose to achieve some desired effect or inadvertently. On the level of abstraction that the Maxims of Conversation and Rules of Politeness are given, they seem eminently reasonable, to the point of being self-evident and thus unhelpful as a theoretical tool.
Yet we find in the lively research areas of pragmatics, sociolinguistics, and conversational behavior analysis that conversational style differs noticeably and significantly across situations and counterpart configurationsc and, more importantly, across cultures. How they are operationalized and their various goal notions are balanced against each other varies from situation to situation, from language to language, and from culture to culture. In some cultural areas terseness is interpreted as rudeness, in others, verbosity is considered overbearing. Requests can in some cultural contexts be given as explicit imperatives (“Get the notes for me”), while they in others must be reformulated more indirectly as modal questions (“Could I borrow the notes from you?”).7
Conversational conventions are only incidentally explicitly accessible to participants in conversation. Mostly they are acquired through interaction with others. Anecdotes are frequent of second language learners blundering through a conversation in disregard of local conventions that are thought to be obvious. The language user must tread a careful path between brusqueness and obsequiousness: errors in either direction detract from the perceived trustworthiness and reliability of the conversational participant.
Conversational conventions are realized as linguistic and para-linguistic behavior in ways that are present in any collection of language. Appropriate behavioral guidelines are thus accessible to a language model with sufficient training data, but how these are realized in the generative phase depends crucially on the fine-tuning, alignment, and instruction training mentioned in this Opinion column. Those processes are not value-neutral, nor are they universal, even if the abstract principles they are based on are. This generalizes obviously to notions of bias, safety, harmfulness, and trustworthiness, all of which are generally accepted but encoded variously in situations, communities, and cultures.
Conclusion
Language is quite similar across populations, cultures, and individuals; the purpose of communication is approximately the same wherever and whoever you are. Our systems of expression involves choices related to modality, certainty, mindfulness, and courtesy in every language. Many of the features of language are hence quite amenable to transfer learning. But how these human universalia are instantiated across languages is governed by culturally specific conventions and preferences. Linguistic principles could be formalized further to be cultural parameters, which will allow us to avoid culturally specific markers that may come across as insensitive, impertinent, or imperious if translated blindly.
Today, most datasets for instruction tuning are translated from one language (almost always North American English) into others using automatic translation tools. Current translation tools are not built to modify the instruction data sets to fit cultural patterns and sensitivities, but render training items rather literally, occasionally to the extent as to make them nonsensical.d Such instruction sets need grounding in the communicative practices of a culture. Usefulness of instruction sets will persist longer than most models, and to ensure their quality, especially for smaller speaker communities less well served by large scale technology, a community-based effort to formulate them to fit local concerns is a sustainable investment which will be valuable in the long run.
In general, fine-tuning and alignment of a model’s parameters is a non-reversible operation, executed through some cultural or ideological lens. A model which has been rendered acceptable for a certain cultural or ideological context is likely to be of less use in the general case, and such modifications should be done openly and transparently by properly recording the provenance of alignment and instruction training sets on model cards; models that claim to be open should be made available both in unaligned and aligned form. The analogy to grammars and lexica is appropriate here: a lexicon and a grammar unable to describe rude behavior, verbal aggression, or cursing is both less generally useful and less empirically accurate than one that does.
Alignment, instruction training, and fine-tuning cannot in general be done by reference to universal values however virtuous, since while such values are easy to agree with, they mean different things to us, depending on a wide range of cultural and situational values.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment