Opinion
Artificial Intelligence and Machine Learning

The Promethean Dilemma of AI at the Intersection of Hallucination and Creativity

Seeking to understand how we can better synthesize creative outputs from generative artificial intelligence.

Posted
hands holding flames, illustration

Credit: Getty Images

In Greek mythology,a despite the risks that fire poses, Prometheus stole it from God and gave it to humanity to advance civilization. In the contemporary world, the “Promethean dilemma” refers to an action that advocates newer technologies in the name of innovation without social ramifications.

For generative artificial intelligence (GenAI) models, the concept of the Promethean dilemma has so far been discussed, starting with whether general access to GenAI systems should be permitted for public use, given their black box nature and tendency to confabulate. In cognitive science, the confabulation theory10 dictates that the brain develops false memories/impressions to tie missing/incomplete memories. Similarly, in GenAI, the output is confabulated owing to a lack of knowledge or inconsistency in learned patterns. Popularly, this behavior has been termed ‘hallucinations’ in GenAI systems, and we adopt the same for this column. The hallucination of an AI model can be characterized as generating content the model is not trained or undertrained on due to restricted data and computing resources. It often manifests as unreal or unaligned content concerning the prompt.5 While hallucination points to a lack of authenticity and dependability in the output3,8 can it also be the key to understanding how we can synthesize creative outputs from GenAI7? In this regard, the existing scope of the ‘Promethean Dilemma of AI’ can be extended to capture the degree of interaction (if any) between creative and hallucinated outputs composed by GenAI systems.

It should be noted that an AI model does not intend to be creative or hallucinated. However, the training data, modeling paradigms, and the learned parameter induce creativity and hallucinations as a by-product, with hallucination assumed to be an intrinsic characteristic7 of the systems. Technically, hyperparameters such as decoding temperature improve the deterministic nature of the model over multiple runs. Different models can produce varying hallucinations at exact temperature values due to variations in the training pipeline. Even at lower temperatures (higher determinism and reproducibility), the model can hallucinate if the wrong output tokens are assigned a higher probability. Consequently, randomness does not necessitate hallucination, leaving room for exploring concepts at the intersection of creativity and hallucination.

When a writer flouts grammar rules or a director does not accurately follow the historical timeline, it is said they are “breaking the fourth wall” and accessing their “creative liberty,” even if the final product is fictional. Can the same be true for GenAI models? A trade-off exists between how much of a generation can be derived or transferred from preexisting concepts and how much can be novel or hallucinatory. Thus, when a GenAI model begins producing abstract concepts, it raises the question, “To what extent can the hallucinations arising from GenAI models be considered creative?” This article highlights the Promethean dilemma of AI at the intersection of inventiveness and hallucination.b The emergence of creativity via hallucination can not only lead to the production of new artistic themes but also aid in developing novel drugs and proteins.2

Nature of Prompting

As the output of a GenAI system depends on the input prompt, evaluating the extent of creative or acceptable hallucination9 is underpinned by the nature of prompting (see the accompanying figure). While subjective prompts with ambiguous context can increase the chances of hallucinations, the systems can also fail on objective prompts. When given objective prompts, such as solving mathematical equations or listing the world’s capitals, the output can be vetted against a source of truth. Meanwhile, subjective cues correspond to results with a vague conception of truth. When prompted to “write a poem” or “draw like Dürer,” the outcomes can only be partially judged as art is perceptive. Then, there is a class of prompts like writing code snippets where the creativity in solutions can always be vetted against the correctness of the answer via test cases. Therefore, depending on the objectivity and novelty required by the task, we can decide the extent of acceptable hallucination.

Figure.  On the x-axis stands cues’ subjective vs. objective nature. The more imaginative and emotional the prompting, the greater the difficulty in evaluating the creative output. On the y-axis stands the extent of coherence or factuality the model adheres to vs. the novelty it incorporates in its production. In each quadrant, we highlight some generative tasks that best describe the combination of prompting and hallucination that is possible for the task. The tasks can be evaluated under a closed solution set for objective prompts on the right side of the y-axis. Consequently, there are no exact evaluation criteria for the tasks on the left side of the y-axis.

Demarcating Creativity and Hallucination

Creativity is an amalgamation of resourcefully thinking outside while building up and seeking inspiration from the existing body of work. Can the same be true for AI? Is the creativity of GenAI models pinned by their ability to replicate exact art or to introduce ingenuity to it? The latter requires the augmentation of concepts that are less likely to be found in the real world. From the perspective of the compositional abilities based on subjective cues, it is difficult to differentiate between creative and hallucinated outputs from the GenAI systems. For example, returning to the prompt “draw like Dürer,” there are three possible scenarios. If the model generates an outcome that is not in the style of Dürer, it has hallucinated. On the other hand, if the model reproduces an output verbatim from the training set of Dürer, then the model has not hallucinated. In the third case, the result may not be an original work but rather an inspired representation of Dürer’s artwork. Meanwhile, creativity and degree of hallucination can be separately judged based on faithfulness to the input prompt in all three cases. However, the impact of hallucination on creativity cannot be directly adjudged in this case, reinforcing the dilemma in employing GenAI for subjective tasks. Existing literature assumes that both creativity and hallucination can be directly attributed to each other, as well as the predicted probabilities of output tokens.7 We believe this conjecture should be extended to account for latent variables that could influence hallucinations, creativity, and predicted probabilities. Departing slightly from existing assumptions, we also posit, like zero kelvin, that an oracle with zero hallucination is the only accurate system to benchmark the emergence of creativity in GenAI systems. Further, even without hallucination, the model will be capable of creative albeit predictable generations. It is akin to evaluation metrics of fluency and specificity of the output text in natural language generation.

In the real world, producing creative outputs requires cognizance of existing social norms and a willingness to play around with them. Even if social information is present in AI systems’ training data, it is one thing to replicate examples of socially acceptable behavior and quite another to comprehend the circumstances in which those behaviors might be deemed inappropriate or objectionable. We hypothesize that GenAI will be better able to distinguish between novelty and hallucination when and if it can be trained to develop a sense of these norms. While contextual and implicit reasoning comes naturally to humans, it has only been weakly observed in GenAI via the chain-of-though prompting setups.13 Eventually, being able to reason and pick on implicit cues will improve the inventiveness of the output. On similar lines, creativity often requires circumventing existing regulations enforced by agencies. The current GenAI systems lack the inherent ability to circumvent any human checkpoints. While adversarial prompting is one way of bypassing filters,12 it originates from humans, not the AI model. As GenAI exists only in relation to humans, with humans as both inventors and consumers of this technology, does it mean that the creativity of GenAI will always be underpinned by human inventiveness and judged2 by human standards?

Conclusion

One way to understand the extent of hallucinations is to compare the variation in results when prompts are varied for open-ended/subjective tasks. This comparison should also be extended to analyzing the impact of hyperparameters like temperature. Here, the aim is not to determine accuracy but to observe which prompt setup for a given task leads to higher hallucinations and whether humans consider those hallucinated outputs more creative. To this end, determining whether hallucinations have creative merit for GenAI systems would be an exciting challenge. Suppose the Reinforcement Learning from Human Feedback4 pipeline is extended to include subjective prompts and social norms.6 In this case, it can lead to the development of novel artistic concepts and anthropomorphize to become an oeuvre of AI’s creativity. Another direction is to update the world’s knowledge of GenAI systems dynamically.11 It should indirectly improve the system’s ability to assimilate existing norms. An improved understanding of how the parameters gain knowledge14 and how this knowledge is interpreted upon prompting will help understand what gaps in knowledge lead to hallucination. Similarly, an improved understanding of social norms and hallucinations should help reduce the Promethean dilemma of AI when accessing innovation against hallucination. Future designs of GenAI interfaces might incorporate a way that allows users to specify their desired trade-off between coherence and novelty based on their goals.

    References

    • 1. Anishchenko, I. et al. De novo protein design by deep network hallucination. Nature 600, 7889 (Dec. 1, 2021); 10.1038/s41586-021-04184-w
    • 2. Chakraborty, T. and Masud, S. Judging the creative prowess of AI. Nature Machine Intelligence 5, 6 (Jun. 1, 2023); 10.1038/s42256-023-00664-y
    • 3. Dutta, S. and Chakraborty, T. Thus spake ChatGPT. Commun. ACM 66, 12 (Nov. 2023); 10.1145/3616863
    • 4. Ge, Y. et al. OpenAGI: When LLM meets domain experts. In Advances in Neural Information Processing Systems 36 (2023); https://bit.ly/3SKqVRI.
    • 5. Guerreiro, N.M., Voita, E., and Martins, A. Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation. In Proceedings of the 17th Conf. of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia, (2023); https://bit.ly/3yAl7Dy
    • 6. Krishna, R. et al. Socially situated artificial intelligence enables learning from human interaction. In Proceedings of the National Academy of Sciences 119, 39 (Sept. 2022); 10.1073/pnas.2115730119
    • 7. Lee, M. A Mathematical investigation of hallucination and creativity in GPT models. Mathematics 11, 10 (2023); 10.3390/math11102320
    • 8. Menczer, F. et al. Addressing the harms of AI-generated inauthentic content. Nature Machine Intelligence 5, 7 (July 2023); 10.1038/s42256-023-00690-w
    • 9. Mukherjee, A. and Chang, H.H. Managing the creative frontier of generative AI: The novelty-usefulness tradeoff. California Management Rev.  (2023); https://bit.ly/46JGHSM
    • 10. Nelson, R.H.  Mechanization of Confabulation. Springer, Berlin Heidelberg, Berlin, Heidelberg (2007), 139192; 10.1007/978-3-540-49605-2_7
    • 11. Ramapuram, J., Gregorova, M., and Kalousis, A. Lifelong generative modeling. Neurocomputing 404 (2020); 10.1016/j.neucom.2020.02.115
    • 12. Wallace, E. et al. Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Intern. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, (2019); 10.18653/v1/D19-1221
    • 13. Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35 (2022); https://bit.ly/3LZ00xx
    • 14. Yang, W. et al. Survey on explainable AI: From approaches, limitations, and applications aspects. Human-Centric Intelligent Systems 3, 3 (Sept. 1, 2023); 10.1007/s44230-023-00038-y

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More