BLOG@CACM
Artificial Intelligence and Machine Learning

The Challenge of Consistency in Generative AI: Will We Adapt or Fix the System?

The variability of GenAI responses poses challenges in disciplines where exact terminology is crucial, such as law, medicine, and academic writing.

Posted
color tiles

We are all familiar with the well-known idiom: “Doing the same thing and expecting a different result—this is the definition of insanity.”1 While the Oxford English Dictionary defines insanity as “a state of mind that impedes the ability to think, reason, or behave in ways that are considered normal,”2 the sentiment behind the idiom is widely understood. Yet, paradoxically, this is precisely how interactions with generative AI (GenAI) unfold: when asking the same question multiple times, we get different results.

For example, when using a GenAI system like ScholarGPT to define the term stressor—that is, a condition that causes stress3—the system generates competing definitions with each iteration. GenAI defines the term first as a stimulus, event, or condition, then as a demand, event, or circumstance, and finally as anything that causes stress. While these paraphrased responses are subtly different, they can carry distinct conceptual implications, particularly in this common scholarly use case4 of defining central concepts where linguistic precision matters.5

This observation highlights both the strengths and weaknesses of GenAI. On the one hand, the ability to rephrase and recombine information is one of its most intriguing features, making it valuable for tasks like summarization and explanation.6 On the other hand, this variability poses challenges in disciplines where exact terminology is crucial, such as law, medicine, and academic writing. Moreover, this challenge seems to stem from an inherent property of generative AI, which generates responses through probabilistic recombination rather than deterministic retrieval.7,8

However, this also begs the question: How can researchers rely on GenAI to assist with literature reviews, definitions, or conceptual frameworks? One proposed solution is assurance through data provenance9 by embedding citations and sources within AI-generated content to trace information back to reliable references. Some emerging AI models attempt to do this, but it remains difficult given that large language models encode the recombinatorial logic of digital innovation.10,11 Instead of expecting GenAI to behave like a static database, we might need to embrace its fluid, generative nature while remaining vigilant about verification and only relying on peer-reviewed literature for final citations and definitions verified with the original manuscript.

Ultimately, as GenAI becomes an integral part of research and knowledge work, researchers must become more cautious when interpreting its outputs. AI systems are not static reference materials but evolving conversational partners—valuable but requiring careful oversight. Whether we adapt to them or push for more structured AI development will shape the next phase of human-AI collaboration.12,13

References

  1. Brown, R.M. Sudden Death (Bantam Books, 1983).
  2. Oxford University Press, Insanity. Oxford English Dictionary (2024).
  3. Lepine, J.A., Podsakoff, N.P., and Lepine, M.A. A Meta-Analytic Test of the Challenge Stressor–Hindrance Stressor Framework: An Explanation for Inconsistent Relationships Among Stressors and Performance. Academy of Management Journal 48, 764–775 (2005).
  4. Wacker, J.G. A theory of formal conceptual definitions: Developing theory-building measurement instruments. Journal of Operations Management 22, 629–650 (2004).
  5. R. M. Schwartz, R.M. and Raphael, T.E. Concept of Definition: A Key to Improving Students’ Vocabulary. Reading Teacher 39 (2), 198–205 (1985).
  6. Dwivedi, Y.K. et al. ‘So what if ChatGPT wrote it?’ Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int J Inf Manage 71 (2023).
  7. Ding, N. et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat Mach Intell 5, 220–235 (2023).
  8. O. Henfridsson, O., and Bygstad, B. The Generative Mechanisms of Digital Infrastructure Evolution. MIS Quarterly 37, 907–931 (2013).
  9. Werder, K., Ramesh, B., and Zhang, R. Establishing data provenance for responsible artificial intelligence systems. ACM Trans Manag Inf Syst 13, 1–23 (2022).
  10. Yoo, Y., Henfridsson, O., and Lyytinen, K. Research Commentary: The New Organizing Logic of Digital Innovation: An Agenda for Information Systems Research. Information Systems Research 21, 724–735 (2010).
  11. Baiyere, A., Grover, V., Lyytinen, K.J., Woerner, S., and Gupta, A. Digital “x”—Charting a Path for Digital-Themed Research. Information Systems Research 34, 463 (2023).
  12. Hillebrand, L., Raisch, S., and Schad, J. Managing with Artificial Intelligence: An Integrative Framework. Academy of Management Annals (2025). https://doi.org/10.5465/annals.2022.0072.
  13. Rahwan, I. et al. Machine behaviour. Nature 568, 477–486 (2019).
Karl Werder

Karl Werder is an associate professor in the Section Digital Business Innovation, IT University of Copenhagen, Denmark. His research interests focus on systems development for performance, artificial intelligence for decision making, and organizing for digital innovation.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More