Long before hard drives and hyperlinks became symbols of modern life, humans etched stories on cave walls and on parchment paper. Later, books, libraries and the Internet filtered into daily life.
Only in the last quarter-century, with the rise of the Web, has it become possible to obtain near-instant and personalized search results. How to cook the perfect omelet? Check. Which astronauts rode Apollo 11 to the Moon? Check. The best towns to visit along the Amalfi Coast? Check.
“At its core, search technology and the Internet have profoundly changed how people access information. They’ve transformed how we learn, work, and communicate,” said Pramod Singh, Executive in Residence in the Engineering Graduate and Professional Programs at Duke University and co-founder of the AI firm Inquisite.
Now, the winds of disruption are blowing again. Large language models (LLMs) and generative artificial intelligence (GenAI) are rapidly transforming how we search. They deliver detailed answers to complex questions in a natural, conversational tone. There’s no need to click through page after page of search results and follow hyperlinks down rabbit holes to find the answer you want or need.
Yet AI-powered search may serve up nearly as many questions as answers. Are services like ChatGPT, Gemini, Claude, Perplexity, and Copilot fundamentally accurate and trustworthy? What is the quality of information they deliver? And do these chatbots ultimately enhance or detract from our ability to find information and engage in thoughtful analysis?
Definitive answers are not yet available. Nevertheless, AI search is rapidly going mainstream. According to a recent study conducted by investment banking firm Evercore, AI adoption could hit 88% by 2028.
Model Thinking
Over the last few decades, the ability to find information online has undergone a remarkable transformation. The world’s first search engine, Archie, appeared in 1990.b It indexed files residing on FTP servers. After the Web became publicly available in 1995, a slew of search engines popped up, including Excite, Lycos, AltaVista, Infoseek, Ask Jeeves, and Yahoo! These and other search tools introduced more advanced capabilities, including Web crawling, vector space models, and natural language queries.
Still, a problem persisted, Singh said. “If you look at the early search engines, they relied on keyword searching, but there was little or no context. There was no way to elevate credible sites and deemphasize uncredible sites and content. This made it a lot more difficult to find the right content.”
It wasn’t until Google introduced PageRank in 1998 that a more advanced search framework began to take shape.c The algorithm made it possible to evaluate Web pages based on their link structure—including the popularity and quality of these sites. The result was higher-quality search results. “Google was an overnight success,” said Chirag Shah, a professor in the School of Information at the University of Washington.
Google has remained the top search engine for more than 25 years. Along the way, it has added numerous tools and innovations, including Search Engine Optimization (SEO), Multitask Unified Model (MUM), and AI-based Semantic Search. However, deep advertising integration and SEO come with some baggage. “At times, it is more difficult to find desired information. In some cases, the user experience has eroded,” Shah said.
When ChatGPT burst onto the scene in November 2022, it introduced a radically new way to search the Internet. There was no need to click through page after page of links. Instead, the LLM spit out responses in natural language, like a trusted professor. The tradeoff? Users no longer saw content from original sources. Instead, ChatGPT delivered a largely unverifiable overview drawn from its training.
In March 2024, ChatGPT added retrieval augmented generation (RAG). Since then, other LLMs, including Gemini and Claude, have also plugged in RAG. The technique enhances the core AI model by tapping into external sources, including Web pages, proprietary databases, and document repositories. Using methods such as data chunking and vector search, it delivers more accurate and up-to-date results.
To be sure, AI search is an appealing prospect. “One advantage to AI-based search is that it directly caters to your specific needs. It typically bypasses irrelevant and unwanted results,” Shah said. However, this convenience comes with a tradeoff. “We lose some agency in discovering information ourselves, and sometimes we don’t even know what context or other information might be missing from the response.”
Of course, the conversational and engaging nature of AI search adds to the charm. It’s easy to ask follow-up questions and pursue a topic without deep analysis. The system does the heavy lifting—or at least, it appears to do so. “The great strength of generative AI is its ability to summarize vast amounts of information and present these summaries in a fluent and well-organized manner,” said Marti A. Hearst, a professor in the School of Information and in the Computer Science Division at the University of California, Berkeley.
LLMs come with plenty of baggage, however. This includes biases based on how they were trained, the copyright and legality of content used to build the model, privacy concerns, and, perhaps most importantly, the accuracy of the results they deliver. “It is well-known and frequently stated that these systems are not always correct, and they can reflect the biases of and errors in the underlying data on which they are developed,” Hearst said.
Accuracy Matters
Low-quality information produced by GenAI systems is no trivial matter. A recent study conducted by the Tow Center for Digital Journalism at Columbia University found that 60% of chatbot responses to their test questions were on some level incorrect. Ironically, paid services generated even higher error rates. A core problem—think of it as the Dunning-Kruger Effect for AI—was the inability of chatbots to recognize when they aren’t equipped to answer a question effectively.
Worse are hallucinations. It’s no secret that chatbots sometimes go off the rails. “Despite sounding authoritative, AI doesn’t understand things at an atomic level, and it doesn’t possess any real knowledge,” said C. Lee Giles, Emeritus David Reese Professor of Information Sciences and Technology at Pennsylvania State University. “It’s simply a mathematical representation of the most likely words to appear in response to another set of words.”
Even a low percentage of errors and hallucinations can be misleading. Users often don’t bother to verify answers, particularly when there are no accompanying links and source information. Yet citations alone won’t solve the problem. Chatbots sometimes fabricate links or cite content hidden behind paywalls or restricted sites. The Tow Center found that even with licensing agreements in place, accurate attribution was not guaranteed.
Inaccuracies and hallucinations are not going away anytime soon. Despite advances in AI search—including providing citations (a standard feature on Perplexity, for example) a basic problem remains, said Maarten de Rijke, a professor of Information Retrieval at the University of Amsterdam in the Netherlands. “In many cases, the models generate content first and then retroactively seek evidence. This post-rationalization approach increases the odds that the information will be inaccurate or fabricated while being attributed to actual sources,” he said.
Not all tasks are created equal, however. Traditional search engines like Google and Bing shine when it comes to plucking specific facts and information—along with things like flights, hotels, restaurants, and products. AI, on the other hand, excels at aggregating and summarizing vast amounts of information, generating questions, offering analysis, and spotting overlooked concepts or logic gaps.
Traditional search providers are adapting. Increasingly, complex queries trigger AI-generated overviews alongside links. Google’s Search Labs|AI Overview, for example, offers brief summaries designed to complement links to full websites. “The advantage to this approach is that people can check the information,” Hearst said. “If it is done reliably, this kind of interaction can provide the best of both worlds: better search results and links to the original source materials.”
Beyond Words
There are also broader concerns about how AI search could impact critical thinking—and society as a whole. Relying too heavily on chatbots could discourage users from exploring topics in depth or critically evaluating the information they receive, Shah said. Diminished exposure to diverse perspectives and fewer serendipitous discoveries could result. Giles warns that as younger generations shift from traditional search engines to chatbots, they might miss out on developing essential search literacy skills.
Another key question is whether GenAI systems can learn to distinguish credible sources from questionable—or outright false—content. At the heart of the issue is the probabilistic nature of large language models, which generate text based on training data or retrieved content via RAG. They have no intrinsic way to know that information is accurate. “A black-box AI model is very different from a knowledge base like Wikipedia, which relies on human oversight and verified citations,” Giles noted.
In fact, researchers have already found ways to manipulate large language models using tactics like prompt injection and data poisoning.d More subtly, LLM output can be influenced by shaping what appears on the Web—including on official government websites. In other words, scrubbing references to certain people, events, or ideas can distort the LLMs’ ability to respond accurately.
“There are legitimate concerns about the manipulation of information and the suppression of diverse viewpoints, potentially impacting democratic processes,” de Rijke said. He points out that the issue is compounded by the fact that all major AI search providers are based in the United States, and these systems already tend to reflect a distinctly American worldview.
Hearst is concerned that AI search could diminish the quality of Web content. “If search engines do not direct people to actual Web pages or other primary sources, there will be little motivation for people to keep writing and maintaining Web pages,” she said. “A big reason for the success of the Web was the ability of a huge ecosystem of Web pages to co-exist, to be created by different individuals and organizations, on a largely level playing field.”
Money Counts
Amid all these challenges, AI search providers also face pressure to turn a profit and rein in the enormous amount of energy LLMs consume. As Google has demonstrated, the likely path forward is through advertising. However, this raises additional questions. “The integrity of output can easily become compromised” when the focus shifts from user benefits to attracting advertisers, de Rijke said. A key is transparency. “It’s important to clearly separate commercial content from organic or AI-generated answers in conversational interfaces, including with defined blocks of ads.”
The solution may also lie in more advanced hybrid frameworks that route simple queries to traditional search engines and send complex, resource-intensive tasks to GenAI. “This approach could provide an optimal balance between cost, performance, energy use and user experience,” de Rijke said. Other possibilities include a greater reliance on Small Language Models (SLMs) and agentic search, which leaps beyond basic document retrieval to actively plan, reason, and interact with different tools and data.
One thing is clear: AI-driven search has arrived, and it is set to reshape the future of digital research and knowledge. “We are witnessing the next generation of search evolve before our eyes,” Singh concluded. “LLMs and chatbots will fundamentally change our relationship with information, in much the same way that search engines like Google have done so. AI will deliver an array of benefits along with new problems.”
Further Reading
- Neisarg, D., Kifer, D., Giles, C.L., and Mali, A.
Investigating Symbolic Capabilities of Large Language Models. May 21, 2024; https://arxiv.org/pdf/2405.13209 - Shah, C.
From Prompt Engineering to Prompt Science with Human in the Loop. May 10, 2024; https://arxiv.org/pdf/2401.04122 - Shah, C., and Bender, E.M.
Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web? ACM Transactions on the Web, Volume 18, Issue 3, Article No.: 33, pp. 1-24; https://dl.acm.org/doi/full/10.1145/3649468 - Yang, D., Wu, S.T., and Hearst, M.A.
Human-AI Interaction in the Age of LLMs. Association for Computational Linguistics, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2024, Pages 34-38; https://aclanthology.org/2024.naacl-tutorials.5.pdf - Zhang, R., Guo, J., de Rijke, M., Fan, Y., and Cheng, X.
Are Large Language Models Good at Utility Judgments? SIGIR ’24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 11, 2024. pp. 1941-1951; https://dl.acm.org/doi/abs/10.1145/3626772.3657784




Join the Discussion (0)
Become a Member or Sign In to Post a Comment