Opinion
Artificial Intelligence and Machine Learning

Generative AI Degrades Online Communities

How large language models are influencing online communities.

Posted
automated face asks a question; individuals query a different automated face, illustration

Imagine you are at a crossroads in a complex project and you need quick answers on how to grapple with a problem. It is quite likely that you might turn to an online knowledge community for answers, one hosted by your company, or perhaps Stack Overflow, Quora, or Reddit. These communities have come to play a central role in knowledge exchange, in many corners of the economy and society, but they depend on voluntary participation from users just like you and me.

Our recent research indicates an intriguing shift is now taking place: generative AI technologies, such as OpenAI’s large language model (LLM) ChatGPT, are disrupting the status quo. Increasingly, users are gravitating toward these new AI tools to obtain answers, bypassing traditional knowledge communities. In this column, we delve into recent work documenting ChatGPT’s influence on user participation in online communities. A key insight we offer is that communities lacking social fabric are suffering most. We then propose a research agenda for better understanding these evolving impacts of generative AI.

(Some) Online Knowledge Communities Are Struggling

In our recent research, we estimate that, by late March 2023, ChatGPT had driven an approximate 12% reduction in average daily Web visits to StackOverflow.com, the world’s largest online knowledge community for software developers. Further, among the 50 most popular topics on Stack Overflow, we estimate the average volume of questions posted per week had declined by more than 10%, per topic, and we find that the declines in community participation have in turn led to a significant degradation in the quality of answers the community provides. This combination of findings raises the prospect of a vicious cycle, with negative implications for the long-term health and sustainability of online knowledge communities.4,5 These concerns are not necessarily limited to StackOverflow.com; the potential exists for a similar dynamic to play out in any online knowledge community, including those that are private and firm-hosted, catering to employees.a That said, we have also found that ChatGPT’s negative effects depend crucially on context. So, when do these negative consequences emerge and what can we do about them?

ChatGPT Excels When It Has Training Data

ChatGPT generates believable text about nearly any subject, but there is a big difference between “believable” and “correct.” ChatGPT, similarly to other LLMs, is trained on large swaths of publicly available data, in large part scraped from online forums such as Stack Overflow and Reddit. Given differences in the volume of available data, ChatGPT’s performance naturally varies by topic and may in turn affect communities to different degrees.

We observed ChatGPT’s impact on Stack Overflow participation varies significantly across topics, aligning with its expected performance based on available training data. Those topics related to open-source tools and general-purpose programming languages (for example, Python, R) appeared to experience larger declines in participation and contribution than proprietary and closed technologies, such as those employed for enterprise server-side development (for example, Spring Framework, AWS, Azure).

For better or worse, the quality of output LLMs can produce based on such publicly available data appears to be peaking. Recent work has documented that GPT, for example, has begun to exhibit declines in the quality of its output.3 It has been suggested this decline in performance is the expected result of a feedback loop, wherein data collected for training is increasingly contaminated by GPT itself, as users leverage the technology to produce and post content online.12 In an ironic twist, this suggests that the incentives of generative AI companies may be aligned with those of society more broadly; they have a vested interest in encouraging users to continue contributing organic, unadulterated content.

ChatGPT Still Does Not Substitute for Human Social Connections

LLMs are better suited to some tasks than others. Users engage in online knowledge communities for a variety of reasons, beyond the simple desire to obtain information. While generative AI may and can often act as a useful source of information, its capacity to substitute for human social connections is much weaker. Many communities manage to foster a sense of solidarity and peer attachment among members.6,8 Consider that, whereas Stack Overflow is notorious for its focus on pure information exchange,b Reddit is comparatively social in nature.1,10 Repeating our analysis employing data from Reddit communities that focus on the same sets of technology topics we considered at Stack Overflow, we found virtually no evidence of any declines in participation following ChatGPT’s emergence (we depict these divergent effects graphically in the accompanying figure). It therefore appears that a robust social fabric will be crucial to the health and sustainability of online knowledge communities going forward.

Figure Xxxxx.  Estimates of ChatGPT’s effect on Stack Overflow weekly question volumes (left) versus Reddit posting volumes (right).

Research Agenda: Knowledge Management in the Era of Generative AI

Our findings raise several important, open questions and issues social and computer scientists can and should look to address going forward. These questions collectively relate to the role of individuals in knowledge production and sharing, and how those roles change in the face of advancing AI.

Social interaction is key.

While our recent work suggests social connection can provide some protection against the eroding influence of generative AI for online knowledge communities, how one achieves social connection is rather open-ended.7 So, in the presence of generative AI, how can online communities be redesigned to facilitate an increased focus on social interaction, while maintaining the quality and efficiency of knowledge provision and search? How might platform features be adjusted to encourage users to engage with each other instead of, or as a complement to, AI-generated content? One useful prospect to consider is that peer experts can provide a helpful point of verification, to ensure the information supplied by an LLM is accurate and optimal. More generally, how can AI be leveraged to enhance, rather than replace, human interactions in online communities?

An important approach to consider is the incorporation of LLMs directly into the community interface, a prospect that obviously requires a thoughtful, well-considered design. Indeed, Stack Overflow has recently announced OverflowAI, an in-house LLM that integrates with the Stack Overflow user interface.c However, this strategy is likely to be less successful in an open setting, as other LLM alternatives exist for users outside of Stack Overflow’s domain. By contrast, the strategy is likely to be more successful inside a firm-hosted online knowledge community, if employees lack access to outside alternatives as a matter of policy (for example, in the presence of employer bans on employees’ use of third-party generative AI).

Generative AI usage policies.

What other strategies might be pursued to encourage continued user engagement and knowledge sharing in an online community? As suggested in this column, many organizations are presently considering acceptable use policies or outright bans on the use of generative AI in the workplace. Those initiatives have largely been motivated by information security concerns, but they nonetheless have the potential to help ensure sustained knowledge sharing. Samsung, as one example, has recently banned employees’ use of LLMs after discovering confidential company data had been ingested by GPT.d That said, anecdotal media reports indicate that employees continue to employ ChatGPT, despite workplace bans.e A lengthy literature on employee policy compliance speaks to whether and when employees abide by workplace regulations.9 Future work might explore these generative AI usage bans and acceptable use policies, to understand compliance and impacts.

Rewarding users for their contributions.

Users participate in online communities voluntarily and they face challenges internalizing the value of the content they contribute.2 As a result, content is often scarce and “under-provided.” Generative AI tools are exacerbating the problem by shrinking the audience a user can expect their contributions to reach. Further, generative AI tools are trained on data scraped from the Internet, which has prompted negative reactions from online community members. Several artists have recently filed class-action lawsuits against Stability AI and Midjourney,f and writers of popular fan fiction have begun to withhold their contributions, to prevent AI companies from profiting off their work.g So, how can we compensate users for their content? Reddit and Twitter have begun charging for API use, in part hoping to obtain payment from AI companies for their users’ data, a change that has only made matters worse, driving contributors to exit.h Reddit moderators recently went on strike, protesting the impact of the new payment policy on accessibility and moderation tools built around the free API, and raising questions about whether Reddit should be passing some revenues on to users.

Redesigning approaches to training and education.

In addition to devising interventions and strategies for managing employees, knowledge sharing and online knowledge communities in the age of generative AI, scholars must also attend to the implications of generative AI tools as a novel source of information, in lieu of peers, as this has potentially far-reaching implications for student and employee education and training. Online communities typically provide a rich source of learning opportunities, with users able to learn not only from the answers to their own questions but also from the questions and answers of peers. If such opportunities begin to diminish and users begin to rely increasingly on generative AI tools in isolation, this raises the question of how new knowledge will be produced, documented, and shared. A lengthy literature on knowledge hiding considers the antecedents and consequences of knowledge withholding in organizations (Serenko and Bontis 2016), and how to manage the impediments that give rise to it. Future work can explore the design of explicit policies, processes, and incentives around knowledge sharing, accounting for generative AI use, as a means of maintaining the status quo. So, how might organizations incorporate generative AI tools into existing training and retraining efforts?

Conclusion

Generative AI is having large, negative impacts on user participation and contributions to online knowledge communities. As users depart, the average quality of contributions is also beginning to decline, raising the prospect of a vicious cycle. As we continue to navigate this new landscape, it is crucial that we develop an understanding of the consequences of generative AI. We must work to identify strategies and information system designs that can ensure the health and sustainability of online knowledge communities, and of knowledge sharing more broadly.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More