Sign In

Communications of the ACM

Contributed articles

Language Models: Past, Present, and Future

physical representation of a human-robot interaction, illustration

Credit: Andrij Borys Associates; Shutterstock

Natural language processing (NLP) has undergone revolutionary changes in recent years. Thanks to the development and use of pre-trained language models, remarkable achievements have been made in many applications. Pre-trained language models offer two major advantages. One advantage is that they can significantly boost the accuracy of many NLP tasks. For example, one can exploit the BERT model to achieve performances higher than humans in language understanding.8 One can also leverage the GPT-3 model to generate texts that resemble human writings in language generation.3 A second advantage of pre-trained language models is that they are universal language processing tools. To conduct a machine learning-based task in traditional NLP, one had to label a large amount of data to train a model. In contrast, one currently needs only to label a small amount of data to fine-tune a pre-trained language model because it has already acquired a significant amount of knowledge necessary for language processing.

Back to Top

Key Insights


This article offers a brief introduction to language modeling, particularly pre-trained language modeling, from the perspectives of historical development and future trends for general readers in computer science. It is not a comprehensive survey but an overview, highlighting the basic concepts, intuitive explanations, technical achievements, and fundamental challenges. While positioned as an introduction, this article also helps knowledgeable readers to deepen their understanding and initiate brainstorming. References on pre-trained language models for beginners are also provided.


John Shahbazian

Thank you for this article! I enjoyed learning about the different NLP models and the history behind them. Previously, I had only heard the names and occasionally played with demos, but I never dove into how they work as I figured it would be too complex. However this article makes understanding the basic idea of them very easy! It was timely and very well written.

Displaying 1 comment

Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.

Purchase the Article

Non-members can purchase this article or a copy of the magazine in which it appears.