Computing Applications China Region special section: Hot topics

Knowledgeable Machine Learning for Natural Language Processing

  1. Introduction
  2. Knowledgeable ML for NLP
  3. Conclusion
  4. References
  5. Authors
Chinese calligraphy

In the past decades, one line has run through the entire research spectrum of natural language processing (NLP)—knowledge. With various kinds of knowledge, such as linguistic knowledge, world knowledge, and commonsense knowledge, machines can understand complex semantics at different levels. In this article, we introduce a framework named “knowledgeable machine learning” to revisit existing efforts to incorporate knowledge in NLP, especially the recent breakthroughs in the Chinese NLP community.

Since knowledge is closely related to human languages, the ability to capture and utilize knowledge is crucial to make machines understand languages. As shown in the accompanying figure, the symbolic knowledge formalized by human beings was widely used by NLP researchers before 1990, such as applying grammar rules for linguistic theories3 and building knowledge bases for expert systems.1 After 1990, statistical learning and deep learning methods have been widely explored in NLP, where knowledge is automatically captured from data and implicitly stored in model parameters. The success of the recent pretrained language models (PLMs)4,13 on a series of NLP tasks proves the effectiveness of this implicit knowledge in models. Making full use of knowledge, including both human-friendly symbolic knowledge and machine-friendly model knowledge, is essential for a better understanding of languages, which has gradually become the consensus of NLP researchers.

Figure. A historical glimpse of the NLP research spectrum and the whole framework of knowledgeable machine learning.

The spectrum depicted in the figure shows how knowledge was used for machine language understanding in different historical periods. The framework shows how to inject knowledge into different parts of machine learning.

Back to Top

Knowledgeable ML for NLP

To clearly show how to utilize knowledge for NLP tasks, we introduce knowledgeable machine learning. Machine learning consists of four components: input, model, objective, and parameter. As shown in the figure, knowledgeable machine learning aims at covering the methods that apply knowledge to enhance these four machine learning components. According to which component is enhanced by knowledge, we can divide existing methods utilizing knowledge for NLP tasks into four categories:

Knowledge augmentation enhances the input of models with knowledge. There are two mainstream approaches for knowledge augmentation: one is to directly add knowledge into the input, and the other is to design special modules to fuse the original input and related knowledgeable input embeddings. So far, knowledge augmentation has achieved promising results on various tasks, such as information retrieval,11,18 question answering,10,15 and reading comprehension.5,12

Knowledge support aims to bolster the processing procedure of models with knowledge. On one hand, knowledgeable layers can be used at the bottom for preprocessing input features, and features can thus become more informative, for example, using knowledge memory modules6 to inject informative memorized features. On the other hand, knowledge can serve as an expert at top layers for post-processing to calculate more accurate and effective outputs, such as improving language generation with knowledge bases.7

Knowledge regularization aims to enhance objective functions with knowledge. One is to build extra objectives and regularization functions. For example, distantly supervised learning utilizes knowledge to heuristically annotate corpora as new objectives and is widely used for a series of NLP tasks such as relation extraction,8 entity typing,17 and word disambiguation.9 The other approach is to use knowledge to build extra predictive targets, such as ERNIE,20 CoLAKE,14 and KEPLER,16 which take knowledge bases to build extra pre-training objectives for language modeling.

Knowledge transfer aims to obtain a knowledgeable hypothesis space and make it easier to achieve effective models. Both transfer learning and self-supervised learning focus on transferring knowledge from labeled and unlabeled data respectively. As a typical paradigm of transferring model knowledge, fine-tuning PLMs has shown promising results on almost all NLP tasks. Some Chinese PLMs like CPM21 and PanGu-alpha19 have recently been proposed and have shown awesome performance on Chinese NLP tasks. CKB2 has further been proposed to build a universal continuous knowledge base to store and transfer model knowledge from various neural networks trained for different tasks.

Since knowledge is closely related to human languages, the ability to capture and utilize knowledge is crucial to make machines understand languages.

Besides the studies mentioned here, many researchers in the Chinese NLP community are committed to using knowledge to enhance NLP models. We believe all these efforts will advance the development of NLP toward better language understanding.

Back to Top


In this article, we introduced a knowledgeable machine learning framework to show existing efforts of utilizing knowledge for language understanding, especially some typical works in the Chinese NLP community. We hope this framework can inspire more efforts to use knowledge for better language understanding.

    1. Avron, B. and Feigenbaum, E.A. The Handbook of Artificial Intelligence, 1981.

    2. Chen, G., Sun, M. and Liu, Y. Towards a universal continuous knowledge base. 2020; arXiv:2012.13568.

    3. Chomsky, N. Syntactic Structures, De Gruyter, 1957.

    4. Devlin, J., Chang, M-W, Lee, K. and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL'2019, 4171–4186.

    5. Ding, M., Zhou, C., Chen, Q., Yang, H. and Tang, J. Cognitive graph for multi-hop reading comprehension at scale. In Proceedings of ACL'2019, 2694–2703.

    6. Ding, M., Zhou, C., Yang, H. and Tang, J. CogLTX: Applying BERT to long texts. In Proceedings of NeurIPS'2020, 12792–12804.

    7. Gu, Y., Yan, J., Zhu, H., Liu, Z., Xie, R., Sun, M., Lin, F. and Lin, L. Language modeling with sparse product of sememe experts. In Proceedings of EMNLP'2018, 4642–4651.

    8. Han, X., Liu, Z. and Sun, M. Neural knowledge acquisition via mutual attention between knowledge graph and text. In Proceedings of AAAI'2018, 4832–4839.

    9. Huang, L., Sun, C., Qiu, X. and Huang, X-J. GlossBERT: BERT for word sense disambiguation with gloss knowledge. In Proceedings of EMNLP-IJCNLP'2019, 3500–3505.

    10. Liu, K., Zhao, J., He, S. and Zhang, Y. Question answering over knowledge bases. IEEE Intelligent Systems 30, 5 (2015), 26–35.

    11. Liu, Z., Xiong, C., Sun, M. and Liu, Z. Entity-Duet Neural Ranking: Understanding the role of knowledge graph semantics in neural information retrieval. In Proceedings of ACL'2018, 2395–2405.

    12. Qiu, D., Zhang, Y., Feng, X., Liao, X., Jiang, W., Lyu, Y., Liu, K. and Zhao, J. Machine reading comprehension using structural knowledge graph-aware network. In Proceedings of EMNLP-IJCNLP'2019, 5898–5903.

    13. Radford, A., Narasimhan, K., Salimans, T. and Sutskever, I. Improving language understanding by generative pre-training. OpenAI Blog, 2018.

    14. Sun, T., Shao, Y., Qiu, X., Guo, Q., Hu, Y., Huang, X-J and Zhang, Z. CoLAKE: Contextualized language and knowledge embedding. In Proceedings of COLING'2020, 3660–3670.

    15. Wang, L., Zhang, Y., and Liu, T. A deep learning approach for question answering over knowledge base. Natural Language Understanding and Intelligent Applications, Springer, 2016, 885–892

    16. Wang, X., Gao, T., Zhu, Z., Zhang, Z., Liu, Z., Li, J. and Tang, J. KEPLER: A unified model for knowledge embedding and pre-trained language representation. TACL 9, 2021, 176–194.

    17. Xin, J., Lin, Y., Liu, Z. and Sun, M. Improving neural fine-grained entity typing with knowledge attention. In Proceedings of AAAI'2018, 5997–6004.

    18. Xiong, C., Power, R. and Callan, J. Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of WWW'2017, 1271–1279.

    19. Zeng, W. et al. PanGu-alpha: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. 2021; arXiv:2104.12369.

    20. Zhang, Z., Han, X., Liu, Z., Xin Jiang, X., Sun, M. and Liu, Q. ERNIE: Enhanced language representation with informative entities. In Proceedings of 2019 ACL, 1441–1451.

    21. Zhang, Z. et al. CPM: A large-scale generative Chinese pre-trained language model. 2020; arXiv:2012.00413.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More