There has been a phenomenal increase in the use of online social media (OSM) services in India, including Facebook, Twitter, Instagram, LinkedIn, and YouTube. In addition to these services, one-to-one messaging services like WhatsApp have 200 million users, the highest in the world. India has 462 million users accessing the Internet, among these: Facebook has 250+ million users, LinkedIn 42+ million, and Twitter 23+ million users, and the majority of users access these services through their mobile phones.
These services have had a profound impact in India—overall digital literacy has increased, people are more connected, dissemination of local language content has increased, information exchanged during crises is substantial, and more. The deep penetration of social media services also has negative effects—the propagation of false information and hate, an increase in spammers and phishers, users are losing social skills, and more. Newness of technology/mobile phones, low-literacy rates, and cheaper mobile data rates are cited as negative impacts of social media services on society.
Research has been mainly directed toward regulation of content generated on OSM. It can be classified in the following categories:
Note that with the rising usage of local and code-mixed (that is, local language + English) languages in content generation, a lot of research is also directed toward mining in presence of such content.6 Selfies form a substantial part of social media image content and it has been found that clicking selfies in many cases can lead to accidents. Hence, another line of research has been to accurately communicate to users risks involved with a location chosen for taking selfies, as with the Saftie and Saftie Camera apps.b
Research enumerated earlier provides an overview of some of the ongoing work in the area of social media conducted by Indian scientists, but is by no means exhaustive. Here, we elaborate on some of the work, specifically focusing on a set of work that helps users get access to 'useful' and 'sanitized' content. We will also talk about the issues related to code-mixed text and the specific research undertaken to identify dangerous spots for clicking selfies.
A lot of research is directed toward code-mixed content, which combines a local language and English.
Search and recommendation systems over OSM. In order to develop search and recommendation systems over OSMs, it is critical to have accurate methodologies for tasks like inferring the topical interests and expertise of users, and searching for experts on specific topics. Researchers proposed completely novel crowdsourcing-based methodologies for these tasks, for example, the topics of expertise of a user are inferred based on how other users describe the said user.
The proposed methodologies are far more accurate than content-based techniques, in inferring a wide range of topics of interest/expertise of users and identifying topical experts. It was earlier thought that OSMs like Twitter are only used for casual conversation among friends. However, several works1,11 showed that Twitter is actually a treasure-trove of information on thousands of topics, ranging from popular topics like politics and sports, to specialized topics like neurology and forensics. The research has identified thousands of groups of Twitter users interested in these diverse topics. Along with proposing novel algorithms, the endeavor has resulted in the development and public deployment of several Web-based systems on the Twitter platform based upon the proposed algorithms, for example, topical search systems,c systems for inferring topical interest/expertise of users,d and so on. These systems are currently being used by hundreds of users worldwide.
Efficient utilization of social media during disasters. Research has shown that microblogging sites like Twitter have become important sources of real-time information during disaster events. A significant amount of valuable situational information (updates about a current situation) is available from these sites. However, this information is immersed among hundreds of thousands of tweets, mostly containing sentiments and opinions of the masses who are posting during such events. To effectively utilize microblogging sites during disaster events, a series of research work conducted by CNeRG IIT Kharagpure has extracted the situational information from among the large amounts of sentiment and opinion, determined the humanitarian categories like 'infrastructure damage,' 'missing or found people,' or 'relief required' of the tweets, and summarized the situational information in real time, to help decision-making processes when time is critical.
Another important observation is that apart from English, people also post situational updates in their local languages (predominantly Hindi in India)—hence the classification-summarization framework was extended to Hindi as well as code-mix (for example, part Hindi, part English) tweets. It has also been observed that some people take advantage of a panic situation, posting offensive content targeting specific religious communities during a disaster. Such communal posts deteriorate law and order and unfortunately it has been observed on the Indian subcontinent that this phenomenon is prevalent even during a natural disaster. Methods to detect such communal tweets and to characterize users who initiate and/or propagate them were developed.
Election and social media: Researchers in India have studied in detail the use of social media during the April/May 2019 elections in India and made several observations.f Besides the widespread usage of misleading messages and suspected (fake/bot) accounts, which are now observed in almost all elections, there were several specialties, including a substantial amount of satire video; female verified handles demonstrate more engagement compared to male verified accounts; and an important trending hashtags has been #Main-BhiChowkidar (#IamtheWatchMan), which prompted around 5,000 users to add Chowkidar (Watchman) to their name in the social media handle.
Code mixing on social media. There is a widespread practice of writing Indian languages using Roman script as well as mixing it with English during writing/speaking,g a phenomenon referred to as linguistic code-mixing or code-switching. For any analysis of social media content from India, correct processing of code-mixed text is an absolute necessity; however, traditional natural language processing (NLP) modules such as language identifiers, POS taggers, translators, and word aligners treat linguistic code-switching data either as noise or as a new language (for example, Hinglish for Hindi-English code mixing). Both views are limited because the former does not recognize the complexity and socio-pragmatics of the phenomenon, whereas the latter does not utilize the fact that code mixing is a grammatically informed combination of two languages. Further, bilingual speakers show different language references depending on the topic of discussion and sentiment expressed. This implies that ignoring code-mixed patterns or conducting content-analysis only for the predominant language over social media (usually English) can lead to misleading conclusions, and are bound to miss out on social and discourse-level nuances in the data. Several researchers from India have worked to address different aspects of code-switching; Microsoft Research India, under project Melange,h has largely led the initiative. Several semi-supervised10 techniques to automatically produce a large, annotated code-mixed dataset are being developed to help the community efficiently perform downstream supervised NLP tasks.
Killfies for social media. In recent years, the posting of selfies (or digital self-portraits) on social media websites such as Facebook, Instagram, and Snapchat has become a part of mainstream culture. Often people portray their adventurousness by posting dangerous selfies (aka killfies). Since March 2014, 238 people are reported to have been killed while taking selfies,i with India dominating these statistics with 141 deaths. Given the increasing penetration of mobile technology, high usage statistics, and the disturbances caused by such behavior, India is one of the prime regions where this problem is particularly relevant. Research conducted by [email protected] Delhij identifies dangerous selfies. The researchers have created datasets, classifiers, apps, and location-marker tools in this context. A convolutional neural network-based classifier to identify dangerous selfies posted on social media using only the image (no metadata) gives an accuracy of 98%. The Saftie Camerak app based on the developed classifier works in real world settings and detects and warns a user if the location is potentially dangerous.
Important funding initiatives. There has been a lot of funding initiatives both from government and non-government agencies to popularize social media research. Among those initiatives is the Indo-German Max Planck Center for Computer Science—a five-year project on Understanding, leveraging and deploying online social networks, jointly funded by the Indian Department of Science and Technology and Max Planck Society. Another initiative is the Media Lab Asia and Information Technology Research Academy (ITRA)-funded five-year project on Post disaster situation analysis and resource management, which patronized the research on investigating the role of social media for disaster management.
Challenges. Presently, the world is witnessing several negative impacts of OSMs. Hence, it is important for the computing world, with intense research input from scientists all over the world, to mitigate these impacts. The specific problems are many—fake news, hate speech, the shaming of individuals or groups. It is now clear that in the garb of spontaneity, companies, political parties, and individuals are constantly manipulating the systems to produce trending topics and thus control discussions on social media. The problems are compounded in India with the unprecedented rise in use of local or code-mix languages; hence the need for special attention from Indian researchers. Another diagonally opposite area of research would be to leverage social media for social good; work on post-disaster management as reported here; and future scopes including utilizing social media content to devise better governance mechanisms, supporting individuals/groups with health-related issues, and making quality education accessible to the huge population by connecting teachers with students located in different places.
Acknowledgments. The authors thank Sunita Sarawagi, Abir De, and the anonymous reviewers for providing constructive feedback.
1. Bhattacharya, P. et al. Deep Twitter diving: Exploring topical groups in microblogs at scale. In Proceedings of the 17th ACM Conf. Computer Supported Cooperative Work and Social Computing, 2014, 197–210.
2. Chakraborty, A., Messias, J., Benevenuto, F., Ghosh, S., Ganguly, N. and Gummadi, K.P. Who makes trends? Understanding demographic biases in crowdsourced recommendations. In Proceedings of the 11th Intern. AAAI Conf. Web and Social Media, 2017.
4. De, A., Valera, I., Ganguly, N., Bhattacharya, S. and Gomez-Rodriguez, M. Learning and forecasting opinion dynamics in social networks. In Proceedings of the 30th Inter. Conf. Neural Information Processing Systems, 2016, 397–405.
5. Maity, S.K., Chakraborty, A., Goyal, P. and Mukherjee, A. Opinion conflicts: An effective route to detect incivility in Twitter. In Proc. ACM Hum.-Comput. Interact. Article 117 (2018), 117:1–117:27.
6. Pratapa, A., Bhat, G., Choudhury, M., Sitaram, S., Dandapat, S. and Bali, K. Language modeling for code-mixing: The role of linguistic theory based synthetic data. In Proceedings of the 56th Annual Meeting of the Assoc. Computational Linguistics, Vol.1. (Melbourne, Australia, 2018), 1543–1553; https://www.aclweb.org/anthology/P18-1143
8. Rudra, K., Goyal, P., Ganguly, N., Mitra, P. and Imran, M. Identifying sub-events and summarizing disaster-related information from microblogs. In Proceedings of the 41st Intern. ACM SIGIR Conf. Research and Development in Info. Retrieval, 2018, 265–274.
9. Sachdeva, N. and Kumaraguru, P. Call for service: Characterizing and modeling police response to serviceable requests on Facebook. In Proceedings of the ACM Conf. Computer-Supported Cooperative Work and Social Computing, 2017.
11. Zafar, M.B., Bhattacharya, P., Ganguly, N., Ghosh, S. and Gummadi, K.P. On the wisdom of experts vs. crowds: Discovering trustworthy topical news in microblogs. In Proceedings of the ACM Conf. Computer-Supported Cooperative Work and Social Computing, 2016, 438–451.
g. for example, a bilingual Hindi/English speaker posts on Twitter: "aj patakhe to india me hi phutenge, sure it would be," where the italicized segment ("today fireworks will occur in India only") is in Hindi written in Roman script.
©2019 ACM 0001-0782/19/11
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.
No entries found