There has been a phenomenal increase in the use of online social media (OSM) services in India, including Facebook, Twitter, Instagram, LinkedIn, and YouTube. In addition to these services, one-to-one messaging services like WhatsApp have 200 million users, the highest in the world. India has 462 million users accessing the Internet, among these: Facebook has 250+ million users, LinkedIn 42+ million, and Twitter 23+ million users, and the majority of users access these services through their mobile phones.
These services have had a profound impact in India—overall digital literacy has increased, people are more connected, dissemination of local language content has increased, information exchanged during crises is substantial, and more. The deep penetration of social media services also has negative effects—the propagation of false information and hate, an increase in spammers and phishers, users are losing social skills, and more. Newness of technology/mobile phones, low-literacy rates, and cheaper mobile data rates are cited as negative impacts of social media services on society.
Research has been mainly directed toward regulation of content generated on OSM. It can be classified in the following categories:
- Identifying topical interests and expertise of the users in online behavior1,11 and efficiently matching the consumers and producers of content;
- Mining useful content from social media, for example, finding actionable information from the OSM to help law-enforcement agencies9 and relief and rescue teams during disaster;7,8
- Identifying harmful content, namely analyzing hate and spam content on YouTube and Twitter,5 and analyzing the spread of misinformation/fake content on social media (TweetCred, Facebook Inspector, WhatsFarziaa);
- Identifying bias in content recommendation of news to users of social media;2
- Impact of content on determining the dynamics of opinion over social networks.3,4
Note that with the rising usage of local and code-mixed (that is, local language + English) languages in content generation, a lot of research is also directed toward mining in presence of such content.6 Selfies form a substantial part of social media image content and it has been found that clicking selfies in many cases can lead to accidents. Hence, another line of research has been to accurately communicate to users risks involved with a location chosen for taking selfies, as with the Saftie and Saftie Camera apps.b
Research enumerated earlier provides an overview of some of the ongoing work in the area of social media conducted by Indian scientists, but is by no means exhaustive. Here, we elaborate on some of the work, specifically focusing on a set of work that helps users get access to ‘useful’ and ‘sanitized’ content. We will also talk about the issues related to code-mixed text and the specific research undertaken to identify dangerous spots for clicking selfies.
A lot of research is directed toward code-mixed content, which combines a local language and English.
Search and recommendation systems over OSM. In order to develop search and recommendation systems over OSMs, it is critical to have accurate methodologies for tasks like inferring the topical interests and expertise of users, and searching for experts on specific topics. Researchers proposed completely novel crowdsourcing-based methodologies for these tasks, for example, the topics of expertise of a user are inferred based on how other users describe the said user.
The proposed methodologies are far more accurate than content-based techniques, in inferring a wide range of topics of interest/expertise of users and identifying topical experts. It was earlier thought that OSMs like Twitter are only used for casual conversation among friends. However, several works1,11 showed that Twitter is actually a treasure-trove of information on thousands of topics, ranging from popular topics like politics and sports, to specialized topics like neurology and forensics. The research has identified thousands of groups of Twitter users interested in these diverse topics. Along with proposing novel algorithms, the endeavor has resulted in the development and public deployment of several Web-based systems on the Twitter platform based upon the proposed algorithms, for example, topical search systems,c systems for inferring topical interest/expertise of users,d and so on. These systems are currently being used by hundreds of users worldwide.
Efficient utilization of social media during disasters. Research has shown that microblogging sites like Twitter have become important sources of real-time information during disaster events. A significant amount of valuable situational information (updates about a current situation) is available from these sites. However, this information is immersed among hundreds of thousands of tweets, mostly containing sentiments and opinions of the masses who are posting during such events. To effectively utilize microblogging sites during disaster events, a series of research work conducted by CNeRG IIT Kharagpure has extracted the situational information from among the large amounts of sentiment and opinion, determined the humanitarian categories like ‘infrastructure damage,’ ‘missing or found people,’ or ‘relief required’ of the tweets, and summarized the situational information in real time, to help decision-making processes when time is critical.
Another important observation is that apart from English, people also post situational updates in their local languages (predominantly Hindi in India)—hence the classification-summarization framework was extended to Hindi as well as code-mix (for example, part Hindi, part English) tweets. It has also been observed that some people take advantage of a panic situation, posting offensive content targeting specific religious communities during a disaster. Such communal posts deteriorate law and order and unfortunately it has been observed on the Indian subcontinent that this phenomenon is prevalent even during a natural disaster. Methods to detect such communal tweets and to characterize users who initiate and/or propagate them were developed.
Election and social media: Researchers in India have studied in detail the use of social media during the April/May 2019 elections in India and made several observations.f Besides the widespread usage of misleading messages and suspected (fake/bot) accounts, which are now observed in almost all elections, there were several specialties, including a substantial amount of satire video; female verified handles demonstrate more engagement compared to male verified accounts; and an important trending hashtags has been #Main-BhiChowkidar (#IamtheWatchMan), which prompted around 5,000 users to add Chowkidar (Watchman) to their name in the social media handle.
Code mixing on social media. There is a widespread practice of writing Indian languages using Roman script as well as mixing it with English during writing/speaking,g a phenomenon referred to as linguistic code-mixing or code-switching. For any analysis of social media content from India, correct processing of code-mixed text is an absolute necessity; however, traditional natural language processing (NLP) modules such as language identifiers, POS taggers, translators, and word aligners treat linguistic code-switching data either as noise or as a new language (for example, Hinglish for Hindi-English code mixing). Both views are limited because the former does not recognize the complexity and socio-pragmatics of the phenomenon, whereas the latter does not utilize the fact that code mixing is a grammatically informed combination of two languages. Further, bilingual speakers show different language references depending on the topic of discussion and sentiment expressed. This implies that ignoring code-mixed patterns or conducting content-analysis only for the predominant language over social media (usually English) can lead to misleading conclusions, and are bound to miss out on social and discourse-level nuances in the data. Several researchers from India have worked to address different aspects of code-switching; Microsoft Research India, under project Melange,h has largely led the initiative. Several semi-supervised10 techniques to automatically produce a large, annotated code-mixed dataset are being developed to help the community efficiently perform downstream supervised NLP tasks.
Killfies for social media. In recent years, the posting of selfies (or digital self-portraits) on social media websites such as Facebook, Instagram, and Snapchat has become a part of mainstream culture. Often people portray their adventurousness by posting dangerous selfies (aka killfies). Since March 2014, 238 people are reported to have been killed while taking selfies,i with India dominating these statistics with 141 deaths. Given the increasing penetration of mobile technology, high usage statistics, and the disturbances caused by such behavior, India is one of the prime regions where this problem is particularly relevant. Research conducted by Precog@IIIT Delhij identifies dangerous selfies. The researchers have created datasets, classifiers, apps, and location-marker tools in this context. A convolutional neural network-based classifier to identify dangerous selfies posted on social media using only the image (no metadata) gives an accuracy of 98%. The Saftie Camerak app based on the developed classifier works in real world settings and detects and warns a user if the location is potentially dangerous.
Important funding initiatives. There has been a lot of funding initiatives both from government and non-government agencies to popularize social media research. Among those initiatives is the Indo-German Max Planck Center for Computer Science—a five-year project on Understanding, leveraging and deploying online social networks, jointly funded by the Indian Department of Science and Technology and Max Planck Society. Another initiative is the Media Lab Asia and Information Technology Research Academy (ITRA)-funded five-year project on Post disaster situation analysis and resource management, which patronized the research on investigating the role of social media for disaster management.
Challenges. Presently, the world is witnessing several negative impacts of OSMs. Hence, it is important for the computing world, with intense research input from scientists all over the world, to mitigate these impacts. The specific problems are many—fake news, hate speech, the shaming of individuals or groups. It is now clear that in the garb of spontaneity, companies, political parties, and individuals are constantly manipulating the systems to produce trending topics and thus control discussions on social media. The problems are compounded in India with the unprecedented rise in use of local or code-mix languages; hence the need for special attention from Indian researchers. Another diagonally opposite area of research would be to leverage social media for social good; work on post-disaster management as reported here; and future scopes including utilizing social media content to devise better governance mechanisms, supporting individuals/groups with health-related issues, and making quality education accessible to the huge population by connecting teachers with students located in different places.
Acknowledgments. The authors thank Sunita Sarawagi, Abir De, and the anonymous reviewers for providing constructive feedback.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment