On the morning of November 9, 2016, the world woke up to the shocking outcome of the U.S. Presidential election: Donald Trump was the 45th President of the United States of America. An unexpected event that still has tremendous consequences all over the world. Today, we know that a minority of social bots—automated social media accounts mimicking humans—played a central role in spreading divisive messages and disinformation, possibly contributing to Trump's victory.16,19
In the aftermath of the 2016 U.S. elections, the world started to realize the gravity of widespread deception in social media. Following Trump's exploit, we witnessed to the emergence of a strident dissonance between the multitude of efforts for detecting and removing bots, and the increasing effects these malicious actors seem to have on our societies.27,29 This paradox opens a burning question: What strategies should we enforce in order to stop this social bot pandemic?
In these times—during the run-up to the 2020 U.S. elections—the question appears as more crucial than ever. Particularly so, also in light of the recent reported tampering of the electoral debate by thousands of AI-powered accounts.a
What struck social, political, and economic analysts after 2016—deception and automation—has been a matter of study for computer scientists since at least 2010. In this work, we briefly survey the first decade of research in social bot detection. Via a longitudinal analysis, we discuss the main trends of research in the fight against bots, the major results that were achieved, and the factors that make this never-ending battle so challenging. Capitalizing on lessons learned from our extensive analysis, we suggest possible innovations that could give us the upper hand against deception and manipulation. Studying a decade of endeavors in social bot detection can also inform strategies for detecting and mitigating the effects of other—more recent—forms of online deception, such as strategic information operations and political trolls.
Social bots coexist with humans since the early days of online social networks. Yet, we still lack a precise and well-agreed definition of what a social bot is. This is partly due to the multiple communities studying them and to the multifaceted and dynamic behavior of these entities, resulting in diverse definitions each focusing on different characteristics. Computer scientists and engineers tend to define bots from a technical perspective, focusing on features such as activity levels, complete or partial automation, use of algorithms and AI. The existence of accounts that are simultaneously driven by algorithms and by human intervention led to even more fine-grained definitions and cyborgs were introduced as either bot-assisted humans or human-assisted bots.3 Instead, social scientists are typically more interested in the social or political implications of the use of bots and define them accordingly.
Figure. Facebook's War Room in Menlo Park, CA, on Oct. 17, 2018, ahead of Brazil's runoff election. The company has worked to assuage public concern about the fake accounts, misinformation, and foreign interference that cloud discussion about elections on its site.
Social bots are actively used for both beneficial and nefarious purposes.13 Regarding the detection of benign or malicious social bots, the majority of existing works focused on detecting the latter. The reason is straightforward if we take into account the categorization proposed by Stieglitz et al.30 Bots were categorized according to their intent and to their capacity of imitating humans, with the majority of existing specimen being either benign bots that do not aim to imitate humans (for example, news and recruitment bots, bots used in emergencies) or malicious ones relentlessly trying to appear as human-operated. The detection of the former category of bots does not represent a challenge, and scholars devoted the majority of efforts to spot the latter, also because of their tampering with our online ecosystems. Indeed, the wide array of actions that social bots perform and the negligible cost for creating and managing them en masse, open up the possibility to deploy armies of bots for information warfare, for artificially inflating the popularity of public characters and for manipulating opinions.
On the onset of the sudden surge of interest around automation and deception, several studies measured the extent of the social bot pandemic. Results are nothing less than worrying. The average presence of bots was estimated to be in the region of 15% of all active Twitter accounts in 2017,31 and 11% of all Facebook accounts in 201938—a considerable share indeed. Even more worrisome, when strong political or economic interests are at stake, the presence of bots dramatically increases. A 2019 study reported that 71% of Twitter users mentioning trending U.S. stocks, are likely to be bots.8 Similar results were obtained about the presence of bots in online cryptocurrency discussions24 and as part of the "infodemics" about the COVID-19 pandemic.14 Other studies specifically focused on political activity, concluding that bots played a role in strategic information operations orchestrated ahead of numerous worldwide events, as shown in Figure 1. Despite taking part in political discussions about all countries highlighted in figure, bots did not always have a real impact. In fact, scholars still lack a widespread consensus on the impact of social bots, with some studies reporting on their pivotal role for increasing disinformation's spread, polarization, and hateful speech,27,29 and competing results claiming that bots do not play a significant role in these processes.32 The ubiquity of social bots is also partly fueled by the availability of open source code, for which Bence Kollanyi reported an exponential growth that led in 2016 to more than 4,000 GitHub repositories containing code for deploying Twitter bots.22 Other investigations demonstrated this trend has not halted yet. In fact, by 2018, scholars found more than 40,000 public bot repositories.1 The looming picture is one where social bots are among the weapons of choice for deceiving and manipulating crowds. These results are backed by the same platforms where information operations took place—namely, Facebook,b Twitterc and Redditd—that banned tens of thousands accounts involved in coordinated activities since 2016.
Given the reported role of bots in several of the ailments that affect our online ecosystems, many techniques were proposed for their detection and removal—adding to the great coverage from news outlets—contributing to the formation of a steeply rising publication trend. Today, new studies on the characterization, detection, and impact estimation of bots are published at an impressive rate, as shown in Figure 2. Should this skyrocketing trend continue, by 2021 there will be more than one new paper published per day, which poses a heavy burden on those trying to keep pace with the evolution of this thriving field. Perhaps even more importantly, the rate at which new papers are published implies that a huge worldwide effort is taking place in order to stop the spread of the social bot pandemic. But where is all this effort leading? To answer this question, we first take a step back at the early days of social bot detection.
The first work that specifically addressed the detection of automated accounts in online social networks dates back to January 2010.37 In the early days, the vast majority of attempts at bot detection featured two distinctive characteristics: they were based on supervised machine learning, and on the analysis of individual accounts. In other words, given a group of accounts to analyze, detectors were separately applied to each account of the group, to which they assigned a binary label (either bot or legitimate). This approach to bot detection is schematized in panel A of Figure 3. Here, the key assumption is that bots and humans are clearly separable and that each malicious account has individual features that make it distinguishable from legitimate ones. This approach to the task of social bot detection also revolves around the application of off-the-shelf, general-purpose classification algorithms on the accounts under investigation and on designing effective machine learning features for separating bots from legitimate accounts.
For example, Cresci et al. developed a set of supervised machine learning classifiers for detecting so-called fake followers, a type of automated accounts commonly used to artificially boost the popularity of the public characters that buy them.4 Fake followers can be bought for as low as $12 per 1,000 followers in the surface Web. As a result, they are fairly common.e The authors analyzed some 3,000 fake followers obtained from different vendors and revealed that the simplistic nature of these accounts renders their detection rather easy, even when leveraging only 19 data- and computation-inexpensive features.4 After all, fake followers need not perform complex tasks such as producing content or engaging in conversations. Other detection systems make use of large numbers of machine learning features to spot social bots. By leveraging more than 1,200 features of an account, Botometer evaluates possible bots based on their profile characteristics, social network structure, the content they produce, their sentiment expressions, and the timings of their actions.35 Instead of focusing on a specific type of bots, as Cresci et al. did, Botometer represents a "general purpose" bot detector. The generality and ease of deployment of this detector are however counterbalanced by a reduced bot detection accuracy.5,17 The two previous detectors simultaneously analyze multiple dimensions of suspicious accounts in order to spot possible bots. Instead, other systems solely focus on network characteristics, textual content of shared messages, or profile information. These systems are typically easier to game, since they only analyze a single facet of the complex behavior of bots.
Newer bots often feature advanced characteristics that make them much more difficult to detect with respect to older ones.
Despite achieving promising initial results, these early approaches have a number of drawbacks. The first challenge in developing a supervised detector is related to the availability of a ground truth dataset to use in the training phase of the classifier. In most cases, a real ground truth is lacking, and the labels are simply given by human operators that manually analyze the data. Critical issues arise as a consequence of the diverse definitions of social bots, resulting in different labeling schemes.18 Moreover, humans have been proven to suffer from several annotation biases and to largely fail at spotting recent sophisticated bots, with only 24% bots correctly labeled as such by humans in a recent experiment.5 Furthermore, these approaches typically output binary classifications. In many cases however, malicious accounts feature a mixture of automated and human-driven behaviors that cannot be accounted for with simple binary labels. To make matters worse, another major drawback of individual detectors is caused by the evolutionary nature of social bots.
Initial success at social bot detection forced bot developers to put in place sophisticated countermeasures. Because of this, newer bots often feature advanced characteristics that make them much more difficult to detect with respect to older ones. This vicious circle leads to the development of always more sophisticated social bots and is commonly referred to as bot evolution.
Noteworthy works published by Chao Yang et al. between 2011 and 2013 provided the first evidence and the theoretical foundations to study social bot evolution.34 The first wave of social bots that populated OSNs until around 2011 was made of rather simplistic bots. Accounts with very low reputation due to few social connections and posted messages and featuring clear signs of automation as shown in panel A of Figure 4. Conversely, the social bots studied by Chao Yang et al. appeared as more popular and credible, given the relatively large number of their social connections. In addition, they were no longer spamming the same messages over and over again. Leveraging these findings, authors developed a supervised classifier that was specifically designed for detecting evolving bots. Initially, the classifier proved capable of accurately detecting this second wave of bots. Time went by and new studies acknowledged the rise of a third wave of bots that spread through online social networks from 2016 onward,5,13 as shown in panel C of Figure 4. Unfortunately, Yang's classifier for detecting evolving bots was no longer successful at spotting the third wave of malicious accounts.6 The previous example serves as anecdotal evidence of bot evolution and of the detrimental effect it has on detectors. Additional quantitative evidence is reported in other studies that evaluated the survivability of different bots—that is, their capability of continually evading detection and avoiding being removed from social platforms—and the ability of humans in spotting bots in-the-wild. Results showed that only 5% of newer bots are removed from social platforms, whereas older ones are removed 60% of the times.5 Moreover, hundreds of tech-savvy social media users that participated in a crowd-sourcing experiment were able to tell apart newer bots from legitimate users only 24% of the times. The same users were instead able of spotting older bots 91% of the times.5
The previous anecdotal and quantitative results tell us that current sophisticated bots are hardly distinguishable from legitimate accounts if analyzed one at a time, as supervised classifiers and crowdsourcing participants did. In fact, newer bots are more similar to legitimate human-operated accounts than to other older bots. Among the reasons for the human-like appearance of many bots is an increased hybridization between automated and human-driven behaviors. These cyborgs exist and operate halfway between the traditional concepts of bots and humans, resulting in weakened distinctions and overlapping behaviors between the two. Moreover, they are now using the same technological weapons as their hunters, such as powerful AI techniques for generating credible texts (for example, via the GPT-2 and 3 deep learning models)f and profile pictures (for example, via StyleGANs deep learning models).g Indeed, the possibility for malicious accounts to leverage deepfake texts, profile pictures, and videos is worrying, and worthy of increased attention.10 Kate Starbird recently discussed a related issue in an inspiring piece on Nature.28 Similar to the hazy duality between "bots" and "humans," she posits that the boundaries between what is "fake" and what is "real," are blurring. To this end, human-like bots and cyborgs are just the tip of the iceberg, with other newer forms of deception—such as political trolls and "unwitting humans"—that are bound to make the online information landscape an even grimmer place. Figure 4 provides some examples of Twitter profiles that demonstrate how real-world bots evolved over the course of the years. As one form of "social Web virus," bots mutated thus becoming more resistant to our antibodies. The social bot pandemic gradually became much more difficult to stop. Within this global picture, dichotomous classifications—such as human vs bot, fake vs real, coordinated vs not coordinated—might represent oversimplifications, unable to grasp the complexity of these phenomena and unlikely to yield accurate and actionable results.
Ultimately, the findings about the evolution of online automation and deception tell us the naive assumption of early, supervised bot detection approaches—according to which bots are clearly separable from legitimate accounts—is no longer valid.
The difficulties at detecting sophisticated bots with early approaches rapidly gave rise to a new research trend. Since 2012–2013, several different teams independently proposed new systems that, despite being based on different techniques and implementations, shared the same concepts and philosophy. As schematized in Figure 3 (panel B), the primary characteristic of these new systems, is that of targeting groups of accounts as a whole, rather than individual accounts. The rationale for this design choice is that bots act in coordination with other bots, forming botnets to amplify their effects.40 The existence of botnets does not necessarily imply that accounts are explicitly connected in the social network, but rather that they are maneuvered by a single entity and that they share common goals. As such, botnets leave behind more traces of their automation and coordination than those left behind by sophisticated single bots.5
Devising techniques for spotting suspiciously coordinated and synchronized behaviors is thus likely to yield better results than analyzing individual accounts. In addition, by analyzing large groups of accounts, detectors also have access to more data for fueling powerful—yet data-hungry—AI algorithms. In 2018, approximately five after the emergence of the group approach to bot detection, also Facebookh and Twitteri acknowledged the importance of focusing on coordinated and inauthentic behaviors. The second common feature to the majority of group detectors is the proposal of important algorithmic contributions, thus shifting from general-purpose machine learning algorithms such as support vector machines and decision trees, to ad-hoc algorithms that are specifically designed for detecting bots, in an effort to boost detection performance. Finally, many group detectors are also based on unsupervised or semi-supervised approaches. Here the idea is to overcome the generalization deficiencies of supervised detectors that are severely limited by the availability of exhaustive and reliable training datasets.11
To quantitatively demonstrate the rise of group approaches to bot detection, Figure 5 illustrates the results of an extensive longitudinal classification. We surveyed more than 230 papers that proposed a bot detection technique and we manually classified each detector along two orthogonal dimensions. The first dimension (panel A) highlights whether detectors target individual accounts or groups of accounts. Then in panel B, we classify detectors according to their high-level approach to the task. In particular, we classified detectors as either based on: heuristics—that is, based on simple rules; crowdsourcing—that is, relying on the judgement of experts; supervised machine learning—such as those based on classification and requiring a labeled training dataset; unsupervised machine learning—such as those based on clustering that do not necessitate of labeled training data; or adversarial approaches—including adversarial machine learning. To better explain our methodology, we provide a couple of examples showing how well-known bot detectors were classified. The system proposed in Ruan et al.26 is designed for detecting compromised accounts—originally legitimate accounts that have been taken over by an attacker. It initially builds a behavioral profile for each investigated account. Then, the system is able to detect compromised accounts via anomaly detection when a behavior diverges significantly with respect to its associated profile. This system is classified as an individual detector (since the behavioral profile of an account depends solely on its own actions) and as an unsupervised detector (since it leverages an anomaly detection technique). Conversely, another system looks for suspiciously large similarities between the sequence of activities of vast groups of accounts.6 The activity of each account is encoded as a character's string and similarities between account activities are computed by applying the longest common subsequence metric to such strings. Suspiciously long subsequences between activity strings are identified via peak detection, and all those accounts that share the long activity subsequence are labeled as bots. Given these characteristics, this work contributes to group-based bot detectors (since it analyzes a group of accounts looking for similar activity sequences) as well as to unsupervised machine learning approaches (since it leverages an unsupervised peak detection algorithm). Generalizing the two previous examples, we note a few interesting patterns that derive from our classification. The vast majority of techniques that perform network analyses, for instance by considering the social or interactions graph of the accounts, are naturally classified as group based. More often than not, they also propose unsupervised approaches. Contrarily, all techniques based on the analysis of the textual content of posted messages, such as those works that exclusively employ natural language processing techniques, are supervised detectors that analyze individual accounts.
By leveraging classification results reported in Figure 5, we can also derive a number of additional insights. First of all, the rising publication trend of bot detectors follows the general trend of interest around social bots, previously shown in Figure 2. Indeed, since 2015 there has been a steadily increasing number of bot detectors published every year. From the trends shown in panel A it is also strikingly evident that group-based approaches, revolving around the analysis of collective behaviors, are increasingly frequent. In fact, in 2018 the number of newly proposed group-based detectors surpassed for the first time that of detectors based on the analysis of individual accounts. From panel B we note that bot detection approaches based on heuristics and crowdsourcing received very little attention. This is probably due to the many challenges involved in the development of these systems, which ultimately limit their applicability, scalability and detection performance. Instead, the number of new supervised detectors has been constantly increasing since 2012, despite their severe generalization issues.11 The adoption of unsupervised machine learning started in 2013 with the rise of group approaches, and now appears to be stationary. Interestingly, the plateau hit by unsupervised approaches co-occurred with the rise of adversarial ones, which might take their place in the coming years. Although the exact number of new bot detectors per type can slightly vary by analyzing a different set of papers, the big picture that emerges from Figure 5—documenting the trends of individual, group and adversarial approaches—is clear, reliable, and insightful.
Devising techniques for spotting suspiciously coordinated and synchronized behaviors is likely to yield better results than analyzing individual accounts.
As a consequence of this paradigm-shift, group-based detectors are particularly effective at identifying evolving, coordinated, and synchronized accounts. For instance, several group detectors implement graph-based approaches and aim at spotting suspicious account connectivity patterns.20,24 These techniques are suitable for studying both users interacting with content (for example, retweets to someone else's tweets) or with other users (for example, becoming followers of other users). Coordinated and synchronized behaviors appear as near-fully connected communities in graphs, dense blocks in adjacency matrices, or peculiar patterns in spectral subspaces.21 Other techniques adopted unsupervised approaches for spotting anomalous patterns in the temporal tweeting and retweeting behaviors of groups of accounts.2,23 One way to spot accounts featuring suspiciously synchronized behaviors is by computing metrics of distance out of the accounts time series, and by subsequently clustering the accounts. The rationale behind this approach is based on evidence suggesting that human behaviors are intrinsically more heterogeneous than automated ones.7 Consequently, a large cluster of accounts with highly similar behaviors might indicate the presence of a botnet, even in the absence of explicit connections between the accounts. Distance between accounts time series was computed as a warp-correlation coefficient based on dynamic time warping,2 or as the Euclidean distance between the feature vectors computed by an LSTM autoencoder,23 a type of deep neural network that is particularly suitable for extracting latent features from sequential data.
As the switch from individual to group detectors demonstrates, the overall approach to the task of bot detection can have serious repercussions on detection performance. At the same time, some scientific communities tend to favor and stick to a specific approach. For instance, works published within the natural language processing community, quite naturally focus on textual content, thus resulting in a multitude of supervised classifiers that analyze accounts individually and that yield binary labels. In contrast, the complex networks community favors graph-based approaches. As a consequence, some combinations of approaches—above all, text-based detectors that perform unsupervised, group analyses—are almost unexplored and definitely underrepresented in the landscape of existing bot detectors. In the future, it would be advisable for multiply efforts to follow the directions that have been mostly overlooked until now.
So far, we highlighted that a shift took place from individual to group detectors in an effort to contrast social bot evolution. Now, we review the latest advances in the field for gaining possible insights into the future of deception detection. We ground this analysis on two observations:
Firstly, we observe that both the individual and the group-based approaches to social bot detection follow a reactive schema. In practice, when scholars and OSN administrators identify a new group of accounts that misbehave and that cannot be effectively detected with existing techniques, they react and begin the development of a new detection system. Hence, the driving factor for the development of new and better detectors have always been bot mischiefs. A major implication of this approach is that improvements in the detection of bad actors typically occur only sometime after having collected evidence of new mischiefs. Bad actors such as bots, cyborgs, and trolls thus benefit from a long time span—the time needed to design, develop, and deploy a new effective detector—during which they are essentially free to tamper with our online environments. In other words, scholars and OSN administrators are constantly one step behind of malicious account developers. This lag between observations and countermeasures possibly explains the current situation with our online social ecosystems: Despite the increasing number of existing detection techniques, the influence of bots and other bad actors on our online discussions did not seem to decrease.
Our second observation is related to the use of machine learning for the task of social bot detection. The vast majority of machine learning algorithms are designed for operating within environments that are stationary and neutral, if not even benign. When the stationarity and neutrality assumptions are violated, algorithms yield unreliable predictions that result in dramatically decreased performances.15 Notably, the task of social bot detection is neither stationary nor neutral. The stationarity assumption is violated by the mechanism of bot evolution that results in accounts exhibiting different behaviors and characteristics over time. Also, the neutrality assumption is clearly violated, since bot developers are actively trying to fool detectors. As a consequence, the very same algorithms that we have been relying upon for a decade, and for which we reported excellent detection results in our studies, are actually seeing their chances to detect bots in-the-wild severely limited.
Recent developments in machine learning may however come to our rescue and may possibly mitigate both issues emerging from the previous observations. Adversarial machine learning is a paradigm specifically devised for application in those scenarios presenting adversaries motivated in fooling learned models.15 Its high-level goal is to study vulnerabilities of existing systems and possible attacks to exploit them, before such vulnerabilities are effectively exploited by the adversaries. Early detection of vulnerabilities can in turn contribute to the development of more robust detection systems. One practical way to implement this vision is by generating and experimenting with adversarial examples—that is, input instances specifically created to induce errors in machine learning systems.
All tasks related to the detection of online deception, manipulation and automation are intrinsically adversarial. As such, they represent favorable application domains for adversarial machine learning. This intuition resulted in the first papers published in 2018–2019 that initiated the development of an adversarial approach to bot detection, as shown in panel B of Figure 5. In the so-called adversarial bot detection, scholars experiment with meaningful adversarial examples with which they extensively test the capabilities of current bot detectors.9 Within this context, adversarial examples might be sophisticated types of existing bots and trolls that manage to evade detection by current techniques; or even bots that do not exist yet, but whose behaviors and characteristics are simulated, as done by Cresci et al.;9 or bots developed ad-hoc for the sake of experimentation, as done by Grimme et al.17 Finding good adversarial examples can help scholars understand the weaknesses of existing bot detection systems. As a result, bot hunters need not wait anymore for new bot mischiefs in order to adapt their techniques, but instead they can proactively (instead of reactively) test them, in an effort that could quickly make them more robust. In addition, this paradigm accounts for adversaries by design, thus providing higher guarantees for deception detection, which violates the stationarity and neutrality assumptions.
The previous analysis highlights that initial efforts toward adversarial bot detection were driven by the creativity of some researchers and only covered few cases with limited applicability.9,17 In the near future they could instead be powered by the latest developments in AI. Generative adversarial networks (GANs) are a powerful machine learning framework where two competing deep learning networks are jointly trained in a game-theoretic setting.15 In particular, a GAN is composed of a generator network that creates data instances and a discriminator network that classifies data instances, combined as shown in Figure 6 where a GAN is instantiated for a generic task of deception detection. The goal of the generator is that of creating synthetic data instances that resemble the properties of real organic data, while the typical goal of the discriminator is to classify input data instances as either synthetic or organic. The discriminator is evaluated based on its binary classification performance, while the generator is evaluated in terms of its capacity to induce errors in the discriminator, hence the competition between the two networks.
Originally, GANs were proposed as a form of generative model—that is, the focus was posed on the generator network. A notable example of this kind is represented by the GAN trained in Wu et al.33 for creating adversarial examples of social bots that improved the training of downstream detectors. However, with the end-goal of providing even larger improvements on deception detection, we could envision the adoption of GANs for training better discriminator networks. In particular, the generator of a GAN could be used as a generative model for creating many plausible adversarial examples, thus overcoming the previously mentioned limitations in this task and the scarcity of labeled datasets. Then, the whole GAN could be used to test the discriminator against the adversarial examples and to improve its detection performances. This paradigm has never been applied to the task of social bot detection, but it was tested with promising results for related tasks, such as that of fake news generation/detection.39 The adversarial framework sketched in Figure 6 is general enough to be applied to a wide set of deception detection tasks, comprising the detection of social bots, cyborgs, trolls and mis/disinformation. Furthermore, in contrast with existing adversarial approaches for bot detection, it is grounded on an established and successful machine learning framework, rather than on ad-hoc solutions lacking broad applicability.
Despite the high hopes placed on adversarial approaches for detecting deception and automation, this research direction is still in its infancy and, probably due to its recency, is still lagging behind more traditional approaches. As such, efforts at adversarial detection can only be successful if the scientific community decides to rise to the many open challenges. Among them is the development of techniques for creating many different kinds of adversarial examples and to evaluate whether these examples are realistic and representative of future malicious accounts. In spite of these challenges, our analysis and the promising results obtained so far strongly motivate future endeavors in this direction, as also testified by the sparking adversarial trend in Figure 5.
The exponentially growing body of work on social bot detection shown in Figure 2, somehow reassures us that much effort is bound to be devoted in the coming years to the fight of this crucial issue. However, at the same time it also poses some new challenges. Firstly, it is becoming increasingly important to organize this huge body of work. Doing so would not only contribute to a better exploitation of this knowledge but would also allow researchers to more efficiently provide new solutions by avoiding exploring paths that already proved unsuccessful. To this end, this survey aims to provide a contribution to the critical review and analysis of the vast literature on and beyond this topic.
Secondly, the foreseen increase in publications inevitably implies that more bot detectors will be proposed. With the growing number of disparate detection techniques, it is becoming increasingly important to have standard means, such as benchmarks, frameworks works and reference datasets, with which to evaluate and compare them. The present situation is one where we have a suitcase filled with all kinds of tools. Unfortunately, we do not really know how to use them profitably, what the differences are between them, and ultimately, what they are really worth! Buying us yet another tool would not help much. Instead, a few targeted investments aimed at extensively evaluating and comparing our current tools would tremendously increase the usefulness of all our suitcase.
One aspect often overlooked when evaluating bot detectors is their generalizability, that is, their capacity of achieving good detection results also for types of bots that have not been originally considered. To this regard, our analysis lays the foundations of a bi-dimensional generalizability space, sketched in Figure 7. A desirable scenario for the near future would involve the possibility to evaluate any new bot detector against many different types of social bots, thus moving along the y-axis of Figure 7, following the promising approaches recently developed in Echieverria et al.11 and Yang et al.36 It would also be profitable to evaluate detectors against different versions of current bots, thus somehow simulating the evolving characteristics of bots. This could be achieved by applying the adversarial approach previously described for creating many adversarial examples, opening up experimentation along the x-axis of the generalizability space. Combining these two evaluation dimensions, thus extensively exploring the generalizability space, would allow a much more reliable assessment of the detection capabilities of present and future techniques, thus avoiding overestimates of detection performance. In order to reach this ambitious goal, we must first create reference datasets that comprise several different kinds of malicious accounts, including social bots, cyborgs and political trolls, thus significantly adding to the sparse resources existing as of today.j Here, challenges include the limited availability of data itself, missing or ambiguous ground truth and the obsolescence of existing datasets that hardly cope with the rapid evolution of malicious accounts. To this regard, continuous data-sharing initiatives such as that of Twitter for accounts involved in information operations,k are extremely welcome as they can enable the next wave of research on these issues. Then, we should also devise additional ways for creating a broad array of diverse adversarial examples. Doing so would also require quantitative means to estimate the contributions brought by the different adversarial examples, for instance in terms of their novelty and diversity with respect to existing malicious accounts. These challenges currently stand as largely unsolved and call for the highest effort of our scientific community.
Our longitudinal analysis of the first decade of research in social bot detection revealed some interesting trends. Early days were characterized by simple supervised detectors analyzing accounts individually. Unsupervised detectors emerged in 2012–2013 and shifted the target to groups of misbehaving accounts. Finally, we highlighted the new rising trend of adversarial approaches. Our analysis revealed that for more than a decade we fought each of the menaces posed by sophisticated social bots, cyborgs, trolls, and collusive humans, separately. Now, thanks to the rise of AI-enabled deception techniques such as deepfakes, the most sophisticated of these malicious actors are bound to become indistinguishable from one another, and likely also from legitimate accounts. It is thus becoming increasingly necessary to focus on spotting the techniques used to deceive and to manipulate, rather than trying to classify individual accounts by their nature. Inauthentic coordination is an important piece of the deception puzzle, since it is exploited by bad actors for obtaining visibility and impact. Moreover, it is oblivious to the different types of bad actors. In other words, both our findings and recent reflections17,28 suggest that we should keep on moving away from simple supervised approaches focusing on individual accounts and producing binary labels. We should instead take on the challenging task of embracing the complexity of deception, manipulation and automation by devising unsupervised techniques for spotting suspicious coordination. In addition, future techniques should not provide oversimplistic binary labels as often done and just as often criticized,l but should instead produce multifaceted measures of the extent of suspicious coordination.
Our in-depth analysis revealed the emergence of group-based approaches several years before "coordinated inauthentic behavior" was acknowledged as the main threat to our online social ecosystems by the general public and by the social platforms themselves. Among the most pressing challenges along this line of research is the problem of scalability of group-based detectors and the intrinsic fuzziness of "inauthentic coordinaton." In fact, the scalable and generalizable detection of coordination is still a largely open challenge, with only few contributions proposed so far.12,25 Similarly, computational means to discriminate between authentic and inauthentic coordination are yet to be proposed and evaluated. Interestingly, the same analysis that anticipated worldwide interest in inauthentic coordination, is now suggesting that adversarial approaches might give us an edge in the long-lasting fight against online deception.
Summarizing the main suggestions stemming for our extensive analysis, future deception detection techniques should: focus on identifying suspicious coordination independently of the nature of individual accounts; avoid providing binary labels in favor of fuzzier and multifaceted indicators; favor unsupervised/semi supervised approaches over supervised ones; and account for adversaries by design. In addition, part of the massive efforts we dedicated to the task of detection should also be reallocated to measure (human) exposure to these phenomena and to quantify the impact they possibly have. Only through enacting these changes we will be able to develop tools that better represent the existing reality, thus providing actionable results to the many scientific communities and stakeholders looking at AI and Big Data tools as a compass to adventure in the perilous landscape of online information. These guiding lights stand in front of us as an exciting and rare opportunity, one that we did not have in the past. Acting upon and capitalizing on this opportunity is now exclusively on our shoulders.
Acknowledgments. This research is supported in part by the EU H2020 Program under the scheme INFRAIA-01-2018-2019: Research and Innovation action grant agreement #871042 SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics.
1. Assenmacher, D., Clever, L., Frischlich, L., Quandt, T., Trautmann, H. and Grimme, C. Demystifying Social Bots: On the Intelligence of Automated Social Media Actors. Social Media + Society. SAGE, 2020.
5. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proceedings of the 26th Intern. World Wide Web Conf. Companion (2017). IW3C2.
6. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing 15, 4 (2017), 561–576.
7. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., and Tesconi, M. Emergent properties, models, and laws of behavioral similarities within groups of Twitter users. Computer Communications 150 (2020), 47–61.
9. Cresci, S., Petrocchi, M., Spognardi, A., and Tognazzi, S. Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection. In Proceedings of the 11th Intern. Conf. Web Science (2019). ACM.
10. Da San Martino, G., Cresci, S., Barrón-Cedeño, A., Yu, S., Di Pietro, R., and Nakov, P. 2020. A survey on computational propaganda detection. In Proceedings of the 29th Intern. Joint Conf. Artificial Intelligence (2020).
11. Echeverrìa, J., De Cristofaro, E., Kourtellis, N., Leontiadis, I., Stringhini, G., and Zhou, S. LOBO: Evaluation of generalization deficiencies in Twitter bot classifiers. In Proceedings of the 34th Annual Computer Security Applications Conf. ACM, 137–146.
16. Gorodnichenko, Y., Pham, T., and Talavera, O. 2018. Social Media, Sentiment and Public Opinions: Evidence from #Brexit and #USElection. Working Paper 24631. National Bureau of Economic Research, 2018.
23. Mazza, M., Cresci, S., Avvenuti, M., Quattrociocchi, W., and Tesconi, M. RTbust: Exploiting temporal patterns for botnet detection on Twitter. In Proceedings of the 11th Intern. Conf. Web Science (2019). ACM.
29. Stella, M., Ferrara, E., and De Domenico, M. Bots increase exposure to negative and inflammatory content in online social systems. In Proceedings of the National Academy of Sciences 115, 49 (2018), 12435–12440.
30. Stieglitz, S., Brachten, F., Ross, B., and Jung, A. Do social bots dream of electric sheep? A categorization of social media bot accounts. In Proceedings of the 17th Australasian Conf. Information Systems (2017).
31. Varol, O., Ferrara, E., Davis, C., Menczer, F., and Flammini, A. Online human-bot interactions: Detection, estimation, and characterization. In Proceedings of the 11th Intern. Conf. Web and Social Media. AAAI, 2017.
35. Yang, K., Varol, O., Davis, C., Ferrara, E., Flammini, A., and Menczer, F. Arming the public with artificial intelligence to counter social bots. Human Behavior and Emerging Technologies 1, 1 (2019), 48–61.
38. Zago, M., Nespoli, P., Papamartzivanos, D., Gil Perez, M., Gomez Marmol, F., Kambourakis, G., and Martinez Perez, G. Screening out social bots interference: Are there any silver bullets? IEEE Communications Mag. 57, 8 (2019), 98–104.
39. Zellers, R., Holtzman, A., Rashkin, H., Bisk, Y., Farhadi, A., Roesner, F., and Choi, Y. Defending against neural fake news. In Proceedings of the 33rd Conf. Neural Information Processing Systems (2019). 9051–9062.
j. One of the few publicly available bot repositories is hosted at: https://botometer.iuni.iu.edu/bot-repository/datasets.html
Copyright held by author/owner. Publication rights licensed to ACM.
Request permission to publish from email@example.com
The Digital Library is published by the Association for Computing Machinery. Copyright © 2020 ACM, Inc.
No entries found