Architecture and Hardware Research highlights

FANG: Leveraging Social Context for Fake News Detection Using Graph Representation

By Van-Hoang Nguyen, Kazunari Sugiyama, Preslav Nakov, and Min-Yen Kan

Posted Apr 1 2022

Abstract
1. Introduction
2. Related Work
3. Methodology
4. Experiments
4.1. Fake news detection results
5. Discussion
6. Conclusion and Future Work
References
Authors
Footnotes

We propose Factual News Graph (FANG), a novel graphical social context representation and learning framework for fake news detection. Unlike previous contextual models that have targeted performance, our focus is on representation learning. Compared to transductive models, FANG is scalable in training as it does not have to maintain the social entities involved in the propagation of other news and is efficient at inference time, without the need to reprocess the entire graph. Our experimental results show that FANG is better at capturing the social context into a high-fidelity representation, compared to recent graphical and nongraphical models. In particular, FANG yields significant improvements for the task of fake news detection and is robust in the case of limited training data. We further demonstrate that the representations learned by FANG generalize to related tasks, such as predicting the factuality of reporting of a news medium.

1. Introduction

Social media have emerged as an important source of information for many worldwide. Unfortunately, not all information they publish is true. During critical events such as political elections or pandemic outbreaks, disinformation with malicious intent,²¹ commonly known as “fake news,” can disturb social behavior, public fairness, and rationality. Many sites and social media have devoted efforts to identify disinformation. For example, Facebook encourages users to report noncredible posts and employs professional fact checkers to expose the news in question. Manual fact-checking is also used by fact-checking websites such as Snopes, FactCheck, PolitiFact, and Full Fact. In order to scale with the increasing amount of information, automated news verification systems consider external knowledge databases as evidence.²³ Evidence-based approaches achieve high accuracy and offer potential explainability, but they also take considerable human effort. Moreover, fact-checking approaches for textual claims based on textual evidence are not easily applicable to claims about images or videos.

Recent work has taken a different tack, by exploring the contextual features of the news-dissemination process. They observed distinctive engagement patterns when social users face fake versus factual news.^{6, 13} For example, the fake news as shown in Table 1 had many engagements shortly after its publication. These are mainly verbatim recirculations with negative sentiment of the original post explained by the typically appalling content of fake news. After that short time window, we see denial posts questioning the validity of the news, and the stance distribution stabilizes afterwards with virtually no support. In contrast, the real news example in Table 1 leads to moderate engagement, mainly comprised of supportive posts with neutral sentiment that stabilize quickly. Such temporal shifts in user perception serve as important signals to distinguish fake from real news.

Table 1. Engagement of social media users with respect to fake and real news articles.

Previous work proposed partial representations of social context with (i) news, sources, and users as major entities and (ii) stances, friendship, and publication as major interactions.^{5, 16, 17, 22} However, they did not put much emphasis on the quality of the representation, on modeling the entities and their interactions, and on minimally supervised settings.

Naturally, the social context of news dissemination can be represented as a heterogeneous network where nodes and edges represent the social entities and the interactions between them, respectively. Network representations have several advantages over some existing Euclidean-based methods^{11, 18} in terms of structural modeling capability for several phenomena such as echo chambers of users or polarized networks of news media. Graphical models also allow entities to exchange information, via (i) homogeneous edges, that is, user–user relationship, source–source citations; (ii) heterogeneous edges, that is, user–news stance expression, source–news publication; as well as (iii) high-order proximity (such as, between users who consistently support or deny certain sources, as illustrated in Figure 1). This allows the representation of heterogeneous entities to be dependent, leveraging not only fake news detection but also related tasks such as malicious user detection and source factuality prediction. Here, we focus on improving contextual fake news detection by enhancing the representations of social entities.

Figure 1. Graph representation of social context.

Our contributions can be summarized as follows:

We propose a novel graph representation that models all major social actors and their interactions (see Figure 1).
We propose the Factual News Graph (FANG), an inductive graph learning framework that effectively captures social structure and engagement patterns, thus improving representation quality.
We report significant improvement in fake news detection when using FANG, and we further show that our model is robust in the case of limited training data.
We show that the representations learned by FANG generalize to related tasks such as predicting the factuality of reporting of a news medium.
We demonstrate FANG’s explainability thanks to the attention mechanism of its recurrent aggregator.

2. Related Work

2.1. Contextual fake news detection

Previous work on contextual fake news detection can be categorized based on the approach used to represent and learn its social context.

Euclidean approaches represent the social context as a flat vector or a matrix of real numbers. They typically learn a Euclidean transformation of the social entity features that best approximates the fake news prediction.¹⁶

However, given our formulation of social context as a heterogeneous network, Euclidean representations are less expressive. Although pioneering work used user attributes such as demographics, news preferences, and social features, for example, the number of followers and friends,²¹ such work did not capture the user interaction landscape, that is, what kind of social figures they follow, which news topics they favor or oppose, and so forth. Moreover, in terms of FANG’s graphical representation, node variables are no longer constrained by the independent and identically distributed assumption, and thus they can reinforce each other’s representation via edge interactions.

Having acknowledged the above limitations, researchers have started exploring non-Euclidean or geometric approaches. In particular, they generalized the idea of using the social context when modeling a target user or the news source network and by developing representations that capture structural features about the entity.

The Capture, Score, and Integrate (CSI) model¹⁸ used linear dimensionality reduction on the user cosharing adjacency matrix and combined it with news engagement features obtained from a recurrent neural network (RNN).

The Tri-Relationship Fake News (TriFN) detection framework²²—although similar to our approach—neither differentiated user engagements in terms of stance and temporal patterns nor modeled source–source citations. Also, matrix decomposition approaches, such as CSI,¹⁸ can be expensive in terms of graph node counts and ineffective for modeling high-order proximity.

Other work on citation source network,⁹ propagation network,¹⁴ and rumor detection² proposed models optimized solely for the objective of fake news detection, without accounting for representation quality, and therefore they are not robust to limited training data and cannot be generalized to other downstream tasks, as we show in Section 5.

2.2. Graph Neural Networks (GNNs)

GNNs have successfully generalized deep learning methods to model complex relationships and interdependencies on graphs and manifolds. Graph Convolutional Networks (GCNs) are among the first methods that effectively approximate convolutional filters.⁷ However, GCNs impose a substantial memory footprint in storing the entire adjacency matrix. They are also not easily adaptable to our heterogeneous graph, where nodes and edges with different labels exhibit different information propagation patterns. Furthermore, GCNs do not guarantee generalizable representations and are transductive, requiring the inferred nodes to be present at training time. This is especially challenging for contextual fake news detection or general social network analysis, as their structure is constantly evolving.

With these considerations in mind, we build our work on GraphSage, which can generate embeddings by sampling and aggregates features from a node’s local neighborhood.⁴ GraphSage offers substantial flexibility in defining the information propagation pattern with parameterized random walks and recurrent aggregators. It is well-suited for representation learning with an unsupervised node proximity loss and generalizes well in minimal supervision settings. Moreover, it uses a dynamic inductive algorithm that allows the creation of unseen nodes and edges at inference time.

3. Methodology

3.1. Fake news detection using social context

Let us first define the social context graph G with its entities and interactions as shown in Figure 1:

A = {a₁, a₂, …} is the list of news articles in question, where each a_i (i = 1, 2, …) is modeled as a feature vector x_a.
S = {s₁, s₂, …} is the list of news sources, where each source s_j (j = 1, 2, …) has published at least one article in A and is modeled as a feature vector x_s.
U = {u₁, u₂, …} is the list of social users, where each user u_k (k = 1, 2, …) has engaged in spreading an article in A or is connected with another user; u_k is modeled as a feature vector x_u.
E = {e₁, e₂, …} is the list of interactions, where each e = {v₁, v₂, t, x_e} is modeled as a relation between two entities v₁, v₂ ∈ A ∪ S ∪ U at time t; t is absent in time-insensitive interactions. The interaction type of e is given as a label x_e.

Table 2 summarizes the characteristics of different types of interactions, both homogeneous and heterogeneous. Stance is a special type of interaction, as it is not only characterized by edge labels and source/destination nodes but also by temporality as shown in the examples in Table 1. Recent work has highlighted the importance of incorporating temporality not only for fake news detection¹⁸ but also for modeling online information dissemination.

Table 2. Interactions in FANG’s social context network.

We can now formally define our task as follows:

Definition 3.1. Context-based fake news detection: Given a social context graph G = (A, S, U, E) constructed from news articles A, news sources S, social users U, and social engagements E, context-based fake news detection is defined as the binary classification task to predict whether a news article a ∈ A is fake or real, in other words, F_c : a → {0, 1} such that,

3.2. Graph construction from social context

News articles. Textual²² and visual²⁴ features have been widely used to model news article contents, by feature extraction, unsupervised semantics encoding, or learned representation. We use unsupervised textual representations as they are relatively efficient to construct and optimize. For each article a ∈ A, we construct a TF.IDF¹⁹ vector from the text body of the article. We enrich the representation of news by weighting the pretrained embeddings from GloVe¹⁵ of each word by its TF.IDF score, forming a semantic vector. Finally, we concatenate the TF.IDF and the semantic vector to form the news article feature vector x_a.

News sources. We focus on characterizing news media sources using the textual content of their websites.⁹ Similar to article representations, for each source s, we construct the source feature vector x_s as the concatenation of its TF.IDF vector and its semantic vector derived from the words in the Homepage and the About Us section, as some fake news websites openly declare their content to be satirical or sarcastic.

Social users. Online users have been studied extensively as the main propagator of fake news and rumors in social media. Shu et al.²² conducted feature analysis of user profiles and pointed out the importance of signals derived from profile description and timeline content. A text description such as “American mom fed up with anti american leftists and corruption. I believe in U.S. constitution, free enterprise, strong military and Donald Trump #maga“ strongly indicates the user’s political bias and suggests the tendency to promote certain narratives. We construct the user vector x_u as a concatenation of a TF.IDF vector and a semantic vector derived from the textual description in the user profile.

Social interactions. For each pair of social actors (v_i, v_j) ∈ A ∪ S ∪ U, we add an edge e = {v_i, v_j, t, x_e} to the list of social interactions E if they are linked via interaction type x_e. Specifically, for the followership interaction, we examine whether user u_i follows user u_j; for the publication interaction, we check whether news article a_i was published by source s_j; for the citation interaction, we examine whether the Homepage of source s_i contains a hyperlink to source s_j. In the case of time-sensitive interactions, that is, publication and stance, we record their relative timestamp with respect to the article’s earliest time of publication.

Stance detection. The task of characterizing the viewpoint of a text with respect to another one is known as stance detection. In the context of fake news detection, we are interested in the stance of a user reply with respect to the title of a news article in question. We consider four stances: support with neutral sentiment or neutral support, support with negative sentiment or negative support, deny, and report.

We classify a post as verbatim reporting of the news article if it matches the article title after cleaning the text from emojis, punctuation, stop words, and URLs. We train a stance detector to classify the remaining posts as support or deny using our own dataset for stance detection between social media posts and news articles, which contains 2527 labeled source–target sentence pairs from 31 news events. For each event with a reference headline, the annotators were given a list of related headlines and posts, and they labeled whether each related headline or post supports or denies the claim made by the reference headline. Aside from the reference headline–related headline or the headline–related post sentence pairs, we further made second-order inferences for related headline–related post sentence pairs. If such a pair expressed a similar stance with respect to the reference headline, we inferred a support stance for the related headline–related post, and deny otherwise. Table 3 shows statistics about the dataset. The interannotator agreement is substantial, with a Cohen’s Kappa of 0.78. We fine-tuned a RoBERTa-large transformer¹⁰ on this data, achieving Accuracy of 0.8857, F₁ score of 0.8379, Precision of 0.8365, and Recall of 0.8395.

Table 3. Statistics about our stance-annotated dataset.

To further subclassify support posts into such with neutral and with negative sentiment, we fine-tuned a RoBERTa-large-based sentiment classifier on the Yelp ReviewPolarity dataset.^a Altogether, the stance of a user-article engagement e is given as stance(e).

3.3. Factual News Graph (FANG) framework

We now describe our FANG learning framework on the social context graph described in Section 3.2. Figure 2 shows an overview of our FANG model. Although optimizing for the fake news detection objective, FANG also learns generalizable representations for the social entities. This is achieved by optimizing three concurrent losses: (i) unsupervised Proximity Loss, (ii) self-supervised Stance Loss, and (iii) supervised Fake News Detection Loss.

Figure 2. Overview of our FANG framework.

Representation learning. We first discuss how FANG derives the representation of each social entity. Previous representation learning frameworks such as node2vec³ computed a node embedding by sampling its neighborhood, as defined by the graph structure, and then optimizing for the proximity loss, similar to word2vec. These methods use the neighborhood structure only, and they are suitable when the auxiliary node features are unavailable or incomplete, that is, when optimizing for each entity’s structural representation separately. Recently, GraphSage⁴ was proposed to overcome this limitation by allowing auxiliary node features to be used jointly with proximity sampling as part of the representation learning.

Let GraphSage(·) be GraphSage’s node-encoding function. Thus, we can now obtain the structural representation z_u ∈ ℝ^d of any user and source node as z_r = GraphSage(r), where d is the structural embedding dimension. For news nodes, we further enrich their structural representation with user engagement temporality, which we showed to be distinctive for fake news detection in Section 1. This can be formulated as learning an aggregation function F(a, U) that maps a news a in question, and its engaged users U to a temporal representation that captures a‘s engagement pattern. Therefore, the aggregating model (that is, the aggregator) has to be time-sensitive. RNNs fulfill this requirement: specifically, the Bidirectional LSTM (Bi-LSTM) with attention can capture long-term dependencies in the information sequence in both the forward and backward directions.¹² By examining the model’s attention, we learn which social profiles influence the decision, thus mimicking human analytic capability.

Our proposed LSTM input is a user-article engagement sequence {e₁, e₂, …, e_|U|}. Let meta(e_i) ∈ ℝ^l = (time(e_i), stance(e_i)) be the concatenation of e_i‘s elapsed time since the news publication and a one-hot stance vector. Each engagement e_i has its representation x_{e_i} = (z_{U_i}, meta (e_i)), where z_{U_i} = GraphSage (U_i).

A Bi-LSTM encodes the engagement sequence and outputs two sequences of hidden states: (i) a forward one, , which starts from the beginning of the engagement sequence, and (ii) a backward one, , which starts from the end of the engagement sequence.

Let w_i be the attention weight paid by our Bi-LSTM encoder to the forward and to the backward hidden states. This attention should be derived from the similarity of the hidden state and the news features, that is, how relevant the engaging users are to the discussed content, and the particular time and stance of the engagement. Therefore, we formulate the attention weight w_i as follows:

where l is the meta dimension, e is the encoder dimension, and M_e∈ ℝ^d×e and M_m∈ ℝ^l×1 are the optimizable projection matrices for engagement and the meta features, which are shared across all engagements. We use w_i to compute the forward and the backward weighted feature vectors as and , respectively.

Finally, we concatenate the forward and backward representation vectors to obtain the overall temporal representation for article a. By explicitly setting 2e = d, we can then combine the temporal and the structural representations as .

Unsupervised proximity loss. We derive the Proximity Loss from the hypothesis that closely connected social entities often behave similarly. This is motivated by the echo chamber phenomenon, where social entities tend to interact with other entities of common interest to reinforce and to promote their narratives. This echo chamber phenomenon encompasses intercited news media sources publishing news of similar content or factuality, as well as social friends expressing similar stance with respect to news article(s) of similar content. Therefore, FANG should assign such nearby entities to a set of proximal vectors in the embedding space. From our observation that social entities are highly polarized, we also hypothesize that loosely connected social entities often behave differently. Thus, we want FANG to enforce that the representations of these disparate entities are distinctive.

The social interactions that define the above characteristics the most are user–user friendship, source–source citation, and news–source publication. As these interactions are either (a) between sources and news or (b) between news, we divide the social context graph into two subgraphs, namely news–source subgraph and user subgraph. Within each subgraph G’, we formulate the following Proximity Loss function:

where z_r ∈ ℝ^d is the representation of entity r, P_r is the set of nearby nodes or positive set of r, N_r is the set of disparate nodes or negative set of r, and q is a weighting factor. P_r is obtained using our fixed-length random walk, and N_r is derived using negative sampling.⁴

Self-supervised stance loss. We also propose an analogous hypothesis for the user–news interaction, in terms of stance. If a user expresses a stance with respect to a news article, their respective representations should be close. For each stance c, we first learn a user projection function α_c(u) = A_cz_u and a news article projection function β_c(a) = B_cz_a that map a node representation of ℝ^d to a representation in the stance space c of ℝ^d_c. Given a user u and a news article a, we compute their similarity score in the stance space c as α(u)^┬ β(a). If u expresses stance c with respect to a, we maximize this score, and we minimize it otherwise. This is the stance classification objective, optimized using the Stance Loss:

where f(u, a, c) = softmax(α_c (u)^┬ β_c (a)) and

Supervised fake news loss. We directly optimize the main learning objective of fake news detection via the supervised Fake News Loss. In order to predict whether an article a is false, we obtain its contextual representation as the concatenation of its representation and the structural representation of its source, that is, v_a = (z_a, z_s).

This contextual representation is then input into a fully connected layer whose outputs are computed as o_a = Wv_a + b, where W ∈ ℝ^2d×1 and b ∈ ℝ are the weights and the biases of the layer. The output value o_a ∈ ℝ is finally passed through a sigmoid activation function σ(·) and trained using the cross-entropy–based Fake News Loss L_news, which we define as follows:

where T is the batch size, y_a = 0 if a is fake, and 1 otherwise.

We define the total loss by linearly combining these three component losses: L_total = L_prox. + L_stance + L_news.

4. Experiments

We conducted our experiments on a Twitter dataset collected by related work on rumor classification^{8, 13} and fake news detection.²⁰ For each article, we collected its source, a list of engaged users, and their tweets if they were not already available in the previous dataset. This dataset also includes Twitter profile description and the list of Twitter profiles of the users that a given target user follows. We further crawled additional data about media sources, such as the content of their Homepage and their About us page, together with their frequently cited sources on their Homepage.

The truth value of the articles—namely, whether they are fake or real news—is based on two fact-checking websites: Snopes and PolitiFact. We release the source code of FANG and the stance detection dataset.^b Table 4 shows some statistics about our dataset.

Table 4. Statistics about our dataset.

4.1. Fake news detection results

We benchmark the performance of FANG on fake news detection against several competitive models: (i) a content-only model, (ii) a Euclidean contextual model, and (iii) another graph learning model.

In order to compare our FANG model with the content-only model, we used a Support Vector Machine (SVM) model on TF.IDF feature vectors constructed from the news content (see Section 3.2). We also compared to a Euclidean model, CSI,¹⁸ a fundamental yet effective recurrent encoder that aggregates the user features, the news content, and the user–news engagements. We reimplement the CSI model with source features by concatenating the overall score for the users and the article representation with our formulated source description to obtain the result vector for CSI’s integrated module mentioned in the original paper. Lastly, we compared against the GCN graph learning framework.⁷ First, we represented each of k social interactions in a separated adjacency matrix. We then concatenated GCN’s output on k adjacency matrices as the final representation of each node, before passing the representation through a linear layer for classification.

We also studied the importance of modeling temporality by experimenting on two variants of CSI and FANG: (i) temporally insensitive CSI(-t) and FANG(-t) without time(e) in the engagement e‘s representation x_e, and (ii) time-sensitive CSI and FANG with time(e). Table 5 shows the macroscopic results. As an evaluation measure, we use the area under the Receiver Operating Characteristic curve (AUC ROC; hereafter, just AUC).

Table 5. Comparison between FANG and baseline models on fake news detection, evaluated with AUC score.

All context-aware models, that is, CSI(-t), CSI, GCN, FANG(-t), and FANG improve over the context-unaware baseline by 0.1153 absolute with CSI(-t) and by 0.1993 absolute with FANG in terms of AUC score. This shows that considering the social context is helpful for fake news detection. We further observe that both time-sensitive CSI and FANG improve over their time-insensitive variants, CSI(-t) and FANG(-t) by 0.0233 and 0.0339, respectively. These results demonstrate the importance of modeling the temporality of news spreading. Finally, the two graph-based models, FANG(-t) and GCN, perform consistently better than the Euclidean CSI(-t) by 0.0501 and 0.0386, respectively: this demonstrates the effectiveness of our social graph representation. Overall, we can conclude that our FANG model outperforms the other context-aware, temporally-aware, and graph-based models.

5. Discussion

We now answer the following research questions (RQs) to better understand FANG’s performance under different scenarios:

RQ1: Does FANG work well with limited training data?
RQ2: Does FANG differentiate between fake and real news based on their characteristic patterns in temporal engagement?
RQ3: How effective is FANG’s representation learning?

5.1. Limited training data (RQ1)

To address RQ1, we conducted the experiments described in Section 4.1 using different sizes of the training dataset. We observed consistent improvements over the baselines under both limited and sufficient data conditions. Figure 3 (left) further visualizes the experimental results. We can see that FANG consistently outperforms the two baselines for all training sizes: 10%, 30%, 50%, 70%, and 90% of the data. In terms of AUC score at decreasing training size, among the graph-based models, GCN’s performance drops by 16.22% from 0.7064 at 90% to 0.5918 at 10%, whereas FANG’s performance drops by 11.11% from 0.7518 at 90% to 0.6683 at 10%. We further observe that CSI’s performance drops the least by only 7.93% from 0.6911 at 90% of the training data, to 0.6363 at 10% of the data. Another result from an ablated baseline, FANG(-s), where we removed the stance loss, highlights the importance of this self-supervised objective. At 90% of the training data, the relative underperforming margin of FANG(-s) compared to FANG is only 1.42% in terms of AUC score. However, this relative margin increases as the availability of training data decreases, to at most 6.39% at 30% of the training data. Overall, the experimental results emphasize our model’s effectiveness even under scenarios with limited training data compared to the ablated version. This confirms a positive answer for RQ1.

Figure 3. FANG’s performance against baselines (AUC score) for varying training data sizes (left), and attention distribution across time windows for fake versus real news (right).

5.2. Engagement temporality study (RQ2)

To address RQ2 and to verify whether our model makes its decisions based on the distinctive temporal patterns between fake and real news, we examined FANG’s attention mechanism. We accumulated the attention weights produced by FANG within each time window and then compared them across time windows. Figure 3 (right) shows the attention distribution over time for fake and for real news.

We can see that, for fake news, FANG pays 68.08% of its attention to the user engagement that occurred in the first 12 h after a news article has been published. Its attention then sharply decreases to 18.83% for the next 24 h, then to 4.14% from 36 h to 2 weeks after publication, and finally to approximately 9.04% from the second week onward. However, for real news, FANG places only 48.01% of its attention on the first 12 h, which then decreases to 17.59% and to 12.85% in the time windows of 12–36 h, and 36 h to 2 weeks, respectively. We also observe that FANG maintains 21.53% attention even after 2 weeks.

Our model’s characteristics are consistent with the general observation that the appalling nature of fake news generates the most engagements within a short period of time after its publication. Therefore, it is reasonable that the model places much emphasis on these crucial engagements. On the other hand, genuine news attracts fewer engagements, but it is circulated for a longer period of time, which explains FANG’s persistent attention even after 2 weeks since the publication. Overall, the temporality study here highlights the transparency of our model’s decision, largely thanks to the incorporated attention mechanism.

5.3. Representation learning (RQ3)

In the intrinsic evaluation, we verify how generalizable the minimally supervised news representations are for the fake news detection task. We first optimize both GCN and FANG on 30% of the training data to obtain news representations. We then cluster these representations using an unsupervised clustering algorithm, OPTICS.¹ The higher the homogeneity score, the more likely the news articles of the same factuality label (i.e., fake or real) should be close to each other, which yields higher quality representation.

In the extrinsic evaluation, we verify how generalizable the supervised source representations are for a new task: source factuality prediction. We first train FANG on 90% of the training data to obtain all source s representations as z_s = GraphSage(s), and the total representation as v_s = (z_s, x_s, Σ_{a∈publish(s)} x_a, where x_s, publish(s), and x_a denote the source s content representation, the list of all articles published by s, and their content representations.

We propose two baseline representations that do not consider the content of the source s, . Finally, we train two separate SVM models for v_s and on the source factuality dataset, consisting of 129 sources of high factuality and 103 sources of low factuality, obtained from Media Bias/Fact Check^c and PolitiFact.^d

For intrinsic evaluation, the Principal Component Analysis (PCA) plot of labeled FANG representation (see Figure 4, top left) shows moderate collocation for the groups of fake and real news, whereas the PCA plot of labeled GCN representation (see Figure 4, bottom left) shows little collocation within either the fake or the real news groups. Quantitatively, FANG’s OPTICS clusters (as shown in Figure 4, top right) achieve a homogeneity score of 0.051 based on news factuality labels, compared to a homogeneity score of 0.0006 for the GCN OPTICS clusters. This intrinsic evaluation demonstrates FANG’s strong representation closeness within both the fake and the real news groups, indicating that FANG yields improved representations over another fully supervised graph neural framework.

Figure 4. 2D PCA plot of FANG’s representations with factuality labels (top left) and OPTICS clustering labels (top right), and GCN’s news representations with factuality labels (bottom left) and OPTICS clustering labels (bottom right).

For the extrinsic evaluation on downstream source factuality classification, our context-aware model achieves an AUC score of 0.8049 (versus 0.5842 for the baseline). We further examined the FANG representations for sources to explain this 0.2207 absolute improvement. Figure 5 shows the source representations obtained from the textual features, GCN, and FANG with their factuality labels, that is, high, mixed, low, and the citation relationship. In the left subfigure, we can observe that the textual features are insufficient to differentiate the factuality of media, as a fake news site such as cnsnews could mimic factual media in terms of web design and news content.

Figure 5. Plots for source representations using textual features (left), GCN (middle), and FANG (right) with factuality labels.

However, the citation between a low-factuality website and high-factuality sites would not be as high, and it is effectively used by the two graph learning frameworks: GCN and (especially) FANG. Yet, GCN fails to differentiate low-factuality sites with higher citations, such as jewsnews.co.il and cnsnews, from high-factuality sites. On the other hand, sources such as news.yahoo despite being textually different, as shown in Figure 5 (left), should still cluster with other credible media for their high intercitation frequency. FANG, with much more emphasis on contextual representation learning, makes these sources more distinguishable. Its representation space gives us a glance into the landscape of news media, where there is a large central cluster of high-factuality intercited sources such as nytimes, washingtonpost, and news.yahoo. At the periphery lie less connected media outlets, inclusive of both high- and low-factuality ones.

We also see cases where all models failed to differentiate mixed-factuality media, such as buzzfeednews and nypost, which have high citation counts with high-factuality media. Overall, the results from intrinsic and extrinsic evaluation, as well as the observations, confirm RQ3 on the improvement of FANG’s representation learning.

5.4. Scalable inductiveness

FANG overcomes the transductive limitation of previous approaches, although inferring the credibility of unseen nodes. MVDAM⁹ has to randomly initialize an embedding and to optimize it iteratively using node2vec³ for any unseen node, whereas FANG directly infers the embedding with its learned feature aggregator.

Other graphical approaches using matrix factorization²² or graph convolutional layers^{2, 14} learn parameters whose dimensionality is fixed to the network size N and can be as expensive as O(N³)² in terms of inference time. FANG infers the embeddings of unseen nodes without the adjacency matrix, and its inference time only depends on the neighborhood size of the unseen nodes.

5.5. Limitations

We note that the entity and the interaction features are constructed before passing to FANG, and thus errors from upstream tasks, such as textual encoding or stance detection, propagate to FANG. Future work can address this in an end-to-end framework, where textual encoding and stance detection can be jointly optimized.

Another limitation is that the dataset for contextual fake news detection can quickly become obsolete as hyperlinks and social media traces at the time of publication might no longer be retrievable.

6. Conclusion and Future Work

We have demonstrated the importance of modeling the social context for the task of fake news detection. We further proposed FANG, a graph learning framework that enhances representation quality by capturing the rich social interactions between users, articles, and media, thereby improving both fake news detection and source factuality prediction. We have demonstrated the efficiency of FANG with limited training data and its capability of capturing distinctive temporal patterns between fake and real news with a highly explainable attention mechanism. In future work, we plan more analysis of the representations of social users. We further plan to apply multitask learning to jointly address the tasks of fake news detection, source factuality prediction, and echo chamber discovery.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

FANG: Leveraging Social Context for Fake News Detection Using Graph Representation

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/3517214

April 2022 Issue

Published: April 1, 2022

Vol. 65 No. 4

Pages: 124-132

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Apr 26 2024

Optimizing Energy Efficiency in Datacenters with Advanced Cooling Technologies

Alex Williams

Architecture and Hardware

Credit: Getty Images Servers in snowy setting.

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More