Span identification and technique classification of propaganda in news articles

  • Original Article
  • Open access
  • Published: 08 May 2021
  • Volume 8 , pages 3603–3612, ( 2022 )

Cite this article

You have full access to this open access article

  • Wei Li   ORCID: orcid.org/0000-0003-2738-4350 1 ,
  • Shiqian Li 1 ,
  • Chenhao Liu 1 ,
  • Longfei Lu 1 ,
  • Ziyu Shi 1 &
  • Shiping Wen 2  

3178 Accesses

2 Citations

Explore all metrics

Propaganda is a rhetorical technique designed to serve a specific topic, which is often used purposefully in news article to achieve our intended purpose because of its specific psychological effect. Therefore, it is significant to be clear where and what propaganda techniques are used in the news for people to understand its theme efficiently during our daily lives. Recently, some relevant researches are proposed for propaganda detection but unsatisfactorily. As a result, detection of propaganda techniques in news articles is badly in need of research. In this paper, we are going to introduce our systems for detection of propaganda techniques in news articles, which is split into two tasks, Span Identification and Technique Classification. For these two tasks, we design a system based on the popular pretrained BERT model, respectively. Furthermore, we adopt the over-sampling and EDA strategies, propose a sentence-level feature concatenating method in our systems. Experiments on the dataset of about 550 news articles offered by SEMEVAL show that our systems perform state-of-the-art.

Similar content being viewed by others

research paper on propaganda techniques

Towards an Ontology for Propaganda Detection in News Articles

research paper on propaganda techniques

Homographic Puns Recognition Based on Latent Semantic Structures

research paper on propaganda techniques

Measuring populist discourse using semantic text analysis: a comment

Roel Popping

Avoid common mistakes on your manuscript.

Introduction

Recently, with the development of related models in the field of natural language processing, research on propaganda detection also goes ahead, which originates from document level [ 1 ], then develops to sentence level [ 6 , 21 ] and now to fragment level [ 13 , 26 ]. At present, identifying those specific fragments which contain at least one propaganda technique and identifying the applied propaganda technique in the fragment are main tasks of the fragment-level propaganda detection. As an extension of text classification task in the field of natural language processing, there are many relevant advanced algorithms [ 8 , 10 , 12 , 19 , 27 ] which can be used for reference.

figure 1

The corpus of news articles which have been retrieved with the newspaper 3k library and sentence splitting has been performed automatically with NLTK sentence splitter

SEMEVAL, the most influential and largest semantic evaluation competition all over the world, provides a news article corpus in which fragments containing one out of 14 propaganda techniques [ 14 , 18 ] have been annotated as shown in Fig. 1 . Based on this dataset, numerous researchers have sprung up putting forward a large quantity of algorithms to search the usages of propaganda techniques. Among the algorithms, the great mass of them are based on the popular language models such as ELMO [ 16 ], GPT [ 17 ] and especially BERT [ 3 ]. As shown in Fig.  2 , BERT Model raised by Google outperforms previous methods in 11 NLP tasks. Undoubtedly, it has achieved state-of-the-art performance on multiple NLP benchmarks [ 22 ]. In our systems, we choose BERT as our basic model as well.

In this work, we introduce our systems for span identification and technique classification of propaganda in news articles. As for the span identification task, we have set forth two of architectures working on it. The first is the BERT-based binary classifier, and the other one is the BERT-based three-token type classifier. The latter is our second-to-none system. Besides joining the most popular BERT model, we have also optimized the sampling [ 2 ] process, combined EDA [ 23 , 24 ] to prevent the overfitting of our system and adopted the sentence-level feature concatenating (SLFC), in which case our model can learn characteristics better. As for the technique classification of propaganda task, we have designed a BERT-based architecture with a dimensionality reduction Full Connected (FC) layer and a linear classifier. Same as SI task, we have utilized EDA strategy in the data loading process. The final result in “Experiment and results” shows that it is very meaningful of our optimizing and improving of the pre-trained BERT model. At last, both of our systems for SI and TC have exceeded most of the existed models and made a breakthrough.

The contributions of our paper are as follows:

We fine-tune the BERT with Linear layers and devise two accurate systems for the span identification and technique classification of propaganda in news articles.

We change the binary sequence tagging task SI into a three-way classification task by adding ’invalid’ token type and compare the binary tagging method with the three-token type method.

We propose SLFC approach in SI system. To our best knowledge, it is the first work to integrate sentence-level classification features into each word.

For our systems, we have obtained the optimal network parameters through experiments and comparative analysis.

Related work

The followings are the history and the correlative approaches about propaganda detection in news articles.

  • Propaganda detection

Propaganda techniques detection is born in the process of fake news detection. Some of the earlier workers judge a news article as authentic or not only according to its origin. As we can imagine that this approach is unscientific. Recently, with the rise of artificial intelligence and machine learning, propaganda detection has attracted researcher’s eyes which promotes it to become a standalone research field.

In the early days of NLP neural networks, a bidirectional long-term short-term memory (BiLSTM) [ 5 ] layer was proposed to capture the semantic features of human language. Gradually, people began to utilize it to detect the using of propaganda in news articles. Initially, a corpus has been created for news articles automatically annotated with a novel multi-granularity neural network which is superior to some powerful BERT-based baselines [ 14 ]. Simultaneously, Proppy [ 1 ], a system to unmask propaganda in online news, has appeared for document-level propaganda detection, which works by analysing various representations, from writing style and readability level to the presence of certain keywords. Later, to further improve the accuracy of detection, researchers began to pay attention to the detection of sentence level. Hou et al. proposed a model for sentence-level detecting which could understand semantic features of language better by constructing context-dependent input pairs (sentence-title pair and sentence-context pair) [ 6 ]. After the NLP4IF workshop, fragment-level classification (FLC) of propaganda occurred. For instance, different neural architectures (e.g., CNN, LSTM-CRF, and BERT) have been explored to further improve the effect of neural networks [ 15 ].

BERT-based model

In our experiment, we have been designing models specifically for SI and TC tasks based on BERT [ 3 , 4 ] model architecture, which incorporates the strength of the other language models. As shown in Fig. 2 , BERT utilizes the transformer’s attention mechanism [ 20 ] to decode the input word vector. Unlike the previous NLP models, BERT is able to run in parallel. More uniquely, the pre-training process of BERT includes two tasks, Masked Language Model (MLM) and Next Sentence Prediction (NSP), which make the BERT model more suitable for NLP tasks. After completing different pre-training and fine-tuning for different tasks, BERT has made great progress on many NLP tasks. Many researchers have discovered the huge potential of the two-stage new model (pre-training and then fine-tuning) on BERT. As a consequence, in recent years, based on BERT, many improved models occurred, such as MT-DNN [ 7 ], XLNET [ 25 ], ALBERT [ 11 ], etc.

figure 2

The architecture of the pre-trained BERT model with the Word Embedding layer for the gain of word vectors and 12-layer Encoders making up of parallel transformers for the fusion of semantic

In our system, using BERT is mainly for word feature extraction, thanks to that BERT adopts the popular feature extractor transformer, and also implements a bidirectional language model. It is the core concept of BERT to convert word into word vector input, which is added by Token Embedding, Segment Embedding, and Position Embedding to integrate the whole sentence semantics into each word in the same sentence. For our SI task, we process the obtained feature vector from BERT generator by incorporating sentence-level features into each word vector and then put it to a multi-class (prop, non-prop, invalid) classifier layer. To fit our TC task, we truncate the valid fragments and pad it for the latter FC layer and the classifier. Since there are two versions of BERT, taking our SI and TC tasks into account we use the 12-layer BERT pre-trained model as our basis.

In this section, we will introduce the details of our solutions and show the model architectures designed for the span identification (SI) and technique classification (TC) tasks.

Data process

S On account of that only a small portion of the texts use propaganda in SI and some of the techniques rarely appear in the given fragments in TC which lead to the imbalance of dataset, we have proposed two methods aimed at these two problems.

Over-sampling In the task of SI, we utilize the over-sampling (OS) [ 2 , 26 ] method to get more balanced and suitable dataset for training. Since sentences with propaganda techniques are relatively few, we sample them with a higher probability, and the number of non-propaganda ones is correspondingly reduced considering the whole training process. Nevertheless, during our experiment, we find that if over-sampling is overused, the labeled part will be too much in the sample which will cause overfitting, and the F1-score will decline to an undesirable level as a result. Therefore, when training our model, we take the strategy that the first half of the epoch uses the over-sampling and the latter part uses the sequential sampling. While TC is merely a classification task and each fragment in the given dataset corresponds to a specific propaganda technique, over-sampling is superfluous.

Data augmentation Since the pre-trained BERT model is easy to overfit, we have adopted a data augmentation scheme to improve the generalization ability and robustness [ 24 ] of the model. In the task of SI, we apply EDA Synonym Replacement (SR) [ 9 ] and Random Swap (RS) [ 23 ] to our model. Namely, each word has equal probability of being swapped or replaced by its synonyms without changing the label. Compared to short sentences, long sentences absorb more noise, which can better balance the dataset. After processing, sentences different from before are added to the training dataset. While in TC task, the data augmentation strategy is the same as that in SI which is a random process, initially. However, some of the techniques still cannot be detected such as Appeal_to_Authority, Bandwagon, Reductio_ad_hitlerum and Black-and-White_Fallacy. Aiming at this problem, on the basis of random data augmentation, we compulsively add them into the set to be data augmented. In this way, the purpose of increasing the valid noise of the training dataset is achieved. Meanwhile, the training time gets shortened as well.

Approach of span identification (SI)

figure 3

The architecture of Span Identification task adopting over-sampling, data augmentation and sentence-level feature concatenating. The Concat means adding the classification feature of the sentence to its every word vector

To deal with the SI task, we first defined it as a binary classification task [ 13 ], but after experiment we found Precision and F1-score of this solution were unexpected. After analyzing the cause and effect of this issue, we propose a three-classification model to classify each word in the news articles into three token types. The concrete architecture of our model is shown in Fig. 3 . Two of them are ‘prop’ and ‘non-prop’; the other one is ‘invalid’ which means the label of some invalid words like ‘CLS’, ‘SEP’ [ 3 ] and those used to ensure the input sentence with the same length. Classifying these invalid words into a ’invalid’ token type reduces the noise and improves Precision and F1-score. Furthermore, we have utilized sampling skill and EDA to optimize the dataset.

Due to that the labels of the plain-text document offered by SEMEVAL are at char level, converting them into word level for word embedding in pre-trained BERT model is the first step. Before inputting them into the classifier, we combine the word vectors in each sentence with the feature vector of the sentence where they are. Then the word vectors with semantic integration of the sentence are normalized for the last classifier layer. As shown in Figure 3 on the right, the concatenating process generates the new concatenated vectors by placing the sentence vector in front of the word vectors. The following formula ( 1 ) shows the concatenating process mathematically:

where \(s_1,s_2\) represent the elements in a sentence vector which contains the classifying result of sentence-level prediction. And the right matrix ( \(768\times 200\) ) contains 200 word vectors (768 dim) on behalf of one sentence. By concatenating, the input matrix ( \(770\times 200\) ) of the final classifier is made as the below one. This concatenating step is reasonable considering that sentence-level prediction is more undemanding and accurate than word-level prediction. The result also shows that concatenating step plays a key role in the promotion of word-level prediction accuracy. Finally, by merging the successive words with identical propaganda technique, those specific fragments which include at least one propaganda technique are identified.

Approach of technique classification (TC)

For the multi-class classification task TC, we have utilized a Full Connection layer and a linear classifier based on BERT model, as shown in Fig. 4 . Since the dimension of the valid fragment vector is large, we utilize the former Full Connection layer for dimensionality reduction, and the second for classifying them into 14 classes. And we handle those propaganda techniques that rarely appear in the dataset by utilizing EDA so as to solve the imbalance of dataset. Comparing our model without and with EDA, the latter gets an improvement of around 4 points in F1 score as shown in Table 3 .

figure 4

The architecture of Technique Classification task with segmentation and padding operations, an FC layer and a linear classifier layer

For details of TC task, we take the given text fragment identified as propaganda and its document context as the input of the pre-trained BERT generator. Different from SI task which is a word-level classification task, the TC task is fragment-level. Hence, incorporating sentence-level features into each word vector is ineffectual for TC task. As for the fragment which belongs to several sentences, we divide it into different sentences in the training process, while evaluating we treat it as a whole. Then for the sake of obtaining the valid fragment with propaganda techniques, we make segmentation of the output of BERT and pad it with invalid zero vector to a settled length (120). With the dimensionality reduction of our Full Connection layer, a linear classifier is used for 14 token types classification.

Experiment and results

In this section, we will show the experiment details and the achieved experiment results by comparing our surpassing systems respectively for SI and TC to several other attempts.

Experiment details

In our experiment, we have trained our models parallelly with 4 Nvidia GTX 1080Ti GPUs to reduce the time required. Based on the PyTorch Framework and CrossEntropy Optimizer [ 28 ] (after trying the focal loss), we have fine-tuned the pre-trained BERT model for our SI and TC tasks.

Dataset The datasets for both of the SI and TC tasks are news articles in plain text format, including train-articles, dev-articles and the label files. To begin with, we have split each article into individual sentences to reduce parameters of our model. And before the experiment, we divided the annotated corpus of about 550 articles into 80% train set for model training and 20% test set for model evaluation, respectively. By calculating the instances of each technique, we find that the dataset for TC is imbalanced as shown in Table 1 . Some of the techniques such as “Loaded_Language” has a high proportion of 34.64%, while some of the techniques such as “Black-and-White_Fallacy”, “Slogans” and “Whataboutism, Straw_Men, Red_Herring” show up less often. What is worse, neither “Bandwagon, Reductio_ad_hitlerum” nor “Thought-terminating_Cliches” has no more than 80 instances which may badly influence the training process. During training, in order to enhance the generalization capability of our model, we utilized EDA to make train set extended and more well-proportioned. Besides, particularly for SI task, we adopted the over-sampling strategy for tagged sentences.

figure 5

The comparison of SI training process between our systems with and without SLFC

Evaluation metric So as to make a fair comparison, we use different evaluation criteria in different comparison experiment. For both of SI and TC tasks, we adopt the F1-score (F1) as the main metric. In addition, the general Precision (P) and Recall (R) are the secondary metric for SI task. The F1-score is denoted by the following formula:

Results: span identification (SI)

For the purpose of achieving the SI task, we have presented two diverse architectures and optimized one of them with over-sampling, EDA, and sentence-level feature concatenating (SLFC). As is shown in Table 2 , our top perform system is three-token type classification system with Over-sampling, EDA and SLFC. We have contrasted our SI system with BERT-based Binary classifier model, and BERT-based Three-token type classifier model.

As we have seen, the BERT-based three-token type classifier reaching 40.8815% F1-score, 40.1099% Precision and 41.6834% Recall behaves better than baseline which is merely BERT-based with no fine tuning and the Binary classifier model. We owe this success to the ’invalid’ token type which impairs the noise of the invalid words by classifying the irrelevant words individually. Besides, after using EDA, it only took two epochs or so to reach the peak of the Recall, without which it took about six epochs to reach the peak and the results were not as expected. Ultimately, our SI system, based on our three-token type classifier and utilizing our strategies of over-sampling, EDA and SLFC, prevails over others, which scores 44.1732% of F1-score on the test set.

Next, we will give a deep analysis of the usage of SLFC and how it benefits our system on Recall and F1-score shown in Fig. 5 . Generally speaking, the word-level prediction requires more accurate detection and there is a bigger margin of error than sentence-level prediction, which is why we give each word more information about whether the sentence it is in has propaganda with the aid of SLFC. Namely, the sentence classification prediction provides reference for the word prediction. If a sentence is propagandistic, it is of high probability containing propaganda fragments. On the contrary, if a sentence is non-propagandistic, the words in it are not of propagandistic as well. Based on this knowledge, we successively apply SLFC to our model, which does increase the F1-score by around three points and the Recall by around four points, respectively. Meanwhile, the precision does not decrease significantly. All in all, compared with no SLFC, our system identifies the propaganda spans more accurate which consequently promotes the F1-score and Recall.

Results: technique classification (TC)

To better complete the TC task, we have presented two architectures, one without EDA and another with EDA. Comparing them with the baseline which is merely BERT-based with no fine tuning, both of our systems with EDA or not have reached a new high state, improving the F1-score by two times approximately as shown in Table 3 . During our experiment, we have made an experimental comparison and analysis for our strategy of utilizing EDA in the data loading process of our TC system. The final result has indicated that our TC system with EDA improved F1-score by around 3% compared to the absent EDA system. It stands to reason that our TC system reaches the state-of-the-art in the end, which scores 57.5729% of F1-score on the test set.

The respective promotions with EDA strategy in F1-score for each of propaganda technique are shown in Table 4 . Compared with our no EDA model, in spite of the fact that three of techniques (‘Doubt’, ‘Flag-Waving’, and ‘Whataboutism, Straw_Men, Red_Herring’) have slightly decreased to some extent, more than half of the techniques have made progress in F1-score. For details, most of them have gotten about eight-point improvement on average, such as ‘Appeal_to_fear-prejudice’, ‘Exaggeration, Minimisation’, ‘Repetition’ and so on. What is worth mentioning is that the techniques named ‘Causal_Oversimplification’ and ‘Thought-terminating_Cliches’ have gotten about 14-point improvement. Thus, our TC system makes many breakthroughs on the whole, giving the credit to EDA which can enhance the data set, prompt the model to converge faster and improve the generalization ability and robustness of the model.

Parameter analysis

After a series of experiments, we have given a set of optimal parameters [epoch, learning rate (lr), sentence length (sent-len)] for the models of the two tasks. The optimal parameter combinations are shown in bold in Table 5 .

For the sentence length, which is the length of the single input into the BERT and is usually set to 256, we have set it to 200 and 210 for our SI and TC tasks, respectively. In SI task, it is attributed to that the whole sentences in dataset do not exceed 200 in length, and too much padding will lead to greater classification error. As for TC task, the maximum length of valid fragments in the dataset is 210, so we choose it as the limit for padding. In terms of learning rate, both of our choices are 3 \(\times 10^{-5}\) because our valid dataset is small. Through the analysis of the SLFC method for SI task in Sect. 4.2 , we have found that the model began to converge around the epoch 7, so we set the training epoch to 8 to prevent overfitting in our SI system. Besides, in the experiment process of TC task, we have found the epoch parameter greater than 15 caused F1-score decreased, so we set it as the best choice for our TC system. Based on the above optimal parameters, our SI and TC systems finally obtained the F1-score of 0.441732 and 0.575729, and both of the training processes have taken around 2.5 h using 4 Nvidia GTX 1080Ti graphics cards (i.e. around 10 GPU hours).

Conclusion and future work

Based on the BERT model, we have set forth two specific systems for Span Identification and Technique Classification of Propaganda in news articles. In the data loading process, we have tried two strategies, over-sampling in SI task and EDA in both of SI and TC tasks, in order to deal with the imbalance between data with and without tags and enlarge our training dataset. For SI task, we have afresh defined it as a three-token type sequence tagging task with our SI system, and adopted sentence-level feature concatenating method. For TC task, we have devised a system based on BERT with a dimensionality reduction FC layer and a linear classifier. Ultimately, we have achieved two efficient and accurate systems for propaganda detection in news articles. And the final result also confirmed that our research further perfects the BERT model.

In the future, we are going to improve the Precision, Recall and F1-score further by drawing lessons from the SpanBERT model, which may perform better. Namely, in the process of masking, we would like to cover consecutive words randomly instead of scattered words. And we are thinking about searching for a more suitable architecture of BERT adopting the popular Neural Architecture Search (NAS). Besides, we hope our model can be compressed to some extent. For instance, we can prune the classifier layer, quantify or share the parameters of our model. In these cases our model can be applied widely and conveniently in our daily lives.

Barron-Cedeno A, Da San Martino G, Jaradat I, Nakov P (2019) Proppy: a system to unmask propaganda in online news. In: Proceedings of the AAAI conference on artificial intelligence, pp 9847–9848

Corney D, Albakour D, Martinez-Alvarez M, Moussa S (2016) What do a million news articles look like? In: NewsIR@ ECIR, pp 42–47

Devlin J, Chang M.W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. North American chapter of the association for computational linguistics

Fadel A, Tuffaha I, Al-Ayyoub M (2019) Pretrained ensemble learning for fine-grained propaganda detection. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda, pp 139–142

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput:1735–1780

Hou W, Chen Y (2019) Caunlp at nlp4if 2019 shared task: context-dependent bert for sentence-level propaganda detection. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda, pp 83–86

Huang Y, Wang W, Wang L, Tan T (2013) Multi-task deep neural network for multi-label learning. In: 2013 IEEE International conference on image processing, pp 2897–2900

Khalid A, Khan FA, Imran M, Alharbi M, Khan M, Ahmad A, Jeon G (2019) Reference terms identification of cited articles as topics from citation contexts. Comput Electr Eng 74:569–580

Article   Google Scholar  

Kobayashi, S (2018) Contextual augmentation: data augmentation by words with paradigmatic relations. North American chapter of the association for computational linguistics, pp 452–457

Kurian D, Sattari F, Lefsrud L, Ma Y (2020) Using machine learning and keyword analysis to analyze incidents and reduce risk in oil sands operations. Saf Sci

Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) Albert: a lite bert for self-supervised learning of language representations. ICLR

Liu S, Lee K, Lee I (2020) Document-level multi-topic sentiment classification of email data with bilstm and data augmentation. Knowl Based Syst:105918

Mapes N, White A, Medury R, Dua S (2019) Divisive language and propaganda detection using multi-head attention transformers with deep learning bert-based language models for binary classification. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda, pp 103–106

Martino DSG, Yu S, Barron-Cedeno A, Petrov R, Nakov P (2019) Fine-grained analysis of propaganda in news articles. EMNLP/IJCNLP 1:5635–5645

Google Scholar  

Pankaj G, Khushbu S, Usama Y, Thomas R, Hinrich S (2019) Neural architectures for fine-grained propaganda detection in news. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda

Peters E.M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer S.L (2018) Deep contextualized word representations. North American chapter of the association for computational linguistics

Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pretraining

San G.D.M, Alberto B.C, Preslav N (2019) Findings of the nlp4if-2019 shared task on fine-grained propaganda detection. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda

Tchiehe N.D, Gauthier F (2017) Classification of risk acceptability and risk tolerability factors in occupational health and safety. Saf Sci:138–147

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez N.A, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems 30 (NIPS 2017), pp 5998–6008

Vlad G.A, Tanase M.A, Onose C, Cercel D.C (2019) Sentence-level propaganda detection in news articles with transfer learning and bert-bilstm-capsule model. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda, pp 148–154

Wang A, Singh A, Michael J, Hill F, Levy O, Bowman R.S (2018) Glue: A multi-task benchmark and analysis platform for natural language understanding. In: International conference on learning representations

Wei WJ, Zou K (2019) Eda: easy data augmentation techniques for boosting performance on text classification tasks. EMNLP/IJCNLP 1:6381–6387

Xie Z, Wang I.S, Li J, Lévy D, Nie A, Jurafsky D, Ng Y.A (2017) Data noising as smoothing in neural network language models. ICLR

Yang Z, Dai Z, Yang Y, Carbonell GJ, Salakhutdinov R, Le VQ (2019) Xlnet: generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems 32 (NIPS 2019), pp 5754–5764

Yoosuf S, Yang Y (2019) Fine-grained propaganda detection with fine-tuned bert. In: Proceedings of the second workshop on natural language processing for internet freedom: censorship, disinformation, and propaganda, pp 87–91

Zhan Z, Hou Z, Yang Q, Zhao J, Zhang Y, Hu C (2020) Knowledge attention sandwich neural network for text classification. Neurocomputing:1–11

Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in neural information processing systems, pp 8778–8788

Download references

Author information

Authors and affiliations.

University of Electronic Science and Technology of China, Chengdu, Sichuan, China

Wei Li, Shiqian Li, Chenhao Liu, Longfei Lu & Ziyu Shi

University of Technology Sydney, Ultimo, Australia

Shiping Wen

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Wei Li .

Ethics declarations

Conflict of interest.

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Li, W., Li, S., Liu, C. et al. Span identification and technique classification of propaganda in news articles. Complex Intell. Syst. 8 , 3603–3612 (2022). https://doi.org/10.1007/s40747-021-00393-y

Download citation

Received : 18 March 2021

Accepted : 30 April 2021

Published : 08 May 2021

Issue Date : October 2022

DOI : https://doi.org/10.1007/s40747-021-00393-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Neural network
  • Span identification
  • Technique classification

Advertisement

  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 04 September 2020

Characterizing networks of propaganda on twitter: a case study

  • Stefano Guarino 1 ,
  • Noemi Trino 2 ,
  • Alessandro Celestini 1 ,
  • Alessandro Chessa 2 , 3 &
  • Gianni Riotta 2  

Applied Network Science volume  5 , Article number:  59 ( 2020 ) Cite this article

12k Accesses

18 Citations

35 Altmetric

Metrics details

The daily exposure of social media users to propaganda and disinformation campaigns has reinvigorated the need to investigate the local and global patterns of diffusion of different (mis)information content on social media. Echo chambers and influencers are often deemed responsible of both the polarization of users in online social networks and the success of propaganda and disinformation campaigns. This article adopts a data-driven approach to investigate the structuration of communities and propaganda networks on Twitter in order to assess the correctness of these imputations. In particular, the work aims at characterizing networks of propaganda extracted from a Twitter dataset by combining the information gained by three different classification approaches, focused respectively on (i) using Tweets content to infer the “polarization” of users around a specific topic, (ii) identifying users having an active role in the diffusion of different propaganda and disinformation items, and (iii) analyzing social ties to identify topological clusters and users playing a “central” role in the network. The work identifies highly partisan community structures along political alignments; furthermore, centrality metrics proved to be very informative to detect the most active users in the network and to distinguish users playing different roles; finally, polarization and clustering structure of the retweet graphs provided useful insights about relevant properties of users exposure, interactions, and participation to different propaganda items.

Introduction

The 2016 US presidential election veritably marked the transition from an age of ‘post-trust’ ( Löfstedt 2005 ), to an era of ‘post-truth’ ( Higgins 2016 ), with contemporary advanced democracies experiencing a rise of anti-scientific thinking and reactionary obscurantism, ranging from online conspiracy theories to the much-discussed “death of expertise” ( Nichols 2017 ). The long-standing debate about the relationship between media and public good has been reinvigorated: the initial euphoria about the “openness” of the Internet ( Lévy 2002 ) has been taken over by a widespread concern that social media may instead be undermining the quality of democracy ( Tucker et al. 2018 ). Media outlets, public officials and activists are supplying citizens with different, often contradictory “alternative facts” ( Allcott and Gentzkow 2017 ). In this context, social media platforms would be fostering “selective exposure to information”, with widespread diffusion of “echo chambers” and “filter bubbles” ( Sunstein 2001 ; Pariser 2011 ). Propaganda actions may be now more effective than ever, representing a major global risk, possibly able to influence public opinion enough to alter election outcomes ( Van der Linden et al. 2017 ; Shao et al. 2018 ; Guess et al. 2019 ).

As a first step towards the disruption of these networks of propaganda, researchers have been trying to model the social mechanisms that make users fall prey of partisan and low-quality information. From a psychological point of view, news consumption is mainly governed by so-called “informational influence”, “social credibility”, “confirmation bias” and “heuristic frequency” ( Shu et al. 2017 ; Del Vicario et al. 2017 ). This means that social media users tend to shape their attitude, belief or behavior based on arguments provided in online group discussions, using popularity as a measure of credibility, privileging information that confirms their own prior beliefs and/or that they hear regularly. These phenomena are exacerbated by the general incapability of making good use of the great amount of available information, a problem which can be modeled relying on the dualism of information overload vs. limited attention ( Qiu et al. 2017 ), or on the principles of information theory and (adversarial) noise decoding ( Brody and Meier 2018 ). However, there is still a lack of evidence in the literature regarding the processes that lead to the structuration of digital ecosystems where polarized and unverified claims are especially likely to propagate virally. Are these a natural consequence of the existence of communities with homogeneous beliefs – i.e., echo chambers – and of the organized actions of “propaganda agents”, or are we missing a piece?

To provide a first answer to this and other related questions, the present paper takes a data-driven approach. Specifically, we aim at demonstrating the importance of characterizing networks of propaganda on Twitter by combining the information gained by three different classification approaches: (i) using the content of tweets to determine users’ “polarization” with respect to a main theme of interest; (ii) telling apart users having an active role in the diffusion of different propaganda and disinformation items related to that theme; (iii) analyzing social ties to identify topological clusters and users playing a “central” role in the network. Our main goal is addressing the following research questions:

Is modularity-based network clustering “stable” or are the patterns of cohesion among users dependent of the topics of discussion? In other terms, is the exposure/participation to propaganda of a given user a direct consequence of his/her own global interactions with other users?

Can we use centrality metrics for detecting users playing specific roles in the production-diffusion chain of propaganda? If yes, what metrics should we mostly rely on? And are these users “consistently” involved in the diffusion of related yet different propaganda items?

What is the role of polarization in the analysis? How shall we use the available information about the political/social “goal” of a propaganda item to enrich the graph-based analysis of the corresponding network of propaganda?

Our methodology will be applied to a case study concerning the constitutional referendum held on December 4, 2016 in Italy, by means of a dataset composed of over 1.3 millions tweets. As a side result, we will provide insights regarding the reasons of the success of specific propaganda items and the existence of “propaganda hubs” and “authorities”, i.e., accounts that are critical in fostering propaganda and spreading disinformation campaigns.

Related work

As reported by a recent Science Policy Forum article ( Lazer et al. 2018 ), stemming the viral diffusion of fake news largely remains an open problem. The body of research work on fake news detection is vast and heterogeneous: linguistics-based techniques ( Markowitz and Hancock 2014 ; Feng et al. 2012 ; Feng and Hirst 2013 ) coexist with network-based techniques ( Ciampaglia et al. 2015 ; Papacharissi and de Fatima Oliveira 2012 ; Karadzhov et al. 2017 ) as well as machine-learning-based approaches ( Castillo et al. 2011 ; Zubiaga et al. 2018 ). Yet, (semi-)automatic debunking seems not an adequate response if considered alone ( Margolin et al. 2018 ; Shin and Thorson 2017 ). Experimental evidence confirms the general perception that, on average, fake news get diffused farther, faster, deeper and more broadly than true news ( Silverman and Singer-Vine 2016 ). Users are more likely to share false and polarized information and to share it rapidly, especially when related to politics ( Vosoughi et al. 2018 ), while the sharing of fact-checking content typically lags that of fake news by at least 10 h ( Shao et al. 2016 ). Furthermore, debunking is often associated to counter-propaganda and disseminated online through politically-oriented outlets, thus reinforcing selective exposure and reducing consumption of counter-attitudinal fact-checks ( Shin and Thorson 2017 ). Besides the technical setbacks, the existence of the so-called “continued influence effect of misinformation” is widely acknowledged among socio-political scholars ( Skurnik et al. 2005 ), thus questioning the intrinsic potential of debunking in contrasting the proliferation of fake news.

In this regard, the efforts deployed by major social media platforms seem insufficient. As of 2017, Twitter – the most widely studied of such platforms – expressed an alarmingly shallow stance towards disinformation, stating that bots are a “positive and vital tool” and that Twitter is by nature “a powerful antidote to the spreading of false information” where “journalists, experts and engaged citizens can correct and challenge public discourse in seconds” ( Crowell 2017 ). In the meanwhile, based on two millions retweets produced by hundreds thousands accounts in the six months preceding the 2016 US presidential election, researchers were coming to the conclusion that the core of Twitter’s interaction network was nearly fact-checking-free while densely populated of social bots and fake news ( Shao et al. 2018 ).

Characterizing misinformation and propaganda networks on social media thus recently emerged as a primary research trend ( Subrahmanian et al. 2016 ; Shao et al. 2018 ; Bovet and Makse 2019 ). Data collected on social media are paramount for understanding disinformation disorders ( Bovet and Makse 2019 ): they are instrumental to analyze the global and local patterns of diffusion of unreliable news stories ( Allcott and Gentzkow 2017 ) and, to a broader level, to understand the relevance of propaganda on public opinion, possibly incorporating thematic, polarity or sentiment classification ( Vosoughi et al. 2018 ), thus unveiling the structure of social ties and their impact on (dis)information flows ( Bessi and Ferrara 2016 ). Investigating the relation between polarization and information spreading has also been shown to be instrumental for both uncovering the role of disinformation in a country’s political life ( Bovet and Makse 2019 ) and predicting potential targets for hoaxes and fake news ( Vicario et al. 2019 ). Finally, recent work used network-based features as instruments to describe, classify and compare the diffusion networks of different disinformation stories as opposed to “main-stream” news, making a promising step towards text-independent fake news detection ( Pierri et al. 2020 ).

A relevant issue emerging from the literature is quantifying the representativeness of data extracted from real-time social media in general, and more specifically from Twitter, when these data are used to forecast opinion trends and vote shares in elections. In particular, the socio-demographic composition of Twitter users may be not representative of the overall population and may thus manifest different political-preferences from non-Twitter users ( Bakker and De Vreese 2011 ; Burckhardt et al. 2016 ). This potential mismatch could be accompanied by a self-selection bias: as some scholars showed ( Ceron et al. 2016 ), the largest number of comments is often produced by the more active and politically mobilitated users, while the vast majority of accounts has a limited activity ( Gayo-Avello et al. 2011 ). Nonetheless, the main goal of this paper is making one step forward in the understanding of the role of propaganda in shaping the political debate in Italy. To this end, Twitter is extremely representative: it is in fact the reference social media in Italy to discuss political issues. Investigating to which extent our findings may be extended to the Italian population at large is left to future work.

After the crucial 2013 election, that had imposed an unprecedented tri-polar equilibrium in the Italian political scenario, the 2016 referendum determined the collapse of the entire political scene, with the defeat of the center-left “Democratic Party” and the successive resignation of its leader and head of government, Matteo Renzi, architect of the consultation. The government reform was in fact strongly defeated, with “NO” percentages at 59.12% and “YES” at 30.88%. Offline trends showed how political polarisation and divisions among party leaders fostered the grassroots activism of the YES and NO front committees, reinforcing opposite views regarding the reform. The NO faction was a composite formation supported by both left-wing and right-wing parties, with alternative yet sometimes overlapping political justifications. Subsequently, the 2018 elections sanctioned the major rise of two euro-skeptic and populist formations, “5 Stars Movement” and “The Northern League”, who were the main actors of opposition to the 2016 referendum.

The constitutional referendum offered to these rising parties an extraordinary window of opportunity in propaganda building, by imposing carefully selected instrumental news-frames and narratives and using social media as strategic resources for community-building and alternative agenda setting. Social media – and Twitter in particular – have in fact constituted a strategic tool for newly born political parties, that through the activation of the two-way street mediatization could incorporate their proposals into conventional media, still maintaining a critical, even conspiratorial attitude towards traditional media ( Alonso-Muñoz and Casero-Ripollés 2018 ; Schroeder 2018 ). More generally, the dichotomous structuration of referendum offered to both political alignments the chance to align the various issues along a pro/anti-status-quo spectrum. The cleavage was strategically used by both coalitions, which adopted opposite frames to stress their position:

on the one hand, the referendum was framed as a tool of “rottamazione”, the process of political renovation at the center of Renzi’s political agenda;

on the other one, on the NO front, it was inserted in the broader cleavage between anti-parties and traditional parties, pointed as an expression of old interests and privileges.

Data collection

For data collection we relied on Twitter’s Streaming API, scraping tweets containing any combination of the following hashtags: “#ReferendumCostituzionale”, “#IoVotoNO”, “#SIcambia”, “#SIRiforma”, “#Italiachedicesì”, “#Italiachedicesi”, “#bastaunsi”, “#referendum”, “#costituzione”, “#riformacostituzionale”, “#famiglieperilno”, “#bastaunsì”, “#bastaunsi”, “#referendumsociali”. These are a mix of “trending” hashtags, official hashtags of the referendum campaign, and popular hashtags used by the supporters of the two fronts. Data was collected for the six months preceding the referendum, that is, from July 05, 2016, to December 04, 2016, but we only consider the tweets dated from November 01 in this paper in order to focus on the most relevant part of the campaign.

Propaganda items

Following the literature, in order to identify the main topics and themes of disinformation of the political campaigning we relied on the activity of fact-checking and news agencies who reported lists of (dis)information news stories that went viral during the referendum campaign. Mostly based on the work by fact-checking web portal Bufale.net ( Mastinu 2016 ), online newspaper Il Post ( Post 2016 ), and political fact-checking agency Pagella Politica Footnote 1 ( Politica 2016 ), we were able to identify twelve main stories, including both general theories and very specific news pieces. To widen the scope of the analysis, we considered news, theories and topics of discussion that could be associated to information disorders in its broader sense. This includes factual (i.e., verifiably true/false) claims as well as stories (e.g., hearsays, rumors and conspiracy theories) that cannot be deemed true/false with certainty, with no distinction between deliberate and organized disinformation/propaganda and unintentionally propagated misinformation.

Differently from related work ( Pierri et al. 2020 ) that used the presence of a specific url for collecting tweets associated to a news story of interest, we set up a custom query in order to search our dataset for tweets that discuss a given topic in a broader sense. For each of the twelve propaganda items considered, we manually selected relevant textual content related to that story – news pieces, tweets, work of debunking agencies – from which we extracted a suitable keyword-based query. An example of such queries is the following (corresponding to what will be later denoted PI2):

(’illegittimo’ OR ’illeggittimo’ OR ’illegal’ OR ’non eletto’) AND (’parlamento’ OR ’governo’ OR ’renzi’ OR ’presidente’)

The query is enriched with synonyms – as in (’illegittimo’ OR ’illeggittimo’ OR ’illegal’ OR ’non eletto’) – that take into account singular/plural forms, different jargon, and, possibly, frequent spelling errors. With the terminology of information retrieval, these synonyms are expected to increase the recall of our filters. On the other hand, to assess the precision of the filters we manually verified a sample of 200 tweets per filter, finding that all of them where somewhat related to the corresponding propaganda item. The size of this sample, albeit limited, must be commensurate with the total number of tweets matching each filter, which is in the order of a few thousands. It is worth noticing that we do not aim at perfect accuracy; rather, as any query-based filter, the goal was collecting a sufficiently large and significant sample of tweets for each propaganda item.

In a previous work we classified these stories into four categories ( Guarino et al. 2019 ), by distinguishing entirely fabricated content from manipulated items and broader propaganda pieces. Here, we decided to focus upon the four most shared Propaganda Items (PI), and namely: PI1 A newspiece about alleged vote rigging organized by government forces; PI2 A second item framing the referendum as the political product of an illegitimately elected parliament and/or government; PI3 A third news, claiming that victory of the YES would make Italy yield national sovereignty to EU institutions (especially referring to an hidden clause in art.117); PI4 A fourth - more general - piece supporting the claim that a victory of the YES would have caused a shift towards authoritarianism.

All the most diffused news items can be broadly located along the spectrum of different arguments of conspiracy theories, traditionally driven by a belief that a powerful group of people is manipulating the public, while concealing their activities. As some scholars have demonstrated ( Castanho Silva et al. 2017 ), conspirationism is associated with different sub-dimensions of populist attitudes-people-centrism, anti-elitism, and a good-versus-evil view of politics-, with coup d’état attempts and secret plots organized by political élites to gain further power or consolidate their privilege or the explicitly plot to notch the integrity of the electoral process by gaining unauthorized access to voting machines and altering voting results.

Classification of tweets and users

After having identified the most relevant news-pieces in our dataset, we aimed at gaining a better understanding of users in our dataset and the relation between polarization and disinformation. To classify the stance of each tweet with respect to the referendum question, we adopted a semi-automatic self-training process, described more in detail in ( Guarino et al. 2019 ). The underlying idea is that political exchanges in social-media platforms exhibiting “a highly partisan community structure” with “homogeneous clusters of users who tend to share the same political identity” ( Conover et al. 2011 ). This is reflected on Twitter by the usage of different patterns of hashtags by supporters of opposite factions ( Becatti et al. 2019 ). We therefore built a hashtag graph, selecting the top 30 hashtags by weighted degree (i.e., with the greatest number of co-occurrences with other hashtags). Among them, we identified a set of generic and/or out-of-context hashtags that could have been detrimental to identifying clear and meaningful clusters, namely: “#referendum”, “#referendumcostituzionale”, “#photo”, “#riformacostituzionale”, “#costituzione”, “#4dicembre”, “#trendingtopic” and “#1w1l”. Pruning these hashtags indeed increased the modularity of the clustering. The rationale was to mimic the removal of stopwords or very frequent words in order to improve the quality of topic modeling. Louvain’s algorithm was then applied to cluster such hashtags based on their mutual co-occurrence patterns. We found the two greatest clusters to clearly identify the YES and NO fronts, thus we used hashtags in these clusters to extract a training set composed of tweets labelled as follows: −1 (NO) if the tweet only contains hashtags from the NO cluster; +1 (YES) if the tweet only contains hashtags from the YES cluster; 0 (UNK) if the tweet contains a mix of hashtags from the two clusters.

To extend the labeling to all tweets in the dataset, we defined a text-based classifier. The classifier may be tuned to represent tweets using tf-idf vectors, doc2vec ( Le and Mikolov 2014 ), or a combination of both, and to use either Logistic Regression or a Gradient Boosting Classifier. We tested any possible combination and selected the overall best performing one, namely, a Gradient Boosting Classifier using doc2vec feature vectors. As classification score we used the mean accuracy on 10K tweets of test data and corresponding labels, with 10-fold cross-validation. Significantly, the obtained accuracy was very high (above 90%) and this is the reason why we did not investigate more advanced and recent classification methods. Our explanation for this excellent accuracy is that the dichotomic nature of the referendum fostered the emergence of sets of highly partisan hashtags, rarely used in a mix. Albeit the classifier uses the whole text of the tweets, it takes advantage of the presence of such hashtags to obtain remarkable performances. Unfortunately, we cannot guarantee equal accuracy of our classifier on other datasets – defining a high-quality and general purpose classifier being well beyond the scope of this paper.

On the whole, UNK tweets were substantially negligible – although this may be due to limitations of the classifier ( Guarino et al. 2019 ) – while NO tweets were almost 1.5x more frequent than YES tweets, supporting the diffused belief that the NO front was significantly more active than its counterpart in the social debate. Significantly, we also obtained a continuous score in [-1,1] for users, since a user can be classified with the average score of his/her tweets. These user-level scores are used in the following sections for correlating polarization with other network properties of our corpus of users.

For the sake of clarity and completeness, the hashtag graph and its cluster-graph – wherein each cluster is contracted into a single node – are shown in Fig.  1 . We see that: (i) hashtags used by the NO and YES supporters are strongly clustered; (ii) “neutral” hashtags (such as those used by international reporters) also cluster together; (iii) a few hashtags are surprisingly high-ranked, such as “#ottoemezzo”, a popular political talk-show being central in the NO cluster – thus confirming regular patterns of behavior in the “second-screen” use of social network sites to comment television programs ( Trilling 2015 ). In particular, the two largest clusters of hashtags clearly characterize the two sides: the YES cluster is dominated by the hashtags “#bastaunsì” (“a yes is enough”) and “#iovotosi” (“I vote yes”), whereas the NO cluster by “#iovotono” (“I vote no”), “#iodicono” (“I say no”) and “#renziacasa” (“Renzi go home”). In this perspective, the jargon of both communities show clear segregation and high levels of clustering by political alignments, as expected.

figure 1

The hashtag graph and the associated cluster graph

Polarized retweet graphs

The main objects of analysis of this paper are a set of interaction networks extracted from a Twitter dataset of more than 1.3 million tweets. Each of these networks is formally represented as a graph G =( V , E ), whose vertex set V models a corpus of social media users. Specifically, as often done in the literature [49, 34], we consider directed and weighted retweet graphs, wherein nodes are Twitter users and an edge e = (u, v) means that user u retweeted user v at least once in the considered corpus of tweets. In our graphs, edges are weighted by a parameter w e equal to the number of retweets between a given pair of users. Nodes are instead endowed with a “polarization” attribute p u ∈ [−1,1] – defined in the previous section – equal to the average polarization of the tweets and retweets of that user. Specifically, in this paper we consider the following six graphs:

The whole retweet graph, obtained from the entire dataset. Footnote 2

The P/D (Propaganda/Disinformation) retweet graph, obtained from the set of all tweets that matched any of the queries defined in the “ Data collection ” section, i.e., tweets related to any of the 12 news stories.

The PI1 , PI2 , PI3 and PI4 retweet graphs, induced by the set of tweets that satisfied each of the four selected propaganda items, taken individually.

The subgraph of the whole graph composed of the 1000 vertices having greatest pagerank is shown in Fig.  2 . We can clearly see a few features of the graph that will be better discussed in the following: a general prevalence of NO edges (i.e., tweets), multiple NO-leaning clusters and a single main YES-leaning cluster.

figure 2

The top 1000 vertices of the whole graph by pagerank. Vertex size is by pagerank, vertex and edge color is by polarization (YES=red, NO=blue), the 20 top users are annotated

A first relevant perspective on our dataset is obtained by considering how the vertex set of the P/D graph may be decomposed based on the belonging of its users to the individual PI graphs:

67.61% of all users in the P/D graph (i.e., 3666 users) are only involved in one of PI1, PI2, PI3, PI4;

16.17% of all users in the P/D graph (i.e., 877 users) are involved in two of PI1, PI2, PI3, PI4;

6.79% of all users in the P/D graph (i.e., 368 users) are involved in three of PI1, PI2, PI3, PI4;

only 1.44% of all users in the P/D graph (i.e., 78 users) are involved in all four of PI1, PI2, PI3, PI4;

7.99% of all users in the P/D graph (i.e., 433 users) are involved in other PI besides PI1, PI2, PI3, PI4;

Summing up, the four items of propaganda that we selected involve approximately 92% of all users of the P/D graph, with users only involved in other propaganda/fake news stories adding up to just 7.99%. We can thus safely focus on these four items without a significant loss in the generality of our results. At the same time, the fact that most users of the P/D graph were only involved in a single PI and that only a negligible fraction was involved in all four PIs warns us of the pitfalls of considering disinformation as a whole.

A second aspect to consider is the distribution of the polarization attribute p u across the six graphs. For each of the six considered graphs, Fig.  3 shows the histogram and a kernel density estimate obtained considering the value of p u for all users of the graph. Let us remind that p u ∈ [−1,+1] expresses the stance of user u with respect to the referendum in the range [NO,YES]. Overall, users appear to be strongly polarized, with two huge spikes at -1 and +1 for the whole graph. When we switch to networks of propaganda, however, users seem to be generally less polarized. This apparently counter-intuitive phenomenon is a consequence of our scoring method and of the much higher average activity of users involved in these networks. Indeed, a user’s polarization is well-definite when that user has a single tweet and gets blurrier as the number of tweets increases, because of the contribution of many tweets not all of which are necessarily equally polarized. The average number of tweets per user in our propaganda networks is 8 to 14 times greater than the average computed over the whole graph. At the same time, users with a single tweet are 37% of the whole graph, but just 1% to 5% of the P/D and PIs graphs.

figure 3

Polarization of users involved in propaganda

The distribution of PI1, PI2 and PI3 follows the overall trend of the P/D graph, that is, a general prevalence of NO users over YES users. Since all 4 selected items, as well as most of the 12 items, are pro-NO, this may be interpreted as a prevalence of propaganda over counter-propaganda. In that sense, PI4 is the exception: a clear example of a topic mostly used by one side (the YES coalition) to accuse the other of using deceptive propaganda. This is a first element in favour of the importance of accounting for polarization when characterizing these propaganda networks and their users.

Clustering structure

The clustering structure of a retweet graph highlights relevant properties of how users and groups of users interact with each other, and of how easily information flows through the graph. Along this line, recent work provided clear evidence that modularity-based clustering applied to retweet graphs brings to light communities of users with strong homophily/affiliation within which propaganda and polarized information spread especially well ( Aragón et al. 2013 ; Becatti et al. 2019 ). By characterizing and comparing the clustering structure obtained for our six graphs through the well-known Louvain algorithm we expect to better understand the emergence of networks and sub-networks of propaganda and measure their persistence. To start, in Fig.  4 we show the size distribution of communities for the P/D and PIs retweet graphs: we rank the communities of each graph based on their size and we plot the size of each community on a log scale. At a high level, we see that the distributions of all PIs graphs are somewhat similar – especially for PI1 and PI3 – and that in all cases only a few clusters have a relevant size.

figure 4

Comparison of the clustering structure (cluster size distribution) for different propaganda graphs

We now assess whether modularity based clustering detects communities of users with a clear attitude towards the referendum. To obtain a single polarization score for a given cluster c , we computed the number of YES users in c , denoted Y c , the number of NO users in c , N c , and defined \(p_{c}=\frac {Y_{c}-N_{c}}{Y_{c}+N_{c}}\) . This definition guarantees that p c =+1 if N c =0, p c =−1 if Y c =0 and p c =0 if Y c = N c . Yet, if compared with just taking the average polarization of the users in c , this measure is more robust with respect to classification accuracy – under the assumption that telling apart YES and NO users is easier than measuring the exact polarization of each user. In Fig.  5 we consider the 10 largest clusters of each graph ranked by size and, for each of such clusters, we plot the polarization score p c . The marker size is set proportional to the cluster size, whereas the marker color is also descriptive of the polarization in a range from blue (NO) to red (YES). We can clearly see that the clusters of the networks of propaganda are generally and significantly more polarized than the clusters of the whole graph. We also see that the overall prevalence of NO users in the P/D, PI1, PI2 and PI3 graphs already emerged in Fig.  3 is reflected in a greater number of NO clusters – the same happening in PI4 for the YES front.

figure 5

The polarization of the top 10 clusters for each of the six graphs. The marker size is proportional to the cluster size. The polarization is also visible from the marker color

The main clusters of the whole graph deserve special attention. As already observed in the literature ( Becatti et al. 2019 ), in fact, they quite clearly reflect political affiliation:

Cluster 5 (≈16K members) appears to group together members and supporters of the “Democratic Party”, including Government members (such as PM Matteo Renzi and the Minister of Reforms Maria Elena Boschi), the official YES Committee and Renzi’s foundation ‘Leopolda’, among the others.

Custer 1 (≈11K members) is expressive of the “5 Star Movement” community. Only two of the most active users (Minister Danilo Toninelli and Senator Elio Lannutti) are official party members, however, whereas the most influential actors belong to the militant base.

Cluster 0 (≈7.5K members) groups the members of the souverainist right, including the two politicians Matteo Salvini and Giorgia Meloni, their political parties, and a number of supporters.

Cluster 2 (≈3.5K members) clearly involves the “Forza Italia” members and advocates.

In this context, three large and barely-polarized clusters come to light. On the one hand, cluster 3 (≈10K members) seems to validate the claim that “structure segregation and opinion polarization share no apparent causal relationship” ( Prasetya and Murata 2020 ). It includes left-wing opponents to the referendum as well as several media accounts and has very low polarization (-0.04), a probable evidence of the willingness of the left-wing members of the NO alignment to maintain a cross-partisan interaction with the democrats. On the other hand, clusters 11 and 6 (≈6K and ≈4K members, respectively) completely escape the party affiliation logic. Apart from @europeelects, which produces poll aggregation and election analysis in the European Union, we only found evidence of accounts belonging to international militants of the souverainist and anti-globalization movement: they are Brexit supporters, Italian pro-Trump advocates, or journalists covering such topics in their reporting activities.

Now, we aim at assessing to which extent the obtained clusters are influenced by the choice of a specific PI, that is, whether the patterns of cohesion among different users seem to be coherent across different topics of discussion. In Fig.  6 we use the Adjusted Mutual Information (AMI) to compare the clusters emerged in different graphs. Specifically, for each graph we draw a polyline showing the AMI between that graph’s clustering and all other graphs’ clustering. It is worth recalling that the AMI of two partitions is 1 if the two partitions are identical, it is 0 if the mutual information of the two partitions is the expected mutual information of two random partitions, Footnote 3 and it is negative if the mutual information of the two partitions is worse than the expected one. Of course, when comparing the partitions obtained for any two graphs, we just consider the users that are common to both graphs. In addition, in Fig.  7 , we provide a more pointwise analysis of the 10 greatest communities of each PI graph, showing how users of these clusters distribute over the greatest 30 communities of the whole and P/D graphs. Precisely, in each heatmap the cell at the intersection of row i and column j measures the proportion of users of cluster i in the considered PI graph that lie in cluster j of the compared graph.

figure 6

Pairwise Adjusted Mutual Information of graphs’ clusterings

figure 7

Cluster-to-cluster intersection size: top 10 clusters of each PI graph vs. top 30 clusters of the whole retweet graph and the P/D graph clusters. The intersection size is normalized by the considered PI cluster size (i.e., each row is individually normalized and sums up to (almost) 1)

The two figures together provide clear evidence that users are clustered in an rather unstable way, especially when we compare networks generated by individual PIs with the whole retweet graph and with each other. The topological organization of the NO front is adequately expressive of different ideological affiliations of NO sponsors, but these differences are not clearly visible in the participation to clusters of the PI1, PI2 and PI3 networks. Assuming that selective exposure and social validation are core driving polarization mechanisms ( Prasetya and Murata 2020 ), two main interpretations are possible: either (i) being generally NO-leaning is enough to trigger the exposure to these three PIs, with the actual political community a user belongs to playing a marginal role; or (ii) the interactions occurring globally on Twitter – and, as such, global information flows – are only partially responsible of the tendency of users to diffuse propaganda and disinformation items. In PI4, on the other hand, most clusters are de facto sub-clusters of a macro-community of the whole retweet graph. This is easily explained by the different polarization emerged in Fig.  5 : the macro-community is cluster 5, which we already identified as the “YES community”. The YES cluster seems to be driven by both an effort of community building and the attempt to de-legitimate the NO front by debunking its news-claims and propaganda items. As a consequence, YES users stay attached in the P/D and in other PI graphs, while they splits into sub-communities in PI4, showing a stronger degree of internal homogeneity and highlighting a polarized conversational archetype, with partisan actors and segregated community structure and discussion.

Users’ centrality

In this section we study the role played by the users in the four propaganda items selected and discussed in the previous sections. To assess the activity of each user we compute the following centrality measures on the retweet graphs: PageRank, In-Degree, Out-Degree, Authority Score and Hub Score ( Kleinberg 1999 ). The centrality measures we chose are often used for networks analysis and their interpretation depends on the phenomenon modeled by the network. In our graph, the In-Degree tells us which are the users that are more often retweeted, i.e., the users creating contents that are spread on the network. The PageRank tells us which are the users that are most likely “visited”, i.e., the users whose contents are most probably read if the retweets graph is used to surf the network. The Out-Degree tells us which are the users that more often retweet, i.e., the users playing a main role in the information diffusion. Finally the Hub Score and Authority Score are interconnected: the former tells us which are the users that more often retweet contents created by an authority , we call these users hubs ; the latter tells us which are the users creating the main contents about a discussion topic, we call these users authorities . The main difference between the Hub Score and the Out-Degree is about the content of the retweets done by a user, in the former case the user retweets authoritative contents, in the latter case the user does not show any preference about the tweet’s origin. Similarly, the main difference between the Authority Score and the In-Degree resided in the type of users that usually retweet contents produced by a given user, in the former case the users retweeting these contents are hubs , in the latter case there is no distinction among users. Thus, a user has a high Authority Score if she has a high degree and the users retweeting her contents are hubs. Tables  1 , 2 , 3 , 4 and 5 show for each propaganda item the top 10 users of each metric. The color of each cell denotes the polarization of the user, blue is used for NO supporters and red is used for YES supporters. The color’s intensity shows how strong is the polarization, i.e., the darker is the color the more polarized is the user.

Our results show a few relevant aspects. First, if we look at the whole retweet network the most active users are almost all NO supporters, as showed in Table  1 , albeit the number of NO and YES users is quite balanced in the network – as showed in Fig.  3 . Indeed, we find only few accounts belonging to the YES supporters in the top position of the five metrics. Additionally, by analyzing individual propaganda items we observe that the set of most active users, their ranking and their polarization change depending on the considered PI and metrics. In PI1, PI2 and PI3 the most active accounts are NO supporters, while in PI4 the YES supporters are the most active and numerous, in accordance with Fig.  3 and despite PI4 also being a pro-NO item. The analysis of users polarization is thus essential to understand the role played by the main actors inside each network: if we only considered the centrality of the users, without looking at their polarization, we would not be able to distinguish between accounts that are contributing to the diffusion of a fake news and accounts that are working against, i.e., the debunkers. We also see, again in accordance with the analysis presented in the “ Clustering structure ” section, that different PIs see different users take on different roles. Well known public figures – such as “mattosalvinimi”, ”giorgiameloni” and “matteorenzi” – are a minority with respect to grassroots activists, and users playing a central role in a specific network of propaganda and/or with respect to a specific metrics are absent or not as relevant in other cases – such as “cinmir89” or “proudman811”.

To further investigate the persistence of these rankings across different networks of propaganda, in Fig.  8 we present a set of correlation matrices that broadly corroborate the previous findings. Specifically, for each centrality measure we report the pairwise correlation between the rankings produced by that measure on different graphs, in order to better understand the role of the users that were active in more than one propaganda item. We rely on Spearman’s rank correlation coefficient, rather than the widely used Pearson’s, because we are neither especially interested in verifying linear dependence, nor we do expect to find it. We are more interested in the possible monotonic relationship between centrality measures as determined by Spearman’s correlation.

figure 8

Spearman’s rank correlation coefficients

As already observed, combining the centrality and polarization data, we notice that in the PI4 network there is a different community of users that is active and that is spreading information with respect to the other PI networks. This behaviour is clearly visible from the Hub Score matrix (Fig.  7 d) and partially from the Out-Degree matrix (Fig.  7 b). In the former the anti-correlation in row PI4 shows the existence of a different community of spreaders in PI4 with respect to other PI. A community composed of users that are absent, less active or play a different role in other networks. In the Out-Degree matrix we have almost no correlation in row PI4 except for PI2. This difference is due to the presence of a small, non-negligible, community of YES supporters in PI2 as showed in Fig.  5 . Whereas, the PageRank and In-Degree matrices show that the relevance of the accounts creating contents is more stable than those diffusing the information. Finally, the Authority Score matrix shows that, although the contents creator accounts are stable, their role change in the network. The same account is considered more authoritative in one network and less in the other.

What is happening in the other networks can be better understood by looking at Fig.  7 f-i where we computed separately the correlation for NO and YES supporters for the Authority and Hub Scores, that overall appear to be the most informative. To better focus our analysis we excluded the PI4 row. Our results show that among YES supporters the content creators accounts are not stable and their role change depending on the propaganda item selected. On the other hand, the role of the accounts spreading information is more stable, meaning that for different networks there are different authorities, but the hubs are the same. For what concerns the NO supporters we have that both authorities and hubs relevance changes depending on the propaganda network. Thus there is probably a more efficient synergy among NO supporters between authority and hub accounts.

Conclusions

The paper aimed at providing new insights into the dynamics of propaganda networks on Twitter. The results of our study are partly in line with existing research. Modularity-based clustering, applied to retweet graphs, pictured a wide panorama of communities of users with strong homophily/affiliation and polarized position. As expected, the clusters of propaganda networks were generally and significantly more polarized than the clusters of the whole graph and the topological organization proved to be highly representative of the ideological affiliation of users. The comparison between clusters in different graphs reveals that users’ clusters are rather dynamic, particularly when comparing networks generated by individual propaganda items with the whole retweet graph and with each other. It seems that global clusters, often associated with information exposure, are only partially responsible of the tendency of users to diffuse propaganda and disinformation items. When it comes to taking a position on a controversial topic, users tend to group with different people with respect to those they usually connect to in the whole graph, and the “high-level” polarization of a user – such as the NO vs. YES leaning in our case – may have a more prominent role than his/her political affiliation. This is especially visible for users involved in propaganda – as opposed to counter-propaganda.

The combined analysis of cluster-to-cluster intersections and centrality metrics additionally indicates how different propaganda items are associated to different users with authoritative roles. The correlation of centrality metrics across different networks provides further insights: (i) the Authority and Hub Score seem the most informative metrics for studying networks of propaganda, thanks to their ability to tell apart content creators and spreaders; (ii) the role of content creators is taken by different users for different propaganda items, independently of clusters polarization; (iii) spreaders are instead generally more “consistent”. Overall, the propaganda community depicted in this study, far from being monolithic, has a considerable degree of internal variability, in terms of central actors, topics and opinion polarization. Polarization with respect to a main theme, transversal to the considered propaganda items, emerged as a fundamental parameter in governing users behavior. A side result of the present paper is the identification of a few expedients and precautions to be used in practice. For instance, we showed that the authority and hub scores unveil different players of a propaganda network, and that real-time detection of propaganda and disinformation campaigns must be built on top of a reliable polarization measure. To this end, it must be kept in mind that users’ polarization (on a specific issue) and political partisanship do not always coincide: we showed that the topic of debate may significantly alter the community structure of an interaction network, and thus the perceived affiliation of its users. Further directions of research could involve other clustering algorithms as well as dynamic influence metrics, in order to gain deeper knowledge on the relationship between exposure to propaganda and the general structure of users interaction.

Another issue we explicitly chose not to cover involves the determinants of user centrality in a debate (why or how a user gained a central role?), nor to detect coordinated bot attacks that possibly boosted the centrality of a Twitter profile. We rather focus on the perceived centrality of a user, regardless of what caused it, to show that: (i) the centrality itself is of limited use if not accompanied with a polarization analysis, e.g., to distinguish propaganda from counter-propaganda/debunking; (ii) using different metrics make it possible to detect different roles in the network, and such roles vary from one disinformation item to another. That said, the analysis highlighted evidence of a major coordination effort in the NO front, which is where the considered propaganda and disinformation items were more prevalent. Understanding whether this coordination was supported by bots is left to future work.

Availability of data and materials

Part of the code used in this paper will be included in the network analysis toolbox DisInfoNet, currently under development by the partners of the Project “SOMA” at https://gitlab.com/s.guarino/disinfonet . DisInfoNet is presented in a previous conference paper ( Guarino et al. 2019 ) and will be released by the end of the SOMA Project. The entire dataset used during the current study is not publicly available due to Twitter’s policies. The ids of the tweets are available from the corresponding author on reasonable request.

Pagella Politica is partner of the EU H2020 SOMA Project.

Precisely, we only consider the giant weakly connected component of this graph, which contains 92.55% of all vertices and 99.13% of all edges of the complete retweet graph.

Here, the meaning of “random” depends on the choice of a distribution over the set of all possible partitions ( Vinh et al. 2009 )

Abbreviations

Adjusted Mutual Information

Propaganda/Disinformation

Propaganda Item

Prime Minister

Allcott, H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Perspect 31(2):211–36.

Google Scholar  

Alonso-Muñoz, L, Casero-Ripollés A (2018) Communication of european populist leaders on twitter: Agenda setting and the ’more is less’ effect. El profesional de la información 27(6):1193–02.

Aragón, P, Kappler KE, Kaltenbrunner A, Laniado D, Volkovich Y (2013) Communication dynamics in twitter during political campaigns: The case of the 2011 spanish national election. Policy Internet 5(2):183–206.

Bakker, TP, De Vreese CH (2011) Good news for the future? young people, internet use, and political participation. Commun Res 38(4):451–470.

Becatti, C, Caldarelli G, Lambiotte R, Saracco F (2019) Extracting significant signal of news consumption from social networks: the case of twitter in italian political elections. Palgrave Commun 5(1):1–16.

Bessi, A, Ferrara E (2016) Social bots distort the 2016 us presidential election online discussion. First Monday 21(11–7).

Bovet, A, Makse HA (2019) Influence of fake news in twitter during the 2016 us presidential election. Nat Commun 10(1):7.

Brody, DC, Meier DM (2018) How to model fake news. arXiv preprint arXiv:1809.00964.

Burckhardt, P, Duch R, Matsuo A (2016) Tweet as a tool for election forecast: UK 2015. General election as an example. [online]. http://asiapolmeth.princeton.edu/sites/default/files/polmeth/files/uk_election_tweets_asia_polmeth.pdf .

Castanho Silva, B, Vegetti F, Littvay L (2017) The elite is up to something: Exploring the relation between populism and belief in conspiracy theories. Swiss Polit Sci Rev 23(4):423–443.

Castillo, C, Mendoza M, Poblete B (2011) Information credibility on twitter In: Proceedings of the 20th International Conference on World Wide Web, 675–684.. ACM, New York.

Ceron, A, Curini L, Iacus SM (2016) Politics and big data: nowcasting and forecasting elections with social media. Taylor & Francis.

Ciampaglia, GL, Shiralkar P, Rocha LM, Bollen J, Menczer F, Flammini A (2015) Computational fact checking from knowledge networks. PloS ONE 10(6):0128193.

Conover, M, Ratkiewicz J, Francisco MR, Gonçalves B, Menczer F, Flammini A (2011) Political polarization on twitter. Icwsm 133:89–96.

Crowell, C (2017) Our approach to bots & misinformation. Twitter public policy.

Del Vicario, M, Scala A, Caldarelli G, Stanley HE, Quattrociocchi W (2017) Modeling confirmation bias and polarization. Sci Rep 7:40391.

Feng, S, Banerjee R, Choi Y (2012) Syntactic stylometry for deception detection In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, 171–175.. Association for Computational Linguistics, Jeju Island.

Feng, VW, Hirst G (2013) Detecting deceptive opinions with profile compatibility In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, 338–346.. Asian Federation of Natural Language Processing, Nagoya.

Gayo-Avello, D, Metaxas PT, Mustafaraj E (2011) Limits of electoral predictions using twitter In: Fifth International AAAI Conference on Weblogs and Social Media.. The AAAI Press, Menlo Park, California.

Guarino, S, Trino N, Chessa A, Riotta G (2019) Beyond fact-checking: Network analysis tools for monitoring disinformation in social media In: International Conference on Complex Networks and Their Applications, 436–447.. Springer, Lisbon.

Guess, A, Nagler J, Tucker J (2019) Less than you think: Prevalence and predictors of fake news dissemination on facebook. Sci Adv 5(1):4586.

Higgins, K (2016) Post-truth: a guide for the perplexed. Nat News 540(7631):9.

Karadzhov, G, Nakov P, Màrquez L, Barron-Cedeno A, Koychev I (2017) Fully automated fact checking using external sources. arXiv preprint arXiv:1710.00341.

Kleinberg, JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632.

MathSciNet   MATH   Google Scholar  

Lazer, DM, Baum MA, Benkler Y, Berinsky AJ, Greenhill KM, Menczer F, Metzger MJ, Nyhan B, Pennycook G, Rothschild D, et al (2018) The science of fake news. Science 359(6380):1094–1096.

Le, Q, Mikolov T (2014) Distributed representations of sentences and documents In: International Conference on Machine Learning, vol. 32, 1188–1196.. JMLR: W&CP, Beijing.

Lévy, P (2002) Cyberdémocratie: essai de philosophie politique In: A Inteligência Coletiva.. Odile Jacob, Paris.

Löfstedt, R (2005) Risk management in post-trust societies. Springer, New York: Palgrave Macmillan.

Margolin, DB, Hannak A, Weber I (2018) Political fact-checking on twitter: when do corrections have an effect?. Polit Commun 35(2):196–219.

Markowitz, DM, Hancock JT (2014) Linguistic traces of a scientific fraud: The case of diederik stapel. PloS ONE 9(8):105937.

Mastinu, L (2016) TOP 10 Bufale e disinformazione sul Referendum. www.bufale.net/top-10-bufale-e-disinformazione-sul-referendum/ . Accessed 05 July 2019.

Nichols, T (2017) The death of expertise: The campaign against established knowledge and why it matters. Wiley Online Library.

Papacharissi, Z, de Fatima Oliveira M (2012) Affective news and networked publics: The rhythms of news storytelling on# egypt. J Commun 62(2):266–282.

Pariser, E (2011) The filter bubble: what the internet is hiding from you. Penguin UK.

Pierri, F, Artoni A, Ceri S (2020) Investigating italian disinformation spreading on twitter in the context of 2019 european elections. PloS ONE 15(1):0227821.

Politica, RP (2016) La notizia più condivisa sul referendum? È una bufala. https://pagellapolitica.it/blog/show/148/la-notizia-pi%C3%B9-condivisa-sul-referendum-%C3%A8-una-bufala . Accessed 05 July 2019.

Post, RI (2016) Nove bufale sul referendum. www.ilpost.it/2016/12/02/bufale-referendum/ . Accessed 05 July 2019.

Prasetya, HA, Murata T (2020) A model of opinion and propagation structure polarization in social media. Comput Soc Networks 7(1):1–35.

Qiu, X, Oliveira DF, Shirazi AS, Flammini A, Menczer F (2017) Limited individual attention and online virality of low-quality information. Nat Hum Behav 1(7):0132.

Schroeder, R (2018) Digital media and the rise of right-wing populism. Soc Theory Internet Media Technol Glob:60–81.

Shao, C, Ciampaglia GL, Flammini A, Menczer F (2016) Hoaxy: A platform for tracking online misinformation In: Proceedings of the 25th International Conference Companion on World Wide Web, 745–750.. International World Wide Web Conferences Steering Committee, Montréal.

Shao, C, Ciampaglia GL, Varol O, Yang K-C, Flammini A, Menczer F (2018) The spread of low-credibility content by social bots. Nat Commun 9(1):4787.

Shao, C, Hui P-M, Wang L, Jiang X, Flammini A, Menczer F, Ciampaglia GL (2018) Anatomy of an online misinformation network. PloS ONE 13(4):0196087.

Shin, J, Thorson K (2017) Partisan selective sharing: The biased diffusion of fact-checking messages on social media. J Commun 67(2):233–255.

Shu, K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media: A data mining perspective. ACM SIGKDD Explor Newsl 19(1):22–36.

Silverman, C, Singer-Vine J (2016) Most americans who see fake news believe it, new survey says. BuzzFeed News 6. https://www.buzzfeednews.com/article/craigsilverman/fake-news-survey .

Skurnik, I, Yoon C, Park DC, Schwarz N (2005) How warnings about false claims become recommendations. J Consum Res 31(4):713–724.

Subrahmanian, V, Azaria A, Durst S, Kagan V, Galstyan A, Lerman K, Zhu L, Ferrara E, Flammini A, Menczer F, et al (2016) The darpa twitter bot challenge. arXiv preprint arXiv:1601.05140.

Sunstein, CR (2001) Republic.com. Princeton university press.

Trilling, D (2015) Two different debates? investigating the relationship between a political debate on tv and simultaneous comments on twitter. Soc Sci Comput Rev 33(3):259–276. https://doi.org/10.1177/0894439314537886 .

Tucker, J, Guess A, Barberá P, Vaccari C, Siegel A, Sanovich S, Stukal D, Nyhan B (2018) Social media, political polarization, and political disinformation: A review of the scientific literature. Political polarization, and political disinformation: a review of the scientific literature (March 19, 2018).

Van der Linden, S, Leiserowitz A, Rosenthal S, Maibach E (2017) Inoculating the public against misinformation about climate change. Glob Challenges 1(2):1600008.

Vicario, MD, Quattrociocchi W, Scala A, Zollo F (2019) Polarization and fake news: Early warning of potential misinformation targets. ACM Trans Web (TWEB) 13(2):10.

Vinh, NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, 1073–1080.. Association for Computing Machinery (ACM), Montréal.

Vosoughi, S, Roy D, Aral S (2018) The spread of true and false news online. Science 359(6380):1146–1151.

Zubiaga, A, Aker A, Bontcheva K, Liakata M, Procter R (2018) Detection and resolution of rumours in social media: A survey. ACM Comput Surv (CSUR) 51(2):32.

Download references

Acknowledgements

Not applicable.

This work was supported in part by the Project “SOMA”, funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825469. The European Commission had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. Any opinion, finding, and conclusions expressed in this paper only reflect the views of the authors.

Author information

Authors and affiliations.

Institute for Applied Mathematics, National Research Council, Rome, Italy

Stefano Guarino & Alessandro Celestini

Luiss “Guido Carli” University, Rome, Italy

Noemi Trino, Alessandro Chessa & Gianni Riotta

Linkalab, Cagliari, Italy

Alessandro Chessa

You can also search for this author in PubMed   Google Scholar

Contributions

SG, NT, ACe and ACh designed the study. ACh acquired the data. SG and ACe created the software used to perform the data analysis. SG, NT and ACe interpreted the results and wrote the paper. All author(s) revised the work and read and approved the final manuscript.

Corresponding author

Correspondence to Stefano Guarino .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Guarino, S., Trino, N., Celestini, A. et al. Characterizing networks of propaganda on twitter: a case study. Appl Netw Sci 5 , 59 (2020). https://doi.org/10.1007/s41109-020-00286-y

Download citation

Received : 28 February 2020

Accepted : 17 July 2020

Published : 04 September 2020

DOI : https://doi.org/10.1007/s41109-020-00286-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Propaganda networks
  • Polarization

research paper on propaganda techniques

Special Issue: Propaganda

This essay was published as part of the Special Issue “Propaganda Analysis Revisited”, guest-edited by Dr. A. J. Bauer (Assistant Professor, Department of Journalism and Creative Media, University of Alabama) and Dr. Anthony Nadler (Associate Professor, Department of Communication and Media Studies, Ursinus College).

Peer Reviewed

Overlooking the political economy in the research on propaganda

Article metrics.

CrossRef

CrossRef Citations

Altmetric Score

PDF Downloads

Historically, scholars studying propaganda have focused on its psychological and behavioral impacts on audiences. This tradition has roots in the unique historical trajectory of the United States through the 20th century. This article argues that this tradition is quite inadequate to tackle propaganda-related issues in the Global South, where a deep understanding of the political economy of propaganda and misinformation is urgently needed.

School of Journalism and Mass Communication, University of Wisconsin-Madison, USA

research paper on propaganda techniques

Research Questions

  • What is the dominant approach to studying online propaganda in the context of the Global South?
  • What are the limitations of this approach?

Essay Summary

  • During the 20th century, funding from the U.S. government and corporate entities oriented the field of mass communication towards studying psychological and behavioral impacts of media on audiences because such research was beneficial to these funders. This orientation is the dominant paradigm in the field. 
  • A bulk of research on misinformation and propaganda since 2016 is situated in this intellectual legacy. 
  • A thematic meta-analysis – of recent literature on online propaganda in the context of the Global South, and 20 Facebook-funded research projects in 2018 – shows that research is overwhelmingly focused on the psychological and behavioral impacts of propaganda. This research advocates for promoting “media literacy” and helping citizens “inoculate” themselves against propaganda.
  • This approach has limited use in tackling propaganda in the Global South. It not only oversimplifies “media literacy,” but also fails to examine, quite crucially, how the state, corporations, and media institutions interact – the political economy of propaganda.
  • Further, scholars need to reflect on how entities such as Facebook fund such research to deflect scrutiny of their institutional role in propaganda-related violence in the Global South.

Implications

Researchers studying mass communication, including social media, often focus on the media’s psychological and behavioral impacts on audiences to the detriment of an institution-level study of mass communication. The reason for such an orientation lies in the history of mass communication research in the United States amidst the global wars and social turmoil of the twentieth century. During this period, research agendas took on an “administrative” approach in order to aid those in governments and corporations in using communication channels to better influence their audiences (Lazarsfeld, 1941). 1 Lazarsfeld, a “founding father” of the field, describes the administrative paradigm as “carried through in the service of some kind of administrative agency of public or private character” (Lazarsfeld, 1941, p. 8) with a purpose to “sell goods, or to raise the intellectual standards of the population, or secure an understanding of government policies” (Lazarsfeld, 1941, p. 2).  The administrative research paradigm (henceforth “administrative paradigm”) conceptualizes communication as “a sender sending a message through a channel to affect audiences.” In this linear, mechanical, and mathematical formulation, each of the pieces – “sender,” “message,” “channel,” and “audiences” – are objects of study. “Media effects” research combines these pieces by studying how various types of messages and channels influence audiences. The administrative paradigm is the dominant paradigm within the field.

Although the research outside of the administrative paradigm is vast (as described later in this section), the dominant paradigm overlooks the political economy of communication or an institution-level analysis – how the state and corporate interests influence media institutions and shape “communication.” These priorities have significant implications, especially regarding propaganda in the Global South. For example, scholars have rarely examined Facebook’s institutional role in the genocide in Myanmar. But before describing these implications further, it is useful to briefly historicize the administrative paradigm.

Historicizing the administrative paradigm

Over the course of the twentieth century, support from two entities was crucial to solidify administrative research as the dominant paradigm within communication research in the U.S. – corporate interests such as the Rockefeller Foundation, and the U.S. government (Pooley, 2008). After observing widespread public susceptibility to propaganda during WWI, many prominent scholars in the U.S. envisioned a technocratic or “managed” democracy (Gary, 1999, pp. 15–54; Glander, 2000, pp. 1–37; Sproule, 1997, pp. 54–74). With corporate funding and governmental aid, scholars started conducting media-effects research, which laid the foundations of the modern advertising and public relations industry as well as more sophisticated propaganda efforts by the U.S. government (Pooley, 2008). By WWII, a network of administrative paradigm scholars had emerged which helped mobilize U.S. citizens for the war and countered Nazi propaganda on behalf of the Roosevelt administration (Glander, 2000, pp. 38–55). Prominent paradigm scholars made the utility of media-effects research clear in the statement that “[public] consent will require unprecedented knowledge of the public mind and of the means by which leadership can secure consent” (Gary, 1996, pp. 139–140). The result was a proliferation of media-effects research and the inception of prominent journals such as  Public Opinion Quarterly , funded by the Rockefeller Foundation (Gary, 1996).

The Cold War was momentous in establishing mass communication as a distinct field, with administrative research as its dominant paradigm (Simpson, 1994). As the two global superpowers raced to control the “Third World,” the U.S. government poured in unprecedented funding into media-effects research, which back then was more appropriately called “psychological warfare” research. 2 Over time, terms such as “propaganda” and “psychological warfare” were replaced by more innocuous-sounding words such as “persuasion” even though the underlying research orientation has remained unchanged (Simpson, 1994; Sproule, 1997). In 1952, for example, 96% of the reported funding in communication research was from the U.S. military (Simpson, 1994, p. 52). This research directly shaped U.S. psychological warfare efforts in “Third World” countries such as Iran (Simpson, 1994, p. 56). In addition, the U.S. government’s massive support expanded the aforementioned network of administrative paradigm scholars; helped place these scholars in high academic positions with control over scholarly publishing and rank and tenure decisions; and the establishment of key academic journals (Simpson, 1994, p. 9). 

Turning to the present, the key point is that contemporary research is shaped by this history: which texts are considered “classics,” who the “founding fathers” are, the top academic journals and their research orientations, and the research questions that generations of scholars have deemed important. That is, academic inquiry operates on a terrain shaped by history and contemporary political economy, rather than in a vacuum. 3 This is, of course, not unique to the field of communication, but also psychology, area studies, anthropology, development studies, and social sciences in general. See the “Introduction” in  Universities and Empire  (Simpson ,  1998).  Furthermore, states’ and corporations’ desire to fund administrative research is not just a phenomenon of the past. Rather, it is fundamental to their interests – they want to use mass communication for political campaigns, advertising, etc.; to educate, persuade, manipulate, or control their subjects.

Limitations of current research

It is in this context that propaganda research has boomed since the 2016 U.S. election. The thematic meta-analysis presented in this paper provides evidence of multiple limitations in the bulk of this research in the context of the Global South:

  • Theoretical: The research is primarily focused on media-effects – it investigates individuals’ susceptibility to propaganda. 
  • Policy: The research largely conceptualizes online platforms as neutral, and therefore the policy recommendations place the burden of responsibility onto individuals – suggesting that governments should conduct “media literacy” campaigns and recommending how to “inoculate” 4 To quote Roozenbeek & van der Linden (2020) from this journal, “inoculation theory” – which has a plethora of literature on it – uses a medical analogy to posit that “preemptively exposing someone to a weakened version of a particular misleading argument prompts a process that is akin to the production of ‘mental antibodies,’ which make it less likely that a person is persuaded by the ‘real’ manipulation later on.” Their research was funded by the U.S. State Department and U.S. Department of Homeland Security (and conducted on U.S. citizens). This article illustrates the historical patterns described in this article quite aptly; see Roozenbeek et al. (2020) for another example.  citizens against propaganda. Entities such as Facebook have generously funded such research when facing scrutiny of their institutional role in propaganda-fueled violence in the Global South. 
  • Methodological: The research is methodologically limited by the administrative paradigm; primarily to conducting media-effects experiments and secondarily to doing content analysis 5 The history of content analysis and its relationship with the administrative paradigm is interesting. The U.S. government promoted content analysis within the administrative paradigm during the Cold War so that it can identify propaganda from rival institutions and get insights into those institutions. Interestingly, researchers working for the U.S. government seriously questioned if content analysis can help in making inferences about institutions at all (Simpson, 1994, p. 88; Sproule, 1997, pp. 193–196).  of online propaganda. Largely absent are methodologies such as policy analysis, interviews, ethnographies, historical analysis, etc.
  • Geographical: Research is overwhelmingly focused on the Global North.

The following two cases illustrate the alarming situation in the Global South, along with some key institution-level issues therein, which the administrative paradigm is theoretically unequipped to tackle. In Myanmar, the UN fact-finding mission declared that Facebook played a “determining role” in the Rohingya genocide leading to the exodus of more than 800,000 Rohingya Muslims and a massive humanitarian crisis in South Asia (Miles, 2018). But the key issue of Facebook’s institutional role is rarely examined by communication scholars (some scholarship breaks this trend; see Fink, 2018; Sablosky, 2021). 

In India, Hindu nationalist propaganda on Facebook and Facebook-owned WhatsApp has led to harassment, hate speech, lynchings, and pogroms against Muslims and lower castes (Muslim Advocates & GPAHE, 2020; Soundararajan et al., 2019). Here, a key dynamic is that India is the largest market for Facebook, 6 328 million and 400 million Indians use Facebook and WhatsApp, respectively (Perrigo, 2020).  and Facebook appears to be reluctant to curb propaganda at the expense of its profits. For instance, to avoid upsetting the ruling Hindu-nationalist Bharatiya Janata Party (BJP), Facebook has allowed hate speech by BJP politicians to remain on its platform in violation of its own policies (Purnell & Horwitz, 2020). It is difficult to expect institutional accountability when BJP-linked people run the Facebook India office (ibid.). 

Suggested interventions

This paper suggests the following interventions. First, the dominance of the administrative paradigm exists alongside a theoretically and methodologically vast and rich literature, which adopts other approaches to studying mass communication in the Global North such as: online content moderation (Gillespie, 2018), regulation of social media companies (Napoli, 2019), propaganda ecosystems (Benkler et al., 2018), the role of algorithms (Caplan & boyd, 2018; Noble, 2018), and the political economy of communication (Hindman, 2008; McChesney, 2015; Pickard, 2019). But it is the dominant paradigm – its theoretical orientation and methodology – that is most frequently applied to study online propaganda in the Global South. Less common are approaches dealing with important questions in the Global South of content moderation and hate speech (Sablosky, 2021), the political economy of the internet (Athique & Parthasarathi, 2020), internet regulation (Parthasarathi & Agarwal, 2020; Rajkhowa, 2015), internet access (Moyo & Munoriyarwa, 2021; Mukherjee, 2019; Nothias, 2020), and theorizations of what “fake news” might mean in the Global South (Wasserman, 2020). Scholars need to fill these gaps urgently.

Second, “media literacy” in the context of the Global South needs to be defined with a democratic ethos. There is a tendency in existing research to persistently connect individuals’ characteristics (education, income, identity etc.) to their susceptibility to online propaganda. This tendency extends the discipline’s described historic legacy of facilitating imperialism and statist–corporatist control. Instead, scholars need to explore how individuals can be empowered to participate democratically in policymaking on these issues (Hasebrink, 2012; 7 Also see the entire corresponding special issue of  Medijske Studije. Horowitz & Napoli, 2014). 

Third, scholars need to be cognizant of the political economy of knowledge production – how institutions such as Facebook promote administrative research which takes scholarly attention away from Facebook’s institutional role in propaganda-related violence. 8 As explained earlier, this is part of a larger historic trend. Another example is the Rockefeller Foundation awarding a $67,000 grant in 1937 to study media-effects in the U.S. – the grant came during the formative period for the discipline, and explicitly forbade research on questions of the political economy of communication (Pooley, 2008, p. 51). The prevalence of a research paradigm need not be in proportion to its normative value or theoretical sophistication.

Finding 1: Agendas of research projects funded by Facebook-owned WhatsApp to study misinformation.

In July 2018, Facebook-owned WhatsApp came under heavy scrutiny globally after a series of WhatsApp-fueled lynchings occurred in India (Gowen, 2018). WhatsApp responded with a massive public relations campaign – on radio, television, print media, and the internet – projecting WhatsApp/Facebook as a neutral actor that was committed to helping the community fight misinformation (Creech, 2020; Indo-Asian News Service, 2018; The Indian Express, 2018). Further, WhatsApp invited proposals for “WhatsApp Research Awards for Social Science and Misinformation,” which was widely publicized in the news media. In November 2018, WhatsApp awarded $50,000 to 20 research proposals, totaling $1 million. 

A thematic meta-analysis was done to understand the research focus of this $1 million grant using: (1) WhatsApp’s definitions of “high priority” areas 9 The initial announcement included a category called “Detecting problematic behavior within encrypted systems,” but no research awards were given under it. This category was defined as “Examine technical solutions to detecting problematic behavior within the restrictions of and in keeping with the principles of encryption.” under which it invited proposals, and thereby outlined research agenda(s); (2) The abstracts of all 20 accepted proposals (and every proposal’s stated “high priority” area) from which the more specific research and geographic focus of individual proposals could be ascertained.

Overall, the geographic focus is encouraging; 75% of proposals are on the Global South. However, each “high priority” area (listed below) and the proposals therein are confined to the administrative paradigm:

  • Information processing of problematic content: WhatsApp defined this as examining how “social, cognitive, and information processing variables relate to the content’s credibility, and the decision to share that content with others;” that is, media-effects research. For example, one study will compare how users react to fake news in various formats (text-only, audio-only, and video) in India.
  • Election-related misinformation: Proposals under this category predominantly undertake media-effects research but emphasize electoral politics. For example, studies will focus on how users get impacted by political information, how they have political conversations, etc.
  • Digital literacy and misinformation: WhatsApp defines this as “the relation between digital literacy and vulnerability to misinformation on WhatsApp.” Although this is media-effects research, the focus is more on application (literacy) rather than on advancing theory. One study will examine “how vulnerability to fake news is affected by socioeconomic, demographic, or geographical factors” across nine states in India. Another will test the effectiveness of a game-based intervention to “inoculate” WhatsApp users against fake news.
  • Network effects and virality: WhatsApp defined this as “the spread of information through WhatsApp networks.” One study will explore how users process religious information, and another will examine how WhatsApp users and their networks coevolve as misinformation diffuses through the network. 

In sum, when Facebook-owned WhatsApp faced intense scrutiny on its institutional role in propaganda-related violence in the Global South, it responded by funding administrative research – framing the issue as users’ susceptibility to propaganda and lack of media literacy. 

research paper on propaganda techniques

This approach overlooks WhatsApp’s far more complex role in the Global South. For example, “encrypted” platforms such as WhatsApp do not protect everyone equally – while Hindu nationalist groups in India organize violence on WhatsApp with impunity (Purohit, 2019), peaceful protestors are prosecuted (Mahaprashasta, 2020). 10 Although the Indian government and other actors have hacked into WhatsApp in the past, the issue is typically not related to breaking WhatsApp’s encryption – police reports are filed using screenshots of WhatsApp messages, phones are seized, etc. Right-wing authoritarians have leveraged WhatsApp extensively for their political campaigns (Avelar, 2019; Perrigo, 2019; Vaidhyanathan, 2018, pp. 186–195). There is also a larger issue of Facebook pushing its services, such as WhatsApp, without establishing a functional internal infrastructure for handling civil rights issues. 11 Facebook’s failure in this regard is perhaps best documented in the Global North. See Murphy and Cacace (2020). For instance, Facebook makes its website/services cheaper to access compared to other websites in the Global South, 12 This project by Facebook – which violates net neutrality – is called “Internet.org” or “Free Basics,” see Nothias (2020). Recently, Facebook appointed the vice-president of Internet.org to run WhatsApp. which has made Facebook/WhatsApp ubiquitous in countries such as Myanmar (Vaidhyanathan, 2018, pp. 194–195). The issues of internet access, digital literacy, and propaganda are interconnected and need to be studied holistically (Moyo & Munoriyarwa, 2021; Mukherjee, 2019; Nothias, 2020). 

Lastly, the political economy of knowledge production should be noted here. Following a series of WhatsApp-fueled lynchings in India in 2018, Facebook provided $1 million for administrative research. In contrast, when an investigative report in 2020 revealed the collusion between Facebook and India’s Hindu nationalist government (Purnell & Horwitz, 2020), no research grant was awarded to study similar institution-level phenomena in the Global South. Through research funding, Facebook appears to have strategically marginalized the institutional dimensions within these contexts, with obvious consequences for knowledge production.

Finding 2: Agendas of published research on Facebook and WhatsApp.

A thematic meta-analysis of the research published in 10 communication journals with the highest  Journal Impact Factors , shows that the research on Facebook and WhatsApp is limited (Fig. 2):

  • Geographically: Out of 590 articles on Facebook, only 4 are on India, and none on Myanmar. This is despite the majority of Facebook users being in the Global South (and India), and despite the genocide in Myanmar – which has been shockingly understudied in comparison to the 2016 U.S. election. A mere 22 articles are on WhatsApp, which has 1.3 billion users located largely in the Global South.
  • Theoretically: The highly cited articles are from journals with an administrative research orientation, notably the Journal of Computer-Mediated Communication . The three newer journals –  New Media & Society ;  Information, Communication & Society ; and  Digital Journalism  – however, provide room for non-administrative approaches to studying Facebook and WhatsApp. But overall – keeping with the discipline’s historical legacy – older journals generally adopt the administrative research tradition, and this research is cited the most. 13 7 out of 10 of the most cited articles related to Facebook were from the  Journal of Computer-Mediated Communication.

More extensive reviews and meta-analysis studies of social media research in the past have also noted this geographical disparity as well as the dominance of administrative paradigm theories and methodologies (Caers et al., 2013; Kapoor et al., 2018; Ngai et al., 2015; Zhang & Leung, 2015). Crucially, these reviews and meta-analysis studies do not critique the administrative paradigm; their recommendations for “future directions” of research are almost always confined to the paradigm.

research paper on propaganda techniques

Finding 3: Agendas of published research in Misinformation Review (MR).

A thematic meta-analysis of all the research articles published in this journal, during its first year, shows similar patterns.  MR ’s first issue in January 2020 stated its goal as envisioning a response strategy to deal with misinformation and propaganda, and that “such a strategy requires interventions at many levels: legal, political, financial, infrastructural, cultural, and social” made possible by research on “all aspects of misinformation” (Pasquetto, 2020). However, the published research is limited (Fig. 3):

  • Theoretically: Administrative paradigm research constitutes 71% of all articles. The focus is to understand audience behavior via media-effects experiments, surveys, and content analysis of people’s social media posts.
  • Methodologically: The tools of administrative research – psychological studies, content analysis, and surveys – are the dominant method in  MR . Even when scholars probe institutions, they mostly conduct content analysis of these institution’s postings on social media. Policy analysis, historical approaches, interviews, ethnographies, etc., are almost completely absent.
  • Geographically: Only 19% of the articles are on the Global South.

research paper on propaganda techniques

(A) Agendas of research projects funded by Facebook-owned WhatsApp to study misinformation 

On July 3, 2018, WhatsApp announced the “WhatsApp Research Awards for Social Science and Misinformation.” It announced the 20 winners on September 14, 2018. To assess the research orientation, a thematic meta-analysis was done using:

  • Definitions of “high priority” areas obtained from the WhatsApp’s awards launch webpage. 14 See  https://www.whatsapp.com/research/awards/ . Internet Archive link of the same:  https://web.archive.org/web/20180713090830/https://www.whatsapp.com/research/awards/
  • The title, abstract, and the stated “high priority” category for the accepted proposals listed on the WhatsApp’s webpage. 15 See  https://www.whatsapp.com/research/awards/announcement/ . Internet Archive link of the same:  https://web.archive.org/web/20190203020712/https://www.whatsapp.com/research/awards/announcement/

The research proposals were categorized by geography – Global North/South – based on the above information (see Appendix I ).

(B) Agendas of published research on Facebook and WhatsApp

First, the list of the top 10 communication journals – by their Journal Impact Factor in 2019 – was obtained from Clarivate Analytics. 16 https://jcr.clarivate.com Second, under “advanced search” on Web of Science: 17 https://apps.webofknowledge.com/

  • “Topic” was set to “Facebook,” 
  • “Publication name” to the aforementioned 10 journals, 
  • “Timespan” to 1900–2020,
  • “Document type” to “article.”
  • The above steps were repeated with “Topic” set to “WhatsApp.”

These search terms do not restrict results to the Global South or the topic to propaganda; the goal was to assess how these platforms are studied generally. The evidence presented is based on the thematic meta-analysis done with the 30 most-cited articles on Facebook, and all 22 articles on WhatsApp (see Appendix II for the list). Each journal’s featured and highly cited articles, as well as its “aims and scope,” informed the thematic meta-analysis.

(C) Agendas of published research in Misinformation Review (MR)

A thematic meta-analysis of all 31 peer-reviewed research articles published in  MR  during 2020 (excluding the article types “commentary” and “research note”) was done, focusing on these themes:

  • Theory: Classified as “administrative research” when focused on understanding audiences, or “institution-level research” when focused on understanding institutional actors (state or non-state). This was determined based on the entirety of the article.
  • Methodology: Classified as content analysis, media-effects experiment, or surveys based on methodology described in the article.
  • Geography: Classified as Global North or Global South based on the research context or the study’s subjects.

See Appendix III for classifications.

  • / Mainstream Media
  • / Media Literacy
  • / Propaganda

Cite this Essay

Abhishek, A. (2021). Overlooking the political economy in the research on propaganda. Harvard Kennedy School (HKS) Misinformation Review . https://doi.org/10.37016/mr-2020-61

Bibliography

Athique, A., & Parthasarathi, V. (Eds.). (2020). Platform capitalism in India (1st ed.; 2020 ed.). Palgrave Macmillan.

Avelar, D. (2019, October 30). WhatsApp fake news during Brazil election ‘favoured Bolsonaro.’ The Guardian . https://www.theguardian.com/world/2019/oct/30/whatsapp-fake-news-brazil-election-favoured-jair-bolsonaro-analysis-suggests

Benkler, Y., Faris, R., & Roberts, H. (2018). Network propaganda: Manipulation, disinformation, and radicalization in American politics . Oxford University Press.

Caers, R., De Feyter, T., De Couck, M., Stough, T., Vigna, C., & Du Bois, C. (2013). Facebook: A literature review. New Media & Society , 15 (6), 982–1002. https://doi.org/10.1177/1461444813488061

Caplan, R., & boyd, d. (2018). Isomorphism through algorithms: Institutional dependencies in the case of Facebook. Big Data & Society , 5 (1). https://doi.org/10.1177/2053951718757253

Creech, B. (2020). Fake news and the discursive construction of technology companies’ social power. Media, Culture & Society , 42 (6), 952–968. https://doi.org/10.1177/0163443719899801

Fink, C. (2018). Dangerous speech, anti-Muslim violence, and Facebook in Myanmar. Journal of International Affairs , 71 (1.5), 43–52. https://jia.sipa.columbia.edu/dangerous-speech-anti-muslim-violence-and-facebook-myanmar

Gary, B. (1996). Communication research, the Rockefeller Foundation, and mobilization for the war on words, 1938–1944. Journal of Communication , 46 (3), 124–148. https://doi.org/10.1111/j.1460-2466.1996.tb01493.x

Gary, B. (1999). The nervous liberals: Propaganda anxieties from World War I to the Cold War . Columbia University Press.

Gillespie, T. (2018). Custodians of the internet: Platforms, content moderation, and the hidden decisions that shape social media . Yale University Press.

Glander, T. (2000). Origins of mass communications research during the American Cold War: Educational effects and contemporary implications (1st ed.). Lawrence Erlbaum.

Gowen, A. (2018, July 4). As rumors fuel mob lynchings in India, WhatsApp offers $50,000 grants to curb fake news. The Washington Post . https://www.washingtonpost.com/world/as-rumors-fuel-mob-lynchings-in-india-whatsapp-offers-50000-grants-to-curb-fake-news/2018/07/04/93b098f7-edb2-44de-945e-81f59bf16e96_story.html

Hasebrink, U. (2012). The role of the audience within media governance: The neglected dimension of media literacy. Medijske Studije , 3 (6), 58–73. https://hrcak.srce.hr/ojs/index.php/medijske-studije/article/view/6067

Hindman, M. (2009). The myth of digital democracy . Princeton University Press.

Horowitz, M. A., & Napoli, P. M. (2014). Diversity 2.0: A framework for audience participation in assessing media systems. Interactions: Studies in Communication & Culture , 5 (3), 309–326. https://doi.org/10.1386/iscc.5.3.309_1

The Indian Express. (2018, July 10). WhatsApp begins ad campaign, lists easy tips for users to fight fake news. The Indian Express . https://indianexpress.com/article/technology/tech-news-technology/whatsapp-begins-ad-campaign-lists-easy-tips-for-users-to-fight-fake-news-5252903/

Indo-Asian News Service. (2018, September 5). WhatsApp starts second phase of radio ad campaign in India. NDTV Gadgets 360. https://gadgets.ndtv.com/apps/news/whatsapp-radio-ads-india-to-fight-misinformation-fake-news-1911699

Kapoor, K. K., Tamilmani, K., Rana, N. P., Patil, P., Dwivedi, Y. K., & Nerur, S. (2018). Advances in social media research: Past, present and future. Information Systems Frontiers , 20 (3), 531–558. https://doi.org/10.1007/s10796-017-9810-y

Lazarsfeld, P. F. (1941). Remarks on administrative and critical communications research. Zeitschrift für Sozialforschung , 9 (1), 2–16. https://doi.org/10.5840/zfs1941912

Mahaprashasta, A. A. (2020, August 3). How Delhi police turned anti-CAA WhatsApp group chats into riots “conspiracy.” The Wire . https://thewire.in/communalism/delhi-riots-police-activists-whatsapp-group

McChesney, R. W. (2015). Rich media, poor democracy: Communication politics in dubious times (new ed.). The New Press.

Miles, T. (2018, March 12). U.N. investigators cite Facebook role in Myanmar crisis. Reuters. https://www.reuters.com/article/us-myanmar-rohingya-facebook-idUSKCN1GO2PN

Moyo, D., & Munoriyarwa, A. (2021). ‘Data must fall’: Mobile data pricing, regulatory paralysis and citizen action in South Africa. Information, Communication & Society , 24 (3), 365–380. https://doi.org/10.1080/1369118X.2020.1864003

Mukherjee, R. (2019). Jio sparks Disruption 2.0: Infrastructural imaginaries and platform ecosystems in ‘Digital India.’ Media, Culture & Society , 41 (2), 175–195. https://doi.org/10.1177/0163443718818383

Murphy, L. W., & Cacace, M. (2020). Facebook’s civil rights audit – Final report . https://about.fb.com/wp-content/uploads/2020/07/Civil-Rights-Audit-Final-Report.pdf

Muslim Advocates, & GPAHE. (2020). Complicit: The human cost of Facebook’s disregard for Muslim life . https://muslimadvocates.org/wp-content/uploads/2020/10/Complicit-Report.pdf

Napoli, P. M. (2019). Social media and the public interest: Media regulation in the disinformation age . Columbia University Press.

Ngai, E. W. T., Tao, S. S. C., & Moon, K. K. L. (2015). Social media research: Theories, constructs, and conceptual frameworks. International Journal of Information Management , 35 (1), 33–44. https://doi.org/10.1016/j.ijinfomgt.2014.09.004

Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism . NYU Press.

Nothias, T. (2020). Access granted: Facebook’s free basics in Africa. Media, Culture & Society , 42 (3), 329–348. https://doi.org/10.1177/0163443719890530

Parthasarathi, V., & Agarwal, S. (2020). Rein and laissez faire: The dual personality of media regulation in India. Digital Journalism , 8 (6), 797–819. https://doi.org/10.1080/21670811.2020.1769493

Pasquetto, I. (2020). Volume 1, Issue 1 Editorial. Harvard Kennedy School (HKS) Misinformation Review , 1 (1). https://misinforeview.hks.harvard.edu/article/editorial-volume-1-issue-1/

Perrigo, B. (2019, January 25). How WhatsApp is fueling fake news ahead of India’s elections. Time . https://time.com/5512032/whatsapp-india-election-2019/

Perrigo, B. (2020, August 27). Facebook’s ties to India’s ruling party complicate its fight against hate speech. Time . https://time.com/5883993/india-facebook-hate-speech-bjp/

Pickard, V. (2019). Democracy without journalism?: Confronting the misinformation society . Oxford University Press.

Pooley, J. (2008). The new history of mass communication research. In D. W. Park & J. Pooley (Eds.), The history of media and communication research: Contested memories (pp. 43–70). Peter Lang.

Purnell, N., & Horwitz, J. (2020, August 14). Facebook’s hate-speech rules collide with Indian politics. The Wall Street Journal . https://www.wsj.com/articles/facebook-hate-speech-india-politics-muslim-hindu-modi-zuckerberg-11597423346

Purohit, K. (2019, December 18). Post CAA, BJP-linked WhatsApp groups mount a campaign to foment communalism. The Wire . https://thewire.in/media/cab-bjp-whatsapp-groups-muslims

Rajkhowa, A. (2015). The spectre of censorship: Media regulation, political anxiety and public contestations in India (2011–2013). Media, Culture & Society , 37 (6), 867–886. https://doi.org/10.1177/0163443715584099

Roozenbeek, J., & van der Linden, S. (2020). Breaking Harmony Square: A game that “inoculates” against political misinformation. Harvard Kennedy School (HKS) Misinformation Review . https://doi.org/10.37016/mr-2020-47

Roozenbeek, J., van der Linden, S., & Nygren, T. (2020). Prebunking interventions based on “inoculation” theory can reduce susceptibility to misinformation across cultures. Harvard Kennedy School (HKS) Misinformation Review , 1 (2). https://doi.org/10.37016//mr-2020-008

Sablosky, J. (2021). Dangerous organizations: Facebook’s content moderation decisions and ethnic visibility in Myanmar. Media, Culture & Society . https://doi.org/10.1177/0163443720987751

Simpson, C. (1994). Science of coercion: Communication research and psychological warfare, 1945 – 1960 . Oxford University Press.

Simpson, C. (Ed.). (1998). Universities and empire: Money and politics in the social sciences during the Cold War . The New Press.

Soundararajan, T., Kumar, A., Nair, P., & Greely, J. (2019). Facebook India: Towards the tipping point of violence: Caste and religious hate speech . Equality Labs.

Sproule, J. M. (1997). Propaganda and democracy: The American experience of media and mass persuasion . Cambridge University Press.

Vaidhyanathan, S. (2018). Antisocial media: How Facebook disconnects us and undermines democracy . Oxford University Press.

Wasserman, H. (2020). Fake news from Africa: Panics, politics and paradigms. Journalism , 21 (1), 3–16. https://doi.org/10.1177/1464884917746861

Zhang, Y., & Leung, L. (2015). A review of social networking service (SNS) research in communication journals from 2006 to 2011. New Media & Society , 17 (7), 1007–1024. https://doi.org/10.1177/1461444813520477

The author received no specific funding for this work.

Competing Interests

No potential conflicts of interest.

No institutional review board or ethics committee for human or animal experiments review was required.

This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author and source are properly credited.

Data Availability

All materials needed to replicate this study are available via the Harvard Dataverse: https://doi.org/10.7910/DVN/ZAFIEN , https://doi.org/10.7910/DVN/SRGKQA , https://doi.org/10.7910/DVN/LA2NLZ .

IMAGES

  1. PROPAGANDA RESEARCH.doc

    research paper on propaganda techniques

  2. Analysis of Nazi Poster Propaganda Techniques Essay Example

    research paper on propaganda techniques

  3. 6 Methods of Propaganda Techniques

    research paper on propaganda techniques

  4. 11 Propaganda Techniques by Jessica Yates

    research paper on propaganda techniques

  5. Propaganda Techniques

    research paper on propaganda techniques

  6. Propaganda Techniques What are Propaganda Techniques?

    research paper on propaganda techniques

VIDEO

  1. Presentation of Policy Research Paper

  2. Propaganda Techniques in Today's Advertisements II Teacher Presentation KU BBIS I 7 Dec 2023

  3. Propaganda Techniques in Today's Advertisement Teacher Presentation KU BBA I B Nov 30 Thurs 2023

  4. What is Propaganda?

  5. HEALTH 6 Propaganda Techniques

  6. Part 4 The Art of Soviet Propaganda: History, Analysis, and Impact

COMMENTS

  1. Propaganda, misinformation, and histories of media techniques

    To summarize the already quite truncated argument below, the larger conceptual frameworks for understanding information that is understood as "pernicious" in some way can be grouped into four large categories: studies of propaganda, the analysis of ideology and its relationship to culture, notions of conspiracy theory, and finally, concepts of m...

  2. Propaganda Analysis Revisited

    April 1, 2021 Share Editorial Propaganda Analysis Revisited This special issue is designed to place our contemporary post-truth impasse in historical perspective.

  3. PDF Teaching about Propaganda: An Examination of the Historical Roots ...

    In this paper, we compare the popular list of seven propaganda techniques (with terms like "glittering generalities" and "bandwagon") to a less well-known list, the ABC's of Propaganda Analysis.

  4. Propaganda

    propaganda, dissemination of information—facts, arguments, rumours, half-truths, or lies—to influence public opinion.It is often conveyed through mass media.. Propaganda is the more or less systematic effort to manipulate other people's beliefs, attitudes, or actions by means of symbols (words, gestures, banners, monuments, music, clothing, insignia, hairstyles, designs on coins and ...

  5. PDF Prta: A System to Support the Analysis of Propaganda Techniques in the News

    Propaganda messages are conveyed via spe-cific rhetorical and psychological techniques, rang-ing from leveraging on emotions —such as using loaded language (Weston, 2018, p. 6), flag wav-ing (Hobbs and Mcgee, 2008), appeal to author-ity (Goodwin, 2011), slogans (Dan, 2015), and clich ́es (Hunter, 2015)— to using logical fallacies —such as straw ...

  6. 2

    The research literature on misinformation, disinformation, and propaganda is vast and sprawling. This chapter discusses descriptive research on the supply and availability of misinformation, patterns of exposure and consumption, and what is known about mechanisms behind its spread through networks.

  7. Span identification and technique classification of propaganda in news

    Introduction Recently, with the development of related models in the field of natural language processing, research on propaganda detection also goes ahead, which originates from document level [ 1 ], then develops to sentence level [ 6, 21] and now to fragment level [ 13, 26 ].

  8. PDF Detecting Propaganda Techniques in Memes

    binary labels (propaganda vs. non-propaganda), and (iii) PTC (Da San Martino et al.,2019), which uses fragment-level annotation and an inventory of 18 propaganda techniques. While that work has focused on text, here we aim to detect propaganda techniques from a multimodal perspective. This is a new research direction, even though large part of

  9. PDF SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles

    1Qatar Computing Research Institute, HBKU, Qatar 2Universita di Bologna, Forl` `ı, ... and 32 of the participating teams submitted a system description paper. The rest of the paper is organized as follows. Section 2 introduces the propaganda techniques we ... considers seven propaganda techniques, whereas Weston (2000) lists at least 24 ...

  10. (PDF) Propaganda

    Abstract. Propaganda is sponsored information that uses cause‐ and emotion‐laden content to sway public opinion and behavior in support of the source's goals. Propaganda utilizes mass media to ...

  11. Propaganda, misinformation, and histories of media techniques

    Propaganda, misinformation, and histories of media techniques April 2021 DOI: Authors: C. W. Anderson Abstract This essay argues that the recent scholarship on misinformation and fake news...

  12. PDF Propaganda, misinformation, and histories of media techniques

    To summarize the already quite truncated argument below, the larger conceptual frameworks for understanding information that is understood as "pernicious" in some way can be grouped into four large categories: studies of propaganda, the analysis of ideology and its relationship to culture, notions of conspiracy theory, and finally, concepts of m...

  13. (PDF) The Propaganda Model: Theoretical and ...

    The 'Propaganda Model' (PM) of media operations advanced by Herman and Chomsky (1988) is analytically and conceptually concerned to engage with the question of how ideological and communicative...

  14. Characterizing networks of propaganda on twitter: a case study

    The paper aimed at providing new insights into the dynamics of propaganda networks on Twitter. The results of our study are partly in line with existing research. Modularity-based clustering, applied to retweet graphs, pictured a wide panorama of communities of users with strong homophily/affiliation and polarized position.

  15. PDF Abstract

    In this paper, we are dealing with propaganda that is often seen in Internet memes in recent times. Propaganda is communication, which frequently ... Propaganda techniques comprise of psychological and rhetorical strategies, ranging from logical ... In the natural language processing community, much research is done on propaganda by analyzing ...

  16. Propaganda

    In 1927, an American political scientist, Harold D. Lasswell, published a now-famous book, Propaganda Technique in the World War, a dispassionate description and analysis of the massive propaganda campaigns conducted by all the major belligerents in World War I.

  17. PDF Dataset of Propaganda Techniques of the State-Sponsored Information

    In this research, we aim to bridge the information gap by pro-viding a multi-labeled propaganda techniques dataset in Mandarin based on a state-backed information operation dataset provided by Twitter. In addition to presenting the dataset, we apply a multi-label text classification using fine-tuned BERT.

  18. Overlooking the political economy in the research on propaganda

    Historically, scholars studying propaganda have focused on its psychological and behavioral impacts on audiences. This tradition has roots in the unique historical trajectory of the United States through the 20th century. This article argues that this tradition is quite inadequate to tackle propaganda-related issues in the Global South, where a deep understanding of the political economy of

  19. (PDF) Propaganda

    It utilises persuasive approaches, techniques or strategies to change or reinforce existing attitudes and opinions (Agbanu, 2014). Propaganda is a manipulative, circumventive, or persuasive tool ...

  20. PDF Techniques of Online Propaganda: A Case Study of Western Sahara Conflict

    2. THE IPA PROPAGANDA TECHNIQUES When analysing propaganda, researchers often draw on Lee and Lee's (1939) propaganda techniques to recognize the strategies used by the propagandist to control public opinion. These techniques were cited in a book titled The Fine Art of Propaganda and published by the Institute for Propaganda

  21. Analysis of Propaganda Elements Detecting Algorithms in Text Data

    TLDR. This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data. Expand. 13,817. PDF.

  22. (PDF) Detecting Propaganda Techniques in Memes

    In (Dimitrov et al., 2021a), the authors introduce a multi-label multimodal task to detect the different types of propaganda techniques used in memes and release a corpus that includes 950 memes ...

  23. PDF Propaganda and Marketing: A review

    This paper focuses on analyzing Propaganda in advertising with special emphasis on Adolf Hitler's propaganda techniques. To create history, knowing history is awfully important. Hitler was well known for his dictatorship apart from that he was exceptional at influencing people, influence is what happens more in modern day marketing, isn't it?