Article, 2022

Tweet and user validation with supervised feature ranking and rumor classification

Multimedia Tools and Applications, ISSN 1380-7501, Volume 81, 22, Pages 31907-31927, 10.1007/s11042-022-12616-6

Contributors

Sailunaz K. 0000-0001-8751-4108 [1] Kawash J. [1] Alhajj R. 0000-0001-6657-9738 (Corresponding author) [1] [2] [3]

Affiliations

  1. [1] University of Calgary
  2. [NORA names: Canada; America, North; OECD];
  3. [2] Istanbul Medipol University
  4. [NORA names: Turkey; Asia, Middle East; OECD];
  5. [3] University of Southern Denmark
  6. [NORA names: SDU University of Southern Denmark; University; Denmark; Europe, EU; Nordic; OECD]

Abstract

Filtering fake news from social network posts and detecting social network users who are responsible for generating and propagating these rumors have become two major issues with the increased popularity of social networking platforms. As any user can post anything on social media and that post can instantly propagate to all over the world, it is important to recognize if the post is rumor or not. Twitter is one of the most popular social networking platforms used for news broadcasting mostly as tweets and retweets. Hence, validating tweets and users based on their posts and behavior on Twitter has become a social, political and international issue. In this paper, we proposed a method to classify rumor and non-rumor tweets by applying a novel tweet and user feature ranking approach with Decision Tree and Logistic Regression that were applied on both tweet and user features extracted from a benchmark rumor dataset ‘PHEME’. The effect of the ranking model was then shown by classifying the dataset with the ranked features and comparing them with the basic classifications with various combination of features. Both supervised classification algorithms (namely, Support Vector Machine, Naïve Bayes, Random Forest and Logistic Regression) and deep learning algorithms (namely, Convolutional Neural Network and Long Short-Term Memory) were used for rumor detection. The classification accuracy showed that the feature ranking classification results were comparable to the original classification performances. The ranking models were also used to list the topmost tweets and users with different conditions and the results showed that even if the features were ranked differently by LR and RF, the topmost results for tweets and users for both rumors and non-rumors were the same.

Keywords

And LSTM, CNN, Classification, Logistic regression, Naïve bayes, Random forest, Ranking, Rumors, Social media, Support vector machine, Twitter

Data Provider: Elsevier