---
title: "Introduction to CooRTweet"
author: "Nicola Righetti & Paul Balluff"
date: "2024-11-15"
output: rmarkdown::github_document
vignette: >
%\VignetteIndexEntry{Introduction to CooRTweet}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: references.bib
---
## Overview
The `CooRTweet` package is a powerful R tool for detecting and analyzing coordinated behavior across social media platforms. Named after Twitter, a quintessential site for coordinated message amplification through its features like hashtags and trending topics, CooRTweet is a versatile tool applicable to any social media platform, enabling analysis on mono-platform, multi-platform, and cross-platform datasets. Besides being platform-independent, it is also content-independent, supporting a wide range of content types (including hashtags, URLs, images, and any other objects of interest to the researcher). The package allows for flexible thresholds to identify coordination while also accounting for the uncoordinated network in which the coordination is contextualized. CooRTweet is one of the first software tools for coordinated detection to have undergone rigorous validation. With its output, researchers can effectively explore networks of coordinated activity.
## Installation
You can install the `CooRTweet` package from CRAN or GitHub:
``` r
# Install from CRAN
install.packages("CooRTweet")
# Or install the development version from GitHub
devtools::install_github("username/CooRTweet") # Replace with actual GitHub repository
igraph
objects for further analysis and
visualization.simulate_data
function to generate synthetic coordinated
networks.The input dataset should include the following columns: -
object_id
: A unique identifier for the shared content (data
type: character). - account_id
: The user account identifier
(data type: character). - content_id
: The unique ID of the
post (data type: character). - timestamp_share
: The
timestamp when the content was shared (data type: integer, UNIX
format).
Example:
library(CooRTweet)
head(russian_coord_tweets)
#> object_id account_id
#> 1 85d2d12251a735ce05255061f7f231e2 0fb4232d1b7b37069c13ee17579bd10e
#> 2 89864519cd34cabd6f5a801b6857fea6 7d0e462c10d52c4ec1db5af953bf9b26
#> 3 6f04c951961f5cf8c05df6f284fc7c17 2e1140330c02584ffabaf1da362f8e10
#> 4 025b1b3dc82df1cc6c6b766e9c651251 38b5eea36ec86e4a9aac60980ebf6526
#> 5 025b1b3dc82df1cc6c6b766e9c651251 536783d1fd886a85ab697f299f153d3b
#> 6 280d86b602da34926d3797b94d0a7e15 285ddd6b5c9b35ea56257071d4cb6b4d
#> content_id timestamp_share
#> 1 114e22f1b2b648528277b76f0b0224e7 1623881091
#> 2 60e1901aa670f6c05a2d7f7e74cadaf3 1623879483
#> 3 85d2d12251a735ce05255061f7f231e2 1623865591
#> 4 321677c8cae729398e86c5c044b3fbba 1623864621
#> 5 707b5dc150f7405711f992341d0dd32f 1623864188
#> 6 c2592fc64962853976bb75d183a4301b 1623656233
Use the detect_groups()
function to find groups of
accounts coordinating within a specified time window.
result <- detect_groups(
x = russian_coord_tweets,
min_participation = 2,
time_window = 60
)
head(result)
#> object_id content_id
#> <char> <char>
#> 1: 4e04165c28ea7dd3cf4c8512c3f490d7 2944a8d714bba22c120aad58fca851d8
#> 2: 4e04165c28ea7dd3cf4c8512c3f490d7 7b92b077328ca532e5d1c4781484ce35
#> 3: 4e04165c28ea7dd3cf4c8512c3f490d7 7b92b077328ca532e5d1c4781484ce35
#> 4: 4e04165c28ea7dd3cf4c8512c3f490d7 e9a4787f1d970dbe4c3868a783ba1535
#> 5: 4e04165c28ea7dd3cf4c8512c3f490d7 1283205f0e23fda216c4de27bec4df80
#> 6: bc46f3cae46cd00726c4c1992145ae20 54cecb686722d539dd556a9c6e8d72e0
#> content_id_y time_delta account_id
#> <char> <num> <char>
#> 1: 354126c2d9e2e69676c9dbdcc167d3d7 36 d9fe8e4d34b6dcfba8cbcf1be4b28717
#> 2: 354126c2d9e2e69676c9dbdcc167d3d7 59 94b2413eb4e850246c07ba1bd55625c2
#> 3: 2944a8d714bba22c120aad58fca851d8 23 94b2413eb4e850246c07ba1bd55625c2
#> 4: c5b0dcb930f979202600a59bf64db452 48 7289281c087ccc0342d96604243d0069
#> 5: e9a4787f1d970dbe4c3868a783ba1535 41 6c051ab25467ae690fb24cd2c6c3ad99
#> 6: cf2b35e31413d2da92940bc571c2c6a2 5 f442b084eb6be4c7f66dffa386c01e4b
#> account_id_y
#> <char>
#> 1: 0fb4232d1b7b37069c13ee17579bd10e
#> 2: 0fb4232d1b7b37069c13ee17579bd10e
#> 3: d9fe8e4d34b6dcfba8cbcf1be4b28717
#> 4: 47a750359d66201ddefe7f7efbfed0b9
#> 5: 7289281c087ccc0342d96604243d0069
#> 6: 3dced626839250a9c9bf41a381234214
Convert detected groups into a coordination network using
generate_coordinated_network()
.
graph <- generate_coordinated_network(
result,
edge_weight = 0.5
)
graph
#> IGRAPH f08c98f UNW- 2110 3721 --
#> + attr: name (v/c), weight (e/n), avg_time_delta (e/n), n_content_id
#> | (e/n), n_content_id_y (e/n), edge_symmetry_score (e/n),
#> | weight_threshold (e/n)
#> + edges from f08c98f (vertex names):
#> [1] 0fb4232d1b7b37069c13ee17579bd10e--d9fe8e4d34b6dcfba8cbcf1be4b28717
#> [2] 0fb4232d1b7b37069c13ee17579bd10e--94b2413eb4e850246c07ba1bd55625c2
#> [3] 94b2413eb4e850246c07ba1bd55625c2--d9fe8e4d34b6dcfba8cbcf1be4b28717
#> [4] 47a750359d66201ddefe7f7efbfed0b9--7289281c087ccc0342d96604243d0069
#> [5] 6c051ab25467ae690fb24cd2c6c3ad99--7289281c087ccc0342d96604243d0069
#> [6] 3dced626839250a9c9bf41a381234214--f442b084eb6be4c7f66dffa386c01e4b
#> + ... omitted several edges
To analyze multiple types of content (e.g., URLs, hashtags), run
detect_groups()
separately for each type and combine the
results.
# Example datasets for different content types
head(german_elections)
#> # A data frame: 6 × 7
#> account_id post_id url_id hashtag_id domain_id phash_id timestamp
#> * <chr> <int> <int> <int> <int> <int> <dbl>
#> 1 fb_12670 129235 23678 NA 3498 NA 1629589836
#> 2 fb_5966 84441 NA NA 6756 NA 1629589069
#> 3 fb_5966 84443 29871 NA 5534 NA 1629589050
#> 4 fb_5966 84445 NA NA NA 9280 1629589022
#> 5 fb_5966 84446 30435 NA 5639 NA 1629589009
#> 6 fb_9045 104337 13609 NA 5804 NA 1629588823
# Prepare data --------------------
# URLs
urls_data <- prep_data(german_elections,
object_id = "url_id",
account_id = "account_id",
content_id = "post_id",
timestamp_share = "timestamp")
urls_data <- unique(urls_data,
by = c("object_id", "account_id", "content_id", "timestamp_share"))
urls_data <- urls_data[!is.na(object_id)]
urls_data$object_id <- paste0("url_", urls_data$object_id)
# images (pHash)
img_data <- prep_data(german_elections,
object_id = "phash_id",
account_id = "account_id",
content_id = "post_id",
timestamp_share = "timestamp")
img_data <- unique(img_data,
by = c("object_id", "account_id", "content_id", "timestamp_share"))
img_data <- img_data[!is.na(object_id)]
img_data$object_id <- paste0("hash_", img_data$object_id)
# Detect coordinated groups for URLs and hashtags --------------------
result_urls <- detect_groups(urls_data, time_window = 30,
min_participation = 2)
result_images <- detect_groups(img_data, time_window = 30,
min_participation = 2)
# Combine results --------------------
library(data.table)
combined_results <- rbindlist(
list(result_urls, result_images),
use.names = TRUE,
fill = TRUE
)
# Generate the coordinated multi-modal network --------------------
graph <- generate_coordinated_network(combined_results, edge_weight = 0.5)
graph
#> IGRAPH 0d2f829 UNW- 671 1732 --
#> + attr: name (v/c), weight (e/n), avg_time_delta (e/n), n_content_id
#> | (e/n), n_content_id_y (e/n), edge_symmetry_score (e/n),
#> | weight_threshold (e/n)
#> + edges from 0d2f829 (vertex names):
#> [1] fb_5761 --fb_7103 fb_12065--fb_4039 fb_11199--fb_3297 fb_11202--fb_18649
#> [5] fb_2258 --fb_4039 fb_21069--fb_9754 fb_11202--fb_21069 fb_11202--fb_4039
#> [9] fb_11202--fb_9754 fb_21069--fb_18649 fb_18649--fb_4039 fb_14401--fb_8707
#> [13] fb_18649--fb_9754 fb_4039 --fb_7548 fb_12670--fb_17402 fb_4039 --fb_8027
#> [17] fb_17402--fb_8027 fb_4039 --fb_8900 fb_8027 --fb_8900 fb_17402--fb_8900
#> [21] fb_17326--fb_7548 fb_17326--fb_3732 fb_3732 --fb_7548 fb_17326--fb_20736
#> + ... omitted several edges
The CooRTweet package includes several additional functions and features that enable refined exploration of coordinated networks, as detailed in the package documentation.
The CooRTweet
package enables researchers to study
coordinated behaviors with a high degree of flexibility and precision.
Its generalized architecture makes it adaptable to various contexts and
datasets, empowering social media research and analysis.
```