In this analysis we’ll look at whether replying to comments on X posts is associated with higher engagement.
What We Found
When we compare each X account to itself over time, posts with comments that have been replied to outperform the account’s own baseline number of engagements by around 8%, though this effect is only marginally statistically significant.
This finding is consistent with what we saw on other platforms, though the effect size is smaller than for most others and the statistical evidence is weaker. It’s worth noting that X has different account types (free, Basic, Premium, and Premium+), each of which may receive different algorithmic treatment. Our fixed effects model controls for these differences by comparing each account to itself, holding constant the account type and all other profile characteristics.
It’s also worth noting that, as of November 2025, the sample size we’re working with is very small. We only have around 2100 X posts in our database that have comments that have been replied to.
Data Collection
The SQL below returns X posts that have received at least one comment. On X, engagements typically includes likes, retweets, and replies.
Code
sql <-" select up.id as post_id , up.profile_id , up.user_id , up.sent_at , up.engagements , up.likes , up.retweets , count(distinct c._id) as total_comments , count(distinct case when c.status = 'replied' then c._id end) as replied_comments , count(distinct case when c.status = 'unreplied' then c._id end) as unreplied_comments from dbt_buffer.publish_updates as up inner join dbt_buffer.community_comments as c on up.id = c.post_id and c.service_type = 'twitter' and not c.is_own_reply where up.sent_at >= '2025-01-01' and up.profile_service = 'twitter' and up.engagements > 0 group by 1,2,3,4,5,6,7"# get data from BigQueryposts <-bq_query(sql = sql)
Data Preparation
First we’ll construct a couple key indicators, then filter out posts with no engagement data. We’ll then calculate the median number of engagements for posts that did and did not have comments that have been replied to.
A quick note on the use of log transformations: the number of engagements are long‑tailed, meaning most posts get a small number of engagements while a few take off, so the log scale helps reduce variance and the influence of outliers, and makes effects easier to read as approximate percent changes.
The sample size is quite small – there are only around 2100 posts with comments that have been replied to in our database. Interestingly, those posts received slightly less engagement on average.
We should note that these are directional only. We still need to compare each profile to itself over time to make a fair comparison.
Z-Score Analysis
A Z‑score analysis is a simple way to see how a given post performed relative to the account’s typical performance. Instead of comparing different accounts to each other, which can be unfair because some accounts naturally get more engagement, we compare each account to itself over time. A positive Z‑score means the post performed above that account’s baseline, and a negative Z‑score means it performed below.
For each account, we take the log of engagements for every post and calculate the average and standard deviation for each account. Then, for every post, we compute a Z‑score: the post’s log‑engagement minus the account’s average log‑engagement, divided by that account’s standard deviation.
The resulting metric tells us how far above or below an account’s typical post engagement that specific post landed. This approach is more appropriate than using simple summary statistics because it focuses on within‑account differences.
Code
# profile-level baseline on the log scaleprofile_stats <- posts %>%group_by(profile_id) %>%summarise(mean_log_eng =mean(log_engagements, na.rm =TRUE),sd_log_eng =sd(log_engagements, na.rm =TRUE),n_posts =n() ) %>%filter(n_posts >=3, sd_log_eng >0)posts_z <- posts %>%inner_join(profile_stats, by ="profile_id") %>%mutate(z_log_engagements = (log_engagements - mean_log_eng) / sd_log_eng)# mean Z among posts that received any commentsz_any_comments <- posts_z %>%filter(has_any_comments) %>%group_by(has_replied_comments) %>%summarise(mean_z =mean(z_log_engagements, na.rm =TRUE))z_any_comments
The Z-scores show us that posts with replied-to comments tend to sit slightly above the account’s typical performance level, while posts without replied-to comments sit just below.
Visualizing the Lift
The distributions below help us visualize the shift in engagement. For posts with replied to comments (the blue distribution), the distribution looks roughly the same. Interestingly, the distributions look bi-modal, suggesting that some posts get very little engagement, and others get quite a bit more. This could be a separation between Free accounts and Paid accounts on X.
Code
# Z-score density by replied status (restrict to posts with any comments)posts_z %>%filter(has_any_comments) %>%ggplot(aes(x = z_log_engagements, fill = has_replied_comments)) +geom_density(alpha =0.45) +scale_x_continuous(limits =c(-5,10)) +labs(x ="Within-profile Z (log scale)", y =NULL, fill ="Replied to Comment",title ="Distribution of Performance by Replying to Comments",subtitle ="Posts with replied comments tend to perform better (higher Z-score)")
The plot below shows the distribution of per‑profile differences. Basically, it visualizes how much better or worse each profile tends to perform when it replies to comments versus when it doesn’t.
The distribution is centered very close to zero (median difference of 0.065), indicating that only a slight majority of accounts tend to perform better when they reply to comments. About 51% of accounts have a positive difference, suggesting the effect is quite small and mixed across accounts.
Code
# Per-profile difference: mean Z (reply) - mean Z (no reply)profile_pair <- posts_z %>%group_by(profile_id, has_replied_comments) %>%summarise(mean_z =mean(z_log_engagements, na.rm =TRUE), .groups ='drop') %>% tidyr::pivot_wider(names_from = has_replied_comments, values_from = mean_z)diff_df <- profile_pair %>%mutate(diff =`TRUE`-`FALSE`)share_pos <-mean(diff_df$diff >0, na.rm =TRUE)med_diff <-median(diff_df$diff, na.rm =TRUE)ggplot(diff_df, aes(x = diff)) +geom_histogram(bins =50, fill ='#2c7fb8', alpha =0.8) +geom_vline(xintercept =0, linetype =2, color ='grey50') +geom_vline(xintercept = med_diff, color ='#d95f0e') +labs(x ="Per-profile mean Z difference (reply - no reply)", y =NULL,title ="Most Profiles Perform Better When They Reply",subtitle =paste0("Median difference ", round(med_diff, 3), "; ",round(100*share_pos, 1), "% of profiles > 0"))
Fixed Effects Regression
Next we’ll use fixed effects regression to create within‑profile models that compare each account to itself across posts. Fixed effects hold constant all differences across profiles, things like audience size, niche, or brand strength, by comparing each profile to its own baseline.
Instead of asking whether accounts that reply more get more engagement (which would mix large and small accounts), we’re calculating how engagement changes for each individual account when it replies to comments versus when it doesn’t. Modeling on the log scale also makes coefficients easy to read as approximate percent differences.
Code
# FE on log engagements, clustered by profilefe_model <-feols( log_engagements ~ has_replied_comments | profile_id,data = posts,cluster ="profile_id")summary(fe_model)
The results tell us that posts with comments that have been replied to receive approximately 8% (exp(0.081) − 1) more engagements on average than posts without replied comments. However, this effect is only marginally statistically significant (p = 0.097), suggesting we should interpret this finding with caution.
The effect size is notably smaller than what we observed on other platforms we analyzed, and the weaker statistical significance indicates less certainty about the relationship. This could be partly explained by the small sample size, with only around 2,100 posts having replied comments out of 30,000 total.
It’s also worth noting that X has multiple account types (free, Basic, Premium, and Premium+), each of which may receive different levels of treatment from X’s algorithm. Our fixed effects approach controls for these differences by comparing each profile to its own baseline, which means we’re holding constant the account type and all other time-invariant profile characteristics. This makes our estimates more reliable by ensuring we’re not confounding the effect of replying to comments with the effect of having a Premium account, for example.
Caveats
The main limitation of this analysis is that we don’t have timestamps for when engagements occurred relative to when comment replies were made. It’s possible that posts receiving high engagement early on are more likely to generate comments, and creators may be more motivated to reply to comments on posts that are already performing well. This means we can’t definitively say that replying to comments causes higher engagement. The relationship could go in the opposite direction, or both could be driven by other factors like content quality or timing. Despite this limitation, the consistent positive association we see across both fixed effects models and Z-score analyses suggests that comment engagement and post performance tend to move together on X.