Bluesky: Engagement and Replies

Overview

In this analysis we’ll look at whether replying to comments on Bluesky posts is associated with higher engagement.

What We Found

When we compare each Bluesky account to itself over time, posts with comments that have been replied to outperform the account’s own baseline number of engagements by roughly 5%, even after controlling for whether the post received any comments at all.

This finding is consistent with what we saw on other platforms, though the effect size is smaller than for the others. Still, the coefficients tell us that when an account replies to comments on some posts and not on others, the posts with replies tend to see meaningfully higher engagement, holding constant all the other characteristics of that account.

Data Collection

The SQL below returns around 73 thousand Bluesky posts that received at least one comment. On Bluesky engagements includes likes, replies, and reposts.

Code

sql <- "
  select
    up.id as post_id,
    up.profile_id,
    up.user_id,
    up.sent_at,
    up.reposts + up.likes + up.comments as engagements,
    count(distinct c._id) as total_comments,
    count(distinct case when c.status = 'replied' then c._id end) as replied_comments,
    count(distinct case when c.status = 'unreplied' then c._id end) as unreplied_comments
  from dbt_buffer.publish_updates as up
  inner join dbt_buffer.community_comments as c
    on up.id = c.post_id
    and c.service_type = 'bluesky'
    and not c.is_own_reply
  where up.sent_at >= '2025-01-01'
    and up.profile_service = 'bluesky'
    and up.reposts + up.likes + up.comments > 0
  group by 1,2,3,4,5
"

# get data from BigQuery
posts <- bq_query(sql = sql)

Data Preparation

First we’ll determine whether a post has any comments that have been replied to and calculate the log of the number of engagements. We’ll then calculate the median number of engagements for posts that did and did not have comments that have been replied to.

A quick note on the use of log transformations: the number of engagements are long‑tailed, meaning most posts get a small number of engagements while a few take off, so the log scale helps reduce variance and the influence of outliers, and makes effects easier to read as approximate percent changes.

Code

# calculate log of engagements
posts <- posts %>% 
  mutate(
    has_replied_comments = replied_comments > 0,
    has_any_comments = total_comments > 0,
    log_engagements = log1p(engagements)
  ) %>% 
  filter(!is.na(engagements))

# calculate summary statistics
posts %>% 
  group_by(has_replied_comments) %>% 
  summarise(
    n = n(),
    median_engagements = median(engagements, na.rm = TRUE),
    median_log_engagements = median(log_engagements, na.rm = TRUE)
  )

# A tibble: 2 × 4
  has_replied_comments     n median_engagements median_log_engagements
  <lgl>                <int>              <dbl>                  <dbl>
1 FALSE                67514                 13                   2.64
2 TRUE                  5119                 15                   2.77

These summary statistics show that posts with comments that have been replied to have slightly higher engagement on average. However, the dataset is quite small and this should be treated as directional only. We still need to compare each profile to itself over time to make a fairer comparison.

Z-Score Analysis

A Z‑score analysis is a simple way to see how a given post performed relative to the account’s typical performance. Instead of comparing different accounts to each other, which can be unfair because some accounts naturally get more engagement, we compare each account to itself over time. A positive Z‑score means the post performed above that account’s baseline, and a negative Z‑score means it performed below.

For each account, we take the log of engagements for every post and calculate the average and standard deviation for each account. Then, for every post, we compute a Z‑score: the post’s log‑engagement minus the account’s average log‑engagement, divided by that account’s standard deviation. The resulting metric tells us how far above or below an account’s typical post engagement that specific post landed.

Code

# profile-level baseline on the log scale
profile_stats <- posts %>%
  group_by(profile_id) %>%
  summarise(
    mean_log_eng = mean(log_engagements, na.rm = TRUE),
    sd_log_eng = sd(log_engagements, na.rm = TRUE),
    n_posts = n()
  ) %>%
  filter(n_posts >= 3, sd_log_eng > 0)

posts_z <- posts %>%
  inner_join(profile_stats, by = "profile_id") %>%
  mutate(z_log_engagements = (log_engagements - mean_log_eng) / sd_log_eng)

# mean Z among posts that received any comments
z_any_comments <- posts_z %>%
  filter(has_any_comments) %>%
  group_by(has_replied_comments) %>%
  summarise(mean_z = mean(z_log_engagements, na.rm = TRUE))

z_any_comments

# A tibble: 2 × 2
  has_replied_comments   mean_z
  <lgl>                   <dbl>
1 FALSE                -0.00426
2 TRUE                  0.0558

The Z-scores show us that posts with replied-to comments tend to sit slightly above the account’s typical performance level, while posts without replied-to comments sit just barely below.

Visualizing the Lift

The distributions below help us visualize the shift in engagement. For posts with replied to comments (the blue distribution), the distribution is shifted to the right, suggesting that posts receive more engagement.

Code

# Z-score density by replied status (restrict to posts with any comments)
posts_z %>% 
  filter(has_any_comments) %>%
  ggplot(aes(x = z_log_engagements, fill = has_replied_comments)) +
  geom_density(alpha = 0.45) +
  scale_x_continuous(limits = c(-5,10)) +
  labs(x = "Within-profile Z (log scale)", y = NULL, fill = "Replied to Comment",
       title = "Distribution of Performance by Replying to Comments",
       subtitle = "Posts with replied comments tend to perform better (higher Z-score)")

The plot below shows the distribution of per‑profile differences. Basically, it visualizes how much better or worse each profile tends to perform when it replies to comments versus when it doesn’t.

The distribution is centered slightly above zero (median difference of 0.063), indicating that a modest majority of accounts tend to perform better when they reply to comments. About 52% of accounts have a positive difference, suggesting a small but consistent positive effect.

Code

# Per-profile difference: mean Z (reply) - mean Z (no reply)
profile_pair <- posts_z %>%
  group_by(profile_id, has_replied_comments) %>%
  summarise(mean_z = mean(z_log_engagements, na.rm = TRUE), .groups = 'drop') %>%
  tidyr::pivot_wider(names_from = has_replied_comments, values_from = mean_z)

diff_df <- profile_pair %>% mutate(diff = `TRUE` - `FALSE`)

share_pos <- mean(diff_df$diff > 0, na.rm = TRUE)
med_diff <- median(diff_df$diff, na.rm = TRUE)

ggplot(diff_df, aes(x = diff)) +
  geom_histogram(bins = 50, fill = '#2c7fb8', alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = 2, color = 'grey50') +
  geom_vline(xintercept = med_diff, color = '#d95f0e') +
  labs(x = "Per-profile mean Z difference (reply - no reply)", y = NULL,
       title = "Most Profiles Perform Better When They Reply",
       subtitle = paste0("Median difference ", round(med_diff, 3), "; ",
                         round(100*share_pos, 1), "% of profiles > 0"))

Fixed Effects Regression

Next we’ll use fixed effects regression to create within‑profile models that compare each account to itself across posts. Fixed effects hold constant all differences across profiles, things like audience size, niche, or brand strength, by comparing each profile to its own baseline.

Instead of asking whether accounts that reply more get more engagement (which would mix large and small accounts), we’re calculating how engagement changes for each individual account when it replies to comments versus when it doesn’t. Modeling on the log scale also makes coefficients easy to read as approximate percent differences.

Code

# FE on log engagements, clustered by profile
fe_model <- feols(
  log_engagements ~ has_replied_comments | profile_id,
  data = posts,
  cluster = "profile_id"
)
summary(fe_model)

OLS estimation, Dep. Var.: log_engagements
Observations: 72,633
Fixed-effects: profile_id: 18,732
Standard-errors: Clustered (profile_id) 
                         Estimate Std. Error t value  Pr(>|t|)    
has_replied_commentsTRUE  0.04968   0.015454 3.21466 0.0013082 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.680879     Adj. R2: 0.712747
                 Within R2: 2.022e-4

The results tell us that posts with comments that have been replied to receive approximately 5% (exp(0.05) − 1) more engagements on average than posts without replied comments.

This effect is statistically significant but represents a small lift. It’s notably smaller than what we observed on the other platforms we analyzed. The sample size is relatively small here (with only 5 thousand posts having comments that have been replied to), so these results should be viewed as early evidence that we’ll continue to validate as we collect more data over time.

Caveats

The main limitation of this analysis is that we don’t have timestamps for when engagements occurred relative to when comment replies were made. It’s possible that posts receiving high engagement early on are more likely to generate comments, and creators may be more motivated to reply to comments on posts that are already performing well. This means we can’t definitively say that replying to comments causes higher engagement. The relationship could go in the opposite direction, or both could be driven by other factors like content quality or timing.

Additionally, the sample size for Bluesky is relatively small compared to other platforms. With only around 5 thousand posts having replied comments out of 73 thousand, these results should be viewed as early directional evidence. Despite these limitations, the consistent positive association we see across both fixed effects models and Z-score analyses suggests that comment engagement and post performance tend to move together on Bluesky.