LinkedIn: Engagement and Replies

Overview

In this analysis we’ll look at whether replying to comments on LinkedIn posts is associated with higher engagement.

What We Found

When we compare each LinkedIn account to itself over time, posts with comments that have been replied to outperform the account’s own baseline engagement by roughly 30%, even after controlling for whether the post received any comments at all.

The direction and magnitude are consistent with what we found in the Threads analysis, i.e. engaging in the comments tends to correlate with more engagement on LinkedIn posts.

Data Collection

The SQL below returns approximately 72 thousand LinkedIn posts that have received at least one comment. On LinkedIn engagements includes reactions, comments, shares, and clicks.

Code

sql <- "
  select
    up.id as post_id,
    up.profile_id,
    up.user_id,
    up.sent_at,
    up.likes,
    up.comments,
    up.shares,
    coalesce(up.engagements, up.likes + up.comments) as engagements,
    count(distinct c._id) as total_comments,
    count(distinct case when c.status = 'replied' then c._id end) as replied_comments,
    count(distinct case when c.status = 'unreplied' then c._id end) as unreplied_comments
  from dbt_buffer.publish_updates as up
  inner join dbt_buffer.community_comments as c
    on up.id = c.post_id
    and c.service_type = 'linkedin'
    and not c.is_own_reply
  where up.sent_at >= '2025-01-01'
    and up.profile_service = 'linkedin'
  group by 1,2,3,4,5,6,7,8
"

# get data from BigQuery
posts <- bq_query(sql = sql)

Data Preparation

First we’ll construct a few core metrics and indicators, and then filter out posts with no engagement. We’ll then calculate a couple summary statistics to get an idea of what our data looks like.

A quick note on the use of log transformations: the number of engagements are long‑tailed, meaning most posts get a small number of engagements while a few take off, so the log scale helps reduce variance and the influence of outliers, and makes effects easier to read as approximate percent changes.

Code

# calculate log of engagements
posts <- posts %>% 
  mutate(
    has_replied_comments = replied_comments > 0,
    has_any_comments = total_comments > 0,
    log_engagements = log1p(engagements)
  ) %>% 
  filter(!is.na(engagements))

# calculate summary statistics
posts %>% 
  group_by(has_replied_comments) %>% 
  summarise(
    n = n(),
    median_engagements = median(engagements, na.rm = TRUE),
    median_log_engagements = median(log_engagements, na.rm = TRUE)
  )

# A tibble: 2 × 4
  has_replied_comments     n median_engagements median_log_engagements
  <lgl>                <int>              <dbl>                  <dbl>
1 FALSE                66552                 25                   3.26
2 TRUE                  5659                 32                   3.50

These summary statistics show that relatively few posts, only around 5600 out of 72 thousand that have engagements, have comments data in our database so far. However, those posts seem to have achieved more engagement on average.

We should note that these are directional only. Posts with replied comments tend to show higher engagement on average, but we still need to compare each profile to itself over time to make a fair comparison.

Z-Score Analysis

A Z‑score analysis is a simple way to see how a given post performed relative to the account’s typical performance. Instead of comparing different accounts to each other, which can be unfair because some accounts naturally get more engagement, we compare each account to itself over time. A positive Z‑score means the post performed above that account’s baseline, and a negative Z‑score means it performed below.

For each account, we take the log of engagements for every post and calculate the average and standard deviation for each account. Then for every post we compute a Z‑score, which is the post’s log‑engagement minus the account’s average log‑engagement, divided by that account’s standard deviation.

The resulting metric tells us how far above or below an account’s typical post that specific post landed. This approach is more appropriate than using simple summary statistics because it focuses on within‑account differences.

Code

# profile-level baseline on the log scale
profile_stats <- posts %>%
  group_by(profile_id) %>%
  summarise(
    mean_log_eng = mean(log_engagements, na.rm = TRUE),
    sd_log_eng = sd(log_engagements, na.rm = TRUE),
    n_posts = n()
  ) %>%
  filter(n_posts >= 3, sd_log_eng > 0)

posts_z <- posts %>%
  inner_join(profile_stats, by = "profile_id") %>%
  mutate(z_log_engagements = (log_engagements - mean_log_eng) / sd_log_eng)

# mean Z among posts that received any comments
z_any_comments <- posts_z %>%
  filter(has_any_comments) %>%
  group_by(has_replied_comments) %>%
  summarise(mean_z = mean(z_log_engagements, na.rm = TRUE))

z_any_comments

# A tibble: 2 × 2
  has_replied_comments  mean_z
  <lgl>                  <dbl>
1 FALSE                -0.0208
2 TRUE                  0.236

The Z-scores tell us that, on average, posts with replied comments tend to sit above the account’s typical engagement level, while posts without replied comments sit slightly below.

Visualizing the Lift

The distributions below help us visualize the shift in engagement. For posts with replied to comments (the blue distribution), the distribution is shifted to the right, suggesting that posts receive more engagement.

Code

# Z-score density by replied status (restrict to posts with any comments)
posts_z %>% 
  filter(has_any_comments) %>%
  ggplot(aes(x = z_log_engagements, fill = has_replied_comments)) +
  geom_density(alpha = 0.45) +
  scale_x_continuous(limits = c(-5,10)) +
  labs(x = "Within-profile Z (log scale)", y = NULL, fill = "Replied to Comment",
       title = "Distribution of Performance by Replying to Comments",
       subtitle = "Posts with replied comments tend to perform better (higher Z-score)")

The plot below shows the distribution of per‑profile differences. Basically, it visualizes how much better or worse each profile tends to perform when it replies to comments versus when it doesn’t.

We can see that the distribution is centered at a point greater than 0 (around 0.87), indicating that most accounts tend to perform better when they reply to comments. Roughly 83% of accounts have a positive difference.

Code

# Per-profile difference: mean Z (reply) - mean Z (no reply)
profile_pair <- posts_z %>%
  group_by(profile_id, has_replied_comments) %>%
  summarise(mean_z = mean(z_log_engagements, na.rm = TRUE), .groups = 'drop') %>%
  tidyr::pivot_wider(names_from = has_replied_comments, values_from = mean_z)

diff_df <- profile_pair %>% mutate(diff = `TRUE` - `FALSE`)

share_pos <- mean(diff_df$diff > 0, na.rm = TRUE)
med_diff <- median(diff_df$diff, na.rm = TRUE)

ggplot(diff_df, aes(x = diff)) +
  geom_histogram(bins = 50, fill = '#2c7fb8', alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = 2, color = 'grey50') +
  geom_vline(xintercept = med_diff, color = '#d95f0e') +
  labs(x = "Per-profile mean Z difference (reply - no reply)", y = NULL,
       title = "Most Profiles Perform Better When They Reply",
       subtitle = paste0("Median difference ", round(med_diff, 3), "; ",
                         round(100*share_pos, 1), "% of profiles > 0"))

Fixed Effects Regression

Next we’ll use fixed effects regression to create within‑profile models that compare each account to itself across posts. Fixed effects hold constant all differences across profiles, things like audience size, niche, or brand strength, by comparing each profile to its own baseline.

Instead of asking whether accounts that reply more get more engagement (which would mix large and small accounts), we ask how engagement changes for each individual account when it replies to comments versus when it doesn’t. Modeling on the log scale also makes coefficients easy to read as approximate percent differences.

Code

# FE on log engagements with controls, clustered by profile
fe_model <- feols(
  log_engagements ~ has_replied_comments + has_any_comments | profile_id,
  data = posts,
  cluster = "profile_id"
)

The variable 'has_any_commentsTRUE' has been removed because of collinearity (see $collin.var).

Code

summary(fe_model)

OLS estimation, Dep. Var.: log_engagements
Observations: 72,211
Fixed-effects: profile_id: 24,963
Standard-errors: Clustered (profile_id) 
                         Estimate Std. Error t value  Pr(>|t|)    
has_replied_commentsTRUE 0.264874   0.014581 18.1653 < 2.2e-16 ***
... 1 variable was removed because of collinearity (has_any_commentsTRUE)
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.629011     Adj. R2: 0.640445
                 Within R2: 0.007206

The results tell us that posts with comments that have been replied to get around 30% (exp(0.265) − 1) more engagement on average than posts without replied comments, even after controlling for whether the post received any comments at all.

Caveats

The main limitation of this analysis is that we don’t have timestamps for when engagements occurred relative to when comment replies were made. It’s possible that posts receiving high engagement early on are more likely to generate comments, and creators may be more motivated to reply to comments on posts that are already performing well. This means we can’t definitively say that replying to comments causes higher engagement. The relationship could go in the opposite direction, or both could be driven by other factors like content quality or timing.

Additionally, the sample size for posts with replied comments is relatively small, with only around 5,600 posts out of 72,000 total. These results should be viewed as directional evidence that we’ll continue to validate as we collect more comment data over time. Our engagement metric also includes comments, which creates some coupling between the outcome and the behavior we’re studying. While the fixed-effects design helps mitigate this, future analyses should exclude comments from the engagement calculation for a cleaner test.

Despite these limitations, the consistent positive association we see across both fixed effects models and Z-score analyses, combined with the strong signal that 83% of profiles show positive effects, suggests that comment engagement and post performance tend to move together on LinkedIn.