Skip to contents

pTRAPPING is an R package that provides a streamlined, reproducible workflow for analysing TRAP-seq and PhosphoTRAP RNA-sequencing data. It wraps the statistical power of edgeR and limma behind a compact set of purpose-built functions that take you from a raw counts matrix to publication-ready tables and figures in just a few lines of code.

What is TRAP-seq? Translating Ribosome Affinity Purification followed by sequencing (TRAP-seq) captures mRNAs that are actively translated in a specific cell type, tagged by a transgenic ribosomal subunit. The key comparison is between the IP fraction (immunoprecipitated, actively translated transcripts) and the INPUT fraction (total RNA — all transcripts). Genes significantly enriched in IP relative to INPUT are translated in the tagged cell type.

PhosphoTRAP adds an activity-marking phosphorylation step, so only recently activated neurons are captured — combining cell-type specificity with behavioural state information in a single experiment.

Installation

# install.packages("devtools")
devtools::install_github("laurenoconnelllab/pTRAPPING")

Functions at a glance

Function What it does
ptrap_de() IP vs INPUT DE analysis for TRAP-seq (edgeR LRT/QLF, limma-voom, or t-test)
ptrap_volcano() Volcano plot from a single ptrap_de() result
ptrap_volcano2() Dual scatter comparing two treatment conditions side by side

Simulated dataset

Throughout this README we simulate a mouse TRAP-seq experiment studying pair-bonding in the preoptic area (POA) of the hypothalamus — a brain region well-known for its role in social behaviour.

Mice are divided into two groups:

  • nb – non-bonded control mice
  • pb – pair-bonded mice (48 h co-housing with a partner)

Each group has 3 animals (biological replicates). Each animal contributes one IP and one INPUT library, giving 12 samples total. The column-naming convention <treatment><replicate><fraction> (e.g. nb1INPUT, pb2IP) lets ptrap_de() parse sample metadata automatically — no sample_df needed.

library(pTRAPPING)
library(ggplot2)

set.seed(42)

n_genes  <- 2000
gene_ids <- paste0("Gene", seq_len(n_genes))

# ── Replace early positions with real mouse gene names ──────────────────────
# Neuronal markers — expected to be enriched in IP in BOTH groups
neuronal   <- c("Snap25", "Rbfox3", "Camk2a", "Vglut1", "Syn1",
                "Map2",   "Nrgn",   "Bdnf",   "Pvalb",  "Sst")
# Pair-bonding markers — enriched in IP only in pb
pb_markers <- c("Oxtr", "Avpr1a", "Fos", "Arc", "Egr1")
# Glial markers — NOT enriched in neuronal IP (control transcripts)
glial      <- c("Gfap", "Aldh1l1", "S100b", "Cx3cr1", "Tmem119")

gene_ids[seq_along(neuronal)]                                       <- neuronal
gene_ids[length(neuronal) + seq_along(pb_markers)]                  <- pb_markers
gene_ids[length(neuronal) + length(pb_markers) + seq_along(glial)]  <- glial

# ── True IP / INPUT enrichment ratios ───────────────────────────────────────
base_enrich        <- rep(1.0, n_genes)
base_enrich[1:10]  <- 8   # neuronal markers — constitutively enriched in both groups
base_enrich[11:15] <- 1   # pair-bonding markers — baseline (no enrichment in nb)

pb_extra           <- rep(1.0, n_genes)
pb_extra[11:15]    <- 8   # pair-bonding markers strongly enriched only in pb

# ── Helper: simulate one IP/INPUT pair for all genes ────────────────────────
sim_pair <- function(enrich, mu = 200, size = 10, sd_noise = 0.15) {
  input <- rnbinom(n_genes, mu = mu, size = size)
  ip    <- pmax(round(input * enrich * exp(rnorm(n_genes, 0, sd_noise))), 0L)
  list(input = input, ip = ip)
}

# ── Build counts data frame with auto-parseable column names ────────────────
cols <- lapply(1:3, function(i) {
  nb <- sim_pair(base_enrich)
  pb <- sim_pair(base_enrich * pb_extra)
  setNames(
    data.frame(nb$input, nb$ip, pb$input, pb$ip),
    c(paste0("nb", i, "INPUT"), paste0("nb", i, "IP"),
      paste0("pb", i, "INPUT"), paste0("pb", i, "IP"))
  )
})

counts_mat <- data.frame(Gene = gene_ids, do.call(cbind, cols))

# Preview — neuronal marker, pair-bonding marker, glial marker
counts_mat[c(1, 11, 16), 1:7]
#>      Gene nb1INPUT nb1IP pb1INPUT pb1IP nb2INPUT nb2IP
#> 1  Snap25      274  2231      150   910      229  1467
#> 11   Oxtr      217   260       83   630      290   333
#> 16   Gfap      151   117      143   123      222   307

The first column holds gene identifiers; the remaining 12 columns are raw counts. Snap25 (neuronal) already shows high IP counts in both groups, while Oxtr (pair-bonding marker) shows balanced IP/INPUT — it will only be enriched in pb.


ptrap_de() — differential expression

Simplest call: auto-parse column names

When sample_df is omitted, ptrap_de() reads treatment, replicate number, and fraction directly from the column names. Providing treatment_name tells the function which group to analyse: 1

res_nb <- ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "nb",
  test_method    = "LRT"        # edgeR likelihood ratio test
)
#> Auto-parsed sample metadata from column names.
#>   Treatments : nb, pb
#>   Blocks     : 1, 2, 3
#>   Fractions  : INPUT, IP

# Top enriched genes in nb
head(res_nb[res_nb$diffexpressed != "NO",
            c("Gene", "logFC", "LR", "PValue", "FDR", "diffexpressed")])
#> # A tibble: 6 × 6
#>   Gene   logFC    LR    PValue       FDR diffexpressed
#>   <chr>  <dbl> <dbl>     <dbl>     <dbl> <chr>        
#> 1 Sst     2.95  695. 3.27e-153 6.54e-150 UP           
#> 2 Syn1    3.00  693. 1.10e-152 1.10e-149 UP           
#> 3 Snap25  2.90  656. 1.40e-144 9.31e-142 UP           
#> 4 Vglut1  3.01  631. 3.12e-139 1.56e-136 UP           
#> 5 Bdnf    2.92  624. 1.13e-137 4.50e-135 UP           
#> 6 Rbfox3  2.79  616. 5.50e-136 1.83e-133 UP
res_pb <- ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "pb",
  test_method    = "LRT"
)
#> Auto-parsed sample metadata from column names.
#>   Treatments : nb, pb
#>   Blocks     : 1, 2, 3
#>   Fractions  : INPUT, IP

# Top enriched genes in pb
head(res_pb[res_pb$diffexpressed != "NO",
            c("Gene", "logFC", "LR", "PValue", "FDR", "diffexpressed")])
#> # A tibble: 6 × 6
#>   Gene  logFC    LR    PValue       FDR diffexpressed
#>   <chr> <dbl> <dbl>     <dbl>     <dbl> <chr>        
#> 1 Egr1   3.23  629. 6.88e-139 1.38e-135 UP           
#> 2 Map2   3.12  600. 1.63e-132 1.63e-129 UP           
#> 3 Arc    3.13  591. 1.60e-130 1.07e-127 UP           
#> 4 Syn1   2.98  550. 1.36e-121 6.78e-119 UP           
#> 5 Sst    2.94  543. 4.18e-120 1.67e-117 UP           
#> 6 Nrgn   2.99  542. 5.61e-120 1.87e-117 UP

The neuronal markers (Snap25, Camk2a, etc.) are enriched in IP in both groups. The pair-bonding markers (Oxtr, Avpr1a, Fos, Arc, Egr1) only appear in the pb result — exactly as expected for activity-dependent transcripts of pair-bonded neurons.


Choosing the right test_method

Method When to use
"LRT" edgeR likelihood ratio test — robust with ≥ 4 replicates
"QLF" edgeR quasi-likelihood F-test — more conservative, better FDR control
"voom" limma-voom — precision weights on log-CPM; useful when library sizes vary
"paired.ttest" Per-gene paired t-test on IP − INPUT differences (Tan et al. 2016) — a single treatment vs INPUT
"unpaired.ttest" Per-gene Welch t-test comparing log₂(FE) between two treatment groups (Knight et al.)
# Quasi-likelihood F-test — more conservative than LRT
res_qlf <- ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "nb",
  test_method    = "QLF"
)

# limma-voom — accounts for mean-variance trend with precision weights
res_voom <- ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "nb",
  test_method    = "voom"
)

# Paired t-test — recommended for real PhosphoTRAP data with n = 3 animals
# Follows the approach in Tan et al. 2016 (Cell 167, 47-59)
res_pt <- ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "nb",
  test_method    = "paired.ttest"
)

Note on "paired.ttest": The GLM-based methods ("LRT" / "QLF") estimate gene-wise dispersion from the data and work best with ≥ 4 replicates. With only n = 3, dispersion estimates are unreliable. The paired t-test follows Tan et al. (2016) exactly: “p value for each gene was calculated as the paired t test between input and immunoprecipitated RPKM values from the three experimental repeats.” Set norm.method = "RPKM" (and supply gene.length) to reproduce the original analysis. logFC is log₂(mean_IP / mean_INPUT).


Long-format paired data with return_long = TRUE

return_long = TRUE (only with "paired.ttest") returns the per-gene, per-animal paired table used internally — useful for plotting individual data points or performing additional quality checks:

res_long <- ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "nb",
  test_method    = "paired.ttest",
  return_long    = TRUE
)
#> Auto-parsed sample metadata from column names.
#>   Treatments : nb, pb
#>   Blocks     : 1, 2, 3
#>   Fractions  : INPUT, IP

# DE results tibble
head(res_long$results[, c("Gene", "logFC", "t_statistic", "PValue", "FDR")])
#> # A tibble: 6 × 5
#>   Gene      logFC t_statistic    PValue    FDR
#>   <chr>     <dbl>       <dbl>     <dbl>  <dbl>
#> 1 Gene1833  0.142       251.  0.0000159 0.0317
#> 2 Pvalb     2.91         38.6 0.000670  0.504 
#> 3 Gene68    0.125        36.4 0.000755  0.504 
#> 4 Gene51    0.132        25.1 0.00158   0.653 
#> 5 Gene1017 -0.118       -23.4 0.00182   0.653 
#> 6 Gene1529  0.196        22.6 0.00196   0.653

# Per-gene, per-animal paired table (showing Snap25):
# ip_count / input_count are in the norm.method scale; FE = IP/INPUT per animal
res_long$long_data[res_long$long_data$Gene == "Snap25", ]
#> # A tibble: 3 × 5
#>   Gene   block ip_count input_count    FE
#>   <chr>  <chr>    <dbl>       <dbl> <dbl>
#> 1 Snap25 1        5522.        679.  8.12
#> 2 Snap25 2        3580.        556.  6.43
#> 3 Snap25 3        3462.        440.  7.86

Custom thresholds

Adjust lfc_threshold and fdr_threshold to control how strict the DE classification is:

# Relaxed: any enrichment above 0.5 log2FC at FDR < 0.10
res_relaxed <- ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "pb",
  test_method    = "LRT",
  lfc_threshold  = 0.5,
  fdr_threshold  = 0.10
)
#> Auto-parsed sample metadata from column names.
#>   Treatments : nb, pb
#>   Blocks     : 1, 2, 3
#>   Fractions  : INPUT, IP

table(res_relaxed$diffexpressed)
#> 
#>   NO   UP 
#> 1984   16

HTML table with kable.out = TRUE

Set kable.out = TRUE to get a formatted HTML table of top genes — ideal for Quarto / R Markdown reports:

ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "pb",
  test_method    = "LRT",
  kable.out      = TRUE,
  ngenes.out     = 10
)

ptrap_volcano() — single volcano plot

A classic volcano plot for one treatment condition, with significant genes colour-coded and labelled:

ptrap_volcano(
  res_nb,
  title = "Non-bonded (nb) — POA"
)

Volcano plot for non-bonded mice

Customise colours and thresholds:

ptrap_volcano(
  res_pb,
  lfc_threshold = 1,
  fdr_threshold = 0.05,
  colors        = c("UP" = "#E69F00", "DOWN" = "#56B4E9"),
  title         = "Pair-bonded (pb) — POA"
)

Customised volcano plot for pair-bonded mice

Oxtr, Avpr1a, Fos, Arc, and Egr1 appear only in the pb plot — consistent with their known roles in pair-bond formation and neuronal immediate-early gene expression.


ptrap_volcano2() — dual scatter comparing two treatments

ptrap_volcano2() overlays both treatment conditions on a single scatter plot. Each axis shows the log₂ fold-change (IP / INPUT) for one group, and genes are colour-coded by their DE status across conditions:

ptrap_volcano2(
  res_nb, res_pb,
  treatment_col = "Treatment",
  title         = "POA — nb vs pb"
)

Dual scatter comparing nb and pb treatments

How to read this plot * Top-right quadrant: enriched in IP in both groups → constitutive neuronal translation (e.g. Snap25, Camk2a, Rbfox3). * Bottom-right: enriched only in pb → activity-dependent translation in pair-bonded animals (Oxtr, Fos, Arc, Egr1, Avpr1a). * Top-left: enriched only in nb → non-bonded-specific. * The dotted diagonal is the identity line — equal enrichment in both groups.

Customise colours and thresholds:

ptrap_volcano2(
  res_nb, res_pb,
  lfc_threshold = 1,
  fdr_threshold = 0.05,
  treatment_col = "Treatment",
  title         = "POA — nb vs pb",
  colors        = c(
    "DE in both"   = "#D55E00",
    "DE only nb"   = "#0072B2",
    "DE only pb"   = "#009E73"
  )
)

Customised dual scatter plot


test_method = "voom" — limma-voom

"voom" applies the limma-voom pipeline: counts are converted to log-CPM and precision weights are estimated from the mean-variance trend before fitting the same IP vs INPUT linear model. This is particularly useful when library sizes vary substantially across samples, as the weights down-weight unreliable observations automatically.

res_nb_voom <- ptrap_de(
  counts_mat     = counts_mat,
  treatment_name = "nb",
  test_method    = "voom"
)
#> Auto-parsed sample metadata from column names.
#>   Treatments : nb, pb
#>   Blocks     : 1, 2, 3
#>   Fractions  : INPUT, IP

# Top enriched genes
head(res_nb_voom[res_nb_voom$diffexpressed != "NO",
                 c("Gene", "logFC", "t", "PValue", "FDR", "diffexpressed")])
#> # A tibble: 6 × 6
#>   Gene   logFC     t   PValue      FDR diffexpressed
#>   <chr>  <dbl> <dbl>    <dbl>    <dbl> <chr>        
#> 1 Sst     2.95  28.9 1.31e-55 1.85e-52 UP           
#> 2 Syn1    3.00  28.8 1.85e-55 1.85e-52 UP           
#> 3 Vglut1  3.02  27.8 6.39e-54 4.26e-51 UP           
#> 4 Snap25  2.90  27.6 1.57e-53 7.84e-51 UP           
#> 5 Bdnf    2.91  27.2 5.11e-53 2.04e-50 UP           
#> 6 Map2    2.96  27.1 7.69e-53 2.56e-50 UP

The output has the same structure as "LRT" / "QLF" results (Gene, logFC, FDR, diffexpressed, …), so it drops straight into ptrap_volcano():

ptrap_volcano(
  res_nb_voom,
  title = "Non-bonded (nb) — POA [voom]"
)

Volcano plot from limma-voom analysis


test_method = "unpaired.ttest" — comparing two treatment groups

The unpaired t-test method follows the PhosphoTRAP approach of Knight et al. (2012): per-gene fold enrichments (FE = IP / INPUT) are computed for every animal in each group, and a Welch t-test is used to compare log₂(FE) between the two treatment groups. This is the right choice when you want to ask:

“Is the IP enrichment itself different between group A and group B?”

rather than testing IP vs INPUT within a single group. When neither treatment_name nor control_name is supplied and the counts matrix contains exactly two groups, they are assigned automatically (alphabetically; first = control, second = treatment):

# Auto-assigns: nb = control, pb = treatment
res_unpaired <- ptrap_de(
  counts_mat  = counts_mat,
  test_method = "unpaired.ttest"
)
#> Auto-parsed sample metadata from column names.
#>   Treatments : nb, pb
#>   Blocks     : 1, 2, 3
#>   Fractions  : INPUT, IP
#> Auto-assigned treatments alphabetically: control = 'nb', treatment = 'pb'.
#> Low-power warning ('unpaired.ttest'): 'pb' has n = 3 and 'nb' has n = 3 replicates.
#>   A Welch t-test with 3 replicates per group yields only 2 degree(s) of freedom per group,
#>   making BH correction over 2000 genes highly conservative -- few or no genes may reach
#>   significance at the default thresholds. Consider:
#>     * relaxing fdr_threshold (e.g. 0.10 or 0.20) or lfc_threshold;
#>     * increasing biological replicates (n >= 4 recommended);
#>     * using test_method = 'LRT', 'QLF', 'voom', or 'deseq', which
#>       borrow information across genes to improve power at small n.

# Results include mean FE per group and their ratio (diff_FE)
head(res_unpaired[res_unpaired$diffexpressed != "NO",
                  c("Gene", "logFC", "diff_FE",
                    "mean_FE_nb", "mean_FE_pb",
                    "PValue", "FDR", "diffexpressed")])
#> # A tibble: 4 × 8
#>   Gene  logFC diff_FE mean_FE_nb mean_FE_pb    PValue    FDR diffexpressed
#>   <chr> <dbl>   <dbl>      <dbl>      <dbl>     <dbl>  <dbl> <chr>        
#> 1 Egr1   3.24    9.47      1.00        9.48 0.0000166 0.0210 UP           
#> 2 Oxtr   2.89    7.42      1.12        8.28 0.0000287 0.0210 UP           
#> 3 Arc    3.14    8.80      0.997       8.78 0.0000408 0.0210 UP           
#> 4 Fos    2.99    7.92      1.03        8.13 0.0000420 0.0210 UP

Interpretation: diff_FE = mean_FE_pb / mean_FE_nb — the ratio of mean IP/INPUT fold enrichments between treatment and control. logFC is log2(diff_FE). Genes with logFC > 0 are more enriched in the pair-bonded group.

Pair-bonding markers (Oxtr, Avpr1a, Fos, Arc, Egr1) should show up here as genes whose neuronal IP enrichment is higher in pb than nb:

ptrap_volcano(
  res_unpaired,
  title = "pb vs nb — POA [unpaired t-test]"
)

Volcano plot from unpaired t-test comparing pb vs nb


Providing sample metadata explicitly

For more complex designs — multiple brain regions, custom blocking variables, or non-standard column naming — supply sample_df directly:

# Example: two brain regions, two treatments
sample_df <- data.frame(
  sample    = c("POA_nb1_IP", "POA_nb1_INPUT", "POA_pb1_IP", "POA_pb1_INPUT",
                "AMY_nb1_IP", "AMY_nb1_INPUT", "AMY_pb1_IP", "AMY_pb1_INPUT"),
  region    = rep(c("POA", "AMY"), each = 4),
  treatment = rep(c("nb", "nb", "pb", "pb"), 2),
  block     = rep("1", 8),
  fraction  = rep(c("IP", "INPUT", "IP", "INPUT"), 2)
)

res_poa_pb <- ptrap_de(
  counts_mat     = my_counts,
  sample_df      = sample_df,
  gene_ids       = my_gene_ids,
  region_name    = "POA",
  treatment_name = "pb",
  region_col     = "region",
  treatment_col  = "treatment",
  block_col      = "block",
  fraction_col   = "fraction",
  test_method    = "paired.ttest"
)

Citation

If you use pTRAPPING in your research, please cite:

Rodríguez, C. (2024). pTRAPPING: A Suite of Functions for the Analyses of PhosphoTRAP Data in R. https://github.com/laurenoconnelllab/pTRAPPING

For the paired t-test method (test_method = "paired.ttest"):

Tan, C.L., Cooke, E.K., Leib, D.E., Lin, Y.C., Daly, G.E., Zimmerman, C.A., and Knight, Z.A. (2016). Warm-Sensitive Neurons that Control Body Temperature. Cell 167, 47–59. https://doi.org/10.1016/j.cell.2016.08.028

For the unpaired t-test method (test_method = "unpaired.ttest"):

Knight, Z.A., Tan, K., Birsoy, K., Schmidt, S., Garrison, J.L., Wysocki, R.W., Emiliano, A., Ekstrand, M.I., and Bhatt, D.L. (2012). Molecular profiling of activated neurons by phosphorylated ribosome capture. Cell 151, 1126–1137. https://doi.org/10.1016/j.cell.2012.10.039