Comparing DE methods in delnxΒΆ

delnx provides several differential expression testing methods, each suited to different data types and analysis scenarios:

Method

Function

Data type

Test

Best for

ANOVA

dx.tl.de(method="anova")

Log-normalized

F-test

Fast screening, continuous data

Logistic regression

dx.tl.de(method="lr")

Log-normalized / scaled

LR test

Covariate control, classification-style DE

Binomial

dx.tl.de(method="binomial")

Binary (0/1)

LR test

Proportion-based DE on binarized expression

AUROC ranking

dx.tl.rank_de()

Any

Mann-Whitney U

Fast multi-group markers

Negative binomial

dx.tl.nb_fit() + dx.tl.nb_test()

Raw counts

Quasi-likelihood F

Pseudobulk, perturbation screens

This notebook benchmarks their runtime scaling across three axes:

  1. Sample scaling β€” how each method scales with increasing cell count

  2. 1-vs-all marker detection β€” scaling across many groups (clusters)

  3. 1-vs-1 pairwise comparisons β€” scaling with number of pairwise tests

import time

import anndata as ad
import delnx as dx
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def simulate_data(n_samples, n_genes, n_groups, seed=42):
    """Simulate count data with multiple groups."""
    rng = np.random.default_rng(seed)
    counts = rng.negative_binomial(5, 0.5, size=(n_samples, n_genes)).astype(np.float64)
    adata = ad.AnnData(X=counts)
    adata.var_names = [f"gene_{i}" for i in range(n_genes)]
    adata.obs_names = [f"cell_{i}" for i in range(n_samples)]
    adata.obs["group"] = [f"g{i % n_groups}" for i in range(n_samples)]
    # Log-normalized layer for anova/lr
    adata.layers["counts"] = counts.copy()
    adata.layers["lognorm"] = np.log1p(counts / counts.sum(axis=1, keepdims=True) * 1e4)
    return adata


print("Simulation function defined")
Simulation function defined
# Warm up all JIT caches
a = simulate_data(50, 100, 2)
a_ln = a.copy(); a_ln.X = a_ln.layers["lognorm"]
_ = dx.tl.de(a_ln, condition_key="group", reference="g0", method="anova", verbose=False)
_ = dx.tl.de(a_ln, condition_key="group", reference="g0", method="lr", verbose=False)
_ = dx.tl.rank_de(a, condition_key="group", verbose=False)
fit = dx.tl.nb_fit(a, condition_key="group", reference="g0", verbose=False)
_ = dx.tl.nb_test(a, fit, contrast="group[T.g1]")
print("JIT warmup complete")
JIT warmup complete

1. Sample scalingΒΆ

We vary the number of cells/samples from 100 to 10,000, covering case-control scenarios in (pseudo) bulk and single cell RNA-seq.

Here, the ANOVA and logistic regression methods scale well, while the negative binomial method is much slower due to its iterative fitting procedure.

N_GENES = 10000
sample_sizes = [100, 500, 1000, 2000, 5000, 10000]

sample_results = []

for n in sample_sizes:
    print(f"\n--- {n:,} cells ---")
    adata = simulate_data(n, N_GENES, 2)
    adata_ln = adata.copy()
    adata_ln.X = adata_ln.layers["lognorm"]

    # ANOVA
    t0 = time.time()
    _ = dx.tl.de(adata_ln, condition_key="group", reference="g0", method="anova", verbose=False)
    t = time.time() - t0
    sample_results.append({"n_cells": n, "method": "ANOVA", "time": t})
    print(f"  ANOVA:   {t:.1f}s")

    # LR
    t0 = time.time()
    _ = dx.tl.de(adata_ln, condition_key="group", reference="g0", method="lr", verbose=False)
    t = time.time() - t0
    sample_results.append({"n_cells": n, "method": "LR", "time": t})
    print(f"  LR:      {t:.1f}s")

    # rank_de
    t0 = time.time()
    _ = dx.tl.rank_de(adata, condition_key="group", verbose=False)
    t = time.time() - t0
    sample_results.append({"n_cells": n, "method": "rank_de", "time": t})
    print(f"  rank_de: {t:.1f}s")

    # NB
    t0 = time.time()
    fit = dx.tl.nb_fit(adata, condition_key="group", reference="g0", verbose=False)
    _ = dx.tl.nb_test(adata, fit, contrast="group[T.g1]")
    t = time.time() - t0
    sample_results.append({"n_cells": n, "method": "NB", "time": t})
    print(f"  NB:      {t:.1f}s")

sample_df = pd.DataFrame(sample_results)
--- 100 cells ---
  ANOVA:   0.0s
  LR:      0.0s
  rank_de: 0.7s
  NB:      0.6s

--- 500 cells ---
  ANOVA:   0.1s
  LR:      0.1s
  rank_de: 1.0s
  NB:      1.0s

--- 1,000 cells ---
  ANOVA:   0.1s
  LR:      0.1s
  rank_de: 1.2s
  NB:      1.0s

--- 2,000 cells ---
  ANOVA:   0.2s
  LR:      0.2s
  rank_de: 1.6s
  NB:      2.8s

--- 5,000 cells ---
  ANOVA:   0.3s
  LR:      0.3s
  rank_de: 2.6s
  NB:      2.9s

--- 10,000 cells ---
  ANOVA:   0.5s
  LR:      0.5s
  rank_de: 4.8s
  NB:      5.8s
fig, ax = plt.subplots(figsize=(8, 5))

colors = {"ANOVA": "#2ca02c", "LR": "#ff7f0e", "rank_de": "#9467bd", "NB": "#1f77b4"}
markers = {"ANOVA": "^", "LR": "s", "rank_de": "D", "NB": "o"}

for method in ["ANOVA", "LR", "rank_de", "NB"]:
    sub = sample_df[sample_df["method"] == method].sort_values("n_cells")
    ax.plot(sub["n_cells"], sub["time"], label=method, linewidth=2, markersize=8,
            color=colors[method], marker=markers[method])

ax.set_xlabel("Number of cells")
ax.set_ylabel("Time (seconds)")
ax.set_title(f"Single 2-group DE test ({N_GENES:,} genes)")
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
plt.close()
../_images/08e2754511804ea6edefa26db3ae88358403f658844c00fb9bc208b718c905bf.png

2. One-vs-all marker detectionΒΆ

Here we vary the number of groups: [5, 10, 20, 50], testing each group against all others. This is a common scenario in single cell RNA-seq, where we want to find markers for each cluster.

rank_de scales super well here because it tests all groups in a single pass (vectorized AUROC) while other methods require a loop of independent 1-vs-all tests.

N_CELLS_PER_GROUP = 100
group_counts = [5, 10, 20, 50]

marker_results = []


def run_onevsall_loop(adata, method, use_lognorm=False):
    """Run 1-vs-all DE for each group using a loop."""
    a = adata.copy()
    if use_lognorm:
        a.X = a.layers["lognorm"]
    for g in adata.obs["group"].unique():
        a.obs["is_group"] = (a.obs["group"] == g).astype(str)
        _ = dx.tl.de(a, condition_key="is_group", reference="False", method=method, verbose=False)


def run_onevsall_nb(adata):
    """Run 1-vs-all NB for each group using a loop."""
    for g in adata.obs["group"].unique():
        a = adata.copy()
        a.obs["is_group"] = (a.obs["group"] == g).astype(str)
        fit = dx.tl.nb_fit(a, condition_key="is_group", reference="False", verbose=False)
        _ = dx.tl.nb_test(a, fit, contrast="is_group[T.True]")


for n_groups in group_counts:
    n_total = n_groups * N_CELLS_PER_GROUP
    print(f"\n--- {n_groups} groups, {n_total:,} cells ---")
    adata = simulate_data(n_total, N_GENES, n_groups)

    # rank_de (all groups at once)
    t0 = time.time()
    _ = dx.tl.rank_de(adata, condition_key="group", verbose=False)
    t = time.time() - t0
    marker_results.append({"n_groups": n_groups, "method": "rank_de", "time": t})
    print(f"  rank_de: {t:.1f}s")

    # ANOVA (1-vs-all loop)
    t0 = time.time()
    run_onevsall_loop(adata, "anova", use_lognorm=True)
    t = time.time() - t0
    marker_results.append({"n_groups": n_groups, "method": "ANOVA", "time": t})
    print(f"  ANOVA:   {t:.1f}s")

    # LR (1-vs-all loop)
    t0 = time.time()
    run_onevsall_loop(adata, "lr", use_lognorm=True)
    t = time.time() - t0
    marker_results.append({"n_groups": n_groups, "method": "LR", "time": t})
    print(f"  LR:      {t:.1f}s")

    # NB (1-vs-all loop)
    t0 = time.time()
    run_onevsall_nb(adata)
    t = time.time() - t0
    marker_results.append({"n_groups": n_groups, "method": "NB", "time": t})
    print(f"  NB:      {t:.1f}s")

marker_df = pd.DataFrame(marker_results)
--- 5 groups, 500 cells ---
  rank_de: 1.0s
  ANOVA:   0.5s
  LR:      0.5s
  NB:      5.2s

--- 10 groups, 1,000 cells ---
  rank_de: 1.2s
  ANOVA:   1.1s
  LR:      1.1s
  NB:      10.4s

--- 20 groups, 2,000 cells ---
  rank_de: 1.6s
  ANOVA:   3.3s
  LR:      3.1s
  NB:      56.9s

--- 50 groups, 5,000 cells ---
  rank_de: 2.9s
  ANOVA:   14.2s
  LR:      14.2s
  NB:      150.2s
fig, ax = plt.subplots(figsize=(8, 5))

for method in ["rank_de", "ANOVA", "LR", "NB"]:
    sub = marker_df[marker_df["method"] == method].sort_values("n_groups")
    ax.plot(sub["n_groups"], sub["time"] / 60, label=method, linewidth=2, markersize=8,
            color=colors[method], marker=markers[method])

ax.set_xlabel("Number of groups")
ax.set_ylabel("Total time (minutes)")
ax.set_title(f"1-vs-all marker detection ({N_GENES:,} genes, {N_CELLS_PER_GROUP} cells/group)")
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
plt.close()
../_images/7ecc8a2d3f5b539534900633e2fd1539559cfe21f7cf37211f6db028cff518a8.png

3. NB joint model vs separate fitsΒΆ

For count-based DE with multiple groups, there are two strategies when using the negative binomial method:

  • Joint model: fit one NB GLM with all groups as coefficients, then test each group via contrasts. Dispersion is estimated from all samples (better for small groups).

  • Separate fits: fit independent 2-coefficient models (group-vs-rest or group-vs-control). Each fit is fast but dispersion is estimated per subset.

For smaller numers of conditions, the joint fit is much faster since it only requires one model fit, while separate fits require a loop of independent fits. However for more than ~80 conditions, the joint model becomes slower than separate fits, likely due to the increasing complexity of fitting a model with many coefficients and estimating dispersion across many groups.

joint_group_counts = [5, 10, 20, 50, 80, 100]
N_CELLS_PER_TOTAL_JV = 1000

joint_results = []

for n_groups in joint_group_counts:
    print(f"\n--- {n_groups} groups, {N_CELLS_PER_TOTAL_JV:,} cells ---")
    adata = simulate_data(N_CELLS_PER_TOTAL_JV, N_GENES, n_groups)

    # Joint model: fit once with all groups, test each via contrasts
    t0 = time.time()
    fit_joint = dx.tl.nb_fit(adata, condition_key="group", reference="g0", verbose=False)
    contrasts = [c for c in fit_joint.design_column_names if c != "Intercept"]
    for c in contrasts:
        _ = dx.tl.nb_test(adata, fit_joint, contrast=c)
    t_joint = time.time() - t0
    joint_results.append({"n_groups": n_groups, "strategy": "Joint model", "time": t_joint})
    print(f"  Joint ({n_groups} coefs, 1 fit + {len(contrasts)} tests): {t_joint:.1f}s")

    # Separate fits: one 2-coef model per group (1-vs-all)
    t0 = time.time()
    for g in adata.obs["group"].unique():
        a = adata.copy()
        a.obs["is_group"] = (a.obs["group"] == g).astype(str)
        fit_sep = dx.tl.nb_fit(a, condition_key="is_group", reference="False", verbose=False)
        _ = dx.tl.nb_test(a, fit_sep, contrast="is_group[T.True]")
    t_sep = time.time() - t0
    joint_results.append({"n_groups": n_groups, "strategy": "Separate fits", "time": t_sep})
    print(f"  Separate ({n_groups} x 2-coef fits):              {t_sep:.1f}s")
    print(f"  Ratio (joint/separate): {t_joint/t_sep:.2f}x")

joint_df = pd.DataFrame(joint_results)
--- 5 groups, 1,000 cells ---
  Joint (5 coefs, 1 fit + 4 tests): 1.3s
  Separate (5 x 2-coef fits):              5.2s
  Ratio (joint/separate): 0.25x

--- 10 groups, 1,000 cells ---
  Joint (10 coefs, 1 fit + 9 tests): 2.1s
  Separate (10 x 2-coef fits):              10.3s
  Ratio (joint/separate): 0.21x

--- 20 groups, 1,000 cells ---
  Joint (20 coefs, 1 fit + 19 tests): 4.0s
  Separate (20 x 2-coef fits):              20.7s
  Ratio (joint/separate): 0.20x

--- 50 groups, 1,000 cells ---
  Joint (50 coefs, 1 fit + 49 tests): 23.5s
  Separate (50 x 2-coef fits):              51.7s
  Ratio (joint/separate): 0.45x

--- 80 groups, 1,000 cells ---
  Joint (80 coefs, 1 fit + 79 tests): 80.5s
  Separate (80 x 2-coef fits):              83.0s
  Ratio (joint/separate): 0.97x

--- 100 groups, 1,000 cells ---
  Joint (100 coefs, 1 fit + 99 tests): 162.0s
  Separate (100 x 2-coef fits):              103.7s
  Ratio (joint/separate): 1.56x
fig, ax = plt.subplots(figsize=(8, 5))

for strategy, style in [("Joint model", dict(color="#d62728", marker="o")),
                         ("Separate fits", dict(color="#1f77b4", marker="^"))]:
    sub = joint_df[joint_df["strategy"] == strategy].sort_values("n_groups")
    ax.plot(sub["n_groups"], sub["time"] / 60, label=strategy, linewidth=2, markersize=8, **style)

ax.set_xlabel("Number of groups")
ax.set_ylabel("Total time (minutes)")
ax.set_title(f"NB joint model vs separate 1-vs-all fits ({N_GENES:,} genes, {N_CELLS_PER_TOTAL_JV} total cells)")
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
plt.close()
../_images/02ecc7588852d8e0c07a1d63135d00a7e1a9c378f3f548cfa1c37fc6b3070edc.png

SummaryΒΆ

Scenario

Recommended method

Why

Quick marker screening

rank_de

Tests all groups in one pass, scales to hundreds of clusters

Pseudobulk DE (counts)

nb_fit + nb_test

Proper count model with dispersion shrinkage

Lognorm continuous data

de(method="anova")

Fast closed-form F-test

Covariate control

de(method="lr")

Logistic regression with flexible design matrix

Binary expression (0/1)

de(method="binomial")

Binomial GLM on binarized expression

Many pairwise comparisons

rank_de or de(method="anova")

Low per-test overhead

LR vs binomial: Both use likelihood ratio tests but model different things. LR predicts the condition label from gene expression (classification-style); binomial models binary expression as a function of condition (GLM-style).

Joint vs separate NB models: Use separate 2-coefficient fits for single-cell data (many cells per group, fast, parallelizable). Use a joint model for pseudobulk with few replicates per condition, where pooling dispersions across all samples matters.