delnx.pp.pseudobulkΒΆ

delnx.pp.pseudobulk(adata, sample_key='batch', group_key=None, n_pseudoreps=None, layer=None, min_cells=5, min_counts=5000, mode='sum', **kwargs)[source]ΒΆ

Create pseudobulk samples from single-cell RNA-seq data.

This function aggregates single-cell RNA-seq data into pseudobulk samples based on specified sample and group identifiers. It can optionally create random pseudoreplicates.

Parameters:
  • adata (AnnData) – AnnData object containing single-cell expression data.

  • sample_key (str (default: 'batch')) – Column name in adata.obs that contains the sample identifiers. Each unique value will become a separate pseudobulk sample (or multiple samples if n_pseudoreps is specified).

  • group_key (str | None (default: None)) – Column name in adata.obs that contains group identifiers like cell types. If provided, pseudobulk samples will be generated separately for each combination of sample and group, enabling cell type-specific analysis.

  • n_pseudoreps (int | None (default: None)) – Number of pseudoreplicates to create per sample. If None, uses the original sample structure without resampling. If specified, creates n_pseudoreps pseudoreplicates per sample by randomly sampling cells with replacement.

  • layer (str | None (default: None)) – Layer in adata.layers to use for aggregation. If None, uses adata.X. Should contain raw or normalized counts, not log-transformed values.

  • mode (str (default: 'sum')) –

    Method to aggregate cell-level data into pseudobulk samples:
    • ”sum”: Sum of counts (recommended for RNA-seq data)

    • ”mean”: Average of counts across cells

  • min_cells (int | None (default: 5)) – Minimum number of cells required in a pseudobulk sample to retain it. Samples with fewer cells will be discarded.

  • min_counts (int | None (default: 5000)) – Minimum total counts required in a pseudobulk sample to retain it. Samples with fewer total counts will be discarded.

  • **kwargs – Additional arguments to pass to decoupler.pp.pseudobulk()

Return type:

AnnData

Returns:

AnnData object containing the pseudobulk data. The structure changes from cell-level to sample-level, with each row representing a pseudobulk sample. Original sample and group identifiers are preserved in the observations.

Notes

  • Wrapper around the decoupler (https://github.com/scverse/decoupler) pseudobulk function to support pseudo-replicates

  • It’s generally recommended to aggregate counts using the β€œsum” mode and then re-normalize rather than using β€œmean” directly on log-normalized data.

Examples

Basic pseudobulking by sample:

>>> import scanpy as sc
>>> import delnx as dx
>>> adata = sc.read_h5ad("single_cell_data.h5ad")
>>> # Assuming adata.obs["sample"] contains sample identifiers
>>> pseudobulk_adata = dx.pp.pseudobulk(adata, sample_key="sample")

Pseudobulking by sample and cell type:

>>> # Create cell type-specific pseudobulk samples
>>> pseudobulk_adata = dx.pp.pseudobulk(adata, sample_key="sample", group_key="cell_type")

Creating pseudoreplicates for assessing technical variation:

>>> # Generate 5 pseudoreplicates per original sample
>>> pseudobulk_adata = dx.pp.pseudobulk(adata, sample_key="sample", n_pseudoreps=5)