delnx.pp.pseudobulkΒΆ
- delnx.pp.pseudobulk(adata, sample_key='batch', group_key=None, n_pseudoreps=None, layer=None, min_cells=5, min_counts=5000, mode='sum', **kwargs)[source]ΒΆ
Create pseudobulk samples from single-cell RNA-seq data.
This function aggregates single-cell RNA-seq data into pseudobulk samples based on specified sample and group identifiers. It can optionally create random pseudoreplicates.
- Parameters:
adata (
AnnData) β AnnData object containing single-cell expression data.sample_key (
str(default:'batch')) β Column name inadata.obsthat contains the sample identifiers. Each unique value will become a separate pseudobulk sample (or multiple samples ifn_pseudorepsis specified).group_key (
str|None(default:None)) β Column name inadata.obsthat contains group identifiers like cell types. If provided, pseudobulk samples will be generated separately for each combination of sample and group, enabling cell type-specific analysis.n_pseudoreps (
int|None(default:None)) β Number of pseudoreplicates to create per sample. IfNone, uses the original sample structure without resampling. If specified, createsn_pseudorepspseudoreplicates per sample by randomly sampling cells with replacement.layer (
str|None(default:None)) β Layer inadata.layersto use for aggregation. IfNone, usesadata.X. Should contain raw or normalized counts, not log-transformed values.mode (
str(default:'sum')) β- Method to aggregate cell-level data into pseudobulk samples:
βsumβ: Sum of counts (recommended for RNA-seq data)
βmeanβ: Average of counts across cells
min_cells (
int|None(default:5)) β Minimum number of cells required in a pseudobulk sample to retain it. Samples with fewer cells will be discarded.min_counts (
int|None(default:5000)) β Minimum total counts required in a pseudobulk sample to retain it. Samples with fewer total counts will be discarded.**kwargs β Additional arguments to pass to
decoupler.pp.pseudobulk()
- Return type:
- Returns:
AnnDataobject containing the pseudobulk data. The structure changes from cell-level to sample-level, with each row representing a pseudobulk sample. Original sample and group identifiers are preserved in the observations.
Notes
Wrapper around the decoupler (https://github.com/scverse/decoupler) pseudobulk function to support pseudo-replicates
Itβs generally recommended to aggregate counts using the βsumβ mode and then re-normalize rather than using βmeanβ directly on log-normalized data.
Examples
Basic pseudobulking by sample:
>>> import scanpy as sc >>> import delnx as dx >>> adata = sc.read_h5ad("single_cell_data.h5ad") >>> # Assuming adata.obs["sample"] contains sample identifiers >>> pseudobulk_adata = dx.pp.pseudobulk(adata, sample_key="sample")
Pseudobulking by sample and cell type:
>>> # Create cell type-specific pseudobulk samples >>> pseudobulk_adata = dx.pp.pseudobulk(adata, sample_key="sample", group_key="cell_type")
Creating pseudoreplicates for assessing technical variation:
>>> # Generate 5 pseudoreplicates per original sample >>> pseudobulk_adata = dx.pp.pseudobulk(adata, sample_key="sample", n_pseudoreps=5)