delnx.tl.rank_de¶
- delnx.tl.rank_de(adata, condition_key, layer=None, use_ties=False, multitest_method='fdr_bh', n_cpus=None, min_samples=2, batch_size=512, verbose=True)[source]¶
Perform rank-based differential expression analysis using AUROC statistics.
This function identifies differentially expressed features between conditions using Area Under the ROC Curve (AUROC) analysis on ranked gene expression data. It efficiently handles sparse matrices with memory-optimized batch processing and provides optional tie correction for improved statistical accuracy.
- Parameters:
adata (
AnnData) – AnnData object containing expression data and metadata.condition_key (
str) – Column name inadata.obscontaining condition labels.layer (
str|None(default:None)) – Layer inadata.layersto use for expression data. If None, usesadata.X.use_ties (
bool(default:False)) – Whether to apply tie correction to p-value calculations. Recommended for sparse data with many identical values (especially zeros).multitest_method (
str(default:'fdr_bh')) –- Method for multiple testing correction. Accepts any method supported by
statsmodels.stats.multipletests(). Common options include: ”fdr_bh”: Benjamini-Hochberg FDR correction
”bonferroni”: Bonferroni correction
- Method for multiple testing correction. Accepts any method supported by
n_cpus (
int|None(default:None)) – Number of CPU cores for parallel processing. If None, uses available threads. Parallel processing is enabled when n_cpus >= 4 or use_ties=True.min_samples (
int(default:2)) – Minimum samples required per condition. Conditions with fewer samples excluded.batch_size (
int|None(default:512)) – Features per batch. Larger values use more memory but may be more efficient. If None, processes all features at once.verbose (
bool(default:True)) – Whether to show progress and algorithm information.
- Return type:
DataFrame- Returns:
pd.DataFrame Results with columns: - “feature”: Feature/gene names - “condition”: Condition label (one-vs-all comparison) - “auroc”: AUROC values (0.5=random, >0.5=upregulated, <0.5=downregulated) - “pval”: Two-tailed p-values from Mann-Whitney U test - “tie_corrected”: Whether tie correction was applied
Examples
Basic one-vs-all differential expression:
>>> import delnx as dx >>> results = dx.tl.rank_de(adata, condition_key="cell_type")
With tie correction for improved p-values:
>>> results = dx.tl.rank_de(adata, condition_key="treatment", use_ties=True)
Custom batch size for memory optimization:
>>> results = dx.tl.rank_de(adata, condition_key="condition", batch_size=1024)
Notes
This implementation uses several optimizations: - Numba JIT compilation for fast ranking algorithms - JAX acceleration for AUROC calculations - Memory-efficient batch processing to handle large datasets - Automatic algorithm selection based on data characteristics - Optional tie correction for statistical accuracy with sparse data