| name | single-cell-downstream-analysis |
| title | Single-cell downstream analysis |
| description | Checklist-style reference for OmicVerse downstream tutorials covering AUCell scoring, metacell DEG, and related exports. |
Single-cell downstream analysis quick-reference
This skill sheet distills the OmicVerse single-cell downstream tutorials into an executable checklist. Each module highlights prerequisites, the core API entry points, interpretation checkpoints, resource planning notes, and any optional validation or export steps surfaced in the notebooks.
AUCell pathway scoring (t_aucell.ipynb)
- Prerequisites
- Download pathway collections (GO, KEGG, or custom) that match the organism under study before running the tutorial.
- Ensure an
AnnDataobject with clustering/embedding (adata.obsm['X_umap']) is prepared.
- Core calls
ov.single.geneset_aucellfor one pathway;ov.single.pathway_aucellfor multiple pathways.ov.single.pathway_aucell_enrichmentto score all pathways in a library (setnum_workersfor parallelism).
- Result checks
- Interpret AUCell scores as expression-like values (0–1). Use
sc.pl.embeddingto confirm pathway activity patterns. - Run
sc.tl.rank_genes_groupson the AUCellAnnDatato find cluster-enriched pathways and visualize withsc.pl.rank_genes_groups_dotplot.
- Interpret AUCell scores as expression-like values (0–1). Use
- Resources
- Library-wide scoring can be CPU-intensive; allocate workers (
num_workers=8in tutorial) and sufficient memory for the dense AUCell matrix.
- Library-wide scoring can be CPU-intensive; allocate workers (
- Optional validation / exports
- Persist scores with
adata_aucs.write_h5ad('...')for reuse. - Plot enriched pathways via
ov.single.pathway_enrichmentandov.single.pathway_enrichment_plotheatmaps.
- Persist scores with
scRNA-seq DEG (bulk-style meta cell) (t_scdeg.ipynb)
- Prerequisites
- Run quality control and preprocessing (
ov.pp.qc,ov.pp.preprocess,ov.pp.scale,ov.pp.pca). - Retain raw counts in
adata.rawbefore HVG filtering.
- Run quality control and preprocessing (
- Core calls
- Construct differential objects with
ov.bulk.pyDEG(test_adata.to_df(...).T)for full-cell and metacell views. - Build metacells via
ov.single.MetaCell(..., use_gpu=True)when GPU is available for acceleration.
- Construct differential objects with
- Result checks
- Inspect volcano plots (
dds.plot_volcano) and targeted boxplots (dds.plot_boxplot) for top DEGs. - Map DEG markers back to UMAP embeddings using
ov.utils.embeddingto confirm localization.
- Inspect volcano plots (
- Resources
- Metacell construction benefits from GPU but can fall back to CPU; ensure enough memory for transposed dense matrices
passed to
pyDEG.
- Metacell construction benefits from GPU but can fall back to CPU; ensure enough memory for transposed dense matrices
passed to
- Optional validation / exports
- Save metacell embeddings with matplotlib figures; adjust
legend_*settings for publication-ready visuals.
- Save metacell embeddings with matplotlib figures; adjust
scRNA-seq DEG (cell-type & composition) (t_deg_single.ipynb)
- Prerequisites
- Annotated
adatawithcondition,cell_label, and optionalbatchmetadata. - Initialize mixed CPU/GPU resources when using graph-based DA methods (
ov.settings.cpu_gpu_mixed_init()).
- Annotated
- Core calls
ov.single.DEG(..., method='wilcoxon'|'t-test'|'memento-de')withdeg_obj.run(...)to target cell types.ov.single.DCT(..., method='sccoda'|'milo')for differential composition testing.- Graph setup for Milo:
ov.pp.preprocess,ov.single.batch_correction,ov.pp.neighbors,ov.pp.umap.
- Result checks
- Review DEG tables from
deg_obj(Wilcoxon / memento) and adjust capture rate / bootstraps for stability. - For scCODA, tune FDR via
sim_results.set_fdr(); interpret boxplots with condition-level shifts. - Milo diagnostics: histogram of P-values, logFC vs –log10 FDR scatter, beeswarm of differential abundance.
- Review DEG tables from
- Resources
- Memento and Milo require multiple CPUs (
num_cpus,num_boot, highk); ensure adequate compute time. - Harmony/scVI batch correction needs GPU memory when enabled; plan for VRAM usage.
- Memento and Milo require multiple CPUs (
- Optional validation / exports
- Visual diagnostics include UMAP overlays (
ov.pl.embedding), Milo beeswarm plots, and custom color palettes.
- Visual diagnostics include UMAP overlays (
scDrug response prediction (t_scdrug.ipynb)
- Prerequisites
- Fetch tumor-focused dataset (e.g.,
infercnvpy.datasets.maynard2020_3k). - Download reference assets before running predictions:
- Gene annotations via
ov.utils.get_gene_annotation(requires GTF from GENCODE or T2T-CHM13). ov.utils.download_GDSC_data()andov.utils.download_CaDRReS_model()for drug-response models.- Clone CaDRReS-Sc repo (
git clone https://github.com/CSB5/CaDRReS-Sc).
- Gene annotations via
- Fetch tumor-focused dataset (e.g.,
- Core calls
- Tumor resolution detection:
ov.single.autoResolution(adata, cpus=4). - Drug response runner:
ov.single.Drug_Response(adata, scriptpath='CaDRReS-Sc', modelpath='models/', output='result').
- Tumor resolution detection:
- Result checks
- Inspect clustering and IC50 outputs stored under
output; cross-reference with inferred CNV states.
- Inspect clustering and IC50 outputs stored under
- Resources
- Requires external CaDRReS-Sc environment (Python/R dependencies) and storage for model downloads.
- Running inferCNV preprocessing may need multiple CPUs and substantial RAM.
- Optional validation / exports
- Persist intermediate
AnnData(adata.write('scanpyobj.h5ad')) to reuse for downstream analyses or re-runs.
- Persist intermediate
SCENIC regulon discovery (t_scenic.ipynb)
- Prerequisites
- Mouse hematopoiesis dataset loaded via
ov.single.mouse_hsc_nestorowa16()(or provide preprocessed data with raw counts). - Download cisTarget ranking databases (
*.feather) and motif annotations (motifs-*.tbl) for the species; allocate3 GB disk space and verify paths (
db_glob,motif_path).
- Mouse hematopoiesis dataset loaded via
- Core calls
- Initialize analysis:
ov.single.SCENIC(adata, db_glob=..., motif_path=..., n_jobs=12). - Run RegDiffusion-based GRN inference, regulon pruning, and AUCell scoring via the SCENIC object methods.
- Initialize analysis:
- Result checks
- Examine regulon activity matrices (
scenic_obj.auc_mtx.head()), RSS scores, and embeddings colored by regulon activity. - Use RSS plots, dendrograms, and AUCell distributions to interpret TF specificity and activity thresholds.
- Examine regulon activity matrices (
- Resources
- Multi-core CPU recommended (
n_jobsmatches available cores); ensure enough RAM for motif enrichment. - Large downloads and intermediate objects (pickle/h5ad) require disk space.
- Multi-core CPU recommended (
- Optional validation / exports
- Save
scenic_obj(ov.utils.save) and regulon AnnData (regulon_ad.write). - Optional plots: RSS per cell type, regulon embeddings, AUC histograms with threshold lines, GRN network visualizations.
- Save
cNMF program discovery (t_cnmf.ipynb)
- Prerequisites
- Preprocess with HVG selection (
ov.pp.preprocess), scaling (ov.pp.scale), PCA, and have UMAP embeddings for inspection. - Select component range (e.g.,
np.arange(5, 11)) and iterations; ensure output directory exists.
- Preprocess with HVG selection (
- Core calls
- Instantiate analysis:
ov.single.cNMF(..., output_dir='...', name='...'). - Factorization workflow:
cnmf_obj.factorize(...),cnmf_obj.combine(...),cnmf_obj.k_selection_plot(),cnmf_obj.consensus(...). - Extract results:
cnmf_obj.load_results(...),cnmf_obj.get_results(...), optional RF classifier viaget_results_rfc.
- Instantiate analysis:
- Result checks
- Evaluate stability via K-selection plot and local density histogram; confirm chosen K with consensus heatmaps.
- Inspect topic usage embeddings (
ov.pl.embedding), cluster labels, and dotplots of top genes.
- Resources
- Multiple iterations and components are CPU-heavy; consider distributing workers (
total_workers) and verifying disk space for intermediate factorization files.
- Multiple iterations and components are CPU-heavy; consider distributing workers (
- Optional validation / exports
- Visualizations include Euclidean distance heatmaps, density histograms, UMAP overlays for topics/clusters, and dotplots.
NOCD overlapping communities (t_nocd.ipynb)
- Prerequisites
- Prepare AnnData via
ov.single.scanpy_lazy(automated preprocessing) before running NOCD. - Note: Tutorial warns NOCD implementation is under active development—expect variability.
- Prepare AnnData via
- Core calls
- Pipeline wrapper:
scbrca = ov.single.scnocd(adata)followed by chained methods (matrix_transform,matrix_normalize,GNN_configure,GNN_preprocess,GNN_model,GNN_result,GNN_plot,cal_nocd,calculate_nocd).
- Pipeline wrapper:
- Result checks
- Compare standard Leiden clusters versus NOCD outputs on UMAP embeddings to identify multi-fate cells.
- Resources
- Graph neural network stages can be GPU-accelerated; ensure CUDA availability or be prepared for longer CPU runtimes.
- Track memory usage when constructing large adjacency matrices.
- Optional validation / exports
- Generate multiple UMAP overlays (
sc.pl.umap) fornocd,nocd_n, and Leiden labels using shared color maps.
- Generate multiple UMAP overlays (
Lazy pipeline & reporting (t_lazy.ipynb)
- Prerequisites
- Install OmicVerse ≥1.7.0 with lazy utilities; supported species currently human/mouse.
- Prepare batch metadata (
sample_key) and optionally initialize hybrid compute (ov.settings.cpu_gpu_mixed_init()).
- Core calls
- Turnkey preprocessing:
ov.single.lazy(adata, species='mouse', sample_key='batch', ...)with optionalreforce_stepsand module-specific kwargs. - Reporting:
ov.single.generate_scRNA_report(...)to build HTML summary;ov.generate_reference_table(adata)for citation tracking.
- Turnkey preprocessing:
- Result checks
- Inspect generated embeddings (
ov.pl.embedding) for quality and annotation alignment. - Review HTML report for QC metrics, normalization, batch correction, and embeddings.
- Inspect generated embeddings (
- Resources
- Steps like Harmony or scVI may invoke GPU; confirm hardware availability or adjust
reforce_stepsaccordingly. - Report generation writes to disk; ensure output path is writable.
- Steps like Harmony or scVI may invoke GPU; confirm hardware availability or adjust
- Optional validation / exports
- Customize embeddings by color key; store HTML report and reference table alongside project documentation.