name

scvi-tools

description

Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

scvi-tools Deep Learning Skill

This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.

How to Use This Skill

Identify the appropriate workflow from the model/workflow tables below
Read the corresponding reference file for detailed steps and code
Use scripts in scripts/ to avoid rewriting common code
For installation or GPU issues, consult references/environment_setup.md
For debugging, consult references/troubleshooting.md

When to Use This Skill

When scvi-tools, scVI, scANVI, or related models are mentioned
When deep learning-based batch correction or integration is needed
When working with multi-modal data (CITE-seq, multiome)
When reference mapping or label transfer is required
When analyzing ATAC-seq or spatial transcriptomics data
When learning latent representations of single-cell data

Model Selection Guide

Data Type	Model	Primary Use Case
scRNA-seq	scVI	Unsupervised integration, DE, imputation
scRNA-seq + labels	scANVI	Label transfer, semi-supervised integration
CITE-seq (RNA+protein)	totalVI	Multi-modal integration, protein denoising
scATAC-seq	PeakVI	Chromatin accessibility analysis
Multiome (RNA+ATAC)	MultiVI	Joint modality analysis
Spatial + scRNA reference	DestVI	Cell type deconvolution
RNA velocity	veloVI	Transcriptional dynamics
Cross-technology	sysVI	System-level batch correction

Workflow Reference Files

Workflow	Reference File	Description
Environment Setup	`references/environment_setup.md`	Installation, GPU, version info
Data Preparation	`references/data_preparation.md`	Formatting data for any model
scRNA Integration	`references/scrna_integration.md`	scVI/scANVI batch correction
ATAC-seq Analysis	`references/atac_peakvi.md`	PeakVI for accessibility
CITE-seq Analysis	`references/citeseq_totalvi.md`	totalVI for protein+RNA
Multiome Analysis	`references/multiome_multivi.md`	MultiVI for RNA+ATAC
Spatial Deconvolution	`references/spatial_deconvolution.md`	DestVI spatial analysis
Label Transfer	`references/label_transfer.md`	scANVI reference mapping
scArches Mapping	`references/scarches_mapping.md`	Query-to-reference mapping
Batch Correction	`references/batch_correction_sysvi.md`	Advanced batch methods
RNA Velocity	`references/rna_velocity_velovi.md`	veloVI dynamics
Troubleshooting	`references/troubleshooting.md`	Common issues and solutions

CLI Scripts

Modular scripts for common workflows. Chain together or modify as needed.

Pipeline Scripts

Script	Purpose	Usage
`prepare_data.py`	QC, filter, HVG selection	`python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch`
`train_model.py`	Train any scvi-tools model	`python scripts/train_model.py prepared.h5ad results/ --model scvi`
`cluster_embed.py`	Neighbors, UMAP, Leiden	`python scripts/cluster_embed.py adata.h5ad results/`
`differential_expression.py`	DE analysis	`python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden`
`transfer_labels.py`	Label transfer with scANVI	`python scripts/transfer_labels.py ref_model/ query.h5ad results/`
`integrate_datasets.py`	Multi-dataset integration	`python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad`
`validate_adata.py`	Check data compatibility	`python scripts/validate_adata.py data.h5ad --batch-key batch`

Example Workflow

# 1. Validate input data
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest

# 2. Prepare data (QC, HVG selection)
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000

# 3. Train model
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch

# 4. Cluster and visualize
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8

# 5. Differential expression
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden

Python Utilities

The scripts/model_utils.py provides importable functions for custom workflows:

Function	Purpose
`prepare_adata()`	Data preparation (QC, HVG, layer setup)
`train_scvi()`	Train scVI or scANVI
`evaluate_integration()`	Compute integration metrics
`get_marker_genes()`	Extract DE markers
`save_results()`	Save model, data, plots
`auto_select_model()`	Suggest best model
`quick_clustering()`	Neighbors + UMAP + Leiden

Critical Requirements

Raw counts required: scvi-tools models require integer count data

adata.layers["counts"] = adata.X.copy()  # Before normalization
scvi.model.SCVI.setup_anndata(adata, layer="counts")

HVG selection: Use 2000-4000 highly variable genes

sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3")
adata = adata[:, adata.var['highly_variable']].copy()

Batch information: Specify batch_key for integration

scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")

Quick Decision Tree

Need to integrate scRNA-seq data?
├── Have cell type labels? → scANVI (references/label_transfer.md)
└── No labels? → scVI (references/scrna_integration.md)

Have multi-modal data?
├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md)
├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md)
└── scATAC-seq only? → PeakVI (references/atac_peakvi.md)

Have spatial data?
└── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md)

Have pre-trained reference model?
└── Map query to reference? → scArches (references/scarches_mapping.md)

Need RNA velocity?
└── veloVI (references/rna_velocity_velovi.md)

Strong cross-technology batch effects?
└── sysVI (references/batch_correction_sysvi.md)