| name | scvitools |
| description | Comprehensive skill for scvi-tools - Deep probabilistic models for single-cell omics analysis. Use for scVI, scANVI, totalVI, MultiVI models, single-cell RNA-seq integration, batch correction, differential expression, and multimodal data analysis. |
Scvitools Skill
Comprehensive assistance with scvi-tools development and single-cell omics analysis using deep probabilistic models.
When to Use This Skill
This skill should be triggered when:
Core scvi-tools Tasks:
- Working with scvi-tools models (scVI, scANVI, totalVI, MultiVI, etc.)
- Setting up AnnData objects for scvi-tools analysis
- Performing batch correction and data integration
- Running differential expression analysis
- Analyzing single-cell RNA-seq, ATAC-seq, or multimodal data
- Implementing custom model classes or modules
Analysis and Visualization:
- Getting latent representations and embeddings
- Creating UMAP/tSNE visualizations from scvi-tools outputs
- Interpreting model results and biological insights
- Working with spatial transcriptomics data
Development and Advanced Tasks:
- Building custom scvi-tools models
- Hyperparameter tuning with scvi.autotune
- Model evaluation and benchmarking
- Integration with Scanpy workflows
- Debugging scvi-tools code and installation issues
Data Processing:
- Preprocessing single-cell data for scvi-tools
- Setting up data registration with setup_anndata
- Handling batch effects and covariates
- Working with count data and normalization
Quick Reference
Core Model Setup and Training
Basic scVI Model Setup
import scvi
import scanpy as sc
# Setup AnnData for scVI
scvi.model.SCVI.setup_anndata(
adata,
batch_key="batch",
labels_key="cell_type"
)
# Create and train model
model = scvi.model.SCVI(adata)
model.train()
# Get latent representation
latent = model.get_latent_representation()
adata.obsm["X_scVI"] = latent
Differential Expression Analysis
# 1-vs-1 DE test
de_results = model.differential_expression(
groupby="cell_type",
group1="T-cell",
group2="B-cell"
)
# 1-vs-all DE test
de_results_all = model.differential_expression(
groupby="cell_type",
mode="change"
)
Data Integration and Batch Correction
Multimodal Data with totalVI
import scvi
from scvi.external import TOTALVI
# Setup for RNA + protein data
TOTALVI.setup_anndata(
adata,
batch_key="batch",
protein_expression_obsm_key="protein_expression"
)
model = TOTALVI(adata)
model.train()
# Get normalized RNA and protein
rna_norm = model.get_normalized_expression()
protein_norm = model.get_protein_foregrounds()
Spatial Transcriptomics with GIMVI
from scvi.external import GIMVI
# Setup spatial and seq data
spatial_adata = ... # spatial data
seq_adata = ... # single-cell seq data
model = GIMVI(seq_adata, spatial_adata)
model.train()
# Get latent representations for both modalities
spatial_latent = model.get_latent_representation(spatial_adata)
seq_latent = model.get_latent_representation(seq_adata)
Advanced Model Customization
Custom Model Class
from scvi.model.base import BaseModelClass, UnsupervisedTrainingMixin
from scvi.module import VAE
class CustomModel(UnsupervisedTrainingMixin, BaseModelClass):
def __init__(self, adata, n_latent=30):
super().__init__(adata)
self.module = VAE(
n_input=self.summary_stats["n_vars"],
n_batch=self.summary_stats["n_batch"],
n_latent=n_latent,
)
self._model_summary_string = f"CustomModel with n_latent: {n_latent}"
self.init_params_ = self._get_init_params(locals())
@classmethod
def setup_anndata(cls, adata, batch_key=None, layer=None):
setup_method_args = cls._get_setup_method_args(**locals())
anndata_fields = [
LayerField(REGISTRY_KEYS.X_KEY, layer, is_count_data=True),
CategoricalObsField(REGISTRY_KEYS.BATCH_KEY, batch_key),
]
adata_manager = AnnDataManager(fields=anndata_fields, setup_method_args=setup_method_args)
adata_manager.register_fields(adata, **kwargs)
cls.register_manager(adata_manager)
Hyperparameter Tuning
Automated Hyperparameter Search
import ray
from ray import tune
from scvi import autotune
# Define search space
search_space = {
"model_params": {
"n_hidden": tune.choice([64, 128, 256]),
"n_layers": tune.choice([1, 2, 3])
},
"train_params": {
"max_epochs": 100,
"plan_kwargs": {"lr": tune.loguniform(1e-4, 1e-2)}
}
}
# Run tuning
results = autotune.run_autotune(
scvi.model.SCVI,
data=adata,
mode="min",
metrics="validation_loss",
search_space=search_space,
num_samples=5,
resources={"cpu": 10, "gpu": 1}
)
ATAC-seq Analysis
scBasset for scATAC-seq
from scvi.external import ScBasset
# Setup ATAC data
ScBasset.setup_anndata(adata, batch_key="batch")
# Create and train model
model = ScBasset(adata)
model.train()
# Get latent representation
latent = model.get_latent_representation()
adata.obsm["X_scBasset"] = latent
# Score TF activity
tf_activities = model.score_tf_activity("motif_library_path")
Spatial Data Analysis
ResolVI for Spatial Transcriptomics
from scvi.external import RESOLVI
# Setup spatial data
RESOLVI.setup_anndata(
adata,
batch_key="slice",
labels_key="cell_type"
)
# Train model
model = RESOLVI(adata)
model.train()
# Get corrected counts
corrected_counts = model.get_corrected_counts()
# Differential abundance in spatial neighborhoods
da_results = model.differential_abundance(
groupby="cell_type",
group1="neuron_layer1",
group2="neuron_layer2"
)
Model Evaluation and Visualization
Model Quality Assessment
# Get ELBO (reconstruction quality)
elbo_score = model.get_elbo()
# Get reconstruction error
recon_error = model.get_reconstruction_error()
# Visualize latent space
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
sc.pl.umap(adata, color=["cell_type", "batch"])
Installation and Setup
Installation with GPU Support
# Basic CPU installation
pip install scvi-tools
# GPU support for Linux
pip install scvi-tools[cuda]
# Apple Silicon (MPS) support
pip install scvi-tools[metal]
# Full installation with all dependencies
pip install scvi-tools[all,tutorials,jax]
Environment Setup
import scvi
import torch
# Set seed for reproducibility
scvi.settings.seed = 0
# Configure GPU settings
torch.set_float32_matmul_precision("high")
# Check version
print(f"scvi-tools version: {scvi.__version__}")
Key Concepts
Core Models:
- scVI: Single-cell Variational Inference for batch correction and integration
- scANVI: Semi-supervised scVI for cell type annotation
- totalVI: Total Variational Inference for joint RNA + protein (CITE-seq) data
- MultiVI: Multimodal Variational Inference for paired + unpaired data
- GIMVI: Generative Integrative Modeling for spatial + single-cell data
Data Structures:
- AnnData: Core data structure for single-cell data
- AnnDataManager: scvi-tools data registry and validation system
- setup_anndata(): Required preprocessing step to register data with scvi-tools
Model Architecture:
- BaseModelClass: Abstract base for all scvi-tools models
- BaseModuleClass: Abstract base for neural network modules
- VAEMixin: Provides VAE-specific methods (get_latent_representation, etc.)
Reference Files
references/api_reference.md
Comprehensive API Documentation - Complete reference for all classes and methods:
- Developer API for custom model building
- Data registration utilities (AnnDataManager, AnnDataFields)
- Model base classes and mixins
- Module building blocks (encoders, decoders)
- Training plans and utilities
references/getting_started.md
Installation and Tutorials - Entry points for learning scvi-tools:
- Complete installation guide (CPU, GPU, dependencies)
- Introduction to scvi-tools workflow
- gimVI tutorial for spatial transcriptomics
- Basic setup and data preparation examples
references/tutorials.md
In-depth Tutorial Collection - 60+ pages of detailed tutorials:
- Topic modeling with Amortized LDA
- scBasset for scATAC-seq analysis
- ResolVI for spatial transcriptomics correction
- SHAP and IntegratedGradients for model interpretability
- Advanced use cases and specialized analyses
references/user_guide.md
Comprehensive User Guide - Detailed workflow documentation:
- Complete scvi-tools workflow overview
- Data loading and preprocessing best practices
- Model creation, training, and saving
- Integration with Scanpy for downstream analysis
- Visualization and interpretation techniques
Working with This Skill
For Beginners
Start Here:
- Read the installation guide in
references/getting_started.md - Follow the basic scvi-tools tutorial for data setup
- Practice with the Quick Reference examples above
- Focus on basic scVI workflows first
Recommended Learning Path:
- Install scvi-tools and verify setup
- Load a sample dataset and run
setup_anndata() - Create and train a basic scVI model
- Extract latent representations and create visualizations
- Perform simple differential expression analysis
For Intermediate Users
Expand Your Skills:
- Explore multimodal models (totalVI, MultiVI)
- Learn hyperparameter tuning with
scvi.autotune - Practice batch correction with complex datasets
- Implement custom model classes
- Work with spatial transcriptomics data
Common Tasks:
- Setting up covariates and batch effects
- Choosing appropriate model parameters
- Evaluating model quality and convergence
- Integrating with existing Scanpy workflows
For Advanced Users
Advanced Features:
- Build custom model architectures
- Implement new modules and training plans
- Use Pyro-based models for Bayesian analysis
- Develop specialized analysis pipelines
- Contribute to scvi-tools development
Expert Resources:
- Developer API documentation in
references/api_reference.md - Advanced tutorials for specialized applications
- Model architecture and extension guides
Common Workflows
Standard scVI Analysis
# 1. Data preparation
sc.pp.filter_genes(adata, min_counts=3)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
# 2. Setup with scvi-tools
scvi.model.SCVI.setup_anndata(adata, batch_key="batch")
# 3. Model training
model = scvi.model.SCVI(adata)
model.train()
# 4. Downstream analysis
adata.obsm["X_scVI"] = model.get_latent_representation()
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
Quality Control and Troubleshooting
Model Training Issues:
- Check data preprocessing (use raw counts, not normalized)
- Verify
setup_anndata()was called correctly - Monitor training loss and convergence
- Adjust learning rate and architectural parameters
Data Integration Problems:
- Ensure proper batch key registration
- Check for sufficient shared features across batches
- Consider using scANVI for semi-supervised integration
- Validate integration quality with biological markers
Resources
Installation and Environment
- Virtual environment setup recommended
- GPU support available for CUDA and Apple Silicon
- Optional dependencies for specialized features
Community and Support
- Official scvi-tools documentation
- GitHub repository for issues and contributions
- Community forums and discussion boards
- Tutorial notebooks and examples
Performance Optimization
- GPU acceleration for model training
- Memory-efficient data loading
- Distributed training options
- Hyperparameter tuning best practices
Notes
- This skill covers scvi-tools v1.3+ features
- Always use raw count data as input to scvi-tools models
setup_anndata()must be called before model initialization- Models train faster with GPU acceleration when available
- Integration with Scanpy provides seamless downstream analysis workflows