| name | single-cell-multi-omics-integration |
| title | Single-cell multi-omics integration |
| description | Quick-reference sheet for OmicVerse tutorials spanning MOFA, GLUE pairing, SIMBA integration, TOSICA transfer, and StaVIA cartography. |
Single-Cell Multi-Omics Tutorials Cheat Sheet
This skill walk-through summarizes the OmicVerse notebooks that cover paired and unpaired multi-omic integration, multi-batch embedding, reference transfer, and trajectory cartography.
MOFA on paired scRNA + scATAC (t_mofa.ipynb)
- Data preparation: Load preprocessed AnnData objects for RNA (
rna_p_n_raw.h5ad) and ATAC (atac_p_n_raw.h5ad) withov.utils.read, and initialisepyMOFAwith matchingomicsandomics_namelists. - Model training: Call
mofa_preprocess()to select highly variable features and run the factor model withmofa_run(outfile=...), which exports the learned MOFA+ factors to an HDF5 model file. - Result inspection: Reload downstream AnnData, append factor scores via
ov.single.factor_exact, and explore factor–cluster associations usingfactor_correlation,get_weights, and the plotting helpers inpyMOFAART(plot_r2,plot_cor,plot_factor,plot_weights, etc.). - Export workflow: Persist factors and weights through the MOFA HDF5 artifact and reuse them by instantiating
pyMOFAART(model_path=...)for later annotation or visualisation sessions. - Dependencies & hardware: Requires
mofapy2; plots optionally rely onpymde/scvi-toolsbut run on CPU.
MOFA after GLUE pairing (t_mofa_glue.ipynb)
- Data preparation: Start from GLUE-derived embeddings (
rna-emb.h5ad,atac.emb.h5ad), build aGLUE_pairobject, and runcorrelation()to align unpaired cells before subsetting to highly variable features. - Model training: Instantiate
pyMOFAwith the aligned AnnData objects, runmofa_preprocess(), and save the joint factors throughmofa_run(outfile='models/chen_rna_atac.hdf5'). - Result inspection: Use
pyMOFAARTplus AnnData that now contains the GLUE embeddings to compute factors (get_factors) and visualise variance explained, factor–cluster correlations, and ranked feature weights. - Export workflow: Reuse the saved MOFA HDF5 model for downstream inspection; GLUE embeddings can be embedded with
scvi.model.utils.mde(GPU-accelerated MDE is optional,sc.tl.umapworks on CPU). - Dependencies & hardware: Requires both
mofapy2and the GLUE tooling (scglue,scvi-tools,pymde); GPU acceleration only affects optional MDE visualisation.
SIMBA batch integration (t_simba.ipynb)
- Data preparation: Fetch the concatenated AnnData (
simba_adata_raw.h5ad) derived from multiple pancreas studies and pass it, alongside a results directory, topySIMBA. - Model training: Execute
preprocess(...)to bin features and build a SIMBA-compatible graph, then callgen_graph()followed bytrain(num_workers=...)to launch PyTorch-BigGraph optimisation (can scale with CPU workers) andload(...)to resume trained checkpoints. - Result inspection: Apply
batch_correction()to obtain the harmonised AnnData with SIMBA embeddings (X_simba) and visualise usingmde/sc.tl.umapcoloured by cell type or batch. - Export workflow: Training outputs reside in the workdir (e.g.,
result_human_pancreas/pbg/graph0); reuse them withsimba_object.load(...)for later analyses. - Dependencies & hardware: Requires installing
simbaandsimba_pbg(PyTorch BigGraph backend). GPU is optional; make sure adequate CPU threads and memory are available for graph training.
TOSICA reference transfer (t_tosica.ipynb)
- Data preparation: Download demo AnnData references (
demo_train.h5ad,demo_test.h5ad) and required gene-set GMT files viaov.utils.download_tosica_gmt(); confirm datasets are log-normalised before training. - Model training: Create
pyTOSICAwith the reference AnnData, chosen pathway mask, label key, project directory, and batch size; train withtrain(epochs=...), then persist weights withsave()and optionally reload viaload(). - Result inspection: Generate predictions on query AnnData through
predicted(pre_adata=...), embed with OmicVerse preprocessing and GPU-enabledmde(UMAP fallback available), and explore pathway attention to interpret transformer heads. - Export workflow: Saved project folder keeps model checkpoints and attention summaries; reuse the exported assets to annotate future datasets without retraining from scratch.
- Dependencies & hardware: Needs TOSICA (PyTorch transformer) plus downloaded gene-set masks; avoid setting
depth=2if memory is constrained. GPU acceleration improves embedding (mde) but training runs on standard PyTorch (CPU/GPU depending on environment).
StaVIA trajectory cartography (t_stavia.ipynb)
- Data preparation: Load example dentate gyrus velocity data via
scvelo.datasets.dentategyrus(), preprocess with OmicVerse (preprocess,scale,pca, neighbours, UMAP) to populate the AnnData matrices used by VIA. - Model training: Configure VIA hyperparameters (components, neighbours, seeds, root selection) and instantiate/run
VIA.core.VIAon the chosen representation (adata.obsm['scaled|original|X_pca']). - Result inspection: Store outputs such as pseudotime (
single_cell_pt_markov), cluster graph abstractions, trajectory curves, atlas views, and stream plots through VIA plotting helpers. - Export workflow: Persist derived visualisations and animations (e.g.,
animate_streamplot_ov,animate_atlas) to files (.gif) for reporting; recompute edge bundles viamake_edgebundle_milestonewhen needed. - Dependencies & hardware: Relies on
scvelo,pyVIA, and OmicVerse plotting; computations are CPU-bound though producing large stream/animation outputs benefits from ample memory.