| name | bulk-rna-seq-deconvolution-with-bulk2single |
| title | Bulk RNA-seq deconvolution with Bulk2Single |
| description | Turn bulk RNA-seq cohorts into synthetic single-cell datasets using omicverse's Bulk2Single workflow for cell fraction estimation, beta-VAE generation, and quality control comparisons against reference scRNA-seq. |
Bulk RNA-seq deconvolution with Bulk2Single
Overview
Use this skill when a user wants to reconstruct single-cell profiles from bulk RNA-seq together with a matched reference scRNA-seq atlas. It follows t_bulk2single.ipynb, which demonstrates how to harmonise PDAC bulk replicates, train the beta-VAE generator, and benchmark the output cells against dentate gyrus scRNA-seq.
Instructions
- Load libraries and data
- Import
omicverse as ov,scanpy as sc,scvelo as scv,anndata, andmatplotlib.pyplot as plt, then callov.plot_set()to match omicverse styling. - Read the bulk counts table with
ov.read(...)/ov.utils.read(...)and harmonise gene identifiers viaov.bulk.Matrix_ID_mapping(<df>, 'genesets/pair_GRCm39.tsv'). - Load the reference scRNA-seq AnnData (e.g.,
scv.datasets.dentategyrus()) and confirm the cluster labels (stored inadata.obs['clusters']).
- Import
- Initialise the Bulk2Single model
- Instantiate
ov.bulk2single.Bulk2Single(bulk_data=bulk_df, single_data=adata, celltype_key='clusters', bulk_group=['dg_d_1', 'dg_d_2', 'dg_d_3'], top_marker_num=200, ratio_num=1, gpu=0). - Explain GPU selection (
gpu=-1forces CPU) and howbulk_groupnames align with column IDs in the bulk matrix.
- Instantiate
- Estimate cell fractions
- Call
model.predicted_fraction()to run the integrated TAPE estimator, then plot stacked bar charts per sample to validate proportions. - Encourage saving the fraction table for downstream reporting (
df.to_csv(...)).
- Call
- Preprocess for beta-VAE
- Execute
model.bulk_preprocess_lazy(),model.single_preprocess_lazy(), andmodel.prepare_input()to produce matched feature spaces. - Clarify that the lazy preprocessing expects raw counts; skip if the user has already log-normalised data and instead provide aligned matrices manually.
- Execute
- Train or load the beta-VAE
- Train with
model.train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_vae', generate_save_dir='...', generate_save_name='dg'). - Mention early stopping via
patienceand how to resume by reloading weights withmodel.load('.../dg_vae.pth'). - Use
model.plot_loss()to monitor convergence.
- Train with
- Generate and filter synthetic cells
- Produce an AnnData using
model.generate()and reduce noise throughmodel.filtered(generate_adata, leiden_size=25). - Store the filtered AnnData (
.write_h5ad) for reuse, noting it contains PCA embeddings inobsm['X_pca'].
- Produce an AnnData using
- Benchmark against the reference atlas
- Plot cell-type compositions with
ov.bulk2single.bulk2single_plot_cellprop(...)for both generated and reference data. - Assess correlation using
ov.bulk2single.bulk2single_plot_correlation(single_data, generate_adata, celltype_key='clusters'). - Embed with
generate_adata.obsm['X_mde'] = ov.utils.mde(generate_adata.obsm['X_pca'])and visualise viaov.utils.embedding(..., color=['clusters'], palette=ov.utils.pyomic_palette()).
- Plot cell-type compositions with
- Troubleshooting tips
- If marker selection fails, increase
top_marker_numor provide a curated marker list. - Alignment errors typically stem from mismatched
bulk_groupnames—double-check column IDs in the bulk matrix. - Training on CPU can take several hours; advise switching
gputo an available CUDA device for speed.
- If marker selection fails, increase
Examples
- "Estimate cell fractions for PDAC bulk replicates and generate synthetic scRNA-seq using Bulk2Single."
- "Load a pre-trained Bulk2Single model, regenerate cells, and compare cluster proportions to the dentate gyrus atlas."
- "Plot correlation heatmaps between generated cells and reference clusters after filtering noisy synthetic cells."
References
- Tutorial notebook:
t_bulk2single.ipynb - Example data and weights:
omicverse_guide/docs/Tutorials-bulk2single/data/ - Quick copy/paste commands:
reference.md