| name | bulk-rna-seq-differential-expression-with-omicverse |
| title | Bulk RNA-seq differential expression with omicverse |
| description | Guide Claude through omicverse's bulk RNA-seq DEG pipeline, from gene ID mapping and DESeq2 normalization to statistical testing, visualization, and pathway enrichment. Use when a user has bulk count matrices and needs differential expression analysis in omicverse. |
Bulk RNA-seq differential expression with omicverse
Overview
Follow this skill to run the end-to-end differential expression (DEG) workflow showcased in t_deg.ipynb. It assumes the user provides a raw gene-level count matrix (e.g., from featureCounts) and wants to analyse bulk RNA-seq cohorts inside omicverse.
Instructions
- Set up the session
- Import
omicverse as ov,scanpy as sc, andmatplotlib.pyplot as plt. - Call
ov.plot_set()so downstream plots adopt omicverse styling.
- Import
- Prepare ID mapping assets
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
ov.utils.download_geneid_annotation_pair()and store them undergenesets/. - Mention the available prebuilt genomes (T2T-CHM13, GRCh38, GRCh37, GRCm39, danRer7, danRer11) and that users can generate their own mapping from GTF files if needed.
- When gene IDs must be converted to gene symbols, instruct the user to download mapping pairs via
- Load the raw counts
- Read tab-delimited featureCounts output with
ov.pd.read_csv(..., sep='\t', header=1, index_col=0). - Strip trailing
.bamsegments from column names using list comprehension so sample IDs are clean.
- Read tab-delimited featureCounts output with
- Map gene identifiers
- Run
ov.bulk.Matrix_ID_mapping(counts_df, 'genesets/pair_<GENOME>.tsv')to replacegene_identries with gene symbols.
- Run
- Initialise the DEG object
- Create
dds = ov.bulk.pyDEG(mapped_counts). - Handle duplicate gene symbols with
dds.drop_duplicates_index()to keep the highest expressed version.
- Create
- Normalise and estimate size factors
- Execute
dds.normalize()to calculate DESeq2 size factors, correcting for library size and batch differences.
- Execute
- Run differential testing
- Collect treatment and control replicate labels into lists.
- Call
dds.deg_analysis(treatment_groups, control_groups, method='ttest')for the default Welch t-test. - Offer optional alternatives:
method='edgepy'for edgeR-like tests andmethod='limma'for limma-style modelling.
- Filter and threshold results
- Note that lowly expressed genes are retained by default; filter using
dds.result.loc[dds.result['log2(BaseMean)'] > 1]when needed. - Set dynamic fold-change and significance cutoffs via
dds.foldchange_set(fc_threshold=-1, pval_threshold=0.05, logp_max=6)(fc_threshold=-1auto-selects based on log2FC distribution).
- Note that lowly expressed genes are retained by default; filter using
- Visualise differential expression
- Produce volcano plots with
dds.plot_volcano(title=..., figsize=..., plot_genes=... or plot_genes_num=...)to highlight key genes. - Generate per-gene boxplots using
dds.plot_boxplot(genes=[...], treatment_groups=..., control_groups=..., figsize=..., legend_bbox=...); adjust y-axis tick labels if required.
- Produce volcano plots with
- Perform pathway enrichment (optional)
- Download curated pathway libraries through
ov.utils.download_pathway_database(). - Load genesets with
ov.utils.geneset_prepare(<path>, organism='Mouse'|'Human'|...). - Build the DEG gene list from
dds.result.loc[dds.result['sig'] != 'normal'].index. - Run enrichment with
ov.bulk.geneset_enrichment(gene_list=deg_genes, pathways_dict=..., pvalue_type='auto', organism=...). Encourage users without internet access to provide abackgroundgene list. - Visualise single-library results via
ov.bulk.geneset_plot(...)and combine multiple ontologies usingov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=...).
- Download curated pathway libraries through
- Document outputs
- Suggest exporting
dds.resultand enrichment tables to CSV for downstream reporting. - Encourage users to save figures generated by matplotlib (
plt.savefig(...)) when running outside notebooks.
- Suggest exporting
- Troubleshooting tips
- Ensure sample labels in
treatment_groups/control_groupsexactly match column names post-cleanup. - Verify required packages (
omicverse,pyComplexHeatmap,gseapy) are installed for enrichment visualisations. - Remind users that internet access is required the first time they download gene mappings or pathway databases.
- Ensure sample labels in
Examples
- "I have a featureCounts matrix for mouse tumour samples—normalize it with DESeq2, run t-test DEG, and highlight the top 8 genes in a volcano plot."
- "Use omicverse to compute edgeR-style differential expression between treated and control replicates, then run GO enrichment on significant genes."
- "Guide me through converting Ensembl IDs to symbols, performing limma DEG, and plotting boxplots for Krtap9-5 and Lef1."
References
- Detailed walkthrough notebook:
t_deg.ipynb - Sample count matrix for testing:
sample/counts.txt - Quick copy/paste commands:
reference.md