- Berke from Superbio
- Posts
- Foundation models for single-cell genomics
Foundation models for single-cell genomics
Generative AI, not just for language anymore.

Hi there,
Use single-cell genomics for your research? Time to walk through our most powerful app for genomics analysis.
Last year, we collaborated with Haotian Cui (Bo Wang lab, U of T) on developing a user-friendly interface for their most advanced AI model yet: scGPT. Trained on a giant repository of over 33 million cells, scGPT is a large generative pre-trained model that achieves state-of-the-art performance on single-cell genomics analysis tasks.
We’ve been floored by what our users have been able to do with this model, and are thrilled to illustrate its capabilities below 👇
🧪 Characterize new samples using cell embeddings

Fig 1e, Cui et al (2024), Nature Methods.
scGPT is a generative AI model capable of learning statistical relationships between cell and gene types simultaneously. Above, its creators share a UMAP visualization of the learned cell embeddings using a random subset of the training data, showing ~3 million cells from 51 tissues overall.
Using the model, users can easily transfer learned metadata on cell type and disease condition to a new cell sample - enabling rapid characterization. Simply drop in an .h5ad file of interest and map your sample here.
🧫 Use transfer learning for cell type annotation

Fig 2a, Cui et al (2024), Nature Methods.
Cui et al. (Nature Methods, 2024) demonstrate the power of transfer learning several times over. While the pre-trained all human scGPT preforms well on multiple analysis tasks, fine-tuning on matched tissue samples significantly improved cell type prediction across multiple datasets: pancreas, immune-MS, and tumor-infiltrating lymphocytes.
Superbio has implemented a fine-tuning scGPT app for cell type annotation, and provided the same multiple sclerosis data (with source linked) as demo data. We recommend finding two files from the same tissue - a control sample for fine-tuning, and a test sample for annotation prediction. Infer cell type annotations here.
🧬 Discover genetic regulatory networks

Fig 5a and b, Cui et al (2024), Nature Methods.
Epigenetic regulation involves a complex web of interactions between transcriptional machinery, regulatory elements, and their genetic targets. Fine-mapping of this interplay has been challenging due to the highly cell-type and cell-state specific nature of such programs. Using gene token embeddings, Cui et al. show that scGPT is capable of genetic network inference (GRN), uncovering cell-type specific programs.
Though the model is capable of zero-shot predictions, fine-tuning outperforms once again. Superbio provides a set of scGPT-GRN models fine-tuned on blood, brain, heart, kidney, lung, and pan-cancer tissue respectively - which we recommend matching to your own data. Try launching scGPT-GRN now.
That’s a wrap on our most exciting single-cell app to date. Give it a go with some data from CELLxGENE, and let us know what you think!
Stay curious,
Berke from Superbio
P.S. - Want to learn about more of Superbio’s features?
Find out detailed scGPT tutorials here.