Cellina is a dual-encoder variational autoencoder for predicting how a cell's gene expression changes under altered spatial contexts — a class of queries we call tissue graph counterfactuals.
In tissues, a cell's transcriptional state is shaped by its local neighborhood: the composition of nearby cells and the signals they emit. Existing perturbation methods typically treat cells as independent and apply perturbations uniformly. Cellina addresses this gap by explicitly separating a cell's intrinsic state (z, encoding cell identity) from its spatial context (s, encoding microenvironmental influence), then uses s as a conditioning input to render counterfactual predictions under two types of intervention:
- Edge perturbation — rewire a cell's neighborhood (replace neighbors with those from a different domain)
- Node perturbation — modify the expression of existing neighbors (e.g. pathway activation or knockout)
Follow the worked tutorials for end-to-end examples on colorectal cancer tissue: Cellina and Cellina-GAT (or run them locally from docs/tutorial.ipynb and docs/tutorial_gat.ipynb).
Generative model. Cellina is a VAE with two latent variables. An MLP encoder
Supervised disentanglement. Optimizing the ELBO alone does not prevent
- A cell-type classifier on
$z$ anchors it to transcriptional identity. - An adversarial discriminator is trained to predict spatial domain from
$z$ ; the encoder is then trained to fool it, routing microenvironmental variation to$s$ by elimination. - A graph-supervised contrastive loss
$s$ (CellinaGCN only, optional), as a biologically grounded inductive bias that promotes similarity within local neighbourhoods. Enabled by settinglink_prediction_weight > 0.
Training alternates between a discriminator step (encoder frozen) and a VAE step (discriminator frozen), following a standard adversarial schedule.
Two variants differ in how the spatial encoder is implemented:
| Code class | Paper name | Spatial encoder |
|---|---|---|
Cellina |
Cellina | Degree-normalized weighted pseudobulk aggregation of neighbor expression → MLP |
CellinaGCN |
Cellina-GAT | Multi-layer GATv2 on the local subgraph; self-loops excluded so |
The two variants perform on par. Cellina decouples neighborhood construction from training and scales similarly to non-spatial baselines; CellinaGCN learns attention over each subgraph at additional cost per step.
Cellina supports two post-training interventions on the spatial graph
Edge perturbation replaces a cell's spatial neighbourhood with donors sampled from a target tissue domain, while keeping the cell's own expression fixed:
Node perturbation modifies the feature vectors of
See the Cellina and Cellina-GAT tutorials for full worked examples.
See the changelog.
Cellina ships two conda environments: environment.yml for the full GPU/CUDA setup, and env_minimal.yml for a lightweight CPU-only install. Create one with conda env create, then follow the tutorials above.
Citation coming soon.
Built on scvi-tools.
If you found a bug, please use the issue tracker.
Copyright (c) 2026, PMBio