GEMDiff: A Diffusion workflow bridges between breast normal and tumor gene expression states
Gene expression perturbation from tumor state back to normal.
How it works
Using tumor state gene expression as input, we apply a transformer-based diffusion model conditioned on normal state gene expression to transform tumor gene expression back to a normal state. Additionally, GEMDiff generates synthetic bulk RNA gene expression starting from Gaussian noise. Model pipeline: 1. Data processing: includes replacing the NA values, applying log transformation, and normalization; 2. Cluster quality assessment: examined by silhouette scores to selected well-clustered gene sets; 3. Model training: utilizes selected gene sets for training the diffusion model; 4. Gene augmentation/Gene perturbation: implemented data augmentation or gene perturbation according to the task types and employed UMAP plots for outcome display; 5. Evaluation: Validate the core genes with gene function enrichment analysis..
Augumentation Results
Cover range of gene features from 8 to 256.Core gene found by GEMDiff related with Breast cancer
We identified the 307 "core genes" that were the most perturbed in the transition from tumor back to normal states (+/- 2 SD from the mean). In order to determine if the collective function of the 307 "core genes" was associated with breast cancer, we performed functional enrichment analysis. Note the enrichment of breast cancer 'Disease' terms.
Method overview
Acknowledgements
We would like to thank Dr. Siyu Huang for the instruction of this work.