Project Title: Gene set analysis for the development of pan-cancer radiotherapy response scores.
Supervisor: Dr. Shirin A. Enger
PhD Student Supervisor: Harry Glickman
Hypothesis:
The response to radiotherapy depends on highly complex pan-cancer genomic pathways. Precision radiotherapy prediction modelling studies generally focus on relatively small samples of single cancer types, without sufficient sample size to reliably model this complexity. We hypothesize that we can increase effective sample size and prediction accuracy by incorporating pan-cancer information.
Research Gap:
HG’s first manuscript, under review, explored the predictive value of pan-cancer gene expression information. It used the Molecular Signatures Database hallmark gene sets applied to The Cancer Genome Atlas (TCGA) radiotherapy patients. It used the singscore algorithm to generate gene set scores.
Other gene sets and gene set analysis methods could have been used. How do the results change when these choices are changed?
Learning Objectives:
A) Scientific Writing and Reporting:
● Scientific writing, scientific presentation, LaTeX.
● Scientific reporting requirements, TRIPOD.
B) Data Analysis in R.
● Downloading and manipulating large datasets in R, tidyverse.
● Exploratory analysis, plotting, ggplot2.
● Handling of missing data, multicollinearity.
C) Gene Set Analysis:
● Gene sets, gene set databases, MSigDB, GO.
● Gene set analysis methods, GSEA, singscore, others.
D) Prediction Modelling:
● Survival modeling, coxph, rms.
● Sample size calculations and requirements, dimensionality reduction.
● Model development and validation, performance metrics.
Work Plan:
● Biweekly meetings with HG to present progress/work through obstacles.
● Progress to be documented in PPT slides and on Overleaf.
Work to proceed in 3 phases, each with interim goals and deliverables.
1) Literature/Code Review:
Interim Goal: identify candidate gene sets, gene set algorithms, and R packages for prediction modelling.
● Identify existing gene sets, especially those present in the MSigDB & GO databases, and explore related R packages.
● Identify gene set analysis methodologies and explore related R packages.
● Explore how these gene sets might relate to radiotherapy response based on literature.
2) Exploratory Analysis:
Interim Goal: exploratory analysis of transcriptomic information in TCGA.
● Download TCGA transcriptomic & clinical data.
● Apply chosen gene set analysis R packages.
● Exploratory analysis of gene set results.
● Sample size calculations for prediction modeling.
● Optional: DNA mutations, CNV, miRNA.
3) Gene Set Analyses:
Goal: develop multivariable prediction models based on identified gene sets.
● Develop CoxPH prediction models.
● Design validation framework, explore predictive performance.
● Optional: develop parametric survival models, ML models.
Related Readings and R Packages:
Gene Sets:
● Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database hallmark gene set collection. Cell systems. 2015 Dec 23;1(6):417-25. ● https://www.gsea-msigdb.org/gsea/msigdb ● https://bioconductor.posit.co/packages/release/data/experiment/html/msigdb.html
● Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, Ebert D, Feuermann M, Gaudet P, Harris NL, Hill DP. The gene ontology knowledgebase in 2023. Genetics. 2023 May 2;224(1):iyad031. ● https://geneontology.org/ ● https://cran.r-project.org/web/packages/GOxploreR/index.html ● https://bioconductor.org/packages/release/bioc/html/topGO.html
Gene Set Analysis:
● Foroutan M, Bhuva DD, Lyu R, Horan K, Cursons J, Davis MJ. Single sample scoring of molecular phenotypes. BMC bioinformatics. 2018 Nov 6;19(1):404. ● https://www.bioconductor.org/packages//release/bioc/html/singscore.html
● Mathur R, Rotroff D, Ma J, Shojaie A, Motsinger-Reif A. Gene set analysis methods: a systematic comparison. BioData mining. 2018 May 31;11(1):8. ● https://github.com/alserglab/fgsea
Regression Modelling Strategies Textbook: ● https://hbiostat.org/rmsc/ ● https://cran.r-project.org/web/packages/rms/index.html
TRIPOD reporting requirements:
● Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Annals of internal medicine. 2015 Jan 6;162(1):W1-73. ● https://www.equator-network.org/reporting-guidelines/tripod-statement/
Sample Size Calculations:
● Riley RD, Snell KI, Ensor J, Burke DL, Harrell Jr FE, Moons KG, Collins GS. Minimum sample size for developing a multivariable prediction model: PART II‐binary and time‐to‐event outcomes. Statistics in medicine. 2019 Mar 30;38(7):1276-96.
● Riley RD, Ensor J, Snell KI, Harrell FE, Martin GP, Reitsma JB, Moons KG, Collins G,
Van Smeden M. Calculating the sample size required for developing a clinical prediction model. Bmj. 2020 Mar 18;368. ● https://cran.r-project.org/web/packages/pmsampsize/index.html
(Optional) Missing data:
● Heymans MW, Twisk JW. Handling missing data in clinical research. Journal of clinical epidemiology. 2022 Nov 1;151:185-8. ● https://cran.r-project.org/web/packages/mice/index.html
