This repository contains the complete reproducibility pipeline and quality-control re-analysis for the RNA-seq dataset GSE286438. This project was conducted as part of the DH607 (Introduction to Computational Multiomics) course under Professor Saket Choudhary, Koita Centre for Digital Health (KCDH), IIT Bombay.
The objective of this project is to forensically audit the gene expression evidence supporting the claim that Epigallocatechin gallate (EGCG) reverses cellular senescence.
In the original study, it was claimed that EGCG treatment reverses senescence-associated transcriptional signatures. This re-analysis independently evaluates the robustness of those claims by examining data integrity, sample clustering, and the statistical significance of differential expression.
- Source: NCBI GEO GSE286438 / PMC11790594
- Data Type: Bulk RNA-seq
- Experimental Groups:
- NT (Normal/Control): Healthy, proliferating cells.
- SEN (Senescent): Cells induced into a senescent state.
- ST (Senescent + EGCG Treatment): Senescent cells treated with EGCG to test for reversal.
- Normalization: Log-transformation and library size normalization.
- Sample Integrity: Evaluated using Pearson correlation matrices and Principal Component Analysis (PCA) to assess group clustering and potential outliers.
- Pipeline: Implemented using PyDESeq2 for robust statistical modeling.
-
Thresholds: Adjusted
$p$ -value$< 0.05$ and$|\log_2 \text{Fold Change}| \geq 1$ . -
Contrasts: -
SEN vs NT(Quantifying the senescent phenotype)-
ST vs SEN(Evaluating treatment efficacy) -
ST vs NT(Assessing how closely treated cells resemble normal cells)
-
- Targeted analysis of canonical markers including SASP (Senescence-Associated Secretory Phenotype) factors and proliferation-associated genes (
$p21$ ,$p16$ ,$Ki67$ ).
Our forensic re-analysis revealed several critical points that challenge or qualify the original study's narrative:
- Incomplete Transcriptional Reversal: The EGCG-treated group (ST) showed only partial movement back toward the control state (NT).
- Persistent Senescence Markers: Several key biomarkers associated with senescence remained significantly upregulated even after EGCG treatment.
- Condition Mixing: PCA and correlation analysis revealed partial mixing between ST and SEN samples, suggesting that the treatment effect may not be strong enough to clearly separate the groups transcriptomically.
- Statistical Scrutiny: Re-analysis via
PyDESeq2identified fewer significant genes in theST vs SENcontrast than might be expected for a "complete reversal" of phenotype.
Forensic-QC-Audit-EGCG-Senescence/
├── Deliverables/
│ ├── DH607_Poster_24B2176.pdf # Research poster for presentation
│ └── DH607_Project_Report_24B2176.pdf # Comprehensive final project report
├── Notebooks/
│ ├── Differential Gene Expression.ipynb # Main analysis notebook (DGE)
│ └── phase1.ipynb # Initial data processing and exploration
├── Reference/
│ ├── fcvm-11-1506360.pdf # Primary reference literature
│ └── supplementary_fcvm-11-1506360.pdf # Supplementary material for reference
├── Results/
│ ├── DGE_Results_EGCG_vs_Senescence.csv # Output data from the DGE analysis
│ └── Enrichment_UpRegulated.csv # Gene set enrichment analysis results
├── GSE286438_Counts_matrix_Patel_... # Raw counts matrix (Dataset)
├── LICENSE # Repository licensing information
└── README.md # Project documentation (this file)
- Clone the repository:
git clone [https://github.com/](https://github.com/)[Your-Username]/Forensic-QC-Audit-EGCG-Mediated-Senescence-Reversal.git
- Install dependencies:
pip install pydeseq2 gprofiler-official pandas matplotlib seaborn numpy
- Execute the analysis: Open and run the Jupyter notebooks in the following order: phase1.ipynb → Differential Gene Expression.ipynb.