Analysis of breast cancer progression using principal component analysis and clustering

J Biosci. 2007 Aug;32(5):1027-39. doi: 10.1007/s12038-007-0102-4.

Abstract

We develop a new technique to analyse microarray data which uses a combination of principal components analysis and consensus ensemble k-clustering to find robust clusters and gene markers in the data. We apply our method to a public microarray breast cancer dataset which has expression levels of genes in normal samples as well as in three pathological stages of disease; namely, atypical ductal hyperplasia or ADH, ductal carcinoma in situ or DCIS and invasive ductal carcinoma or IDC. Our method averages over clustering techniques and data perturbation to find stable, robust clusters and gene markers. We identify the clusters and their pathways with distinct subtypes of breast cancer (Luminal,Basal and Her2+). We confirm that the cancer phenotype develops early (in early hyperplasia or ADH stage) and find from our analysis that each subtype progresses from ADH to DCIS to IDC along its own specific pathway, as if each was a distinct disease.

Publication types

  • Validation Study

MeSH terms

  • Biomarkers, Tumor / genetics
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / metabolism*
  • Breast Neoplasms / pathology
  • Cluster Analysis
  • Disease Progression
  • Female
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic / physiology
  • Humans
  • Neoplasm Invasiveness / genetics
  • Oligonucleotide Array Sequence Analysis
  • Predictive Value of Tests
  • Principal Component Analysis*
  • Signal Transduction / genetics

Substances

  • Biomarkers, Tumor