GWAS and T2DM
Over the last decade, geneticists have devoted a large amount of effort towards the search for type 2 diabetes genes. The most successful early approaches focussed on fine-mapping candidate genes, yielding a small number of robustly associated common variants (in/near KCNJ11, PPARG, TCF2, WFS1). Common variants in TCF7L2, a gene with no prior biological candidacy for diabetes, were mapped via a linkage peak exploration effort by deCODE genetics. These findings, along with the rapidly evolving technologies for assaying genetic variation on a large scale, fuelled the subsequent explosion of large-scale explorations for T2D genes genome-wide.
What is a Genome-wide Association Study (GWAS)?
Genome wide association studies are a study design that measures several 100,000 SNPs from a single DNA sample in a single experiment. Microarrays, made by a few companies, such as Illumina or Affymetrix, first came on the market in 2005-2007 and “capture” information from the majority of the common genetic variation in human populations. Common is defined as an allele frequency of 5% or more (so occurring in about 10% of us or more). Although there are several million common genetic variants in the human genome many of these variants are highly correlated to each other due to a process called linkage disequilibrium. This means a few 100,000 carefully selected SNPs assayed on a single microarray capture information from a much larger number of polymorphisms. Once the study samples have been genotyped on these arrays, statistical tests can then be performed to identify significant differences in allele frequencies between cases and controls. For example, the ‘T’ allele of the TCF7L2 SNP ‘rs7903146’ is present in ~36% of cases vs ~28% of controls, conveying a two-fold increased disease risk between the two homozygote groups. Given the large number of statistical tests performed genome-wide, very stringent levels of statistical evidence are required to identify true associations over chance findings.
The first genome-wide association (GWA) studies for type 2 diabetes were published in 2007, collectively identifying six new gene regions. After these initial studies were published, groups began to collaborate and meta-analyse their datasets. This process has evolved over time, with the most recent meta-analysis performed in upto 34,840 T2D cases and 114,981 controls. Studies are now harmonised to common genetic reference panels via imputation techniques, yielding ~2.4M hapmap autosomal variants that capture a large proportion of common variation. Currently there are ~70 loci robustly associated with T2D risk, and we’ve learnt several important lessons from the identification of these variants:
The observed effect sizes are much smaller than we might have originally imagined. The largest effect size locus is still TCF7L2 (per-allele odds ratio ~1.4), with the remaining loci exerting smaller effects, with typical ORs ~1.1. Collectively these variants explain only ~6% of variance in disease susceptibility, with the remaining genome-wide data suggesting that there are likely to be many more common variants of diminishing effect size to be found.
In few cases can we pinpoint the functional variants or genes involved. The association signals sign post regions of the genome important for T2D susceptibility, but correlation between variants (“linkage disequilibrium) limits our ability to isolate a specific causal variant (which in many cases will not be assayed) from its correlated neighbours. Few associated SNPs have a known functional impact on the coding sequence (notably SLC30A8 and GCKR), with the majority residing in the non-coding genome. Genes are often annotated to association signals based on either physical proximity or biological candidacy, but rarely on the basis of a functional link to the sequence variation.
The majority of loci have an impact on beta-cell dysfunction. Parallel physiological studies have assessed the impact of T2D associated variants with various parameters of insulin secretion and resistance. A large proportion of these variants have demonstrated a primary effect on insulin secretion, with a handful of examples for resistance. Interestingly, there is only partial overlap between T2D associated variants, and those which impact glucose homeostasis in healthy populations.
Several biological pathways and processes are beginning to merge. In addition to identifying individual genes, data mining processes can interrogate GWAS data to identify trends across defined biological pathways. Emerging associated pathways include CREBBP-related transcription, adipocytokine signalling and cell cycle regulation.
The search for T2D associated genes and variants continues, with many scientists questioning where the ‘missing heritability’ lies. Future efforts will focus on assaying larger amounts of genetic variation through re-sequencing and extended imputation studies, with a particular focus on rare and complex variation not well captured by previous GWAS. Increased study sizes, investigation of additional ancestry groups, T2D case stratification and integration with additional functional/physiological data will all contribute to this search.
^ Grant et al, Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genetics (2006)
^ Zeggini et al, Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genetics (2008).
^ Voight et al, Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nature Genetics (2010).
^ Morris et al, Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature Genetics (2012)
^ Dupuis et al, New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nature Genetics (2010)