Defining the complete repertoire of mutations driving cancer development and progression
Next generation sequencing technology has heralded new opportunities for cancer genomic research. It is now feasible to survey the entire sequence content of an individual tumour and define the accumulation of somatic mutations and structural variations. We are undertaking the systematically surveying of complete transcriptome complexity, genome sequence content / genome structure and epigenomic signatures in a large cohort of individual Pancreatic Cancers (in collab with A. Biankin, Gavan Institute) and Ovarian Cancers (in collab with D. Bowtell, PeterMac Cancer Institute) as part of the International Cancer Genome Consortium.
Over the last 18 months we have established multi-gigabase scale next-generation sequencing technology and demonstrated its utility for studying gene activity, identifying which transcripts are made from each locus and surveying sequence content of ES cell and HeLa transcriptomes. We are completely surveying the RNA abundance and sequence content (both mRNA and miRNA) and complexity in tumour. We have also developed the computational pipelines and experimental methods to create mate pair libraries for high resolution genome scanning for structural variations (SVs) (insertions, homozygous and heterozygous deletions, translocations, inversions). The central concept for these studies relies on the creation of genomic libraries from the terminal sequences of genomic fragments that are of a uniform length (ie make clones of 25-50bp terminal sequences of 3kb genomic fragments). When both tags of the "mate-pair" are independently mapped, they should be end up being the expected uniform distance apart (ie 3kb). Structural variations to the genome perturb the observed distance between mate-pairs that span an altered region:
Large Insertions: can be identified by mate-pairs mapping closer together than expected.
Large Deletions: can be identified by mate-pairs mapping further apart than expected.
Tandem Duplications: can be identified by mate-pairs mapping in the reverse order.
Inversions: can be identified by mate-pairs mapping in the incorrect orientation.
Translocations: can be identified by each mate-pair mapping to a different chromosome.
Chromosomal Copy number variations: can be identified by changes in tag coverage & depth.
In addition to structural variant analysis, genome and transcriptome sequences can be screened at single nucleotide resolution. Overlapping tags can be used to discern sequence variations such as SNPs, substitution mutations and insertion and deletions. We are refining pipelines to identify these events, determine whether they are known SNPs in the population and summarize the pathogenicity of novel events (synonymous Vs non-synonmymous, splice junction mutation, the likelihood of the variant to drive a cancer phenotype (using tools like Canpredict)


