Studying mammalian transcriptomes at single nucleotide resolution
The mammalian Transcriptome is far more complicated than previously thought. Previously, we have used bioinformatic approaches to review the transcriptional output of all loci in mouse and man, providing novel insights into how alternative transcription expands the repertoire of the proteome and provides higher levels of control of biological systems. We have also studied the consequences of transcriptional complexity of the mammalian phosphoregulator network (all protein kinases and phosphatases) and the genes encoding extracellular space proteins. We are now actively pursuing the role of this transcriptional complexity in specific biological statesĀ using next-generation sequencing approaches
Recent advances in nanotechnology have heralded new opportunities for transcriptome sequencing and genomic research. Applied Biosystems Inc (AB) have developed a single molecule sequencing approach based on Church's "polony" sequencing concept. Known as Supported Oligo Ligation Detection Sequencing (SOLiD), this technology provides the means to sequence short DNA fragments at rates of more than 300,000,000 sequences per run.
We have recently used this platform to perform massive scale RNA sequencing (RNAseq) using random directional cDNA libraries mouse embryonic stem (ES) cells (Cloonan et al, Nature Methods 2008). RNAseq is very robust and highly quantitative (Pearson correlations of 0.99 reported for replicate RNAseq runs) and raw tag counts correlate well with qRT-PCR results. Additionally, the dynamic range of RNAseq is potentially unlimited, as it uses tag counts rather than image-derived intensities to determine relative abundance. When sequence depths of 10-100 million reads per biological sample are compared to expression arrays, many genes whose activity is below the limits of detection on the array can be readily observed by RNAseq. Importantly, this sensitivity is tunable by altering sequencing depth.
In addition to monitoring gene activity, RNAseq is capable of studying alternative splicing events, promoter usage, and 3' UTR usage. These events can be detected counting tags that match the portions of sequence that are unique to each transcript. These so-called "diagnostic" sequences may correspond to cassette exons or the junction sequences arising from specific exon combinations. RNAseq also identifies novel complexity, in the form new transcriptionally active regions and novel splicing events. We recently identified approximately 200,000 actively expressed retro-transposable elements in ES cells, 30,000 of which are dynamically expressed during ES and EB cell differentiation. Finally RNAseq is not just a means for measuring the relative abundance of transcripts; it is also a massive scale survey of sequence content. This allows for the simultaneous analysis of gene expression and screening of sequence variation.
Other models currently being studied by whole transcriptome sequencing:
- Creation of a mammalian body expression atlas
- Human ES and iPS cells
- Human Cell Cycle synchronized cells
- Human Cancer cell lines
- Mammalian body atlas.

