Quantification of Gene Expression

Encyclopedia of DNA Element - Gene Expression Atlas - Brain Atlas - Human Proteome Maps ... and more ...
mRNA Transcript Analysis

The inclusion of transcriptome anaylsis dataset to the Human Protein Atlas database makes it even more comprehensive
Posted by RNA-Seq Blog -- from Health Canal

The Human Protein Atlas was launched in April 2016. The new version includes data from different sources, which makes comparisons between tissue profiles on both the RNA and protein level possible.  =>  Visit your favorite gene to browse our new layout on the Tissue Atlas!
Looking closely at healthy and sick conditions in the human tissues or cells makes it possible to learn more and thereby improve healthcare. To do this comparisons the researchers first need to know how the human body is built. One way is to analyze the transcriptome. This means looking into which genes are activated to create a protein in the tissue or cell sample.
The human biology is constructed in three steps; from DNA to RNA and then to protein. The DNA code needs to be copied to the RNA, and then read to make a protein. To detect and count protein molecules is very complicated and the scientists need different methods to make sure the data is valid. One way is to analyze the transcriptome to count the amount of RNA molecules being copied from the DNA of a certain gene, creating a picture of what proteins to expect from the sample. Counting RNA molecules is easier than counting protein molecules.
The Human Protein Atlas includes proteome analysis based on more than 25 000 antibodies targeting more than 17 000 unique proteins, combined with transcriptome analysis covering all 20 000 human protein coding genes. The new atlas launched on April 11 also includes primary data from several sources, which allows for comparisons.
The new version of the Human Protein Atlas is significantly advancing in terms of mapping the transcriptome in different human tissues. These data have been the basis for much of the metabolic modelling we are doing here at Chalmers. I am therefore very excited about the progress, and the Human Protein Atlas will certainly be an important resources in our aims to advance towards better diagnostics and precision medicine, says Professor Jens Nielsen at the Department of Biology and Biological Engineering.

Global transcriptomics analysis of human tissues and organs

Overview of the tissues and organs analyzed using RNA-seq by the Human Protein Atlas consortium (HPA, green), tissues studied with cap analysis gene expression (CAGE) within the FANTOM consortium (purple), and tissues analyzed using RNA-seq by the genome-based tissue expression consortium (GTEx, orange). Altogether, 22 tissues and organs were studied with both the HPA and FANTOM datasets, while 21 tissues overlapped between the HPA and GTEx datasets.

The launch is accompanied by an article in Molecular Systems Biology describing transcriptome resources with a focus on the comparison between the datasets generated from the Broad Institute, Boston, US (GTEx) and the Human Protein Atlas. The GTEx dataset includes more than 1600 samples from mostly overlapping, but in some cases unique, tissues compared to the Human Protein Atlas. RNA-seq data from 28 of the GTEx tissues with a corresponding tissue in Human Protein Atlas have been included to allow for direct comparisons between the Human Protein Atlas and GTEx data sets.

The inclusion of the GTEx dataset to the Human Protein Atlas database makes it even more comprehensive and it is reassuring that there is a significant overlap in the tissue classification of the genes based on the two independent datasets, says Professor Mathias Uhlén, program director for the Human Protein Atlas project.
The article published in Molecular Systems Biology discusses publicly available human transcriptome resources and the possible use of these databases for various applications, such as building genome-scale metabolic models used for analyzing cell and tissue functions both in health an disease contexts.
=>  Visit your favorite gene to browse our new layout on the Tissue Atlas!

Transcriptomics resources of human tissues and organs
Mathias Uhlén, Björn M Hallström, Cecilia Lindskog, Adil Mardinoglu, Fredrik Pontén, Jens Nielsen
Molecular Systems Biology 12: 862 | 2016

Quantifying the differential expression of genes in various human organs, tissues, and cell types is vital to understand human physiology and disease. Recently, several large-scale transcriptomics studies have analyzed the expression of protein-coding genes across tissues. These datasets provide a framework for defining the molecular constituents of the human body as well as for generating comprehensive lists of proteins expressed across tissues or in a tissue-restricted manner. Here, we review publicly available human transcriptome resources and discuss body-wide data from independent genome-wide transcriptome analyses of different tissues. Gene expression measurements from these independent datasets, generated using samples from fresh frozen surgical specimens and postmortem tissues, are consistent. Overall, the different genome-wide analyses support a distribution in which many proteins are found in all tissues and relatively few in a tissuerestricted manner. Moreover, we discuss the applications of publicly available omics data for building genome-scale metabolic models, used for analyzing cell and tissue functions both in physiological and in disease contexts.

Welcome to MTD database

MTD -- A mammalian transcriptomic database to explore gene expression and regulation.
Sheng X, Wu J, Sun Q, Li X, Xian F, Sun M, Fang W, Chen M, Yu J, Xiao J
Brief Bioinform. 2016 Jan 27. pii: bbv117

A systematic transcriptome survey is essential for the characterization and comprehension of the molecular basis underlying phenotypic variations. Recently developed RNA-seq methodology has facilitated efficient data acquisition and information mining of transcriptomes in multiple tissues. Current mammalian transcriptomic databases are either tissue specific or species specific, and they lack in-depth comparative features across tissues and species. Here, we present a MTD that is focused on mammalian transcriptomes with a current version that contains data from humans, mice, rats and pigs. Regarding the core features, the MTD browses genes based on their neighboring genomic coordinates or joint KEGG pathway and provides expression information on exons, transcripts, and genes by integrating them into a genome browser. We developed a novel nomenclature for each transcript that considers its genomic position and transcriptional features. The MTD allows a flexible search of genes or isoforms with user-defined transcriptional characteristics and provides both table-based descriptions and associated visualizations. To elucidate the dynamics of gene expression regulation, the MTD also enables comparative transcriptomic analysis in both intraspecies and interspecies manner. The MTD thus constitutes a valuable resource for transcriptomic and evolutionary studies. The MTD is freely accessible at 
Further connected papers => http://bigd.big.ac.cn/publications

Nature Editor's summary

More than a decade after publication of the draft human genome sequence, there is no direct equivalent for the human proteome. But in this issue of Nature two groups present mass spectrometry-based analysis of human tissues, body fluids and cells mapping the large majority of the human proteome.
  • Akhilesh Pandey and colleagues identified 17,294 protein-coding genes and provide evidence of tissue- and cell-restricted proteins through expression profiling. They highlight the importance of proteogenomic analysis by identifying translated proteins from annotated pseudogenes, non-coding RNAs and untranslated regions. The data set is available on http://www.humanproteomemap.org
  • Bernhard Kuster and colleagues have assembled protein evidence for 18,097 genes in ProteomicsDB (available on https://www.proteomicsdb.org) and highlight the utility of the data, for example the identification of hundreds of translated lincRNAs, drug-sensitivity markers and discovering the quantitative relationship between mRNA and protein levels in tissues. Elsewhere in this issue, Vivien Marx reports on a third major proteomics project, the antibody-based Human Protein Atlas programme http://www.proteinatlas.org
Mass-spectrometry-based draft of the human proteome
Wilhelm M, Schlegl J, Hahne H, Moghaddas Gholami A, Lieberenz M, Savitski MM, Ziegler E, Butzmann L, Gessulat S, Marx H, Mathieson T, Lemeer S, Schnatbaum K, Reimer U, Wenschuh H, Mollenhauer M, Slotta-Huspenina J, Boese JH, Bantscheff M, Gerstmair A, Faerber F, and Kuster B.
Nature. 2014 509(7502): 582-487

Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biological information that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell lines and body fluids enabled estimation of the size of the protein-coding genome, and identified organ-specific proteins and a large number of translated lincRNAs (long intergenic non-coding RNAs). Analysis of messenger RNA and protein-expression profiles of human tissues revealed conserved control of protein abundance, and integration of drug-sensitivity data enabled the identification of proteins predicting resistance or sensitivity. The proteome profiles also hold considerable promise for analysing the composition and stoichiometry of protein complexes. ProteomicsDB thus enables navigation of proteomes, provides biological insight and fosters the development of proteomic technology.


Extended Data Figure 7: Protein- versus mRNA-expression analysis

a, Comparison of mRNA and protein expression of 12 human tissues showing the general rather poor correlation of protein and mRNA levels, implying the widespread application of transcriptional, translational and post-translational control mechanisms of protein-abundance regulation. Spearman correlation coefficients vary from 0.41 (thyroid gland) to 0.55 (kidney). ‘Corner proteins’ (0.5 logs to either side of zero) are marked in colours.
b, Clustering of mRNA expression (left triangle) and protein expression (right triangle) across the 12 tissues does not reveal tissues with common profiles suggesting that the transcriptomes and proteomes of human tissues are quite different from each other.
c, The ratio of protein and mRNA level for a protein is approximately constant across many tissues. The heat map shows proteins and tissues clustered according to their protein/mRNA ratio.
d, Protein abundance can be predicted from mRNA levels. Using the median ratio of protein/mRNA across 12 tissues, it is possible to predict protein levels from mRNA levels for every tissue with a good correlation coefficient, underscoring the importance of the translation rate (and mRNA levels) on protein expression.

A draft map of the human proteome
Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S, Thomas JK, Muthusamy B, Leal-Rojas P, Kumar P Sahasrabuddhe NA, Balakrishnan B, Advani J, George B, Renuse S, Selvan LD, Patil AH, Nanjappa V, Radhakrishnan A, Prasad S, Subbannayya T, Raju R, Kumar M, Sreenivasamurthy SK, Marimuthu A, Sathe GJ, Chavan S, Datta KK, Subbannayya Y, Sahu A Yelamanchi SD, Jayaram S, Rajagopalan P, Sharma J, Murthy KR, Syed N, Goel R, Khan AA, Ahmad S, Dey G, Mudgal K, Chatterjee A, Huang TC, Zhong J, Wu X, Shaw PG, Freed D, Zahari MS, Mukherjee KK, Shankar S, Mahadevan A Lam H, Mitchell CJ, Shankar SK, Satishchandra P, Schroeder JT, Sirdeshmukh R, Maitra A, Leach SD, Drake CG, Halushka MK, Prasad TS, Hruban RH, Kerr CL, Bader GD, Iacobuzio-Donahue CA, Gowda H, Pandey A.
Nature. 2014 509(7502): 575-581

The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here we present a draft map of the human proteome using high-resolution Fourier-transform mass spectrometry. In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans. A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames. This large human proteome catalogue (available as an interactive web-based resource at http://www.humanproteomemap.org) will complement available human genome and transcriptome data to accelerate biomedical research in health and disease.

Tissue-based map of the human proteome
Mathias Uhlén, Linn Fagerberg, Björn M. Hallström, Cecilia Lindskog, Per Oksvold, Adil Mardinoglu, Åsa Sivertsson, Caroline Kampf, Evelina Sjöstedt Anna Asplund, IngMarie Olsson, Karolina Edlund6 Emma Lundberg, Sanjay Navani, Cristina Al-Khalili Szigyarto, Jacob Odeberg, Dijana Djureinovic, Jenny Ottosson Takanen, Sophia Hober, Tove Alm, Per-Henrik Edqvist, Holger Berling, Hanna Tegel, Jan Mulder, Johan Rockberg, Peter Nilsson, Jochen M. Schwenk, Marica Hamsten, Kalle von Feilitzen, Mattias Forsberg, Lukas Persson, Fredric Johansson, Martin Zwahlen, Gunnar von Heijne, Jens Nielsen, Fredrik Pontén
Science 23 January 2015, Vol. 347 no. 6220

INTRODUCTION: Resolving the molecular details of proteome variation in the different tissues and organs of the human body would greatly increase our knowledge of human biology and disease. Here, we present a map of the human tissue proteome based on quantitative transcriptomics on a tissue and organ level combined with protein profiling using microarray-based immunohistochemistry to achieve spatial localization of proteins down to the single-cell level. We provide a global analysis of the secreted and membrane proteins, as well as an analysis of the expression profiles for all proteins targeted by pharmaceutical drugs and proteins implicated in cancer.
RATIONALE: We have used an integrative omics approach to study the spatial human proteome. Samples representing all major tissues and organs (n = 44) in the human body have been analyzed based on 24,028 antibodies corresponding to 16,975 protein-encoding genes, complemented with RNA-sequencing data for 32 of the tissues. The antibodies have been used to produce more than 13 million tissue-based immunohistochemistry images, each annotated by pathologists for all sampled tissues. To facilitate integration with other biological resources, all data are available for download and cross-referencing.
RESULTS: We report a genome-wide analysis of the tissue specificity of RNA and protein expression covering more than 90% of the putative protein-coding genes, complemented with analyses of various subproteomes, such as predicted secreted proteins (n = 3171) and membrane-bound proteins (n = 5570). The analysis shows that almost half of the genes are expressed in all analyzed tissues, which suggests that the gene products are needed in all cells to maintain “housekeeping” functions such as cell growth, energy generation, and basic metabolism. Furthermore, there is enrichment in metabolism among these genes, as 60% of all metabolic enzymes are expressed in all analyzed tissues. The largest number of tissue-enriched genes is found in the testis, followed by the brain and the liver. Analysis of the 618 proteins targeted by clinically approved drugs unexpectedly showed that 30% are expressed in all analyzed tissues. An analysis of metabolic activity based on genome-scale metabolic models (GEMS) revealed liver as the most metabolically active tissue, followed by adipose tissue and skeletal muscle.
CONCLUSIONS: A freely available interactive resource is presented as part of the Human Protein Atlas portal (www.proteinatlas.org), offering the possibility to explore the tissue-elevated proteomes in tissues and organs and to analyze tissue profiles for specific protein classes. Comprehensive lists of proteins expressed at elevated levels in the different tissues have been compiled to provide a spatial context with localization of the proteins in the subcompartments of each tissue and organ down to the single-cell level.

click to enlarge

The Gene Expression Atlas Project

The Expression Atlas provides information on gene expression patterns under different biological conditions. Gene expression data is re-analysed in-house to detect genes showing interesting baseline and differential expression patterns.

The Gene Expression Atlas (ArrayExpress Atlas) is a semantically enriched database of meta-analysis based summary statistics which serves queries for condition specific gene expression patterns (e.g. genes over-expressed in a particular tissue or disease state) as well as broader exploratory searches for biologically interesting genes/samples. It is based on a subset of the ArrayExpress data.

Gene Expression Atlas goals:

  1. Provision of a statistically robust framework for integration of gene expression experiment results across different platforms at a meta-analytical level
  2. A simple interface for identifying strong differential expression candidate genes in conditions of interest
  3. Integration of ontologies for high quality annotation of gene and sample attributes
  4. Construction of new gene expression summarized views, with a view to analysis of putative signalling pathway targets, discovery of correlated gene expression patterns and the identification of condition/tissue-specific patterns of gene expression.

About the Expression Atlas

The Expression Atlas provides information on gene expression patterns under different biological conditions such as a gene knock out, a plant treated with a compound, or in a particular organism part or cell. It includes both microarray and RNA-seq data. The data is re-analysed in-house to detect interesting expression patterns under the conditions of the original experiment. There are two components to the Expression Atlas, the Baseline Atlas and the Differential Atlas:

Baseline Atlas - The Baseline Atlas displays information about which gene products are present (and at what abundance) in "normal" conditions (e.g. tissue, cell type). It aims to answer questions such as:
  • which genes are specifically expressed in kidney?
  • what is the expression pattern for gene SAA4 in normal tissues?
This component of the Expression Atlas consists of highly-curated and quality-checked RNA-seq experiments from ArrayExpress. It has data for several mammalian species as well as opossum and rice. New experiments are added as they become available.

See the Baseline Atlas help page for information about how to search and interpret the results in the Baseline Atlas.

Differential Atlas - The Differential Atlas allows users to identify genes that are up- or down-regulated in a wide variety of different experimental conditions such as yeast mutants, cadmium treated plants, cystic fibrosis or the effect on gene expression of mind-body practice.

Both microarray and RNA-seq experiments are included in the Differential Atlas. Experiments are selected from ArrayExpress and groups of samples are manually identified for comparison e.g. those with wild type genotype compared to those with a gene knock out. Each comparison is called a contrast. Each experiment is processed through our in-house differential expression statistical analysis pipeline to identify genes with a high probability of differential expression.

The Differential Atlas help page has more information about how to search and interpret the results in the Differential Atlas.

Searches enhanced by ontology-based query expansion
The Expression Atlas interface allows searches by gene, splice variant and protein attribute. Individual genes or gene sets can be searched for. Both the Baseline and Differential Atlas are queried by default.

Sample attributes and experimental conditions can also be searched for. Experimental conditions e.g. cerebellum or breast carcinoma, are mapped to ontology terms from the Experimental Factor Ontology (EFO). Mappings are made either manually by curators or automatically using EBI-developed software called Zooma. The ontology mappings allow for ontology-driven query expansion e.g. searching for 'cancer' will return matches to the keyword and also results for different types of cancer such as 'breast carcinoma' and 'acute myeloid leukemia'.


ENCODE Overview

The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence. The project started with two components - a pilot phase and a technology development phase.

The pilot phase tested and compared existing methods to rigorously analyze a defined portion of the human genome sequence (See: ENCODE Pilot Project). The conclusions from this pilot project were published in June 2007 in Nature (download PDF) and Genome Research [genome.org]. The findings highlighted the success of the project to identify and characterize functional elements in the human genome. The technology development phase also has been a success with the promotion of several new technologies to generate high throughput data on functional elements.

With the success of the initial phases of the ENCODE Project, NHGRI funded new awards in September 2007 to scale the ENCODE Project to a production phase on the entire genome along with additional pilot-scale studies. Like the pilot project, the ENCODE production effort is organized as an open consortium and includes investigators with diverse backgrounds and expertise in the production and analysis of data (See: ENCODE Participants and Projects). This production phase also includes a Data Coordination Center [genome.ucsc.edu] to track, store and display ENCODE data along with a Data Analysis Center to assist in integrated analyses of the data. All data generated by ENCODE participants will be rapidly released into public databases (See: Accessing ENCODE Data) and available through the project's Data Coordination Center.

About ENCODE data

The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). 
The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.

ENCODE data are now available for the entire human genome - All ENCODE data are free and available for immediate use via :

To search for ENCODE data related to your area of interest and set up a browser view, use the UCSC Experiment Matrix or Track Search tool (Advanced features). The Experiment List (Human) and Experiment List (Mouse) links provide comprehensive listings of ENCODE data that is released or in preparation.

All ENCODE data is freely available for download and analysis. However, before publishing research that uses ENCODE data, please read the ENCODE Data Release Policy, which places some restrictions on publication use of data for nine months following data release.    Read more about ENCODE data at UCSC.

About ENCODE Data at UCSC
ENCODE investigators employ a variety of assays and methods to identify functional elements. The discovery and annotation of gene elements is accomplished primarily by sequencing RNA from a diverse range of sources, comparative genomics, integrative bioinformatic methods, and human curation. Regulatory elements are typically investigated through DNA hypersensitivity assays, assays of DNA methylation, and chromatin immunoprecipitation (ChIP) of proteins that interact with DNA, including modified histones and transcription factors, followed by sequencing (ChIP-Seq).

To access the human ENCODE data, open the Genome Browser, select the February 2009 assembly (GRCh37/hg19) or the March 2006 assembly (NCBI36/hg18) of the human genome, and go to your region of interest. The bulk of the ENCODE data can be found in the Expression and Regulation track groups, with a few in the Mapping, Genes, and Variation groups. Although most participating research groups have provided several tracks, generally only selected data from each research group are displayed by default. Click the hyperlinked name of a particular track to display a page containing configuration options and details about the methods used to generate the data. See the Genome Browser User's Guide for further information about displaying tracks and navigating in the Genome Browser.

Data from the earlier ENCODE project pilot phase, which covered approximately 1% of the genome,  are available on the March 2006 (NCBI36/hg18), May 2004 (NCBI35/hg17), and July 2003 (NCBI34/hg16) human genome assemblies. The ENCODE Pilot Project web pages provide convenient browser access to these regions.

Credits: Darryl Leja (NHGRI), Ian Dunham (EBI), Michael Pazin (NHGRI)

ArrayExpress - functional genomics data

ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies. Data is collected to MIAME and MINSEQE standards. Experiments are submitted directly to ArrayExpress or are imported from the NCBI GEO database.

What is ArrayExpress?
ArrayExpress is a core EBI database delivered by the Functional Genomics group. The database is a repository for functional genomics data from both microarray and high-throughput sequencing studies, many of which are supported by peer-reviewed publications. Data sets are either submitted directly to ArrayExpress and curated by a team of specialist biological curators, or are imported systematically from the NCBI Gene Expression Omnibus database on a weekly basis. Whichever the source, data are collected in conformity to the "Minimum Information About A Microarray Experiment" (MIAME) and "Minimum Information About a Sequencing Experiment" (MINSEQE) standards.

Publications / how to cite
Citing ArrayExpress database: please use the following publication: Rustici. G et al., 2013. ArrayExpress update--trends in database growth and links to data analysis tools. Nucleic Acids Res
Citing ArrayExpress submission in your manuscript: please include your experiment accession number and the URL to ArrayExpress home page, www.ebi.ac.uk/arrayexpress e.g. "Microarray data are available in the ArrayExpress database (www.ebi.ac.uk/arrayexpress) under accession number E-MEXP-12345."
Citing ArrayExpress BioConductor package: please refer to this BioConductor page for more information about the authors and maintainer of the package.


FANTOM is an international research consortium established by Dr. Hayashizaki and his colleagues in 2000 to assign functional annotations to the full-length cDNAs that were collected during the Mouse Encyclopedia Project at RIKEN. FANTOM has since developed and expanded over time to encompass the fields of transcriptome analysis. The object of the project is moving steadily up the layers in the system of life, progressing thus from an understanding of the ‘elements’ - the transcripts - to an understanding of the ‘system’ - the transcriptional regulatory network, in other words the ‘system’ of an individual life form.

Mouse over the image on the right hand side for information
on FANTOM history and publications.  ===>>>>

Simultaneously with producing data, FANTOM established the FANTOM database and the FANTOM full-length cDNA clone bank, which are available worldwide. The FANTOM resources have been used in several important research projects. For instance, the full-length cDNA database was used in a computer prediction of the genomic position (transcriptional unit) of genes by the International Human Genome Sequencing Consortium. Also they have been used by a research group led by Dr. Shinya Yamanaka at Kyoto University, Japan, for establishing Induced pluripotent stem (iPS) cells. In the study, 24 transcription factors were selected from FANTOM database as candidate initiation factors. Furthermore, the Allen Institute for Brain Science in the United State has created a digital atlas that encompasses the whole brain, and has made it publicly available. The atlas graphically illustrates the expression of genes within the mouse brain using Informatix software. This project has also made use of the FANTOM database.

Link to all FANTOM publications from phase 1 - 5

FANTOM5 releases atlas of human gene expression

FANTOM, a large international consortium led by RIKEN releases today the first comprehensive map of gene activity across the human body, and provides the first holistic view of the complex networks that regulate gene expression across the wide variety of cell types that make up a human being. These findings will help in the identification of genes involved in disease and the development of personalized and regenerative medicine.

After many years of concerted effort to systematically analyze the expression of genes in all human cells and tissues, RIKEN and the FANTOM consortium publish the findings today in two landmark Nature reports, and 16 related articles in ten other scholarly journals (ref.1,2,3).

The papers published in Nature describe maps of promoters and enhancers – short regions of DNA that influence the activity of genes - encoded in the human genome, and their activity across the vast wealth of human cell types and tissues of the human body. Together with the other studies published by FANTOM5, this data provides the first complete view of the networks regulating transcription across all cell types.

The FANTOM project (for Functional Annotation of the Mammalian genome) is a RIKEN initiative launched in 2000 to build a complete library of human genes using the capabilities offered by new, state-of-the-art cDNA technologies. Over 250 experts in primary cell biology and bioinformatics from 114 institutions based in more than 20 countries and regions worked as part of FANTOM 5, the 5th edition of the project, to produce the 18 studies published today.

Using a highly sensitive technique called Cap Analysis of Gene Expression (CAGE), developed at RIKEN (ref. 4, 5), the researchers monitored the activity of promoters and enhancers across over 180 human primary cells. They identified 180,000 promoters and 44000 enhancers on the genome and find that the activity of the large majority of these transcriptional regulation regions is highly specific to cell type.

“Humans are complex multicellular organisms composed of at least 400 distinct cell types. This beautiful diversity of cell types allow us to see, think, hear, move and fight infection yet all of this is encoded in the same genome. The difference between all these cells is what parts of the genome they use – for instance, brain cells use different genes than liver cells, and therefore they work very differently. In FANTOM5, we have for the first time systematically investigated exactly what genes are used in virtually all cell types across the human body, and the regions which determine where the genes are read from the genome,” explains Dr. Alistair Forrest, scientific coordinator of FANTOM5.

Unlike other large-scale genomics projects, FANTOM5 focused on identifying gene expression on normal primary cells rather than cell lines derived from cancers. “In FANTOM5 we made the decision early on that we should include a large focus on normal primary cells and tissues. Although cell lines are easy to use, they are seldom good models of normal cells, ” said Dr. Forrest.

A key discovery on the way was that by employing CAGE, the technology used to find active genes, the team could identify the additional DNA regions that regulate the activity of genes in every cell type, called enhancers. “We found that CAGE is a lot more specific than competing methods, and still can be used on small cell samples – this has a huge potential, because it opens up the door for analyzing tissue samples from people suffering from disease and find out what is wrong on a molecular level,” said Professor Albin Sandelin, one of the coordinators for the enhancer project.

“What is written in the genome? Answering this question has been the consortium’s ultimate goal since the beginning. The basic library of cell definition that was produced during FANTOM5 is a remarkable step to manipulating cells. The library will be an essential resource for developing a wide rage of technologies for the life sciences, that will lead to the development of regenerative and personalized medicine in the near future,” Dr. Yoshihide Hayashizaki, the general Director of FANTOM said.

“Omics science, the study and systematic mapping of the molecules that make up an organism, has yielded one insightful surprise after another. Life, however, remains largely elusive. We will continue to search for the basic molecular mechanism underlying the wide diversity of cells, to provide deeper insights into life science that will lead to improved medical treatment, “ Dr. Hayashizaki added.
  1. Forrest A.R.R. et al. A promoter level mammalian expression atlas. Nature (2014) http://dx.doi.org/ 10.1038/nature13182
  2. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature (2014) http://dx.doi.org/10.1038/nature12787
  3. Kanamori-Katayama, M. et al. Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res. 21, 1150–1159 (2011)

Research Highlight - A map of mRNA localization

Nature Reviews Molecular Cell Biology 8, 946 (2007)


About Fly-FISH

This database documents the expression patterns of Drosophila mRNAs at the subcellular level during early embryogenesis. A high-resolution, high-throughput fluorescence detection method is used to detect expressed mRNAs (PDF). The overall findings and implications of the work performed thus far is summarized in Lecuyer et al (2007). The data can be accessed by searching the localization categories, searching for specific genes or browsing the list of tested genes.

Contrary to expectations, many Drosophila melanogaster mRNA transcripts appear to be localized prior to translation, report Henry Krause and colleagues in Cell.Localization of mRNA before translation offers some advantages over immediate translation — for example, several rounds of translation can occur at a subcellular location, bypassing the energetically costly need to move proteins individually. However, estimates had predicted that only 1–10% of mRNA transcripts are localized prior to translation.To test these estimates, researchers used high-resolution fluorescent in situ hybridization (FISH) to analyse 2,314 embryonically expressed D. melanogaster mRNAs. Strikingly, 71% of the embryonically expressed mRNAs were specifically localized.mRNA transcripts localized into 35 categories, including subembryonic categories and subcellular categories. Given the diversity and frequency of the localization patterns observed, and given the close correlation between mRNA localization and protein translation, the authors propose that many, if not most, cellular functions are regulated by mRNA localization. For example, the fact that the temporal localization of anillin mRNA, which encodes an actin-interacting protein, resembles subsequent actin-filament distribution suggests that mRNA localization is involved in the organization of cytoskeletal networks. To make the most of their extensive data, the Krause team catalogued their findings on Fly-FISH, which is searchable by genes and localization categories. As such, it offers promise for various lines of inquiry. For instance, the spatio–temporal data it contains may provide insight into gene regulatory networks, and the ability to assess mRNA localization with colocalization of other mRNAs and proteins may help to uncover the functions of uncharacterized genes.


A growing collection of online public resources integrating extensive gene expression and neuroanatomical data, complete with a novel suite of search and viewing tools. Get started with tutorials offering introductory overviews and guided tours!

WELCOME to EMAP - The e-Mouse Atlas Project

EMA, the e-Mouse Atlas
A 3-D anatomical atlas of mouse embryo development including detailed histology. EMA includes the EMAP ontology of anatomical structure

EMAGE, the e-Mouse Atlas of Gene Expression
A database of mouse gene expression where, uniquely, the gene expression is mapped into the EMA 3-D space and can be queried spatially

RNA and the Regulation of Gene Expression - A Hidden Layer of Complexity
Publisher: Caister Academic Press
Edited by: Kevin V. Morris The Scripps Research Institute, La Jolla, USA
Publication date: March 2008
ISBN: 978-1-904455-25-7

The role of RNA in regulating gene expression has become a topic of intense interest. In this book internationally recognized experts in RNA research explore and discuss the methods whereby RNA can regulate gene expression with examples in yeast, Drosophila, mammals, and viral infection, and highlight the application of this knowledge in therapeutics and research. Topics include: gene silencing and gene activation, the hammerhead ribozyme, epigenetic regulation, RNAi, microRNA, and pyknons. This comprehensive publication is intended for readers with teaching or research interests in RNA, the regulation of gene expression, genetics, genomics or molecular biology.

"the contributions in this book do provide informative and well-structured overviews of current understanding of the roles of non-coding RNAs, short interfering RNAs, microRNAs and retrotransposons in eukaryotic organisms ... cutting edge studies on the potential role of RNA species in the epigenetic regulation of gene expression and on the existence of previously unidentified classes of intergenic and intronic short regulatory RNAs (pyknons) ... a useful purchase for specialist workers in the field as well as for many institutional libraries." from Microbiology Today (2008)

"This book is a well-selected compilation of 14 mostly review-style articles, written by experts in the field ... a well-written, successful endeavour to present the field of eukaryal RNA-mediated regulation of gene expression. It has its major strength in providing an extremely well structured, up-to-date, comprehensive overview that skillfully zooms the reader into each topic from a general introduction to a high degree of detail ... suited for a broad range of readers, from advanced students to researchers in the field. Personally, we very much enjoyed reading it." from ChemBioChem (2008) 9: 2005-2007


Chapter 1 The Hammerhead Ribozyme Revisited: New Biological Insights for the Development of Therapeutic Agents and for Reverse Genomics Applications
Justin Hean and Marc S. Weinberg

Chapter 2 Epigenetic Regulation of Gene Expression
Kevin V. Morris

Chapter 3 The Role of RNAi and Noncoding RNAs in Polycomb Mediated Control of Gene Expression and Genomic Programming
Manuela Portoso and Giacomo Cavalli

Chapter 4 Heterochromatin Assembly and Transcriptional Gene Silencing under the Control of Nuclear RNAi: Lessons from Fission Yeast
Aurélia Vavasseur, Leila Touat-Todeschini and André Verdel

Chapter 5 RNA-Mediated Gene Regulation in Drosophila
Harsh H. Kavi, Harvey R. Fernandez, Weiwu Xie and James A. Birchler

Chapter 6 MicroRNA-Mediated Regulation of Gene Expression
Lena J. Chin and Frank J. Slack

Chapter 7 Viral Infection-Related MicroRNAs in Viral and Host Genomic Evolution
Yoichi R. Fujii and Nitin K. Saksena

Chapter 8 Regulation of Mammalian Mobile DNA by RNA-Based Silencing Pathways
Harris Soifer

Chapter 9 The Role of Non-Coding RNAs in Controlling Mammalian RNA Polymerase II Transcription
Stacey D. Wagner, Jennifer F. Kugel and James A. Goodrich

Chapter 10 Pyknons as Putative Novel and Organism-Specific Regulatory Motifs
Isidore Rigoutsos

Chapter 11 RNA-Mediated Recognition of Chromosomal DNA
David R. Corey

Chapter 12 RNA Mediated Transcriptional Gene Silencing: Mechanism and Implications in Writing the Histone Code
Kevin V. Morris

Chapter 13 Small RNA-Mediated Gene Activation
Long-Cheng Li

Chapter 14 Therapeutic Potential of RNA-mediated Control of Gene Expression: Options and Designs
Lisa Scherer and John J. Rossi

Global Analysis of mRNA Localization Reveals a Prominent Role
in Organizing Cellular Architecture and Function

Eric Lécuyer, Hideki Yoshida, Neela Parthasarathy, Christina Alm, Tomas Babak, Tanja Cerovina, Timothy R. Hughes, Pavel Tomancak and Henry M. Krause

Although subcellular mRNA trafficking has been demonstrated as a mechanism to control protein distribution, it is generally believed that most protein localization occurs subsequent to translation. To address this point, we developed and employed a high-resolution fluorescent in situ hybridization procedure to comprehensively evaluate mRNA localization dynamics during early Drosophila embryogenesis. Surprisingly, of the 3370 genes analyzed, 71% of those expressed encode subcellularly localized mRNAs. Dozens of new and striking localization patterns were observed, implying an equivalent variety of localization mechanisms. Tight correlations between mRNA distribution and subsequent protein localization and function, indicate major roles for mRNA localization in nucleating localized cellular machineries. A searchable web resource documenting mRNA expression and localization dynamics has been established and will serve as an invaluable tool for dissecting localization mechanisms and for predicting gene functions and interactions.

RNA Quality Control in Eukaryotes
Meenakshi K. Doma, and Roy Parker
Department of Molecular and Cellular Biology and Howard Hughes Medical Institute, University of Arizona, Tucson, AZ 85721, USA2
HMI and Division of Biology, Mail Code 156-29, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125
Cell 131, November 16, 2007

Eukaryotic cells contain numerous RNA quality-control systems that are important for shaping the transcriptome of eukaryotic cells. These systems not only prevent accumulation of nonfunctional RNAs but also regulate normal mRNAs, repress viral and parasitic RNAs, and potentially contribute to the evolution of new RNAs and hence proteins. These quality-control circuits can be viewed as a series of kinetic competitions between steps in normal RNA biogenesis or function and RNA degradation pathways. These RNA quality-control circuits depend on specific adaptor proteins that target aberrant RNAs for degradation as well as the coupling of individual steps in mRNA biogenesis and function.

Nature - January 2007 - Focus on RNA

RNA has occupied a pivotal position in the 'central dogma' of molecular biology, which states that information flows from DNA through RNA to proteins. In this issue, we feature a collection of articles that discuss the diverse functional roles of RNA in biological systems and highlight recent discoveries in RNA chemical biology, including advances in transcription, RNA structural biology, RNA interference and RNA engineering.
  • Chemical crosshairs on the central dogma - Aseem Z Ansari
  • RNA learns from antisense - David R Corey
  • RNA at Santa Cruz - Mirella Bucci
  • Synthetic RNA circuits - Eric A Davidson and Andrew D Ellington
  • Natural expansion of the genetic code - Alexandre Ambrogelly, Sotiria Palioura and Dieter Söll
  • Slicer and the Argonautes - Niraj H Tolia and Leemor Joshua-Tor

RNA Analysis Tools


=> best view with MS Internet Explorer
developed by
Kristi L. Holmes, Ph.D.
The Bernard Becker Medical Library
Washington University in St. Louis School of Medicine
Bioinformatics @ Becker
Rana C. Morris, Ph.D.
National Center for Biotechnology Information
National Institutes of Health
NCBI Home Page
Nicola Gaedeke, Ph.D.
Berlin, Germany
Donna Messersmith, Ph.D.
Scientist and President
Labs-Now LLC (Learn About Biomedical Science-Now)

Quantitative Analysis of Nucleic Acids - the Last Few Years of Progress
Chunming Ding* and Charles R. Cantor*,†

*Bioinformatics Program and Center for Advanced Biotechnology, Boston University, Boston, Massachusetts 02215, USA
†SEQUENOM Inc., San Diego, California 92121, USA
Journal of Biochemistry and Molecular Biology, Vol. 37, No. 1, January 2004, pp. 1-10

DNA and RNA quantifications are widely used in biological and biomedical research. In the last ten years, many technologies have been developed to enable automated and high-throughput analyses. In this review, we first give a brief overview of how DNA and RNA quantifications are carried out. Then, five technologies (microarrays, SAGE, differential display, real time PCR and real competitive PCR) are introduced, with an emphasis on how these technologies can be applied and what their limitations are. The technologies are also evaluated in terms of a few key aspects of nucleic acids quantification such as accuracy, sensitivity, specificity, cost and throughput.

external links:

Intra- and Interspecific Variation in Primate Gene Expression Patterns
  Enard et al.  (2002)   Science 296 (5566): 340-343

Although humans and their closest evolutionary relatives, the chimpanzees, are 98.7% identical in their genomic DNA sequences, they differ in many morphological, behavioral, and cognitive aspects. The underlying genetic basis of many of these differences may be altered gene expression. We havecompared the transcriptome in blood leukocytes, liver, and brain of humans, chimpanzees, orangutans,and macaques using microarrays, as well as protein expression patterns of humans and chimpanzees using two-dimensional gel electrophoresis. We also studied three mouse species that are approximately as related to each other as are humans, chimpanzees, and orangutans. We identified species-specific gene expression patterns indicating that changes in protein and gene expression have been particularly pronounced in the human brain. 


Gene expression

by  M.Tevfik Dorak, MD, PhD


Genes are transcribed from 5' to the 3' of the sense strand via RNA polymerases. It is actually the antisensetemplate strand, which is transcribed (3' to 5') and gives a strand identical to the sense strand. It is possiblethat a gene is encoded on the sense strand and another one on the anti-sense strand in opposite direction (example). Also possible is the overlapping genes which are frequent in viruses and plasmid/phages. Regions relevant in gene expression.
Enhancers: A sequence on either side of the gene (cis-acting = on the same chromosome) that stimulates a specific promoter. It is not transcribed.
A sequence(s) in the close vicinity of the transcription initiation site 5' (upstream) to the gene. It may
be a general (cis-acting) or tissue/cell-specific one (cis-, or trans-acting = on a different chromosome). Initial binding site for RNA polymerase. Transcription factors bind to the promoters and allow RNA polymerase to act. The promoter is not transcribed itself. The common promoters, TATA and CAAT boxes, are found about 30 bp and 75 bp, respectively, upstream of the transcription initiation site.
ranscription initiation (cap) site: This is where the transcription of DNA to immature (precursor) pre-mRNA(nuclear RNA or nRNA) starts. It is immediately 5' to the gene. This sequence adds a 7-methylated GTP cap to the beginning of the mRNA (to protect it against the activity of 5'-exonuclease). From here to the translation initiation site, the sequence codes for the 5'-untranslated or UT (ribosome-binding) region and the signal peptide. The 5'-UT region is transcribed but not translated. It contains the site (the leader sequence) at which ribosomes initially bind to mRNA to start translation. The signal sequence is translated at the N-terminal and directs the protein to its correct cellular location (endoplasmic reticulum, Golgi apparatus, cell membrane, etc) or outside the cell through the cell membrane, and is finally removed at the final destination. The events that occur to mRNA before it leaves the nucleus are collectively called RNA processing or post-transcriptional modification (capping, polyadenylation and splicing). 
Translation initiation site (ATG): This sequence represents the beginning (N-terminal) of translated protein. [5' of DNA codes for N-terminal of a polypeptide]. It codes for a methionine but methionine is subject to post-translational elimination most of the time. Thus, each mature mRNA's first codon is for methionine (AUG) but not all polypeptides start with methionine. Translation takes place in the ribosome in the cytoplasm. 
Exon-intron boundaries:
Each intron starts with GpT and ends with ApG. The introns are subject to splicing
out (post-transcriptional modification which also includes 5' capping and 3' polyadenylation). Although they are not represented in the resultant polypeptide, they may contain some regulatory sequences. 
Stop codon:
One of the three codons marks the end of transcription. The triplet before this codes for the last
amino acid of a polypeptide chain (C-terminal). [3' of DNA codes for C-terminal of a polypeptide.) 
Intranslated Regions (UTRs):
5' UTR usually contains gene- or developmental stage-specific and common
regulators of expression (motifs, boxes, response or binding elements) , and 3' UTR is also involved in gene expression although it does not contain well-known transcription control sites. 3' UTR sequences (called cytoplasmic polyadenylation elements or adenylation control elements) can control the nuclear export, polyadenylation status, subcellular targeting and rates of translation and degradation of mRNA. The involvement of 3' UTR is well documented in controlling male and female gametogenesis and in early embryonic development. Myotonic dystrophy is a disease caused by the expansion of the triplet repeats in the 3' UTR of a protein kinase gene. 
Polyadenylation signal:
This sequence is immediately after (downstream to) the stop codon and codes for a
poly-A tail which varies in length. It is in the 3' untranslated region (histone mRNAs lack poly-A tail). Poly(A) tail is believed to stimulate translation initiation whereas its shortening triggers entry of mRNA into the decay pathway.  Position effect or tissue/cell-specific expression of genes in gene therapy depend on the effects of enhancers and promoters. The triplets on the DNA are transcribed to codons on mRNA [in the nucleus] and after splicing out intronic sequences, the codons are read by anti-codons of tRNA to be translated to amino acids. After various post-translational modifications (which may include phosphorylation, glycosylation, etc), a protein is made. Thus, the stages of protein synthesis are: transcription, splicing (nuclear processing), translation and post-translational modifications.  See also paramutation (paramutation is an allelic interaction that results in meiotically heritable changes in gene expression), methylation, genomic imprinting and allelic exclusion (glossary). Such epigenetic changes and especially their heritability are one of the very hot debates of recent years (see Hidden Inheritance by G Vines in New Scientist, 28 Nov 1998, pp.27-30; Epigenetics: Special Issue of Science, 2001). The National Fragile X Foundation website explains the molecular basis of a well-known epigenetic disease, fragile X syndrome. Common techniques to measure gene expression are Northern blotting, ribonuclease protection assay and reverse-transcription (RT) PCR. These are briefly described below.

Overview: Gene Structure

Talks presentations from  David Wishart, Departments of Computing Science and Biological Sciences,
University of Alberta, Edmonton, Alberta; USA:
Genes and Gene Expression

The gene is the fundamental unit of inheritance and the ultimate determinant of all phenotypes. The DNA of a normal human cell contains an estimated 50 to 100,000 genes, but only a fraction of these are used (or “expressed”) in any particular cell at any given time. For example, genes specific for erythroid cells, such as the hemoglobin genes, are not expressed in brain cells. According to the “central dogma” of molecular biology, a gene exerts its effects by having its DNA “transcribed” into a messenger RNA (mRNA), which is, in turn, “translated” into a protein, the final effector of the gene's action. Thus, molecular biologists often investigate gene “expression” or “activation,” by which is meant the process of transcribing DNA into RNA, or translating RNA into protein. The process of transcription involves creating a perfect RNA copy of the gene using the DNA of the gene as a “template.” Translation of mRNA into protein is a somewhat more complex process, since the structure of the gene's protein is “encoded” in the mRNA, and that structural message must be decoded during translation. 


Functional Components of the Gene

Every gene consists of several functional components, each involved in a different facet of theprocess of gene expression (Figure A). Broadly speaking, however, there are two main functional units: the “promoter” region and the “coding” region. The promoter region controls when and in what tissue a gene is expressed. For example, the promoter of the hemoglobin gene is responsible for its expression in erythroid cells and not in brain cells. How is this tissue-specific expression achieved? In the DNA of the gene's promoter region, there are specific structural elements, “nucleotide sequences” (see “Structural Considerations” below), that permit the gene to be expressed only in an appropriate cell. These are the elements in the hemoglobin gene that instruct an erythroid cell to transcribe hemoglobin mRNA from that gene. These structures are referred to as “cis”-acting elements because they reside on the same molecule of DNA as the gene. In some cases, other tissue type-specific “cis”-acting elements, called “enhancers,” reside on the same DNA molecule, but at great distances from the coding region of the gene. In the appropriate cell, the “cis”-acting elements bind protein factors that are physically responsible for transcribing the gene. These proteins are called “trans”-acting factors because they reside in the cell's nucleus separate from the DNA molecule bearing the gene. For example, brain cells would not have the right “trans”-acting factors that bind to the hemoglobin promoter, and therefore brain cells would not express hemoglobin. They would, however, have “trans”-acting factors that bind to neuron-specific gene promoters.

Figures: Gene expression. A gene's DNA is transcribed into mRNA which is, in turn, translated into protein. The functional components of a gene are schematically diagramed here. Areas of the gene destined to be represented in mature mRNA are called exons, and intervening areas of DNA between exons are called introns. The portion of the gene that controls transcription, and thereforeexpression, is the promoter. This control is exerted by specific nucleotide sequences in the promoter region (so-called “cis”-acting factors) and by proteins (so-called “trans”-acting factors) that must interact with promoter DNA and/or RNA polymerase II in order for transcription to occur. The primary transcript is the RNA molecule made by RNA polymerase II that iscomplementary to the entire stretch of DNA containing the gene. Before leaving the nucleus, the primary transcript is modified by splicing together exons (thus removing intron sequences), adding a cap to the 5´ end, and adding a poly-A tail to the 3´ end. Once in the cytoplasm, mature mRNA undergoes translation to yield a protein.


The structure of a gene's protein is specified by the gene‘s “coding” region. The coding region contains the information that directs an erythroid cell to assemble amino acids in the proper order to make the hemoglobin protein. How is this order of amino acids specified? As described in detail below, DNA is a linear polymer consisting of four distinguishable subunits called nucleotides. In the coding region of a gene, the linear sequence of nucleotides “encodes” the amino acid sequence of the protein. This genetic code is in triplet form so that every group of three nucleotides encodes a single amino acid. The 64 triplets that can be formed by four nucleotides exceeds the number of amino acids used to make proteins (20). This makes the code degenerate and allows some amino acids to be encoded by several different triplets. The nucleotide sequence of any gene can now be determined (see below). By translating the code, one can derive a predicted amino acid sequence for the protein encoded by a gene. 

Structural Considerations

Fine Structure 

The basic repeating units of the DNA polymer are nucleotides (Figure B). Nucleotides consist of an invariant portion, a five-carbon deoxyribose sugar with a phosphate group, and a variable portion, the “base.” Of the four bases that appear in the nucleotides of DNA, two are purines, adenine (A) and guanine (G), and two are pyrimidines, cytosine (C) and thymine (T). Nucleotides are connected to each other in the polymer through their phosphate groups, leaving the bases free to interact with each other through hydrogen bonding. This “base pairing” is specific, so that A interacts with T, and C interacts with G. DNA is ordinarily double-stranded, that is, two linear polymers of DNA are aligned so that the bases of the two strands face each other. Base pairing makes this alignment specific so that one DNA strand is a perfectly complementary copy of the other.

In every strand of a DNA polymer, the phosphate substitutions alternate between the 5´ and 3´ carbons of the deoxyribose molecules. Thus, there is a directionality to DNA: the genetic code reads in the 5´ to 3? direction. In double-stranded DNA, the strand that carries the translatable code in the 5´ to 3´ direction is called the “sense” strand, while its complementary partner is the “antisense” strand.

Figures: Structure of base-paired, double-stranded DNA. Each strand of DNA consists of a backbone of 5-carbon deoxyribose sugars connected to each other through phosphate bonds. Note that as one follows the sequence down the left-hand strand (A to C to G to T), one is also following the carbons of the deoxyribose ring, going from the 5´ carbon to the 3´ carbon. This is the basis for the 5´ to 3´ directionality of DNA. The 1´ carbon of each deoxyribose is substituted with a purine or pyrimidine base. In double-stranded DNA, bases face each other in the center of the molecule and base-pair via hydrogen bonds (dotted lines). Base-pairing is specific so that adenine pairs with thymine, and guanine pairs with cytosine.

Gross Structure

In eukaryotes, the coding regions of most genes are not continuous. Rather, they consist of areas that are transcribed into mRNA, the “exons,” which are interrupted by stretches of DNA that do not appear in mature mRNA, the “introns” (see Figures above). The functions of introns are not known with certainty. A purpose of some sort is implied by their conservation in evolution. However, their overall physical structure might be more important than their specific nucleotide sequences, since the nucleotide sequences of introns diverge more rapidly in evolution than do the sequences of exons. Overall, DNA that contains genes comprises a minority of total DNA. Between genes, there are vast stretches of untranscribed DNA that are assumed to play an important structural role. In the nucleus, DNA is not present as naked nucleic acid. Rather, DNA is found in close association with a number of accessory proteins, such as the histones, and in this form is called chromatin. Although many of DNA's accessory proteins have no known specific function, they generally appear to be involved in the correct packaging of DNA. For example, DNA's double helix is ordinarily twisted on itself to form a supercoiled structure. This structure must unwind partially during DNA replication and transcription. Some of the accessory proteins, for example, topoisomerases and histone acetylases, are involved in regulating this process. 


Genes specify the structure of proteins that are responsible for the phenotype associated with a particular gene. While the nucleus of every human cell contains 30 to 40,000 genes, only a fraction of them are expressed in any given cell at any given time. The “promoter” (with or without an “enhancer”) is the part of the gene that determines when and where it will be expressed. The “coding region” is the part of the gene that dictates the amino acid sequence of the protein encoded by the gene. DNA is a linear polymer of nucleotides. Ordinarily, the nucleotide bases of one strand of DNA interact with those of another strand (A with T, C with G) to make double-stranded DNA. In the cell's nucleus, DNA is associated with accessory proteins to make the structure called chromatin.

mRNA Transcript Analysis

Structural Considerations

The first step in gene expression is transcription of the genetic information in DNA into RNA. The individual building blocks of RNA, ribonucleotides, have the same structure as the deoxyribonucleotides in DNA, except that (1) the 2' carbon of the ribose sugar is substituted with an OH group instead of H; and (2) there are no thymine bases in RNA, only uracil (demethylated thymine), which also pairs with adenine by hydrogen bonding. Just like the DNA polymerases described above, the enzyme RNA polymerase II uses the nucleotide sequence of the gene's DNA as a template to form a polymer of ribonucleotides with a sequence complementary to the Dna template. 

In order for transcription to be “correct,” RNA polymerase II must use the antisense strand of DNA as a template, begin transcription at the start of the gene, and end transcription at the end of the gene. The signals that ensure correct transcription are provided to the RNA polymerase II by DNA in the form of specific nucleotide sequences in the promoter of the gene. After reading and interpreting these signals, the RNA polymerase generates a primary RNA transcript that extends from the initiation site to the termination site in a perfect complementary match to the DNA sequence used as a template. However, not all transcribed RNA is destined to arrive in the cytoplasm as mRNA. Rather, by an incompletely understood process, sequences complementary to introns (see above) are excised from the primary transcript, and the ends of exon sequences are joined together in a process termed “splicing.”

In addition to splicing, the primary transcript is further modified by the addition of a methylated GTP “cap” at the 5´ end, and the addition of a stretch of anywhere from 20 to 40 A bases at the 3´ end. These modifications appear to promote the “translatability” and relative stability of mRNAs and help direct the subcellular localization of mRNAs destined for translation.

Northern Blotting

The fundamental question in the analysis of gene expression at the RNA level is whether RNA sequences derived from a gene of interest are present in cells or tissues. Detecting specific RNA sequences can be accomplished by Northern blotting, the whimsically named analogue of Southern blotting, when applied to RNA analysis. RNA can be isolated from cells in its intact form, free from significant amounts of DNA. Messenger RNA is much smaller than genomic DNA, so it can be analyzed by agarose gel electrophoresis without the enzymatic digestion steps that are necessary for the analysis of high molecular weight DNA. RNA is single stranded and has a tendency to fold back on itself. This allows complementary bases on the same stretch of RNA to base-pair with each other and form what is termed “secondary structure.” Because secondary structure can lead to aberrant electrophoretic behavior, RNA is electrophoretically separated by size in the presence of a denaturing agent, such as formaldehyde or glyoxal/DMSO. After electrophoresis through a denaturing agarose gel, the RNA is transferred to a nitrocellulose or nylon-based membrane in the same manner as DNA for Southern blotting (see Figure 1). Hybridization schemes and blot washing are essentially the same for Northern blotting as for Southern blotting. In this manner, specific RNA sequences corresponding to those in cloned DNA probes can easily be identified.

Poster:   Southern Blot  &  Northern Blot

Poster  Board

Direct download:    Northern-Blot movie  (6.8 MB)

Direct download:    Southern-Blot movie  (1.6 MB)

Figure 1: Genomic Southern blotting. Genomic DNA is digested with a single restriction endonuclease resulting in a complex mixture of DNA fragments of different sizes, that is, molecular weights. Digested DNA is arrayed by size using electrophoresis through a semisolid agarose gel. Because DNA is negatively charged, fragments will migrate toward the anode, but their progress is variably impeded by interactions with the agarose gel. Small fragments interact less and migrate farther; large fragments interact more and migrate less. The arrayed fragments are then transferred to a sheet of nitrocellulose or nylon-based filter paper by forcing buffer through the gel as shown. The DNA fragments are carried by capillary action and can be made to bind irreversibly to thefilter. Now the DNA fragments, still arrayed by size on the filter, can be probed for specificnucleotide sequences using a 32P-radiolabeled nucleic acid probe. The probe will hybridize tocomplementary sequences in the DNA, and the position of the fragment that contains these sequences can be revealed by exposing the filter to x-ray film.


There is a lower limit to the sensitivity of Northern blotting, so that only moderately abundant mRNAs can be detected using this technique. One way to increase the sensitivity of Northern blotting is to enrich the RNA preparation for mRNA. Ordinarily, mRNA makes up less than 10% of the total RNA content of a cell or tissue. When RNA is isolated from these sources, all RNA species are being isolated, that is, ribosomal and transfer RNA as well as mRNA. As noted above, most mRNAs destined for the cytoplasm and translation are modified by the addition of a 3´ poly(A) tract. An RNA preparation can, therefore, be greatly enriched for mRNA species by removing all RNA molecules that lack the 3´ poly(A) tail. This can be done by exposing the RNA preparation to a tract of poly(U) or poly(T) bound to an immobilized support, such as a plastic bead. The poly(A) portion of mRNA will bind to the poly(U) or poly(T) material, and non–poly(A)-containing RNA can be washed away. After washing, the poly(A)-containing mRNA can be recovered from the solid support and used in Northern blot analysis. This procedure improves the sensitivity of Northern blotting by nearly two orders of magnitude.

A dramatic use of Northern blotting in cancer research has been the demonstration of oncogene expression in some human tumors. RNA was isolated from human tumor samples and analyzed by Northern blotting using cloned DNA probes derived from various oncogenes. The earliest observations included expression of c-abl and c-myc in human tumor cell lines and leukemic blasts. Since then, however, a large number of proto-oncogenes have been shown to be transcribed in primary human tumor tissue.

Nuclease Protection Assays (RPA)

Direct download:    RPA movie  (9.4 MB)

Another technique used in the analysis of mRNA is the nuclease protection assay. This assay differs from Northern blotting in two general respects: (1) it is more sensitive than Northern blotting and is therefore used for the detection of rare mRNA species; and (2) it provides detailed structural information about the mRNA being analyzed, and is thus often referred to as “transcript mapping.”

Nuclease protection assays (Figure 2) use a single-stranded radioactive DNA or RNA probe. The nucleotide sequence of the probe contains at least some nucleotides that are complementary to the mRNA being analyzed. The probe is annealed to the target mRNA by base-pairing, and the regions of the probe that are complementary to the target mRNA now become double-stranded, while the noncomplementary regions of the probe remain single-stranded. The annealed mixture is then subjected to digestion with an enzyme specific for single-stranded DNA (usually S1 nuclease), when using a DNA probe, or RNA (usually a mixture of RNase A and RNase T1), when using an RNA probe. The double-stranded annealed areas resist digestion, while all the single-stranded noncomplementary parts of the probe are digested away. In essence, areas in the probe that anneal to the mRNA are “protected” from digestion by the nucleases. The surviving, undigested parts of the probe can then be analyzed by electrophoresis through an agarose or polyacrylamide gel. The amount of radiolabeled probe resistant to digestion is proportional to the amount of target mRNA in the sample.

Figure 2: Nuclease protection assay. In this example, an mRNA containing a point mutation indicated by the inverted triangle in the mRNA on the right) is distinguished from its normal, non-mutated counterpart (mRNA on the left). The mRNA is mixed with a single-stranded 32P-labeled DNA or RNA probe that (1) has sequences perfectly complementary to the nonmutated region of interest in the mRNA, and (2) extends for some length beyond the mRNA. The mixture is heated then cooled to allow the probe to anneal to its complementary sequences in the mRNA. The annealed mixture is then treated with single-strand specific nucleases (S1 nuclease for a DNA probe, or RNases for an RNA probe). This results in digestion of the probe at all single-stranded areas: the extension beyond the mRNA sequences, and the single base-pair mismatch overlying the mutation (right). The radioactive digestion products are then separated by electrophoresis through a urea-containing polyacrylamide gel. The probe that annealed to normal, nonmutated mRNA is smaller than the undigested probe (by the length of the extended region not complementary to the mRNA) and will therefore migrate farther than undigested probe. The probe that annealed to the mutated mRNA will have been digested into two fragments whose summed length will equal that of the digested probe that annealed to nonmutated mRNA.


Nuclease protection assays can also provide structural information about target mRNA sequences. If there are any mismatches in the sequence of the target mRNA compared with the probe, the areas corresponding to the mismatches will generate small single-stranded loops (see Figure 1.11). Since the nucleases that digest the annealed probe/mRNA hybrid are specific for single-stranded nucleotides, any mismatches between probe and target are susceptible to digestion. Thus a mismatch can be detected if the nuclease-digested radiolabeled probe is smaller than would have been expected, or when the probe has been digested into multiple fragments. In fact, by careful measurement of the length of the digested probe, one can determine exactly where the mismatch has occurred in the target mRNA.

This technique has been used to detect single base mutations or small deletions in cellular mRNAs. For example, the proposed pathogenetic role of tumor suppressor genes, such as p53, in cancer depends on the inactivation of these genes, for example, by point mutation. Nuclease protection assays have been used to demonstrate the presence of point mutations in the mRNA for p53 in primary human lung cancer samples.


The flow of genetic information usually runs from DNA to RNA to protein, according to the so-called “central dogma” of molecular biology. There are, however, exceptions to this rule, the most prominent of which involves the life cycle of retroviruses. These viruses encode their genetic information in RNA rather than DNA. When they invade a susceptible host cell, they direct the synthesis of a DNA intermediate that is a complementary copy of their genomic RNA. The enzyme that accomplishes this task, reverse transcriptase, is a DNA polymerase (see above) that uses RNA, rather than DNA, as a template to form a complementary DNA (cDNA) copy of the RNA. This enzyme can be used in vitro to make cDNA copies of any available RNA.

One important application of cDNA synthesis has been the construction of cDNA libraries, analogous to the genomic libraries described above (see Figure 3 and 4). A valuable tool for the analysis of gene expression would be a gene library that consisted only of the genes that were expressed in a cell or tissue of interest. Most of the time, one is really not concerned with all the DNA in the genome, for example, intron sequences, promoters, and vast regions of “uninformative” DNA that lie between genes. Furthermore, if one were interested in analyzing the genes expressed in a brain cell, why bother making a library that contained sequences for the hemoglobin gene? One way to construct a library comprising only tissue-specific expressed genes would be to clone all the mRNA in a specific cell or tissue of interest. Unfortunately, there is no way to ligate single-stranded RNA to a double-stranded DNA cloning vector. However, one can use all the mRNA in a cell as a template for making double-stranded cDNA, which can then be inserted into a cloning vector.

Figure 3: Constructing a genomic library. Genomic DNA and plasmid DNA are cut with EcoRIin preparation for cloning, as in Figure 4. (The vector DNA could also be bacteriophage DNA rather than plasmid DNA). In this case, all of the variously sized EcoRI-produced genomic DNA fragments are cloned individually into the EcoRI site of the plasmid, and the recombinant DNA isintroduced into E.coli by transformation. Transformed bacteria are selected by growth in thepresence of ampicillin, as in Figure 4. Since each bacterium can be transformed by only one recombinant plasmid, and since each colony on the agar plate arose from a single transformedbacterium, each colony (or clone) contains amplified plasmid bearing a single genomic EcoRI fragment. Taken together, all the bacterial colonies represent the entire genetic complement of theorganism from which the original genomic DNA was isolated. Thus, all of the clones on all of the plates can be thought of as a genomic library, with each individual clone representing one volume.

Figure 4: Gene cloning. In this example, a small amount of foreign DNA (a few nanograms) is digested with EcoRI. This foreign DNA can come from any source, the only requirement being that it contains the same restriction endonuclease recognition sites as the vector. Plasmid vector is also digested with EcoRI to create a linear DNA molecule. The “sticky” single-stranded ends of the foreign DNA can align and base-pair with the complementary “sticky ends” of the plasmid, after which DNA ligase covalently bonds foreign DNA to plasmid DNA. This recombinant DNA is introduced into E. coli by a process called transformation. Since the bacteria themselves are not resistant to ampicillin, growth in ampicillin will select only those bacteria that have taken up the plasmid DNA (which carries an ampicillin resistance gene). The plasmid contains a bacterial origin of replication so that as the bacterial culture grows, plasmids replicate resulting in several copies in each bacterium. When the culture has grown to sufficient size, plasmid DNA can be isolated biochemically, foreign DNA can be cut from the plasmid using EcoRI, and the resulting yield will often be milligrams of DNA, that is, greater than a 106-fold amplification.


To make a cDNA library, one isolates all the mRNA from a cell or tissue. Then, using this mRNA as a template, reverse transcriptase makes cDNA copies of each mRNA molecule in the mixture. The cDNA is ligated into a plasmid or phage vector as described above for genomic libraries, and the recombinant vectors are introduced into bacteria. After growth on agar plates, each bacterial colony or phage plaque of a cDNA library houses a unique recombinant vector containing the cDNA copy of a single mRNA. Desired clones can be detected by nucleic acid hybridization to the plaques or colonies using a radiolabeled gene probe. Alternatively, if the vector containing the cDNA molecules can direct transcription of mRNA by host bacterial cells, mRNA will be synthesized, and that mRNA will be translated. In this case, each bacterial colony or plaque will produce a different protein, and each protein will have been encoded by an mRNA from the original cell or tissue being investigated. If an antibody directed against a protein of interest is available, the cDNA clone corresponding to the mRNA that encodes that protein can be identified by binding the antibody to the colonies or plaques of the cDNA library. This technique, called “expression cloning,” often employs the bacteriophage ?gt11 as the cloning vector.

cDNA libraries can be used to clone cDNA for a known gene to discover the sequence of the mRNA it encodes. Alternatively, these libraries can be used to identify previously unknown genes. In a process called “differential screening,” cDNAs can be discovered that owe their existence to a particular differentiation or activation state in the cell of origin. For example, this technique has been used to identify genes whose expression is turned on by hormones or by growth factors. A rapid modification of this technique using PCR (called “differential display”) is described in the next section.

DNA Microarray Analysis

Another approach to comparative gene expression profiling employs the use of DNA microarrays, often referred to as DNA “chips.” Two basic types of DNA microarrays are currently available: oligonucleotide arrays and cDNA arrays. Both approaches involve the immobilization of DNA sequences in a gridded array on the surface of a solid support, such as a glass microscope slide or silicon wafer. In the case of oligonucleotide arrays, 25-nucleotide long fragments of known DNA sequence are synthesized in situ on the surface of the chip using a series of light-directed coupling reactions similar to photolithography. Using this method, as many as 300,000 distinct sequences representing over 6,000 genes can be synthesized on a single 1.3 cm × 1.3 cm microarray. In the case of cDNA microarrays, cDNA fragments are deposited onto the surface of a glass slide using a robotic spotting device. For both microarray approaches, the next step involves the purification of RNA from the source of interest (e.g., from a tumor), enzymatic fluorescent labeling of the RNA, and hybridization of the fluorescently labeled material to the microarray. Hybridization events are then captured by scanning the surface of the microarray with a laser scanning device and measuring the fluorescence intensity at each position in the microarray. The fluorescence intensity of each spot on the array is proportional to the level of expression of the gene represented by that spot. This process is illustrated in Figure 5.

Figure 5: DNA microarray analysis. In this example, RNA extracted from a tumor is end-labeled with a fluorescent marker, then allowed to hybridize to a chip derivatized with cDNAs or oligonucleotides as described in the text. The precise location of RNA hybridization to the chip can be determined using a laser scanner. Since the position of each unique cDNA or oligonucleotide is known, the presence of a cognate RNA for any given unique sequence can be determined.


DNA microarray technology is evolving rapidly, with improvements in miniaturization, reproducibility, production capabilities, and the development of alternative approaches to microarray synthesis. The application of gene expression profiling methods to important questions in biology and medicine is also emerging. For example, DNA microarrays have been recently demonstrated to be useful in understanding the cell cycle, hematopoietic differentiation, responses to serum stimulation, interferon gamma treatment, and cancer classification. The ability to monitor the expression levels of thousands of genes simultaneously offers the potential opportunity to expand the analysis of cancer genetics beyond single–candidate gene approaches, toward considering genetic networks. It is becoming increasingly clear that while some tumors appear to be caused by mutations in a single gene (e.g., oncogene or tumor suppressor gene), most cancers likely arise through the collaboration of multiple genes, none of which, when considered alone, are sufficient for transformation. Until recently, the analysis of such genetic networks has been impractical, in that methods for measuring the expression levels of multiple genes in parallel have not been available. The development of DNA microarrays may, in large part, have solved this problem. Microarrays capable of monitoring the expression levels of the entire human genome (estimated to contain approximately 100,000 genes) are likely to become available in the near future.

The challenge now is not so much how to generate complex gene expression data, rather how to interpret it. The key is to develop methods for recognizing meaningful gene expression patterns and distinguishing those patterns from noise. Such noise (random gene expression levels) can be generated by (1) variability among microarrays, (2) variability in RNA labeling and hybridization methods, and perhaps most importantly, (3) biological variability among samples. It is likely that all of the above sources of variability are significant. It has become clear that the successful elucidation of genetic networks through expression profiling will require the expertise of a new generation of scientists, namely, computational biologists. Improvements in DNA microarray fabrication will only become valuable if pattern recognition algorithms are similarly developed. Nonetheless, it is likely that the future of cancer diagnostics will include the analysis of gene expression profiles which might help guide treatment planning of individual patients.

Polymerase Chain Reaction  (PCR  &  RT-PCR)

real-time qRT-PCR       classical block RT-PCR & competitive RT-PCR

Roche Applied Science - PCR Application Manual 3rd Edition

Another important use of cDNA technology has allowed PCR to be applied to RNA. Since the Taq polymerase is a DNA polymerase (see above), it cannot use RNA as a template. Simply adding primers and Taq polymerase to an RNA preparation will not result in amplification. However, if an RNA of interest could be made into DNA, then PCR would proceed as usual. The first step in this analysis is generating a cDNA copy of the mRNA of interest using reverse transcriptase. This can be done using a primer consisting of Ts (complementary to the poly(A) tail) or of a sequence complementary to some portion of the 3´ region of the mRNA. The 5´ primer can then be added along with Taq polymerase, and the single-stranded cDNA made in the first step will be amplified as described above (see Figure 6). In one of the first applications of this technique, Ph' positive leukemias were diagnosed by identifying chimeric bcr-abl mRNA species in clinical material using PCR. Since then, so-called reverse transcriptase (RT) PCR has come into widespread use.

Figure 6: Polymerase chain reaction (PCR). DNA is mixed with short (10–20 base) single-stranded oligonucleotide primers that are complementary to the 5´ and 3´ ends of the sequence to be amplified. The mixture is heated to dissociate or “melt” all double-stranded DNA, and then cooled to permit the primers to anneal to their complementary sequences on the DNA to be amplified. Note that the 5´ primer will anneal to the “lower” strand, and the 3´ primer will anneal to the “upper” strand. A heat-resistant (thermostable) DNA polymerase (Taq polymerase, see text) was present in the original mixture, and it now synthesizes DNA by starting at the primers and using the strands to which the double-stranded DNA copies for every molecule of double-stranded DNA in the original mixture. The reaction is then heated to melt double-stranded DNA, cooled to allow reannealing, and the polymerase makes new double-stranded DNA again. There are now four double-stranded DNA copies for each original DNA molecule. This process can be repeated n times (usually 20–50) to result in 2'' copies of double-stranded DNA.


One inherent problem in using PCR to monitor mRNA expression is quantitation of the amplified PCR products. In Northern blotting or nuclease protection analysis, the intensity of the hybridization signal is directly proportional to the amount of target RNA in the sample. Thus, one can compare the number of RNA molecules in one sample with another. With PCR, a slight change in the efficiency of polymerization in an early cycle in one sample will lead to a geometrically increasing discrepancy between the amount of amplified product in that sample compared with another sample. Fortunately, a number of techniques have been described for normalizing the products of PCR reactions to allow quantitative comparisons. In general, they involve amplifying an easily distinguishable control RNA template in the same reaction as the RNA of interest. Normalization of the amplified experimental PCR products to the control products then allows comparisons to be made. One application of RT-PCR is a simple method for differential screening (see above) called differential display. Two cell populations to be compared are identified, and mRNA is isolated from both. Reverse transcription and PCR are performed using a poly-T primer, which will anneal to the 3´ poly-A tail of all the mRNA species, and a set of primers with random sequences, which by chance will anneal to sequences upstream of the poly-A tail in all the mRNA species. Since the upstream primer will anneal at random to different mRNA species, the lengths of the PCR products will vary for nearly every mRNA. If the amplification is performed in the presence of radiolabeled nucleotides, the products from the two reactions can be separated on a high-resolution gel. Bands that are much darker in one lane compared with another represent mRNA species that were overexpressed in one cell population compared with another. The cDNA representing this band can be recovered from the gel for further analysis and identification.

Serial Analysis of Gene Expression (SAGE)

Every cell type is thought to have a unique pattern of gene expression, the analysis of which can reveal the underlying mechanism of disease. The most straightforward way to display this unique pattern of gene expression would be to construct a cDNA library from the tissue of interest and sequence every clone. This is obviously an impossible task. Rather, a technique called “serial analysis of gene expression (SAGE)” achieves the same end in a practical manner. In SAGE, the investigator sequences a small and unique fragment of each expressed gene (called a SAGE tag) and quantifies the number of times it appears (called the SAGE tag number). The SAGE tag numbers, therefore, directly reflect the abundance of the corresponding transcript.

The sensitivity and the quantitative accuracy of SAGE are theoretically unlimited. The generation of a SAGE library does not require any prior knowledge of what genes are expressed in the cell of interest. Therefore, unlike DNA chip analysis, SAGE is able to detect and quantify the expression of previously uncharacterized genes. SAGE is based on two fundamental principles:

1. A short (10–11 bp) oligonucleotide fragment (SAGE tag) is sufficient to uniquely identify a specific mRNA transcript or its cognate cDNA. A 10-bp oligonucleotide sequence has a complexity of 410 different combinations. Because there are only about 100,000 genes encoded by the human genome, a 10-bp sequence tag corresponding to a defined position of a cDNA is sufficient to uniquely identify any transcribed human gene.

2. Multiple 10-base-pair SAGE tags can be concatenated in a single plasmid, thereby greatly compressing the number of actual plasmid preparations and DNA sequencing reactions that are required to analyze a large number of genes. In practice, a single sequencing reaction can provide information on 30 to 35 different SAGE tags, and therefore 30 to 35 different genes.

The generation of a SAGE library is a technically demanding, multi-step procedure that has been described in detail. Figure 7 outlines the essence of the method. SAGE has been used to characterize the yeast “transcriptome” (transcriptome is defined as the identity and expression level of all the genes expressed in a cell population at any given time), monitor alterations in gene expression patterns following ionizing radiation, during apoptosis induced by the p53 and the APC tumor suppressor proteins. In all of these cases, the ability to measure the expression levels of thousands of different transcripts simultaneously was extremely useful for the understanding of these physiologic processes. For example, in the case of p53, the analysis of over 100,000 SAGE tags identified not only several novel genes transcriptionally induced by p53, but also the concurrent induction of a group of genes involved in the regulation of cellular redox status. This led the authors to propose a novel mechanism of p53-induced cell death. The application of SAGE to the comparison of the expression profiles of normal and tumor tissues is probably the most attractive one, since by comparing the expression profiles of normal and cancer cells in a comprehensive way, it is possible to identify genes or subsets of genes that could be used as potential diagnostic/prognostic markers or therapeutic targets.

Figure 7: Construction and analysis of SAGE libraries. In step 1, a cDNA library is constructed from the cells or tissue of interest, and the cDNAs immobilized on magnetic beads at their 3' ends. In step 2, the cDNAs are subjected to restriction enzyme digestion with a so-called “anchoring enzyme.” This anchoring enzyme is a “frequent cutter” restriction endonuclease (usually NlaIII) that ensures that all the cDNAs are cut at least once. In step 3, cleaved cDNAs are divided into two pools. Short oligonucleotide linkers are ligated to the newly cut 5' ends of the tags. A different linker is used for each pool (linkers A and B as shown). These oligonucleotide linkers contain a recognition site for a “tagging enzyme”. This tagging enzyme is a type two restriction endonuclease (usually BsmfI) that cuts at some distance to the 3' side of the actual recognition site. In step 4, “SAGE tags” are released by cleavage with the tagging enzyme and further processed to yield blunt ends. In step 5, the free, blunt-ended SE tags are dimerized to yield ditags. These ditags are amplified by PCR using linker A and B primers. Note that each ditag is flanked by the recognition site for the frequent cutter anchoring enzyme used in step 2. These flanking recognition sites serve as “punctuation marks” for sequence analysis of the concatenated SAGE tags. Once sufficient amounts of ditags are generated, they are ligated together in linear arrays containing several ditags (i.e., they are concatemerized) and subcloned into a plasmid that can be used as template for automated sequencing. Each sequenced plasmid can yield data on 30 - 35 SAGE tags. Data are analyzed by using the SAGE-software that reads the sequence obtained, derives the SAGE tags, matches them to their cognate cDNA, and gives the gene expression profile in a numeric format.


One of the more surprising discoveries of the past decade was that some RNA molecules have enzymatic activity. These RNAs, called “ribozymes,” can cleave RNA at sequence-specific sites. They were originally discovered in Tetrahymena, when it appeared that some of the primary RNA molecules in that species were capable of splicing out their introns without the aid of any protein enzymes. Ribozymes have also recently been described in higher organisms, and it is likely that they will be found to play a universal and important role in RNA processing. Sequence-specific ribozymes which will destroy specific mRNAs can be synthesized. One application of this technology is the introduction into malignant cells of ribozymes directed against activated oncogenes. In the laboratory, this technique can reverse the malignant phenotype of some cancer cells.

Gene expression profiling using a novel method:
amplified differential gene expression (ADGE)

Zhijian J. Chen, Hongxie Shen & Kenneth D. Tew
     Nucleic Acids Res. 2001; 29: e46

Amplified differential gene expression (ADGE) is a novel technique, designed to profile gene expression of the whole transcriptome or to compare expression of a set of genes between two samples. ADGE employs hybridization to quadratically amplify the ratio of an expressed gene between control and tester samples before displaying. The subtle structures of adapters and primers are designed for displaying the amplified ratio of an expressed gene between two samples. Four selective nucleotides at the 3' end of primers are used to increase PCR efficiency for targeted molecules and to improve detection of PCR products. Double PCR with the same pair of primers expands the detection range, especially for genes of low abundance. Integration of these steps makes ADGE sensitive and accurate. Application to drug resistant human tumor cell lines showed that ADGE accurately profiled expression levels for induced, repressed or unchanged genes. The qualitative expression patterns for ADGE were verified with RT–PCR. 


The genetic information in DNA is copied, or “transcribed,” into mRNA by the enzyme RNA polymerase II. Before being transported to the cytoplasm, primary transcripts in the nucleus are modified by splicing out introns, adding a 5´ cap and adding a 3´ poly(A) tract. Cytoplasmic mRNA can be detected by Northern blotting, nuclease protection assays, or by modified PCR. Although nuclease protection assays are technically somewhat more demanding than Northern blotting, they are more sensitive and can provide structural information about mRNA transcripts. A retroviral enzyme called reverse transcriptase can make cDNA copies of mRNA transcripts. These cDNAs can be cloned into cDNA libraries, which are useful for isolating and analyzing expressed genes. In the future, ribozymes may be useful for the selective elimination of specific mRNA spezies.

 ©  editor@gene-quantification.info