Identification of Four Soybean Reference Genes for Gene Expression Normalization
- M. Libault, S. Thibivilliers, and G. Stacey, National Center for Soybean Biotechnology, Divisions of Plant Science and Biochemistry, Dep. of Molecular Microbiology and Immunology, Christopher S. Bond Life Sciences Center, Univ. of Missouri, Columbia, MO 65211; D.D. Bilgin, O. Radwan, M. Benitez, and S.J. Clough, Dep. of Crop Sciences, Univ. of Illinois, Urbana, IL 61801; S.J. Clough, USDA-ARS Soybean/Maize Germplasm, Pathology and Genetics Research Unit, Urbana, IL 61801.
Abstract
Gene expression analysis requires the use of reference genes constitutively expressed independently of tissues or environmental conditions. Housekeeping genes (e.g., actin, tubulin, ribosomal, polyubiquitin, and elongation factor 1-α) are commonly used as reference genes with the assumption that they are uniformly expressed. In many cases, however, this assumption was shown to be incorrect. To provide reliable reference genes in soybean [Glycine max (L.)], we surveyed a set of genes that showed little variation in a nodulation study across a series of soybean microarray experiments. More than 200 putative reference genes were identified. We focused on 18 for further analysis using additional cDNA and Affymetrix arrays and quantitative reverse-transcription polymerase chain reactions. Taken together, these experiments allowed us to test the expression stability of these genes in 130 different conditions, confirming four soybean genes as new reference genes (annotated as ATP-binding cassette [ABC] transporter, F-box protein family, metalloprotease, and CDPK-related protein kinase). These genes should be useful for normalization of gene expression studies in soybean, an important crop plant.
DNA microarray hybridization and quantitative reverse-transcription polymerase chain reaction (qRT-PCR) are two complementary techniques commonly used to measure gene expression levels (Clarke and Zhu, 2006). The qRT-PCR technique was shown to be the more sensitive of these two methods (Czechowski et al., 2004) and is often used to confirm DNA microarray hybridization results and to perform more in-depth studies of small sets of genes across many treatments or tissue types.
When using microarrays consisting of a low number of spotted elements, or when using qRT-PCR, quantification of gene expression requires normalization against reference genes. Most studies recommend the use of at least two or three different reference genes as internal controls, to avoid errors that might arise from unstable expression of only one reference gene (Thellin et al., 1999; Vandesompele et al., 2002). A variety of housekeeping genes, such as actin (Li et al., 2005; Williams et al., 2005; Domoki et al., 2006), tubulin (Jeong et al., 2006), glyceraldehyde-3-phosphate dehydrogenase (GAPDH; Sekalska et al., 2006), ribosomal RNA, polyubiquitin, and elongation factor 1-α (EF1-α) genes have been commonly used as reference controls. In many cases, these genes were used without actual testing, on the assumption that they would be constitutively expressed due to their role in basic cellular processes. However, a variety of genes were specifically tested for their usefulness as constitutive reference genes. For example, a number of studies identified useful reference genes including the ones listed above to normalize human (Vandesompele et al., 2002; de Kok et al., 2005; Saviozzi et al., 2006; Willems et al., 2006; Toegel et al., 2007) and animal gene expression (e.g., mouse [Willems et al., 2006; Mamo et al., 2007], cow [Goossens et al., 2005; Robinson et al., 2007]; dolphin [Spinsanti et al., 2006], dog [Etschmann et al., 2006], horse [Bogaert et al., 2006]). Reference genes in human parasites (e.g., Leishmania major; Ouakad et al., 2007) and viruses (Radonic et al., 2005) were also identified. In plants, EF1-α in potato (Solanum tuberosum L.; Nicot et al., 2005), GAPDH, actin and EF1-α in grape (Vitis spp.; Reid et al., 2006), ubiquitin in poplar (Populus spp.; Brunner et al., 2004), and ubiquitin, EF-1α, 18S and 25S rRNA in rice (Oryza sativa L.; Kim et al., 2003; Jain et al., 2006) are considered as useful reference genes. However, not all of these genes showed constitutive expression over all conditions, especially when assayed using qRT-PCR (Thellin et al., 1999; Sturzenbaum and Kille, 2001; Lee et al., 2002; Czechowski et al., 2005; Radonic et al., 2005).
The growing availability of plant genome sequences and functional genomic platforms now makes possible a genomewide approach to identifying useful reference genes, such as previously done in tomato (Lycopersicon esculentum Mill.) (Coker and Davies, 2003). Czechowski et al. (2005) used a similar approach to mine a large number of Affymetrix (Santa Clara, CA) DNA microarray studies available for the model plant Arabidopsis thaliana. Expression of the genes identified was confirmed by qRT-PCR, resulting in the identification of 22 reference genes, which have already been used successfully in other studies (Bethke et al., 2007; Libault et al., 2007).
Plant genomics is revolutionizing our understanding of crop plants. The genome of soybean, the foremost legume crop worldwide, is currently being sequenced by the USDOE's Joint Genome Institute (http://www.jgi.doe.gov/sequencing/DOEprojseqplans.html). In addition, a variety of DNA microarray resources is available for this plant, with several published studies (Thibaud-Nissen et al., 2003; Moy et al., 2004; Vodkin et al., 2004; Zou et al., 2005; Zabala et al., 2006; van de Mortel et al., 2007). Given the increasing interest in the functional genomics of soybean, we sought to identify and experimentally verify a set of genes that can be used as a constitutive reference gene for this plant. Such genes, used together, represent a important molecular biology tool to accurately analyze the expression of soybean genes independent of the method used (array hybridization and qRT-PCR).
Materials and Methods
Affymetrix Data Mining
To identify reliable reference genes, we mined the available gene expression data, as well as results from our own laboratory.
Initial Affymetrix arrays analysis was performed using cDNA reverse transcribed from root hair cells RNA. Soybean (cv. Williams 82) seeds were surface sterilized (twice, 10 min in 20% commercial bleach, followed by three sterile water rinses, a 10-min treatment in 0.1 M HCl, and another three rinses in sterile water) then placed on nitrogen-free B & D agar medium (Broughton and Dilworth, 1971). After 3 d germination in growth chambers (dark conditions, 80% humidity, 27°C), a Bradyrhizobium japonicum suspension (strain USDA110; OD600 = 0.8) or water was sprayed on seedlings. Root hair cells were isolated 3, 6, 12, 18, 24, 36, 48, and 72 h after spraying according to Wan et al. (2005) and stored at –80°C (Supplementary Table 1, indicated in yellow).
Data used to screen the expression of our candidate constitutive control genes was obtained from the following published and unpublished soybean cDNA microarray studies (45 conditions tested; Supplementary Table 1, indicated in green): susceptible (cvs. AC-Colibri and Williams 82) and partially resistant soybean plants [80(30)-1 (Cober et al., 2003) and PI 194639] infected with Sclerotinia sclerotiorum; Williams 82 and PI 194639 treated with oxalic acid; Williams 82 (RPG1 dominant) inoculated with Pseudomonas syringae pv. glycinea with or without the avirulence gene avrB (Zou et al., 2005); Williams 82 treated with herbicides Atrazine and Bentazon; and the hypernodulated mutant SS2-2 uninoculated or inoculated with B. japonicum. In addition, the Affymetrix array analysis performed by van de Mortel et al. (2007) was included in our analysis (40 conditions tested). This study focused on Asian soybean rust (caused by Phakopsora pachyrhizi) and examined the response of resistant and susceptible soybean plants to the disease (Supplementary Table 1, indicated in gray).
Plant Material Used for qRT-PCR Reactions
Soybean Tissues
Seed of Williams 82 was grown in the greenhouse under long-day conditions (16/8 h, day/night), at 27°C on Promix Bx soil (Premier Horticulture, Dorval, QC, Canada). Fourteen-day-old shoot apexes (including apical meristem and early leaves), trifoliate leaves, stem and roots of 18-d-old plants, flowers, seeds, and pods (R6 stage) were harvested. Root hair cells, naked root (roots with root hairs removed), and root tips was isolated from 3-d-old seedlings grown in vitro as described previously (Wan et al., 2005). (See Supplementary Table 1, indicated in blue.)
Nodulation
Seed of Williams 82 was surface sterilized according to Wan et al. (2005) and grown for 3 d between moist Whatman filter paper in the dark, in 80% humidity, at 27°C. Seedlings were transferred to a vermiculite–perlite mix (3:1), and 1 mL of B. japonicum suspension (OD600 = 0.1) was used to inoculate the plants. The control plants were similarly treated with an equal amount of water. Roots were harvested 4, 8, and 24 d after inoculation. Root apexes of treated and mock conditions, where nodulation did not occur, were removed before freezing in liquid nitrogen. (See Supplementary Table 1, indicated in blue.)
Wounding and Hormonal Treatments
Seed of Williams 82 was grown in the greenhouse under long-day conditions (16/8 h, day/ night), at 30°C on Promix Bx soil (Premier Horticulture) for 2 wk before treatment. The first true leaves were wounded by perforation (12–15 wounds per leaf). Untreated leaves were used as control. For hormone treatments, 3 mM SA or 400 μM JA in 0.45% ethanol were sprayed on the leaves of each plant until fully covered. As a control, leaves were also sprayed with an ethanol solution at 0.45%. Plants were covered and grown for 12 h, uncovered, and then tissue harvested 24 and 72 h after treatment. (See Supplementary Table 1, indicated in red.)
Asian Soybean Rust Inoculation
The central leaflet of trifoliate soybean leaves (cv. Boggs; Boerma et al., 2000) was harvested in the field from 3-month-old plants (R5/R6 stage). The leaves were classified as being low, medium, and highly infected by Phakopsora pachyrhizi on the basis of the number of pustules found on the leaves. (See Supplementary Table 1, indicated in red.)
RNA Extraction, DNase Treatments, and Reverse Transcription
Total RNA was isolated from the collected seedlings using Trizol Reagent (Invitrogen, Carlsbad, CA) and subsequently purified using chloroform extraction. Purified RNA (250 ng μL−1 final concentration) was further treated with TURBO DNase (Ambion, Austin, TX) to remove any contaminating genomic DNA according to the manufacturer's instruction. Approximately 2.5 μg of DNA-free RNA were used for first-strand cDNA synthesis using the M-MLV for reverse-transcription-polymerase chain reaction (RT-PCR) (Promega, Madison WI) according to the manufacturer's instruction. The lack of genomic DNA contamination was verified by qRT-PCR using primers able to amplify genomic DNA.
DNA Microarray Data Sets Analyzed
RNA was labeled following the manufacturer's protocol (Affymetrix, Santa Clara, CA) and hybridized to Affymetrix GeneChip Soybean Genome Arrays. Chips were hybridized and scanned at the Keck Center for Functional Genomics on the University of Illinois, Urbana, campus to obtain intensity values. Raw intensities were background corrected and normalized using the justGCRMA module of BioConductor for R. Analysis of variance and multiple testing false discovery rates (FDR) p-values were calculated in SAS (SAS Institute, Cary, NC). For the cDNA projects, images and intensity values were obtained largely as in Zou et al. (2005).
Quantitative PCR Primer Design
Primers used for qRT-PCR were designed using Primer3 software (Rozen and Skaletsky, 1998) with the following criteria: TM of 60°C, polymerase chain reaction (PCR) amplicon lengths of 80 to 120 bp and yielding primer sequences with lengths of 19 to 23 nucleotides with an optimum at 21 nucleotides and guanine-cytosine contents of 40 to 60%.
Quantitative PCR Reaction Conditions and Data Analysis
The qRT-PCR reactions were performed in a 384-well plate format (7900 HT Sequence detection System; Applied Biosystems, Foster City, CA) for tissue and nodulation analysis. The reaction set-up was performed on the robotic Evolution P3 liquid handling system (PerkinElmer, Wellesley, MA). Wounding, hormonal treatment, and Asian soybean rust inoculation analyses were performed with a 96-well plate qRT-PCR machine (7500 Real-Time PCR System; Applied Biosystems). SYBR Green PCR Master Mix (Applied Biosystems) was used to quantify cDNA synthesis. Primer sets (0.2 μM final concentrations for each primer) were used for a final volume of 5 and 10 μL, respectively, with the 384-well and the 96-well plate qRT-PCR machine. The thermal profile of the qRT-PCR reactions was 50°C for 2 min, 95°C for 10 min, 40 cycles of 95°C for 15 sec, and 60°C for 1 min. To analyze dissociation curve profiles, the following program was added after the 40 PCR cycles: 95°C for 15 sec followed by a constant increase of the temperature between 60 to 95°C. The data were analyzed with two different software packages: qRT-PCR results were analyzed with the SDS 2.2.1 software and the 7500 System v.1.3.0 (Applied Biosystems), respectively, when the 384-well and the 96-well plate qRT-PCR machines were used. An Rn threshold of 0.1 was selected to obtain the cycle threshold (Ct) values (automatic background subtraction). Polymerase chain reaction efficiencies (Peff) were calculated according to a linear regression analysis with the LinRegPCR software (R 2 value > 0.995) (Ramakers et al., 2003). As proposed by Vandesompele et al. (2002), we used the geometric mean of the Ct values of the 14 genes (Ctgene) analyzed as reference (Ctref) for each condition tested. The Ctref value was subtracted from Ctgene value for each gene analyzed (ΔCt). The expression levels (E) of each gene were calculated according to the equation: E = Peff (-ΔCt). When technical replicates were performed using the same template, we calculated the average expression level between the technical replicates. The Peff, Ctgene, Ctref values are detailed in Supplementary Table 2.
Results
Identification of Candidate Soybean Reference Genes
In the process of conducting soybean DNA microarray hybridizations, it became important to identify useful reference genes to confirm the microarray results by qRT-PCR. In search of constitutively expressed reference genes, we screened a set of available Affymetrix microarray hybridization results from an experiment examining the interaction between soybean root hair cells and B. japonicum, a nitrogen-fixing symbiont of soybean. This initial set examined 16 different conditions (Supplementary Table 1, indicated in yellow) and provided a list of putative constitutively expressed genes. The screen was conducted by sorting differential expression data by multiple-testing FDR corrected p-values (Benjamini and Hochberg, 1995) of significance associated with inoculation effect after running an ANOVA on array probe set intensities. We took all genes with an FDR corrected p-value > 0.99 (i.e., genes showing no significant differential expression at alpha level = 0.99 in response to inoculation throughout the study), which totaled 217 genes (Supplementary Table 3).
From this list of 217 genes, 18 putative constitutive genes (cons1 to 18; Table 1) were further selected for detailed analysis. This selection was based on the array probe set intensity: cons1 to 10 were strongly expressed, while cons11 to 18 were expressed at a lower level (Table 1). Moreover, the expression level stability of the polyubiquitin (SUBI2; GB ID D26092) and actin (ACT; GB ID AW350943) soybean genes were compared to the new reference genes.
Selection of 18 putative soybean reference genes from Affymetrix array hybridization. The National Center for Biotechnology Information gene identification number (geneID) and putative functions based on the BLAST of full-length gene sequences. false discovery rate (FDR) adjusted p values, fold-changes, and probe set intensity averages are provided. For convenience, these genes are named cons1 to cons18.
The expression levels of the 18 cons genes were analyzed by mining the data available from various soybean Affymetrix (40 conditions tested; Supplementary Table 1, identified by gray color) and cDNA microarray experiments (45 additional conditions; Supplementary Table 1, green color) derived from a number of studies. Using geNorm, we identified the best reference genes in both sets of array experiments. geNorm calculates the mean pairwise variation (M) for a gene in comparison to the other genes tested. Genes with low expression stability will have a high M value. Moreover, this software provides a ranking of the genes tested according to their expression stability based on a stepwise exclusion process. Finally, geNorm calculates the optimal number of reference genes to run to perform an accurate normalization of the gene expression levels.
According to this analysis, the most uniformly expressed genes were cons4, 6, 7, 8 and cons4, 5, 7, and 8 based on the Affymetrix (Fig. 1A) and cDNA array analysis, respectively (Fig. 1B). By comparison, the cDNA array analysis clearly showed that SUBI2 gene was not as stably expressed as cons4, 5, 7, 8 or ACT genes (Fig. 1B). Moreover, among the six more stably expressed genes based on these Affymetrix and cDNA array studies, only cons11 was expressed at a low level (Table 1). This result is likely a reflection of the inability of DNA array studies to accurately measure genes with low gene expression. To address this issue, we also analyzed gene expression using the more sensitive qRT-PCR method.
Ranking of soybean genes based on their expression stability after Affymetrix and cDNA array hybridization. Average of (A) Affymetrix and (B) cDNA array probe intensities of 18 cons genes and 20 putative reference genes (18 cons genes, SUBI2, and ACT) tested, respectively, in 40 and 45 different conditions, which allowed us to identify the most stably expressed genes by using geNorm software. Genes with the most constitutive expression are indicated on the right of each graphic; the less stably expressed are on the left. The y-axis indicates the average of expression stability, M.
Confirmation of the Expression Stability of the Four New Reference Genes by Quantitative PCR Analysis
Specific primer sets were designed for qRT-PCR for each of the 18 cons, SUBI2, and ACT genes (Table 2). To test the specificity of these primers, standard RT-PCR and agarose gel electrophoresis was performed using the 18 primer sets (Supplementary Fig. 1A). Only the cons18 primer set did not yield the expected amplicon.
Sequences of quantitative reverse-transcription polymerase chain reaction primers used for each of the 18 cons, ACT, and SUBI2 genes. Forward and reverse primer sequences are provided. The length of each amplicon is indicated.
Quantitative PCR reactions were performed on 48 different templates (16 conditions): 10 different soybean tissues (root tip, naked root, root hair cells, whole root, stem, leaf, apical meristem, flower, pod, and seed) and roots uninoculated or inoculated with B. japonicum and harvested 4, 8, and 24 d after inoculation (6 conditions). For each condition, three independent biological materials were analyzed (Supplementary Table 1, blue color). Based on qRT-PCR, four of the primer sets (cons1, 12, 17 and 18) did not produce PCR products in at least one of the conditions tested (Supplementary Fig. 2). Although standard PCR showed amplification for the other 14 genes tested, qRT-PCR reactions revealed nonspecific amplification after dissociation curve analysis for cons2 and 17 (Supplementary Fig. 2). Similar analysis on the 13 other genes tested (cons3 to 11, cons13 to 16) did not show nonspecific amplifications (Supplementary Fig. 1B–E and Supplementary Fig. 2). We confirmed the specificity of all primer sets by sequencing each amplicon (data not shown). Complementary to the specificity of the primers used, we used LinRegPCR software (Ramakers et al., 2003) to quantify the efficiency of the primer sets. Ideally, the maximum value of Peff is 1 (i.e., 100% efficient). Based on the analysis of Peff on the 48 templates, a very high qPCR primer efficiency was found for each primer set tested (average > 0.96), except the cons9 primer set (Peff average = 0.91) (Supplementary Fig. 1F).
Using the geNorm software, we initially analyzed the expression stability of each of the target genes in 10 tissues (3 biological replicates; 30 templates; Fig. 2A), then during nodulation (6 conditions, 3 biological replicates for each condition; 18 templates; Fig. 2B), and finally in the 16 conditions combined together (48 templates; Fig. 2C). Four genes (cons4, 6, 7, and 15) were the most stably expressed independent of the data set analyzed (Fig. 2C). The expression of the other genes tested varied more compared with the reference genes. For example, compared with the other putative reference genes, SUBI2 and cons5 ranking showed that they are better reference genes in the different soybean tissues than during nodulation. On the other hand, cons16 was more uniformly expressed compared with the other reference genes tested during nodulation when compared with expression in various soybean tissues (Fig. 2A and 2B).
Ranking of soybean genes based on their expression stability after quantitative reverse-transcription polymerase chain reaction. Expression levels of 15 putative reference genes (13 cons genes, SUBI2, and ACT) (A) in 10 tissues (30 cDNA samples; (B) during nodulation (18 cDNA samples); and (C) in both experiments together (48 cDNA samples). All conditions were considered to identify the best reference genes in soybean. Genes with the most stable expression are indicated on the right of each graphic; the less uniformly expressed are on the left. The y-axis indicates the average of expression stability, M. Complementary to the ranking of the 15 reference genes tested, geNorm calculates an optimal number of reference genes for normalization of gene expression levels (D) (x axis). This number is calculated using a pairwise variation (V) analysis (y axis). Our analysis indicates that four to five reference genes should be used together to perform a reliable normalization (V values under 0.15 threshold).
The geNorm analysis of the cons gene expression levels based on the 16 conditions suggests the use of four to five different constitutive genes for soybean gene expression normalization (Fig. 2D). The genes cons4, 6, 7, and 15 were most stably expressed in all conditions tested (Fig. 2C).
Having reference genes with differing levels of absolute expression provides for better normalized expression of genes with high or low expression levels. Of the identified soybean reference genes, three (cons4, 6, and 7) are highly expressed, while cons15 will be useful for normalization of low gene expression levels (Fig. 3).
Expression levels of four new reference genes in 48 different conditions. Expression levels of four putative reference genes cons4 (black diamonds), 6 (white squares), 7 (black triangles), and 15 (white circles) in 10 different tissues (root tip [1–3], naked root [4–6], root hair cells [7–9], root (10–12), stem (13–15), leaves (16–18), apical meristem [19–21], flower [22–24], pods [25–27], and seeds [28–30]), and during nodulation (4, 8, and 24 d after inoculation [31–33: 4 d inoculated; 34–36: 4 d uninoculated; 37–39: 4 d inoculated; 40–42: 8 d uninoculated; 43–45: 4 d inoculated; 46–48: 24 d uninoculated]). Expression levels are normalized against the average of the cycle threshold (Ct) values for the 14 genes analyzed (y axis; log10 scale).
Confirmation of Stable Expression of the Four Soybean Reference Genes
To test cons4, 6, 7, and 15 as reference genes, their expression patterns were analyzed by qRT-PCR in three other experiments: leaves harvested 24 and 72 h after wounding, after JA and SA treatment, and plants infected by Asian soybean rust with low, medium, and high lesion density (Supplementary Table 1, red color). Three independent biological replicates were analyzed for each experiment. As described above, the expression level of each gene was calculated using the geometric mean of the Ct values of cons4, 6, 7, and 15 as Ctref and analyzed by geNorm.
Reference gene expression levels were very similar through the different conditions tested, with the exception of cons4, which was expressed at a lower level in soybean leaves with low Asian soybean rust lesions (Fig. 4A). geNorm analysis performed on this second data set confirmed the higher stability of cons 6, 7, and 15 compared with cons4 (Fig. 4B and 4C). Lower stability of cons4 gene expression compared with cons6, 7, and 15 was previously described (Fig. 2C, nodulation and tissues qRT-PCR data analyzed together), but cons4 was found more uniformly expressed than cons15 when nodulation and tissues qRT-PCR data were analyzed separately (Fig. 2A and 2B).
Confirmation of cons4, 6, 7, and 15 as reference genes. (A) Expression levels of four putative reference genes cons4 (black diamonds), 6 (white squares), 7 (black triangles), and 15 (white circles) in soybean leaves after different stresses: plants infected by Asian soybean rust (low [1–3], medium [4–6], and high lesions density [7–9]), harvested 24 and 72 h after jasmonic acid and SA treatment (24 h: 10–18; 72 h: 25–33), and wounding (24 h: 19–24, 72 h: 34–39). Expression levels were normalized against the average of the cycle threshold (Ct) values for the four genes analyzed (y axis; log10 scale). (B and C) geNorm analysis of gene expression stability of cons4, 6, 7, and 15 including (B) or not including (C) the low-density rust-inoculated samples.
Discussion
The analysis of gene expression is one of the most common strategies used to identify genes involved in a specific biological process. A critical step in gene expression analysis is data normalization. Different strategies are used to normalize the data: use of an internal reference gene, spiking RNA (also called “external control”), and averaging the expression of a very large number of genes (mainly used in microarray analysis involving a large number of spotted elements). Statistical analysis is essential to provide confidence in the results obtained. Quantitative RT-PCR analysis of gene expression using an internal control is commonly used and is highly sensitive and accurate. Identification and validation of the stability of gene expression is essential for the appropriate use of this method. However, in many cases, the choice of reference genes was based more on assumption than on hard data.
The QRT-PCR in a study of soybean leaves infected with P. syringae (Zou et al., 2005) was standardized to an actin gene (GB ID AW350943) based on the lack of fluctuation of this gene during that specific microarray analysis. Likewise, SUBI2, encoding a polyubiquitin, was used to normalize soybean gene expression data in a whole-root nodulation study (Brechenmacher et al., 2008). However, although these two genes served those specific studies well, they are not the best choice to analyze soybean gene expression over a wider range of conditions. In the present study, we mined multiple soybean microarray hybridization datasets to identify additional putative reference genes that exhibited constitutive expression levels over a large number of conditions. Subsequent analysis of these genes by qRT-PCR using gene specific primers and an additional list of 29 conditions identified four soybean genes (cons4, 6, 7, and 15) as better candidate reference genes. However, the expression of cons4 was later found to vary under differing conditions of fungal infection.
Comparison of Standard and New Soybean Reference Genes
Genes like ubiquitin and actin are commonly used as references to normalize plant gene expression data. The current study shows SUBI2 and ACT genes are not as stably expressed across all conditions compared to the reference genes identified. For example, both genes were not identified as a stably expressed gene by our initial mining of the available soybean microarray data (Supplementary Table 3). Our more detailed analysis of the cDNA array hybridization data, as well as our qRT-PCR reactions, also identified SUBI2 as the 9th and the 8th most uniformly expressed gene out of the 20 and 15 genes tested (Fig. 1B and 2C). Although the ACT gene is one of the most stably expressed genes based on cDNA array analysis (7th most uniformly expressed gene in our set of 20), our qRT-PCR analysis did not support the use of ACT as a constitutive gene (13th most uniformly expressed gene in our set of 15; Fig. 1B and 2C). Moreover, cons4, 6, 7, and 15 were clearly more stably expressed than SUBI2 and ACT. The available annotation for these four genes suggests that they encode genes similar to an ATP-binding cassette transporter (cons4), an F-box protein family (cons6), an insulin-degrading enzyme (cons7) and a CDPK-related protein kinase (cons15). ATP-binding cassette transporters are known to play a variety of cellular roles such as auxin transport, plant defense system, heavy metal tolerance, and stomatal function (Rea, 2007). It should be noted that annotations are based on percentage similarity to other sequences in databases of which only a small percentage have actually been functionally characterized and proven to have that designated function. Therefore, annotations based on BLAST searches only serve to give possible function, not necessarily true function. Constitutive expression of such a gene suggests a general role in cellular metabolism. Cons7, annotated as an insulin-degrading enzyme, appears to encode a general metalloprotease and constitutive expression of such a gene is not surprising. Cons6 (F-box protein family) and cons15 (CDPK-related protein kinase) are likely involved in signal transduction and, therefore, it is a bit surprising to see that they are so uniformly expressed. Clearly, they must exert their regulatory roles through other means besides transcriptional regulation. Regardless of the annotations or putative functions of these four genes, the data indicate that they are very stably expressed over a wide range of conditions.
Interestingly, an F-box protein was identified in Arabidopsis as a reference gene (Czechowski et al., 2005). Looking for the likely orthologs of cons4, 6, 7, and 15 in A. thaliana, we identified AT1G06110 as the likely cons6 ortholog (BLASTX; score: 170; e-value: 7e-43). No obvious orthologs were identified in Arabidopsis for the other three soybean reference genes. According to the Genevestigator database (Zimmermann et al., 2004), AT1G06110 expression is very stable under different biotic and abiotic stresses (Supplementary Fig. 3A). However, AT1G06110 expression varied approximately twofold when different plant organs or developmental stages are compared (Supplementary Fig. 3B and C).
Selection of Valuable Reference Genes for qRT-PCR and Array Hybridization
The same genes (cons4, 7, and, in a less measure, cons8) were identified as the most uniformly expressed genes by comparing both the microarray and qRT-PCR results. Cons6, whose expression level was very stable based on qRT-PCR reactions, did not appear to be as stably expressed based on the results of cDNA hybridizations. Soybean is an ancient tetraploid with some paralogous genes showing a high level of sequence identify. Therefore, it seems reasonable that cross-hybridization on the microarrays by paralogous genes may be the reason for the varying results obtained with cons6. The qRT-PCR reactions using gene specific primers are less likely to be affected by a highly homologous sequence.
Cons 15, identified as a reference gene by qRT-PCR, was also not as uniformly expressed based on array hybridization data (Fig. 1). The expression level of cons15 is significantly lower than the other reference genes analyzed. It is well known that DNA microarray hybridizations are not as sensitive as qRT-PCR and are less accurate at measuring low abundance mRNA (Coker and Davies, 2003; Czechowski et al., 2004). Therefore, we recommend cons15 as a reference control for experiments focused on measuring the expression of genes expressed at a low level.
Cross-Comparison of the Sensitivity of Array Hybridization and qRT-PCR Reactions
When comparing expression levels of the four reference genes between the different experiments, cons6 and cons7 were expressed at very similar levels independent of the technique used (cDNA array hybridization and qRT-PCR reactions; Supplementary Table 4). Cons15 was consistently expressed at lower levels than cons7 (6.5 to 168 times according to the technique used; Supplementary Table 4). Interestingly, when cons15 probe set intensity was very low in an array study, we could still easily detect its gene expression by qRT-PCR (25.8 PCR cycles based on the first set of qRT-PCR experiments). Because of the global low expression of cons15, we suggest to use this gene as reference gene when gene expression is quantified by qRT-PCR.
Relative to the expression levels of cons6, 7, and 15, cons4 expression varied significantly between the different experiments. Although our initial analysis (based on DNA microarray and qRT-PCR analysis) suggested cons4 as a useful reference gene, subsequent analysis in response to wounding, JA, SA, and Asian soybean rust inoculation showed more variation in its expression when compared with cons6, 7, and 15 (Fig. 4A). The results obtained with cons4 serve as a reminder that users of any reference gene should still validate its constitutive expression under the specific conditions being analyzed.
Conclusions
We identified four new constitutively expressed reference genes that should be useful for normalization of soybean gene expression data. The recommended use of these reference genes is not based on assumptions about the level of expression or annotation but on actual analysis of their expression under 130 different conditions. Given the growing resources for soybean genomics and the soon-to-be-released genome sequence, these reference genes should be useful for furthering soybean functional genomic research.
Acknowledgments
This work was funded by a grant from the National Science Foundation, Plant Genome Program, #DBI-0421620. We thank Laurent Brechenmacher, Bernarda Calla, and Jin Zhu for providing unpublished microarray expression data to survey, and Roger Boerma for providing the soybean leaves infected by the Asian soybean rust. We thank Germán Bollero for his advice on the statistical analysis that led to the original identification of 217 candidate soybean reference genes. We thank Andrea Hurley-Sommer for critical reading of the manuscript.
Footnotes
-
↵ * Corresponding author (staceyg{at}missouri.edu).
-
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.
-
- Received February 13, 2008.
- Accepted July 16, 2008.
- Copyright © 2008 Crop Science Society of America




