This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Total RNA samples from shoot apex (SA) and young leaf (YL) were loaded on the formaldehyde denaturing agarose gel (1.2%) containing SYBR? Green II RNA Gel Stain (0.1 ?l/ml; Cat. #S7564; Invitrogen, Carlsbad, CA), and run in 1X MOPS buffer along with RNA Millennium? Size Markers (Cat. #7151; Ambion, Austin, TX). (M).
Figure 3-6. The effect of the log transformation upon the distribution of the spot intensity values.
The left four panels show the histogram of the background corrected mean spot intensity values without the log transformation. The spot intensity ranges spanned a very large interval and the distributions were left-ward skew having a very long tail towards high spot intensity values. The right four panels show the distributions of the same values after the log transformation, in which the distribution of data became symmetrical and almost normal. The plots were generated in Microsoft Excel 2007 software.
Figure 3-5. Portions of composite images of scanned arrays.
Among the 32 blocks (8x4) of array spots, 8 blocks (2x4) are shown here. In array 1, cDNA targets derived from shoot apices were labeled with green dye (Alexa? Fluor 555) and cDNA targets derived from young leaves were labeled with red dye (Alexa? Fluor 647). In array 2, the dyes were swapped. The red fluorescence from the young leaf cDNA targets was more intense than the green from the shoot apex cDNA targets in array 1, where as the red fluorescence from the shoot apex cDNA targets was obscured by the higher intensity of green fluorescence from the young leaf cDNA targets in array 2, where the fluorophores were swapped. To prepare the images, each of two microarray slides was scanned twice; once for excitation of the green channel, Alexa? Fluor 555 and the second for the red channel, Alexa? Fluor 647. The images captured from each excitation channel were combined together to produce a composite image for each array.
Figure 3-7. Scatter plots showing the effect of array spot intensity data normalization.
The Upper panel represents the intensities from the shoot apex sample vs. the young leaf sample before normalization; and, the bottom panel, after normalization. The intensities from the young leaf sample are plotted on the horizontal axis, while those from the shoot apex sample are plotted on the vertical axis, after log2 transformation. Points above the diagonal (y = x) represent spots with higher intensities in the shoot apex whereas points below the diagonal represent spots higher in the young leaf. Intensity differences increase with distance from the diagonal line, reflecting the the gene expression levels in the two tissue types. Before normalization, most spots appeared to have lower intencities in the shoot apex compared to the young leaf since most of the spots were plotted below the diagonal. However, when the data were normalized, most points moved toward the diagonal. The color legend indicates the approximate scale of log2 intensity level; blue is lower and red is higher. The two green lines above and below the y=0 diagonal line indicate the fold change, which is 2.0. Thus, the points that fall above the upper green line represent the genes that were preferentially expressed in the shoot apex than in the young leaf at 2-fold or more. Below the lower green line, are those expressed preferentially in the young leaf than in the shoot apex at 2-fold or more.
Figure 3-11. Gene counts involved in manually categorized biological processes.
The numbers of genes in each category are given in Table S-3 on page 142 in "Supplementary Data". The manual categorization of the genes based on the TAIR GO slim, TAIR Gene Ontology and other protein databases ("2.9.2 Functional Annotation" on page 51 in "Materials and Methods") revealed some of the biological processes, in which only genes that were preferentially expressed in either type of the tissue (shoot apex or young leaf) were involved. The shoot apex includes cytoskeletal organization, hormon metabolism, protein sorting/targeting, whereas the young leaf includes cell cycle cotrol/DNA repair, energy/respiration, photosynthetis dark reaction, carbohydrate metabolism and chlorophyll degradation. Many genes, which were preferentially expressed in the shoot apex or in the young leaf, fell into the same functional categories, including photorespiration, photosynthesis light reaction, amino acid metabolism, lipid metabolism, protein translation, post-translational modification, protein folding, proteolysis, transcription and post-transcriptional processing, inter-/intra-cellular transportation and signal transduction. The genes categorized in 'response to abiotic stimulus' or 'response to biotic stimulus' have not been annotated in further detail. The 'other metabolism' and 'other cellular process' means that the particular gene is involved in a known biological process(es), but not included in this categorization. There were 29 genes (10 from the shoot apex and 19 from the young leaf) that have no annotated functions.
Figure 3-8. MvA plot showing the effects of normalization.
A distribution of spot intensity data was visualized on a MvA plot before (upper panel) and after normalization (lower panel). M on the vertical axis is the log2 transformed spot intensity ratio between two channels (shoot apex vs. young leaf), and A on the horizontal axis is the average spot intensity of the two channels after the log2 transformmation. MvA plots reveal intensity dependent biases as well as extra variation at low intensity spots. In this MvA plots, which were generated in the GeneSpring GX 10 software, a high variation at low intensity spots was observed.
Figure 3-9. Box plot representing summary statistics of normalized spot intensity data.
Each set of the box and tail represents each data set from four channels on the two microarrays. The upper and lower boundaries of the box show the location of the upper quartile (UQ) and lower quartile (LQ), which are the 75th and 25th percentiles, respectively. The central line in the box shows the position of the median (50th percentile) of each data set. Thus, the box represents the interval that contains the central 50% of the data. The length of the tails, blue lines attached to UQ and LQ, are 1.5 times the interquartile distance (IQD). The data represented as red bars fall beyond [UQ+1.5 IQD] or [LQ-1.5 IQD] and are considered as outliers. By Quantile normalization, the median and the percentiles became uniform across the channels and arrays. The box plot was generated in the GeneSpring GX 10 software.
Figure 3-4. The amounts of fluorescently labeled cDNA.
To calculate the total amounts of the fluorescently labeled cDNA in the purified target samples, absorbance were measured at 550 nm, 650 mn and 750 nm using a 'Multiple Wavelength Mode' in a Beckman DU-600 spectrophotometer (Beckman, Fullerton, CA). The amounts of cDNA targets labeled with Alexa? Fluor 555 and Alexa? Fluor 647 were converted into picomoles using the formulas given in Equation 2-4 and Equation 2-4 on page 39, respectively.
Figure 3-2. Agarose gel electrophoresis of first-strand cDNA products.
First-strand cDNA samples, synthesized from the total RNA templates, were loaded on the standard agarose gel (1%) containing SYBR? Gold Nucleic Acid Gel Stain (Cat. #S11494; Invitrogen, Carlsbad, CA), and run in 1X TBE buffer, along with 1 Kb Plus DNA Ladder? (Cat. #10787-018; Invitrogen, Carlsbad, CA). SA, shoot apex; YL, young leaf; M, size marker.
Figure 3-12. The representative electronic fluorescent pictographs (eFPs).
The upper panel shows the change in expression level of the gene (AT2G226100) in the developmental course, which was expressed predominantly in the shoot apex and involved in cytoskeletal organization. The lower panel represent the expression pattern of the gene (AT3G62410), which was expressed preferentially in the young leaf and involved in the dark reaction of photosynthesis. Red indicates the highest expression level and blue the lowest.
Figure 3-10. A 'volcano' plot and a scatter plot showing differentially expressed genes.
The volcano plot (upper panel) represents the genes that passed or failed the assigned criteria for screening. In the volcano plot the negative log10 of p-value [-log10(p)] was plotted against normalized log ratio [log2(ratio)]. Vertical green lines represent a 2-fold difference in the expression levels between the shoot apex and the young leaf tissues. Horizontal green line represents the t-test p-value of 0.05. Genes with large fold-differences and low p-values are easily identifiable graphically. The genes that were differentially expressed while satisfying the p-value cutoff 0.05 were shown in red and the remainder in grey. The red spots in the 'volcano' plot were also depicted in a scatter plot (lower panel). The spots above the upper diagonal represent the genes that are preferentially expressed in the shoot apex at 2-fold or more at 0.05 p-value cut-off, while those below the lower diagonal represent those were expressed predominantly in the leaf at the same criteria. The 'volcano' plot and the scatter plot were generated in the GeneSpring GX 10 software.
Figure 3-3. Total amounts of amino-modified cDNA.
A260 and A320 for each of the purified target samples from the shoot apex and the young leaf were measured using a Beckman DU-600 spectrophotometer (Beckman, Fullerton, CA). The absorbence readings were converted into the total amounts of amino-modified cDNA in picomoles using the formula given in Equation 2-2 on page 39. The yields of amino-modified cDNA in the shoot apex sample were much lower than those in the young leaf, regardless the fluores used.
Part 3. Results
3.1 Yield and Quality of Total RNA and Amino-Modified cDNA
The yield and purity of genomic DNA-free total RNA isolated from the shoot apex and the young leaf samples were determined by spectrometry (Table 3-1). The concentration of total RNA obtained from shoot apices was approximately 1.7 times higher than that of young leaves. The ratios between absorbance readings measured at 260 nm and 280 nm were 2.1 in both samples, indicating that the purity of the isolated total RNA samples was adequate for use in the microarray application (Kleber and Kehr, 2006; Qiagen, 2001).
The quality and integrity of the total RNA were also assessed using formaldehyde denaturing agarose gel electrophoresis (Figure 3-1).The electrophoretogram of total RNA samples clearly showed heavy bands of high-concentration corresponding to 25S and 18S ribosomal RNAs, and the 25s rRNA band appeared more abundant than the 18s rRNA band, demonstrating that the samples were intact and non-degraded (Imbeaud et al., 2005). Also, a smear of mRNAs in various molecular weights indicated no signs of degradation of total RNA. At the lower end of the lane, are possibly small RNAs. Mitochondrial and chloroplast ribosomal RNA, which was visible in the mature leaf samples (Figure S-1 on page 121 in "Supplementary Data"), were not positively detectable in the shoot apex or the young leaf sample. The 5S rRNA, tRNA, and other low-molecular-weight RNAs (<200 nucleotides), which make up 15-20% of total RNA, are not retained by RNeasy? column (Cat. #75142; Qiagen, Valencia, CA).
The quality of the amino-modified first-strand cDNA synthesized from total RNA templates was assessed by standard agarose gel electrophoresis (Figure 3-2). On the agarose gel, first-strand cDNA products appeared as a smear with some distinctive bands, indicating the presence of relatively abundant mRNA species at certain sizes. Although the equal amounts of total RNA templates were added in both reactions, the bands of first-strand cDNA derived from the young leaf total RNA appeared brighter on the gel, indicating a higher yield of first-strand cDNA products from the young leaf total RNA than the shoot apex total RNA in the reverse-transcriptase reactions. This result was confirmed when concentrations of the amino-modified cDNA samples were measured by spectrophotometry (Figure 3-3).
3.2 Labeling Efficiency of Amino-Modified cDNA
By comparing the amounts of fluorescently labeled cDNA to total amounts of amino-modified first-strand cDNA, labeling efficiency was assessed (Figure 3-4). As the total yield of amino-modified cDNA derived from shoot apex was lower than that from young leaf, the amounts of fluorescently labeled cDNA derived from shoot apex were also low regardless of which fluor was used. While the labeling efficiencies were significantly different between source tissue types, the difference between the two fluors was negligible.
3.3 Scanned Arrays
The composite images of scanned microarrays were shown in Figure 3-5. The fluorescence intensities of the spots that were hybridized with cDNA targets derived from the shoot apex were low compared to those hybridized with cDNA targets from the young leaf, regardless of the fluorescence dye used. This was because there was a lower amount of fluorescently labeled cDNA from the shoot apex than from the leaf in the hybridization mixture.
3.4 Statistical Microarray Data Analysis
3.4.1 Log Transformation of Intensity Data
Among 12,288 spots on the composite image of each scanned array, intensity data were extracted from 11,255 spots for data analysis using GeneSpring GX10 software, excluding those regarded as missing values (Hovatta et al., 2005). Missing values included 336 empty spots that were marked as 'BLANK' in the GAL file, and 697 spots that was detected as 'bad', which might have contained imperfections and/or artifacts, such as scraches and dust particles. Among the 11,255 readable spots, 2,597 produced signals greater than background levels. The background-corrected mean spot intensity data were transformed to their log2 values and the distribution of spot intensity data became symmetrical. The effects of the log transformation are illustrated in Figure 3-6.
3.4.2 Normalization of Log-transformed Data
Quantile normalization in GeneSpring GX resulted in standard distribution of the log-transformed data values. The data also underwent centralization, in which the distribution was moved so that it was centered over the expected mean, balancing the two channels to remove intensity-dependent variation due to dye-bias and/or tissue-specific labeling efficiency.
The effect of normalization was illustrated in the scatter plots (Figure 3-7) and an MvA plot (Figure 3-8). In the scatter plots, majority of the genes, which had similar expression levels in both tissue types appeared somewhere along the diagonal after normalization. MvA plot is a scatter plot of the difference (M) versus the average (A) of log spot intensity values between two samples:
(EQ. 3-1)M = log2(R/G)
(EQ. 3-2)A = log2(R*G)/2
where R and G represent the fluorescence intensities in the red and green channels, respectively.
MvA plot is generally used to visualize the intensity ratios between red and green fluorescence in the experiments using two-color spotted arrays. It is useful to assess not only the relation between samples, but also quality by making it easy to identify the intensity dependen variation among low intensity spots. A common source of variation due to technical error in microarray data is incorrectly balanced photomultiplier tube (PMT) settings to compensate the differntial exitation properties of the two channels during scanning, which results in a shift of the data from the x-axis (M = 0) of the ideal MvA plot (Petri et al., 2004). This random variation, which was caused by technical errors, was removed after screening the spots based on the statistical test criteria (described in the next section). After normalization, several descriptive statistics of the array data were represented graphically in a box plot, which is also called a box-whisker-plot.(Figure 3-9).
3.4.3 Filtering Differentially Expressed Genes
A Student's t-test was performed to filter the genes that were expressed differentially in the shoot apex and young leaf with a statistical significance at p=0.05. Genes that were expressed in a tissue type two-fold or more than in the other were considered differentially expressed. The results of the statistical test were represented in a 'volcano' plot (upper panel in Figure 3-10) and a scatter plot (lower panel in Figure 3-10), both illustrating the genes satisfying the criteria. Under the given criteria, 174 genes were identified as being statistically differentially expressed two-fold or more when the transcriptomes of the shoot apex and the young leaf of H. helix cv. Goldheart were compared. Among them, 60 genes were preferentially expressed in the shoot apex, while 114 were preferentially expressed in the young leaf. A full list of the differentially expressed genes with expression level and fold ratio is presented in "Supplementary Data" (Table S-1 on page 122).
3.5 Functional Analysis of Microarray Data
3.5.1 GO Slim Functional Classification
Genes that had unique AGI locus identifiers were categorized according to the GO slim to obtain a broad summary of gene types. The GO slim provided only high-level categories and its categorization scheme took all of the associated terms into account, which were often redundant. The results were generally uninformative. The gene counts associated with GO slim terms are based on the biological processes of the gene, the molecular functions of the gene product, and its cellular localization (Table S-2 on page 128 and Figure S-2 on page 130 in "Supplementary Data").
3.5.2 Cluster Analysis
To identify and group the genes that were similarly expressed in each tissue type and infer any biological significance of the groups of genes, hierarchical and K-mean clustering analyses were carried out using GeneSpring GX. The hierarchical clustering dendrograms of the genes that are preferentially expressed in the shoot apex and the young leaf is presented in Figure S-4 on page 134 and Figure S-6 on page 138, respectively, in "Supplementary Data". Also, the genes in each cluster were subjected to classification by GO slim functional categorization as in the previous section ("3.5.1 GO Slim Functional Classification" ), and the results are shown in Figure S-5 on page 136 and Figure S-7 on page 140 in "Supplementary Data".
There was no clear relationship between functional categories and clustering patterns of genes. When other distance methods were applied to the hierarchical clustering along with various parameters, the results were similar except for minor variations in the numbers of genes to fall in each cluster (data not shown). The K-mean clustering with various parameters revealed no significant patterns related to biological function of the genes (data not shown). Although the cluster analyses have been useful in general to identify patterns in gene expression in relation to their biological functions by grouping the similarly expressed genes (Dr?ghici, 2003), their usefulness was unclear in this study, where only two tissue types were compared: the shoot apex and the young leaf.
3.5.3 Manual Categorization of Gene Function
Since the TAIR GO slim scheme did not lead to a meaningful classification for the gene sets screened in this experiment, the genes were classified manually based on the functional categories provided in both TAIR GO slim and TAIR Gene Ontology as well as other protein databases as described in "2.9.2 Functional Annotation" on page 51 in "Materials and Methods". To assign each gene to an appropriate category, the functional categories were grouped into a higher level category, divided into lower level categories, and/or combined into a category of the same level.
Excluding those that were not mapped to AGI (Arabidopsis Genome Initiative) locus identifiers, 163 genes represented by unique locus identifiers were categorized into 24 biological processes including 'Unknown" (Figure 3-11). Most genes that were preferentially expressed in one tissue type or the other were commonly involved in 15 biological processes, including 'Unknown'. Meanwhile, there were 9 biological processes where the genes that were expressed only in one type of tissue were involved (Table 3-2). For example, in the cell cycle control/DNA repair, energy/respiration, carbohydrate metabolism, chlorophyll degradation and photosynthesis dark reaction were preferentially expressed in the young leaf. On the other hand, cytoskeletal organization and hormone metabolism were preferentially expressed in the shoot apex.
3.6 Pathway Analysis
The metabolic pathway analysis revealed some of the biologically meaningful inter-relations among the genes that were statistically differentially expressed in the shoot apex and in the young leaf, respectively. Fifty nine out of 60 the shoot apex gene products and 102 out of 103 young leaf gene products matched to the curated protein entities in the pathway database implemented in GeneSpring GX 10 software (Agilent Technologies Inc., 2008). There were no direct inter-relations among the matched gene products, except for the two gene products from the young leaf, namely EBF2 (EIN3-BINDING F-BOX PROTEIN 2) and SKP1, a component of the Skp1-Cullin-F-box-protein (SCF), which were encoded by AT5G25350 and AT1G75950, respectively (Figure S-8 on page 147). When the relations were expanded so as to include the immediate neighbors of the curated gene products, a total of 30 gene products established one or more new relations: 9 from the shoot (Table 3-3 on page 76) and 21 from the young leaf (Table 3-4 on page 77). The gene products from the shoot apex or the young leaf, which were thought to be involved in the regulatory pathways were discussed in the later chapter.
Diagrams showing the relations among the shoot apex gene products and other pathway entities were shown in Figure S-9 on page 148 in "Supplementary Data". Their cellular localization were also provided in Figure S-10 on page 150 in "Supplementary Data".
3.7 Verification of Expression Level
The expression levels of the genes that were predominantly expressed in either type of tissue were verified using two microarray data exploring tools available from The Bio-Array Resource for Plant Functional Genomics (BAR; http:// www.bar.utoronto.ca/): the electronic-Northerns (e-Northern) with Expression Browser and the Arabidopsis electronic Fluorescent Pictograph (eFP) Browser.
Using the e-Northern with Expression Browser, the expression levels obtained in this study were explored with the gene expression data sets accumulated in the BAR database and public data sets from the AtGenExpress Consortium (Goda et al., 2008; Kilian et al., 2007; Schmid et al., 2005). When the expression levels of the 60 shoot apex genes were explored, 55 genes were mapped in the data sets in the BAR and the AtGenExpress Consortium. Among them, 31 genes were confirmed to be expressed preferentially in the shoot apex (Table S-4 on page 143 in "Supplementary Data"). All of 103 young leaf genes had matches and 41 of them were confirmed to be expressed predominantly in the young leaf (Table S-5 on page 145 in "Supplementary Data"). The Arabidopsis eFP Browser created 'electronic fluorescent pictographic' representations of expression patterns of the genes that were preferentially expressed in either type of tissue based on the AtGenExpress Consortium data (Goda et al., 2008; Kilian et al., 2007; Schmid et al., 2005; Winter et al., 2007). The representative output images from the eFP browser (Figure 3-12 on page 79) show the expression patterns of AT2G22610 from the shoot apex and AT3G62410 from the young leaf in developmental series.