Home > Major Projects > Algal Genomics > Year 1 activities
     
 
AN EST APPROACH TO UNDERSTANDING ENDOSYMBIOTIC GENE TRANSFER
NSF-MCB 02-36631
 

INTRODUCTION: This grant has 4 specific aims that are described below.

Aim 1. To construct individually tagged non-normalized and normalized cDNA libraries from the dinoflagellate alga, Alexandrium tamarense, and the haptophyte alga, Emiliania huxleyi.
Starter libraries will be constructed from exponentially growing uni-algal cultures of Amphidinium and Emiliania (available from the CCMP culture collection). A total of two non-normalized and two normalized libraries will be constructed from algal cDNA. Each library will be tagged with a unique 10 bp identifier, which makes identification of the source organism readily apparent.
 

Aim 2. To generate 2 serially subtracted normalized libraries increasingly enriched for the mRNAs of the complex frequency class (rare mRNAs). The subtracted libraries will be derived from a complex library mixture comprising the 2 normalized libraries constructed in specific aim 1, by an iterative process that we have developed and named Serial Subtraction of Normalized Libraries (Bonaldo et al. 1996, Soares 1997).
 

Aim 3. To submit to GenBank 30,000 3’ ESTs derived from the starting normalized and the serially subtracted libraries constructed in specific aims 1 and 2, and to conduct clustering analysis of the EST data to identify up to 20,000 unique ESTs (about 10,000 from each species [based on an expected 66% discovery rate]). At a minimum, we expect to isolate 12,000 unique ESTs from the 30,000 sequences. These predictions are realistic goals based on our extensive experience utilizing serial subtraction for a rat EST discovery project (see http://ratest.eng.uiowa.edu/).
 

Aim 4. To make all sequence data, clones, and libraries promptly available for wide use in the scientific community. All validated sequences will be annotated and submitted to GenBank (dbEST) on a weekly basis.
 


Project activities and findings – Year 1
 

cDNA Library Construction.

We have generated a non-normalized (starter [GC0, see Fig. 1B]) and a normalized (GC1) cDNA library from the dinoflagellate Alexandrium tamarense. Sequencing of 3’-ends from 5013 clones has resulted in the identification of 3628 unique cDNAs. A web site has been set up to publish this information and to facilitate data exchange among the project members (http://genome.uiowa.edu/projects/dinoflagellate/). The most current EST cluster report is shown below (Fig. 1A), as is the gene discovery (i.e., novelty) rate (Fig. 1B). As is apparent, A. tamarense does not have a highly complex genome. There are very few highly represented cDNA classes (e.g., cluster sizes of 22 and 25 are peridinin-binding protein and basic nuclear protein, respectively) and most cDNAs are represented by a single transcript. We are currently at a gene discovery rate of 75.63% for the normalized library, GC1, and are in the process sequencing 3,200 more randomly chosen ESTs from this clone pool. The starter library, GC0, stands at a gene discovery rate of 78.05% and we will sequence several hundred more clones from this pool. Given our present results, we expect to have about 2000 more unique ESTs by mid-May. At that point (ca. total of 5500 – 6000 unique ESTs), we will most likely do a subtraction of the dinoflagellate GC1 library using as driver the complete uni-gene cDNA set that we have identified until then. Sequencing of the subtracted library is expected to result in the final unique cDNA set for A. tamarense. We are not certain whether that final count will be 10,000 ESTs or substantially less. If this dinoflagellate is a typical protist (as now appears to be the case) then and it may not be realistic to expect 10,000 genes in this species. In any case, the sequencing of the 3,200 clones that is now in progress will give us important insights into how many more unique ESTs we can expect to find, based on the behavior of the novelty rate and the observation that most cDNAs are unique sequences (Fig. 1A).
We will start work on the Emiliania huxleyi cDNA libraries in June 2003. We have purchased from CCMP a frozen pellet derived from a 15 L culture of a calcifying strain of E. huxleyi. We plan to have a significant number of sequences completed from this library by the end of Summer or Fall 2003 (e.g., ca. 3000 – 5000 ESTs).

 

Cluster report
A
B
Cluster    
Size
Frequency
 
1
--
2849
   
2
--
537
   
3
--
117
   
4
--
55
   
5
--
27
   
6
--
15
   
7
--
7
   
8
--
3
   
9
--
6
   
10
--
2
   
11
--
3
   
12
--
1
   
13
--
1
   
14
--
3
   
22
--
1
   
25
--
1
   
Total # sequences = 5013  
Total # redundant = 1385  
Total # clusters = 3628  

Figure 1. Summary of the EST sequencing data for Alexandrium tamarense. A) Cluster report showing the frequency of the different cluster sizes found in the random sequencing of the non-normalized and normalized cDNA libraries. Note that the highest frequency of cDNAs (2849) is in the class of unique sequences. B) Novelty rate for the normalized cDNA library. The blue, broken line represents the discovery rate for this library.


Analysis of the Dinoflagellate EST Set.
The 3628 unique cDNA sequences from A. tamarense have already provided remarkable new insights into dinoflagellate evolution and endosymbiotic gene transfer. The most significant are the finding of a split cox2 gene in A. tamarense that is typical of apicomplexans and chlorophycean green algae (Funes et al. 2002) and provides strong evidence for a green algal gene transfer (Fig. 2A). Phylogenetic analysis of the nuclear-encoded plastid-targeted tufA and other genes in A. tamarense is weakly consistent with the presence of “green” genes in dinoflagellates (Fig. 2A). In addition, we have found many of the “missing” dinoflagellate plastid genes in the nuclear genome of A. tamarense. Thirty-eight plastid-targeted (i.e., containing a N-terminal targeting sequence) nuclear-encoded genes have been found in the A. tamarense EST data set. These are the non-minicircle-encoded photosynthetic gees that we postulated in the grant to have been transferred to the nucleus of dinoflagellates. Our results confirm this hypothesis. This work is provisionally accepted in Current Biology (see below). A comparative genomics analysis of the now 6,480 unique ESTs are being prepared for submission to Genome Research by early May 2004.
 


   
Figure 2. Neighbor-joining trees of COXII (A) and tufA (B) using amino acid distances calculated with the JTT + Γ model. The results of 2000 bootstrap replicates using the JTT + Γ model with neighbor-joining and 100 bootstrap replicates using this model and the protein maximum likelihood method are shown above and below the branches, respectively (only bootstrap values >50% are indicated). The dashes indicate groups that were not present in the maximum likelihood bootstrap consensus tree. The thick branches are supported with >95% Bayesian posterior probability (WAG + Γ model). Members of the green algae and land plants are shown in green text, whereas apicomplexans are in red, the dinoflagellate is in brown, the outgroup species are in black, and all other taxa are in blue.


  Migration of the Plastid Genome to the Nucleus in a Peridinin Dinoflagellate  
  Jeremiah D. Hackett1, Hwan Su Yoon1, M. Bento Soares2,3, Maria F. Bonaldo2, Thomas L. Casavant4, Todd E. Scheetz5, Tetyana Nosenko1, and Debashish Bhattacharya1,*  
  1Department of Biological Sciences and Center for Comparative Genomics, 2Department of Pediatrics, 3Departments of Biochemistry, Orthopaedics, Physiology, and Biophysics, 4Department of Electrical and Computer Engineering, 5Department of Ophthalmology and Center for Bioinformatics and Computational Biology.  
  The University of Iowa, Iowa City, Iowa 52242, United States.
*Author for correspondence
 
 
Abstract: Dinoflagellate algae are important primary producers and of significant ecological and economic impact because of their ability to form "red tides" [1]. They are also models for evolutionary research because of an unparalleled ability to capture photosynthetic organelles (plastids) through endosymbiosis [2]. The location and extent of the plastid genome in the dominant perdinin-containing dinoflagellates remain, however, two of the most intriguing issues in plastid evolution. The plastid genome in these taxa is reduced to single-gene minicircles [3, 4] encoding an incomplete (until now 15) set of plastid proteins. The location of the remaining photosynthetic genes is unknown. We generated a data set of 6,480 unique expressed sequence tags (ESTs) from the toxic dinoflagellate Alexandrium tamarense (for details, see Experimental Procedures in the Supplemental Data) to find the missing plastid genes and to understand the impact of endosymbiosis on genome evolution. Here we identify 48 of the non-minicircle-encoded photosynthetic genes in the nuclear genome of A. tamarense, accounting for the majority of the photosystem. Fifteen genes that are always found on the plastid genome of other algae and plants have been transferred to the nucleus in A. tamarense. The plastid-targeted genes have red and green algal origins. These results highlight the unique position of dinoflagellates as the champions of plastid gene transfer to the nucleus among photosynthetic eukaryotes.
 

Data Release.
The EST data from A. tamarense has been clustered and is in a form that can be released. It is being released at a rate of 500 ESTs per week until all have been made public.
 

Related Projects.
In related work, we have started a collaborative project with computer scientist Sriram Pemmaraju at the Department of Computer Science at the University of Iowa to develop a program for detecting gene transfer in genomic data sets. The project is led by grant-supported graduate student Shenglan Li in the PI’s labwho has a BS in computer science from the University of Iowa. The goal of this program is to identify a chosen set of homologous gene sequences (e.g., ATP synthase genes) among different genomic data sets, align them, and conduct rigorous phylogenetic analyses to test for departure from the null expectation of vertical evolution of the genes among the species under study. The program (to run under Windows) will incorporate the most modern and freely available programs for doing sequence homology searches (FASTA, BLAST), sequence alignment (CLUSTALX), phylogenetic analysis (Phylip, MrBayes), and will integrate a tree comparison metric (e.g., Robinson-Foulds) to assess the probability that particular query genes qualify for endosymbiotic gene transfer. The novelty of our approach will be to automate various decisions made by biologists working manually with genomic data, thereby substantially speeding up the process of identifying horizontal gene transfer candidates. This will require that the program be trained to deal with the complexity of biological data (e.g., duplicated genes, introns in coding regions) and to make intelligent decisions to filter out spurious information such as alignments misled by artifactual sequence matches. The program will substantially aid our genomics program and form the framework for future grant applications to do EST sequencing and analysis.
 

References.
Bonaldo MF, Lennon G, Soares MB (1996) Normalization and subtraction: two approaches to
  facilitate gene discovery. Genome Research 6: 791-806
Funes S, Davidson E, Reyes-Prieto A, Magallón S, Herion P, King MP, González-Halphen D
  (2002) A green algal apicoplast ancestor. Science 298: 2155
Kohler S, Delwiche CF, Denny PW, Tilney LG, Webster P, Wilson RJ, Palmer JD, Roos DS
  (1997) A plastid of probable green algal origin in Apicomplexan parasites. Science 275:
  1485–1489
Palmer JD (2003) The symbiotic birth and spread of plastids: how many times and whodunit?
  Journal of Phycology 39: 4–12
Soares MB (1997) Identification and cloning of differentially expressed genes. Current Opinion
  in Biotechnology 8: 542-546

Algal Genomics: Year one activities Year two activitiesStudy organismslink to EST site