Home > Major Projects > Algal Genomics > Year 2 activities
     
 
AN EST APPROACH TO UNDERSTANDING ENDOSYMBIOTIC GENE TRANSFER
NSF-MCB 02-36631
 

INTRODUCTION: This grant has 4 specific aims that are described below.

Aim 1. To construct individually tagged non-normalized and normalized cDNA libraries from the dinoflagellate alga, Alexandrium tamarense , and the haptophyte alga, Emiliania huxleyi .
Starter libraries will be constructed from exponentially growing uni-algal cultures of Amphidinium and Emiliania (available from the CCMP culture collection). A total of two non-normalized and two normalized libraries will be constructed from algal cDNA. Each library will be tagged with a unique 10 bp identifier, which makes identification of the source organism readily apparent.

 

Aim 2.
To generate 2 serially subtracted normalized libraries increasingly enriched for the mRNAs of the complex frequency class (rare mRNAs). The subtracted libraries will be derived from a complex library mixture comprising the 2 normalized libraries constructed in specific aim 1, by an iterative process that we have developed and named Serial Subtraction of Normalized Libraries .
 

Aim 3.
To submit to GenBank 30,000 3' ESTs derived from the starting normalized and the serially subtracted libraries constructed in specific aims 1 and 2, and to conduct clustering analysis of the EST data to identify up to 20,000 unique ESTs (about 10,000 from each species [based on an expected 66% discovery rate]). At a minimum, we expect to isolate 12,000 unique ESTs from the 30,000 sequences. These predictions are realistic goals based on our extensive experience utilizing serial subtraction for a rat EST discovery project (see http://ratest.eng.uiowa.edu/).
 

Aim 4.
To make all sequence data, clones, and libraries promptly available for wide use in the scientific community. All validated sequences will be annotated and submitted to GenBank (dbEST) on a weekly basis.
 

Project activities and findings in Year 2

 
Alexandrium tamarense cDNA Library Construction.
We have finished sequencing the non-normalized (starter) and normalized cDNA libraries from the dinoflagellate Alexandrium tamarense . Thus far we have 3'-sequence from 6,480 unique clusters out of 10,632 completed sequences. This gives a novelty rate of 60.95%. We are in the process of subtracting this library and will continue sequencing until the gene discovery rate falls to about 50%. We hope to generate ca. 8,000 - 9,000 unique clusters from this species by the end of Fall 2004. A web site has been set up to publish this information and to facilitate data exchange among the project members (http://genome.uiowa.edu/projects/dinoflagellate/ ). The most current EST cluster report is shown below (Fig. 1A), as is the gene discovery (i.e., novelty) rate (Fig. 1B). There are several highly represented cDNA classes (e.g., cluster sizes of 29 and 45 are luciferin-binding protein and major basic nuclear protein, respectively) and most cDNAs (4,498) are represented by a single transcript. We are currently at a gene discovery rate of 60.95% for the normalized library, GC1, and are in the process sequencing 2,000 more randomly chosen ESTs from this clone pool. The starter library, GC0, stands at a gene discovery rate of 78.05% and we are sequencing several hundred more clones from this pool. Sequencing of the subtracted library is expected to result in the final unique cDNA set for A. tamarense . We are not yet certain whether that final count will be 10,000 ESTs or somewhat less.
 
-------------------------------------
A
B
Cluster    

Figure 1. Summary of the EST sequencing data for Alexandrium tamarense . A) Cluster report showing the frequency of the different cluster sizes found in the random sequencing of the non-normalized and normalized cDNA libraries. Note that the highest frequency of cDNAs (4,498) is in the class of unique sequences. B) Novelty rate for the normalized cDNA library. The blue, broken line represents the discovery rate for this library.
Size
Frequency
 
1
--
4498
   
2
--
1187
   
3
--
398
   
4
--
161
   
5
--
74
   
6
--
43
   
7
--
29
   
8
--
18
   
9
--
14
   
10
--
16
   
11
--
10
   
12
--
6
   
13
--
4
   
14
--
5
   
15
--
2
   
16
--
4
   
17
--
1
   
18
--
1
   
19
--
2
   
22
--
1
   
23
--
2
  peridinin-chlorophyll a protein
24
--
2
  atpH, unknown
29
--
1
  luciferin-binding protein
45
--
1
  major basic nuclear protein
-------------------------------------  
Total # sequences = 10632  
Total # redundant = 4152  
Total # clusters = 6480  
With respect to this cluster file only:  
Novelty = 60.95%  
-------------------------------------  


Analysis of the Dinoflagellate EST Set.
The 6,480 unique cDNA sequences from A. tamarense have provided remarkable new insights into dinoflagellate evolution and endosymbiotic gene transfer. The most significant of these have been published in a manuscript in Current Biology (see below). In addition, a detailed characterization of the existing EST set is being prepared for submission to Genome Biology within one month and a commissioned review on dinoflagellates that focuses on their genome(s) has been accepted for publication in a special issue of the American Journal of Botany. We have, in the past 6 months, placed great emphasis on publishing our analyses of the dinoflagellate ESTs because of high competition in this area and this Summer and Fall will complete the sequencing portion of our grant. This strategy has delayed the raw sequencing work but has significantly improved our understanding of dinoflagellates and has made possible important new collaborations and outreach on functional genomics with Alexandrium . To ensure, however, that we fully reach our aims, we will likely request a 1-year no-cost extension of the grant. We are certain that the sequencing of the subtracted Alexandrium library will be completed by the end of Fall 2004.
 


Hackett, J.D., H.S. Yoon, M.B. Soares, M.F. Bonaldo, T.L. Casavant, T.E. Scheetz, T. Nosenko, and D. Bhattacharya. 2004. Migration of the plastid genome to the nucleus in a peridinin dinoflagellate. Curr. Biol. 14:213-218.

Hackett, J.D., H.S. Yoon, M.B. Soares, M.F. Bonaldo, T.L. Casavant, T.E. Scheetz, and D. Bhattacharya. EST analysis of the toxic dinoflagellate Alexandrium tamarense . Genome.

Hackett, J.D., D.M. Anderson, D. Erdner, and D. Bhattacharya. Accepted. Dinoflagellates: a remarkable evolutionary experiment. Am. J. Bot. Special Issue, "The Plant Tree of Life" (eds. J.D. Palmer, M. Chase, and D. Soltis).

 
   
Abstract (Current Biology )
Dinoflagellate algae are important primary producers and of significant ecological and economic impact because of their ability to form "red tides". They are also models for evolutionary research because of an unparalleled ability to capture photosynthetic organelles (plastids) through endosymbiosis. The nature and extent of the plastid genome in the dominant perdinin-containing dinoflagellates remain, however, two of the most intriguing issues in plastid evolution. The plastid genome in these taxa is reduced to single-gene minicircles encoding an incomplete (until now 15) set of plastid proteins. The location of the remaining photosynthetic genes is unknown. We generated a data set of 6,480 unique expressed sequence tags (ESTs) from the toxic dinoflagellate Alexandrium tamarense to find the missing plastid genes and to understand the impact of endosymbiosis on genome evolution. Here we identify 48 of the non-minicircle-encoded photosynthetic genes in the nuclear genome of A. tamarense , accounting for the majority of the photosystem. Fifteen genes that are always found on the plastid genome of other algae and plants have been transferred to the nucleus in A. tamarense . The plastid-targeted genes have red and green (see Fig. 2) algal origins. These results highlight the unique position of dinoflagellates as the champions of plastid gene transfer to the nucleus among photosynthetic eukaryotes.
Figure 2. Phylogenetic analysis of the green algal mitochondrial cox2 in dinoflagellates that likely originated through lateral gene transfer. The tree of highest likelihood identified in the posterior distribution of post burn-in trees in the Bayesian inference of cox2 is shown with the results of neighbor joining (500 replicates) and maximum parsimony (2000 replicates) bootstrap analyses shown above and below the branches, respectively. Thick branches indicate > 95% posterior probability (from Bayesian analysis) for groups to the right. Taxon abbreviations are the same as in Figure 2, with the addition of chlorophycean green algae (=Green chl.), non-chlorophycean green algae and land plants (=Green non-chl.), and Acanthamoebidae (=Acanth.).

Background (Genome Biology ) The dinoflagellates are a diverse group of protists that play an important role in marine ecosystems and have a significant impact on costal waters through the production of toxic algal blooms, or "red tides." These organisms also have unique aspects of nuclear biology and a complicated plastid evolutionary history. Until recently, little genetic information was available for these organisms. We constructed and sequenced clones from non-normalized and normalized cDNA libraries from the toxic dinoflagellate Alexandrium tamarense to gain insights into the nuclear gene content of dinoflagellates.  
Results
We generated an EST database of 6,480 unique tags from A. tamarense . We were able to putatively annotate approximately 20% of the ESTs using Blast searches against GenBank. We have identified several putative dinoflagellate-specific mRNAs, including one potential plastid protein. Analyses of these ESTs shows that dinoflagellate genes, similar to other eukaryotes, have a high G+C-content that is reflected in the amino acid codon usage. Highly represented transcripts in our library include those encoding histone-like proteins and luciferin-like proteins. Several genes are present in gene families that encode virtually identical proteins.
 
Conclusion
This collection of ESTs is the most extensive genomic resource for a toxic dinoflagellate species to date and provides a glimpse into the genome of this toxic dinoflagellate. These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for understanding the method of toxin production in this species. It also appears that we have not yet exhausted the gene discovery potential of this library.


 
Data Release.
The existing EST data from A. tamarense has been clustered and released to dbEST
(http://www.ncbi.nlm.nih.gov/dbEST/ ) in GenBank.


 
Emiliania huxleyi cDNA Library Construction.
We have initiated sequencing of the non-normalized (starter) and normalized cDNA libraries from the haptophyte Emiliania huxleyi . Thus far we have 3'-sequence from 2,188 unique clusters out of 3,749 completed sequences. This gives a novelty rate of 58.36%. We are in the process of sequencing several thousand clones from both the starter and normalized libraries and will generate up to 6,000 unique ESTs prior to subtraction in Fall 2004. We hope to generate ca. 8,000 - 9,000 unique clusters from this species by the end of Fall 2004 or early Winter 2005 and therefore will request a one-year no-cost extension to this grant. A web site has been set up to publish this information and to facilitate data exchange among the project members (http://genome.uiowa.edu/projects/dinoflagellate/ ). The most current EST cluster report is shown below (Fig. 3A), as is the gene discovery (i.e., novelty) rate (Fig. 3B).
 
-------------------------------------  
A
B
Cluster    

Figure 3. Summary of the EST sequencing data for Emiliania huxleyi . A) Cluster report showing the frequency of the different cluster sizes found in the random sequencing of the non-normalized (HG0) and normalized (HG1) cDNA libraries. B) Novelty rate for the normalized cDNA library. The blue, broken line represents the discovery rate for this library.
Size
Frequency
 
1
--
1471
   
2
--
410
   
3
--
156
   
4
--
65
   
5
--
31
   
6
--
21
   
7
--
9
   
8
--
5
   
9
--
2
   
10
--
3
   
11
--
3
   
12
--
2
   
13
--
1
   
16
--
1
   
17
--
2
   
18
--
1
   
20
--
1
   
25
--
1
   
26
--
1
   
43
--
1
   
46
--
1
   
-------------------------------------  
Total # sequences = 3749  
Total # redundant = 1561  
Total # clusters = 2188  
With respect to this cluster file only:  
Novelty = 58.36%  
-------------------------------------  

Data Release.
The existing EST data from E. huxleyi has been clustered and will be released to dbEST (http://www.ncbi.nlm.nih.gov/dbEST/) in GenBank at the end of Fall 2004 after the first genome-level publication of this data set has been submitted for review.