| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Previous Article | Next Article ![]()
Eukaryotic Cell, February 2002, p. 44-55, Vol. 1, No. 1
1535-9778/02/$04.00+0 DOI: 10.1128/EC.01.1.44-55.2002
Copyright © 2002, American Society for Microbiology. All Rights Reserved.
Department of Biological Sciences, Delaware State University, Dover, Delaware 19901,1 Laboratory of Gene Regulation and Development, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 208922
Received 26 September 2001/ Accepted 9 November 2001
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
A wealth of information has been generated that documents the biochemical behavior of the INs of retroviruses and LTR retrotransposons under in vitro conditions (1). However, much less is known about the integration sites that LTR retroelements choose when influenced by the full spectrum of host factors. Ultimately, the impact of host factors on integration may best be observed in vivo. The selection of insertion sites in the genome of the host is of the highest importance. The disruption of coding sequences in the host genome can reduce the fitness of the host and, as a result, lower the ability of the transposon to propagate. It is therefore critical that a balance be struck between the fitness of the host and the ability of the transposon to integrate into the host genome.
The integration of the Ty elements into the genome of Saccharomyces cerevisiae serves as a model for how LTR elements populate a host genome without disrupting coding sequences. Perhaps the most specific targeting mechanism is that of Ty3. This element inserts one to four nucleotides upstream of polymerase III (pol III) promoter initiation sites, such as those responsible for tRNA transcription (11). This strategy allows Ty3 to amplify its copy number without risk of damaging coding sequences. In recent work, this mechanism was attributed to interactions with transcription factor IIIB (22, 41). In similar types of analyses, Ty1 was found to integrate within a window of 75 to 700 nucleotides upstream of the start sites of pol III promoters (14, 19). Here, too, this strategy avoids the disruption of coding sequences, since this integration window is gene poor (E. C. Bolton and J. D. Boeke, unpublished data). It is interesting that although Ty1 and Ty3 are unrelated transposons, they have converged on the same type of target selection. On the other hand, both Ty1 and Ty5 are members of the copia family, yet they use very different targeting mechanisms. Ty5 specifically inserts into regions of silent chromatin, such as telomeres and the silent sequences of the mating type cassette (43, 44). This targeting behavior is due to the ability of the IN to recognize the SIR complex of silent chromatin (42). The insertion of Ty5 into regions of silent chromatin is yet another effective method for avoiding the disruption of important host genes.
Tf1 is an active LTR retrotransposon found in the genome of the fission yeast Schizosaccharomyces pombe. Tf1 contains a single open reading frame encoding 1,331 amino acids that constitute the capsid, PR, RT, and IN proteins (3, 28, 29). These proteins are functional and are required for efficient transposition in vivo (2, 25-27). Analysis of the conserved residues in RT has demonstrated that Tf1 belongs to the gypsy group of LTR retroelements, as does Ty3 of S. cerevisiae (15, 39, 40). The similarity of Tf1 to Ty3 suggests the interesting possibility that Tf1 may also integrate into specific types of target sites.
The availability of a transposition system in S. pombe provides the unique opportunity to compare the interactions between LTR retrotransposons and the host genome in two yeasts that diverged 109 years ago (38). This is a particularly important comparison, because much of what is known about the selection of target sites by LTR retroelements is derived from studies of S. cerevisiae. An understanding of target selection in S. pombe will reveal whether the types of target selection in S. cerevisiae are general. However, not much is known about the target site preferences of Tf1. What is known is the result of an analysis of 27 insertion events that were identified by screening for cells of S. pombe that become resistant to G418 due to the insertion of Tf1 tagged with neo (Tf1-neo) (6). Behrens et al. (6) reported an interesting pattern of integration into intergenic sequences near the 5" ends of genes. Although this is an intriguing observation, the technique used to isolate the insertion events did not guard against the possibility that transposition might specifically target regions of the genome that would block the expression of neo. This situation would bias the detection of target sites. In fact, what is known about the insertion preferences of LTR retrotransposons is the result of studies that rely on identification within the host genome of the insertion of transposons tagged with genetic markers. Should transposition occur in a class of target sites that block the transcription of the marker gene, these sites would go undetected.
We report a genome-wide study of Tf1 transposition in vivo that was designed to avoid any biases due to the transcription environment of the target sites. Our results confirmed the preference of Tf1 integration for intergenic sequences that was reported by Behrens et al. (6). Interestingly, 8 of the 51 insertion sites were isolated multiple times from genetically independent cultures, suggesting that specific sites in intergenic regions are targeted by Tf1. Perhaps the most significant result was that the analysis of our data and the reexamination of those reported by Behrens et al. (6) led to the surprising observation that Tf1 integration has a significant preference for integration into chromosome 3.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Construction of strains and plasmids. The yeast and bacterial strains and plasmids used in this study are described in Table 1. The oligonucleotides used are shown in Table 2. Yeast strain YHL6488, derived from strainYHL5661, is a diploid strain carrying the assay plasmid pTS1559-9. Plasmid pHL411, containing Tf1 under the control of an nmt1 promoter, was constructed as described previously (28) and was the backbone of all assay plasmids. PCR primers HL423 and HL424 produced an 850-bp product that contained the bacterial p15A origin of replication (ori) amplified from pACYC184 (36). The 850-bp product was digested with BamHI/BglII and ligated into a unique BglII site of pHL411 in a noncoding region of Tf1 in both orientations, generating pHL1555-7 and pHL1555-4. pHL1558-14 and pHL1558-10 were generated by insertion of a 1-kb neo-containing BamHI fragment upstream of the ori gene into the BglII site of pHL1555-4 in the direct and reverse orientations, respectively. pHL1559-14 and pHL1559-9 were generated by insertion of a 1-kb neo-containing BamHI fragment downstream of the ori gene into the BglII site of pHL1555-7 in the direct and reverse orientations, respectively.
|
|
Isolation of DNA with insertion sites. Large patches of cells (YHL6488) were grown for 4 days on agar containing EMM and dropout mix but lacking vitamin B1. These patches were then replica printed onto plates containing EMM and 5-FOA. After 2 days of growth on EMM-5-FOA plates, the cells were scraped off and inoculated into separate 1-liter cultures of 5-FOA-EMM containing uracil and leucine but lacking vitamin B1. These cultures were allowed to reach an optical density at 600 nm of 2.0. Cells were collected at 4,000 rpm for 10 min and resuspended in 0.1 M NaCl. After pelleting, cells were resuspended in 20 ml of solution I (1 M sorbitol, 50 mM Na2PO4 [pH 7.5]). Beta-mercaptoethanol (14 mM; Sigma) and 3 mg of Zymolyase (100T; ICN) per g of cells were added and incubated at 30°C for 2 h. Spheroplasts were collected for 10 min at 3,000 rpm and gently resuspended in 20 ml of solution II (50 mM Tris-HCl [pH 8.0], 50 mM EDTA [pH 8.0], 1% Sarkosyl). An RNase A solution (10 mg/ml; 20 µl) was added and incubated for 1 h at 37°C. A proteinase K solution (10 mg/ml; 500 µl) was added and incubated for 1 h at 65°C. Extensive extractions (two chloroform-isoamyl alcohol [24:1], one phenol-chloroform-isoamyl alcohol, and one chloroform-isoamyl alcohol) were gently performed. Genomic DNA was precipitated by using 2x ethanol and was resuspended in 1 mM EDTA-10 mM Tris-HCl buffer.
Gel fractionation of genomic DNA with transposon insertions. The genomic DNA (10 µg) was digested with either BamHI/BglII, SpeI/NheI, or BglII and gel fractionated. Digested DNA was allowed to migrate through a 0.5% low-melting-temperature agarose gel in Tris-acetate-EDTA at 4°C for 16 h at 22 V. Fragments within the 7- to 23-kb size range were excised and purified by using beta-agarase (New England Biolabs). Beta-agarase-digested fragments were allowed to migrate through a second 0.5% low-melting-temperature gel, and fragments within the 7- to 23-kb size range were excised, digested with beta-agarase, and extracted by using phenol-chloroform-isoamyl alcohol (25:24:1).
Recovery and DNA sequencing of Tf1 targets.
The restriction-digested and gel-purified DNA (50 ng) was allowed to self-ligate for 16 h at 12°C and was transformed into competent DH10B bacterial cells (17) by electroporation with a BRL electroporation apparatus. Cells were electroporated by using the parameters 1.75 V, 25 µF, and 100
. Transformed cells were selected for kanamycin resistance. Because of the presence of endogenous Tf2 elements and the probability of recombination events between Tf1 and Tf2, kanamycin-resistant colonies were screened by bacterial colony hybridization with an EcoRI fragment that was specific for the Gag sequence of Tf1 (29). Clones were analyzed by digestion with SnaBI and EcoRI to identify intact Tf1 elements and any deletions, rearrangements, or recombination events. Primers JB54.3 (anneals just 3" to the 5" LTR) and HL273 (anneals just 5" to the 3" LTR) were used to sequence the genomic DNA flanking Tf1-ori/neo. The flanking sequence was then used to search the Sanger Centre S. pombe sequence database to identify the positions of the insertions. The sequence from the database was compared to that of the isolated DNA to determine whether the introduction of Tf1-ori/neo was due to true integration events or recombination into preexisting transposon sequences. Flanking sequence information was also used to design primers for PCR amplification of the empty target site in parental strain YHL5661.
PCR. To avoid problems related to inadvertent mutations that could arise during PCR, assay plasmids were constructed in duplicate from independent PCRs, and the properties of each plasmid were determined in parallel. The high-fidelity enzyme Pfu DNA polymerase (Stratagene) was used for all cloning PCRs. Taq DNA polymerase (Perkin-Elmer Cetus) was used for all other PCRs. The Pfu cycling conditions were 95°C for 5 min with polymerase added; 95°C for 1 min, 54°C for 2 min, and 72°C for 2 min (30 cycles); and an extension cycle at 72°C for 10 min. PCR products were analyzed on a 1% agarose gel in Tris-borate-EDTA.
| RESULTS |
|---|
|
|
|---|
The existing assay for Tf1 transposition is based on resistance to G418 that is due to the integration of Tf1-neo. Transposition is induced by the overexpression of Tf1 mRNA from a heterologous promoter, nmt1, located on a high-copy-number plasmid (4, 13). Transposition events were generated in this study by using the same system of overexpressing Tf1 mRNA. However, G418 was not used to select for strains of S. pombe that contained transposed copies of Tf1-neo. Instead, bacterial p15A ori was inserted into Tf1-neo adjacent to neo as shown in Fig. 1A. Insertion sites were identified from pools of genomic DNA extracted from cultures of S. pombe that were induced for transposition. To isolate the sites of insertion, the pools of DNA were digested with restriction enzymes and ligated into circles, and the DNA was transformed into bacteria. Colonies that were resistant to kanamycin contained transposed copies of Tf1-ori/neo flanked by the portions of the S. pombe genome that were disrupted by the insertion.
|
Isolation of target sites for Tf1-ori/neo in the genome of S. pombe. To avoid a bias for insertion into nonessential genes of S. pombe, all transposition events were carried out with YHL6488, a strain of S. pombe that was a stable diploid. Another important feature of this strain was that it contained no endogenous copies of Tf1. The strategy used to generate a collection of target sites relied on screening pools of DNA from cells induced for the transposition of Tf1-ori/neo. The transcription of Tf1-ori/neo was induced in five independent patches of S. pombe. The patches were replica printed onto agar containing 5-FOA, and cells from these plates were grown in liquid medium for two to four divisions. The cells were subsequently extracted to isolate genomic DNA. The pools of DNA were digested with either BamHI/BglII, SpeI/NheI, or BglII alone. Different sets of restriction enzymes were chosen to avoid favoring the isolation of target sites that were fortuitously located near a specific restriction site. The restricted DNAs were subjected to two rounds of agarose gel electrophoresis, and all fragments within a range of 7 to 30 kb were excised and circularized with T4 DNA ligase. The batches of DNA were separately transformed into DH10B, a strain of Escherichia coli that was genetically altered to take up large fragments of DNA. The transformants were selected on medium containing kanamycin to enrich for circularized copies of Tf1-ori/neo containing target sequences. Kanamycin-resistant transformants were screened by colony hybridization by using a probe specific for Tf1 Gag. The initial step in the characterization of each transformant identified by colony hybridization was to determine the restriction patterns produced by SnaBI and EcoRI digestion. The appearance of 3.0- and 3.2-kb SnaBI fragments indicated the integration of an intact copy of Tf1.
The sequences flanking each end of Tf1-ori/neo in the plasmids isolated from the bacteria were determined, and these data were used to identify the positions of the insertions within the genome of S. pombe. The sequences flanking the insertions were also used to evaluate whether the Tf1-ori/neo elements were introduced into the DNA of S. pombe as the result of true transposition events. The sequences flanking Tf1-ori/neo in the plasmids were compared to their counterpart sequences from the Sanger Centre genome database. In all, we found 51 plasmids that resulted from the simple insertion of Tf1-ori/neo and the duplication of five nucleotides of the target site. The five nucleotide duplications are the result of IN-mediated cleavages and demonstrated that the introduction of Tf1-ori /neo into the S. pombe sequence was the result of true transposition events (27).
Integration sites for Tf1-ori/neo. The 51 insertions in Table 3 are listed with their coordinates relative to those of cosmids sequenced for the S. pombe genome project. The coordinates were translated into the chromosome coordinates assigned by the Sanger Centre for the "publication-freeze" version of the genome data. Figure 2 is a plot of the positions of the insertions throughout the genome. Each of the three chromosomes received substantial numbers of insertions, and the positions of the target sites were distributed throughout their lengths. The one exception is that no insertions occurred in the 1.2 Mb of ribosomal DNA (rDNA) repeats located on either end of chromosome 3 (not shown in Fig. 2).
|
|
We examined whether the integration sites that we isolated multiple times were in regions more likely to serve as insertion sites than other locations of the genome. One way in which we did this was to compare the positions of the endogenous Tf2 LTR retrotransposons to the sites that we isolated multiple times (Fig. 2). Since Tf2 is closely related to Tf1 and encodes the same IN, the positions of Tf2 elements in the genome are likely to represent sites that are active for Tf1 insertion. Although the integration sites were not immediately adjacent to copies of Tf2, we found that of the eight sites, one was 21 kb (pTS372) and another was 10.6 kb (pTS747) from endogenous copies of Tf2. Another indication that the sites with multiple insertions represent regions with high transposition potential was that seven of the eight were within 76 kb of another site of insertion that was described in this study and in another report (6) (Fig. 2). The average distance from the duplicate sites to the nearest neighboring sites was 54 kb. For these seven sites, the distances from adjacent insertions were 25.3 kb (pTS372), 10.6 kb (pTS747), 29.2 kb (pTS512), 47.9 kb (pTS116), 4.2 kb (pTS118), 9.7 kb (pTS311), and 75.9 kb (pTS910). Most of these distances were significantly shorter than 98 kb, the average of the distances between all the insertions and their nearest neighbors. This analysis suggests that the integration sites that we isolated multiple times were in domains of the genome that were more likely to serve as insertion sites than other regions.
An analysis of all the insertion sites that we isolated revealed a strong bias for insertion into intergenic regions. Although the coding sequence of S. pombe represents 60.2% of the genome (European Consortium and Sanger Centre, unpublished data), only 1 of the 51 insertions disrupted a coding sequence, as indicated by the data from the Sanger Centre.
An analysis of the intergenic regions disrupted by the insertions revealed a strong preference for sequences between genes that are transcribed in either divergent or tandem directions. While 18 insertions occurred between divergent genes and 32 occurred between tandem genes, none of the insertions occurred between genes transcribed in convergent directions. Given that in the genome of S. pombe 1,299 gene pairs are divergent, 1,302 are convergent, and 2,289 are tandem (European Consortium and Sanger Centre, unpublished), we predicted that based on just these ratios, 13.3 of our inserts should be in divergent regions, 13.8 should be in convergent regions, and 24 should be in tandem regions. The lack of any insertion between a convergent pair of genes was not expected. However, these calculations do not reflect the fact that the average regions between divergent and tandem genes are larger than the average space between convergent genes. If we assume that the average size of the region between divergent genes is 1.34 kb and that the average sizes between tandem and convergent genes are 0.97 and 0.56 kb, respectively, then unbiased insertion into intergenic regions would be expected to produce 18.9 insertions between divergent pairs, 24.0 between tandem pairs, and 7.6 between convergent pairs for the 51 events. Since no insertions were found between convergent genes, the insertion of Tf1-ori/neo demonstrated a strong bias against this type of intergenic region.
We examined whether the insertion of Tf1-ori/neo occurred at any specific positions within the intergenic regions. To test whether insertions occurred near the 5" or 3" ends of genes, the distance from the integration site to the closest end of a gene was determined. In Fig. 3A these events were plotted upstream of the open reading frame if the insertions were closest to the 5" end of a gene and downstream of the open reading frame if they were closest to the 3" end. In this analysis, 74% of the insertions were associated with the 5" ends of genes. Although the association with the 5" ends of genes occurred with distances of up to 1.54 kb, there was significant clustering within 300 nucleotides of the start of translation.
|
The nucleotide compositions at the sites of insertion were analyzed to determine whether Tf1 recognized specific sequence patterns. Figure 3B is a compilation of the five nucleotides that were duplicated during each of the transposition events. Although no consensus sequence existed, the target site duplications were particularly rich in AT. Positions 2, 3, and 4 were 94, 76, and 82% AT, respectively. These levels were higher than the 65% AT content of the genome of S. pombe and suggested that the IN of Tf1 had a bias for insertion sites that were AT rich. In addition, the lack of any G's at position 2 indicated a specific bias against this nucleotide at this position.
The integration of Tf1-ori/neo exhibited a significant preference for chromosome 3. The transposon sequences throughout the genome sequence of S. pombe were examined by using the genome data from the Sanger Centre (European Consortium and Sanger Centre, unpublished); it was found that the density of endogenous Tf LTRs was twofold higher in chromosome 3 than in the other two chromosomes (N. J. Bowen and H. L. Levin, unpublished data). We therefore examined whether the enrichment of endogenous Tf sequences in chromosome 3 could be due to a bias in integration. Each insertion reported here was tabulated by the chromosome that was disrupted. These numbers were then normalized to the portions of the chromosomes that are nonrepetitive and associated with pol II transcription. To do this, the 1.2 Mb of rDNA repeats on chromosome 3 and the telomeres were excluded from the calculation. One justification for excluding the rDNA repeats is that this region is not a target for Tf1 integration. In a recent study of Tf1 transposition, 27 insertions were generated (6). When these data were pooled with ours, a total of 78 insertions of Tf1 were characterized. Even though the 1.2 Mb of rDNA repeats constitute 35% of chromosome 3, none of the 25 insertions in chromosome 3 occurred in rDNA repeats. Thus, by considering the insertions reported in our study and only the region of the genome associated with pol II transcription and not the telomeres or rDNA repeats, we found the surprising result that, per unit length, chromosome 3 was significantly more likely to be the target of transposition than either of the other two chromosomes (Table 4). When all the insertions were tabulated, chromosomes 1 and 2 received 3.3 and 3.9 inserts of Tf1-ori/neo per Mb, respectively, while chromosome 3 had 6.7 per Mb. This value represents a frequency of transposition into the DNA of chromosome 3 that was about twofold higher than expected based on its length.
|
|
We examined whether the preference for insertion into chromosome 3 could be due to its having a higher proportion of tandem and divergent gene pairs. This was not the case. The fractions of the intergenic sequences in chromosomes 1, 2, and 3 that were located between divergent gene pairs were 26.5, 27, and 27%, respectively. For intergenic sequences located between tandem gene pairs, the fractions in chromosomes 1, 2, and 3 were 47.5, 46.0, and 48.5%, respectively (Wood, personal communication). The results of these tabulations indicate that the gene densities and polarities are nearly equivalent for all three chromosomes.
| DISCUSSION |
|---|
|
|
|---|
The techniques used in this study were designed to detect sites of insertion whether or not they disrupted essential genes or contained silent chromatin structures. Nevertheless, our collection of 51 insertions showed similarities to a set of 27 insertions of Tf1-neo that were recently reported (6). Behrens et al. (6) identified sites of integration by direct screening of haploid cells of S. pombe for the G418 resistance that resulted from the transposition of Tf1-neo. The collection of Behrens et al. (6) showed a preference for integration into intergenic regions, just as we observed. In addition, they also reported the preference for integration into spaces between divergent or tandem gene pairs. In addition, they found that insertions occurred in intergenic regions that were larger than average. The positions of their insertions were also associated with the 5" ends of pol II genes, suggesting an interaction with transcription factors. Despite these similarities, there were important differences between the two studies. These include our surprising isolation of duplicate insertions and the unique observation that Tf1 has a preference for integration into chromosome 3.
The association of Ty1 and Ty3 insertions with specific transcription units has been observed. In each instance, the target site specificity protects host genes from being disrupted by integration events (7). The Ty3 element integrates just a few base pairs upstream of pol III genes (11), and this positioning is due to an interaction with the pol III transcription factors in TF-IIIB (22, 41). Ty1 also inserts upstream of pol III genes, but the integration window extends 75 to 700 bp upstream of the transcription start site (14). The examples of Ty1 and Ty3 indicate that for host genomes with a dense coding sequence, targeting strategies are required to prevent the disruption of genes. The genome of S. pombe has a dense coding sequence, and the preference for integration into intergenic regions indicates that this mechanism is also designed to avoid the disruption of host genes. Since Tf1 integration is associated with pol II promoters, it is possible that insertion alters the expression of pol II genes. However, eight different insertions of Tf1-neo were tested, and none was found to change significantly the expression of the adjacent genes (6). This result suggests that the insertions occurred upstream of the sequences critical for transcription.
A dramatic bias was represented by the duplicate insertions that occurred at eight positions in the genome. Each of the duplicate events was obtained from a separate culture of S. pombe and was therefore genetically independent. The possibility that the duplicate events were the result of cross-contaminated pools of DNA was unlikely, since at least one of the pools of transposition events was generated and characterized 6 months later at a different research institution. The close association of the duplicate sites with other sites of insertion provided additional evidence of high frequencies of integration. The lack of any obvious similarities between the duplicate sites makes it difficult to identify a mechanism for the high frequencies of repetition. The lack of sequence similarity between the eight multiply isolated sites suggests that they could represent landmarks in the superstructure of the chromosomes. Possible examples include sites of unique chromosomal structures, such as folds or regions of contact with other nuclear structures, such as the nuclear matrix. It it likely that the eight repeated sites do not represent the complete set of such hot spots. Their isolation was likely subject to biases, such as the efficiency of transformation in bacteria of the circularized DNA. Nevertheless, the repeated and independent isolation of the same insertion sites indicates that the integration of Tf1 cDNA was mediated by factors with surprising specificity.
An important question is why did not we identify insertions closer than 2 kb to the duplicate sites. This class of events would presumably retain their selective advantages in bacteria as well as their relationship to the restriction sites used in gel purification and circularization. The answer may be that for any given intergenic region, there is one dominant site with the potential for Tf1 integration. This notion would imply that integration is precisely controlled by DNA binding proteins, as is the case for Ty3.
Perhaps the most significant bias that we observed was that when normalized for size, the nonrepetitive portion of chromosome 3 was more likely to be targeted for integration than the sequences of chromosomes 1 and 2. In fact, we observed about twice the number of insertions per kilobase of chromosome 3 than we observed for chromosomes 1 and 2. It was also interesting to observe that chromosomes 1 and 2 received equal frequencies of insertions per kilobase. Although Behrens et al. (6) did not report a preference for chromosome 3, their data exhibited the same pattern as ours. When we pooled both data sets and normalized the data for chromosome size, we observed the same twofold preference for insertion into chromosome 3. This similarity in the data sets demonstrates that the bias for integration into chromosome 3 is not likely due to an artifact of either isolation procedure. Additional documentation of this bias was obtained when the genome sequence of S. pombe was analyzed for the locations of the 230 Tf LTRs. The nonrepetitive sequences of chromosome 3 had twice the density of insertions as the other two chromosomes (Bowen and Levin, unpublished).
The association with a specific chromosome is unusual for a transposon, but some examples have been observed. The endogenous retrovirus gypsy of Drosophila has potentially active elements that are predominantly located on the Y chromosome (12, 20, 34). However, the high numbers of gypsy elements on the Y chromosome are not due to preferences of integration but instead result from the loss of gypsy elements from the other chromosomes. The presence of gypsy elements on chromosomes other than Y is selected against because damaging levels of transposition occur only in females of the permissive strains. Since the Y chromosome is not present in females, active copies of gypsy elements can reside on the Y chromosome without resulting in high levels of transposition. In the human genome, similar enrichments of the endogenous retroviruses HERV K and HERV L have been observed on the Y chromosome, and these may also be due to selective pressure against their presence in other chromosomes (24).
A comprehensive analysis of all the transposons in S. cerevisiae revealed that Ty1, Ty2, Ty3, and Ty4 are tightly linked to tRNA genes (21). Further analysis revealed that the density of insertions per kilobase of DNA is higher for the smaller chromosomes. The three smallest chromosomes, I, III, and VI, have an average of one insertion per 25.2 kb. In contrast, the three largest chromosomes, VII, XV, and IV, have an average of one insertion per 39.4 kb. Interestingly, when we normalized the data for the chromosomes of S. cerevisiae for number of tRNAs, chromosomes I, III, and VI had densities of Ty1 elements per tRNA at least sixfold higher than those of chromosomes VII, XV, and IV. Since chromosome 3 of S. pombe is the smallest chromosome, it is possible that chromosome size may in some way contribute to the selection of insertion sites. However, it is also possible that the chromosome bias observed in the genome of S. cerevisiae is not due to selective integration but instead is similar to the situation for gypsy, in that it may result from selective pressure that favors the loss of Ty1 from the larger chromosomes. To our knowledge, the preference of Tf1 for insertion into chromosome 3 sequences represents the first example of an LTR retroelement that has an integration mechanism with a chromosome-specific bias.
One important observation was that the existing copies of Tf2 identified by the S. pombe genome project do not show a bias for chromosome 3. Despite the high concentration of Tf2 LTRs on chromosome 3, the full-length elements are more numerous on chromosomes 1 and 2. In fact, eight are located on chromosome 1, and two are located on chromosome 3 (Fig. 2). The explanation may be found in the mechanism used by the current copies of Tf2 to mobilize (18). Tf2 mobilizes primarily through homologous recombination with preexisting sequences. Despite the high number of Tf2 LTRs on chromosome 3, this process of homologous recombination appears to favor the other two chromosomes.
Just how Tf1 favors integration into chromosome 3 remains unknown. One interesting issue is that the 1.2 Mb of rDNA repeats are located on chromosome 3. Perhaps all sequences of chromosome 3 are more tightly associated with the nucleolus than the sequences of the other chromosomes and this localization somehow favors the interaction with the preintegration complexes of Tf1. Another possibility is that the composition of the chromatin on chromosome 3 is distinct from that on the other chromosomes. For example, an alternative histone might be found in higher concentrations on chromosome 3 and could be recognized by the Tf1 IN. We suggest this possibility because the IN of Tf1 contains a chromodomain (30) and because chromodomains were recently found to interact with modified versions of histone proteins (5, 23, 33). Another explanation for the preference of chromosome 3 accounts for the observation that this bias is almost exactly twofold. Perhaps integration is linked to replication and chromosome 3 is replicated later than the other two chromosomes. If transposition were to occur after the replication of chromosomes 1 and 2 but before that of chromosome 3, then the number of Tf1 elements inserted into chromosome 3 would be effectively amplified by a factor of 2. The higher density of insertions in chromosome 3 also could be due to the distribution of hot spots instead of being a global feature of the entire chromosome. If the duplicate insertions that we observed were the result of local chromatin environments, then these conditions might be more prevalent on chromosome 3.
Another important question about the integration of Tf1 into chromosome 3 is what the function of this bias is. Is it the result of a process that favors the transmission of the transposon or the viability of the host? One possibility is that the overall levels of transcription of genes in chromosome 3 are low so that higher densities of Tf1 in chromosome 3 are required to offset this difference. This scenario could be similar to the dosage compensations of Drosophila melanogaster or Caenorhabditis elegans, where the transcription of genes in the X chromosomes of females is altered by a factor of 2 (16, 31).
A different explanation that is more compelling centers on the possibility that the integration mechanism targets all the chromosomes with equal probabilities. Because the nonrepetitive sequence on chromosome 3 is half the size of the sequences on chromosomes 1 and 2, transposition events would be twice as likely to insert into a kilobase of sequence in chromosome 3 than into a kilobase in the other two chromosomes. This model would also explain the sixfold-higher density of Ty1 on the smaller chromosomes of S. cerevisiae than on the larger chromosomes. Chromosomes I, III, and VI are three- to sixfold smaller than chromosomes VII, XV, and IV.
The explanation for why chromosomes might be targeted with equal probabilities may lie in the method likely used by LTR retrotransposons in the wild to populate the genomes of strains that lack them. Since LTR retrotransposons are not infectious in the classical sense, the only way in which they can integrate into genomes that lack copies of an element is through mating and subsequent transposition into the vacant chromosomes. To avoid damage to the element that could occur during recombination or perhaps to avoid host mechanisms that would inhibit the recombination of transposons, integration may take place after meiosis is complete. If this is the case, the most effective way of dispersing the transposon into naive genomes is by mediating integration into chromosomes with equal probabilities. Thus, with a random assortment of chromosomes, the chance is the greatest that each product of meiosis will carry the transposon.
Because the propagation of transposons ultimately depends on the fitness of the host, it is critical for the elements to develop mechanisms of integration that avoid the disruption of coding sequences. The ability to study the transposition of LTR retrotransposons in S. pombe provides the unique opportunity to compare the strategies that transposons use to select insertion sites in two yeasts that diverged from each other 109 years ago (38). In addition, the analysis of both of these yeasts is particularly informative because their coding sequences represent the majority of their genomes. Our analysis of the sites chosen by Tf1 for integration revealed interactions with the host genome that are entirely different from the strategies used by the Ty elements in the genome of S. cerevisiae. Ty1, Ty2, Ty3, and Ty4 avoid damaging host genes by inserting into gene-poor regions upstream of pol III genes (7, 11, 14, 21, 37). Ty5 does not disrupt essential coding sequences because it specifically inserts into regions of silent chromatin (42, 43). In contrast to the strategies used by the Ty elements in S. cerevisiae, the integration of Tf1 shows no preference for tRNA genes or any other pol III transcribed unit. Nor does Tf1 insert into regions of silent chromatin. Instead, we found that Tf1 specifically inserts in regions between genes that are likely transcribed by pol II. Although this strategy is significantly different from that used by the Ty elements, the result is the same. The disruption of coding sequences is avoided.
The detailed information about the interaction of transposons with two greatly divergent genomes may contribute to the understanding of how these types of elements propagate in the genomes of more complex eukaryotes. Now that the human genome project is complete, it may be particularly interesting to examine the patterns of integration throughout the genome and compare these results to those for the two yeasts.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||