• No results found

T. cruzi repeated genes (V)

6.2 Applications

6.2.3 T. cruzi repeated genes (V)

6.2. APPLICATIONS 41 The false positive rate, indicated by probes showing opposite patterns for males and females, was between 6% and 16%. Even with the extra demand of two or more probes and/or multiple populations, there were very many regions of potential CNV. The regions seem to be short in comparison to regions of CNV found in human, mouse or chimpanzee. The false positive rate for regions with more than one probe indicating CNV and regions where more than one breed indicated CNV was low. We thus believe those regions to have true differences in copy number.

Comparison of the potential CNVs discovered in the read depth and the array CGH study only showed overlap in the longest regions in the array CGH study, further indicating a high false positive rate in the read depth study. Layer reads from the read depth and SNP study were used to verify regions of CNV indicated in the array CGH study. The result of this analysis supported CNV discovered in the chicken genome.

Regions of CNV seem to be depleted of genes. We found more deletions than duplications. That could, however, be an artifact of the array CGH methodol-ogy, where deletions in relative terms give stronger signals than duplications. We also found more deletions in domestic breeds in comparison to RJF. This might be a result of selective breeding, where a chicken population quickly adapts a new trait and where deletion might be a more rapid means to genome alteration.

In conclusion we suggest that copy number variation does exist in the chicken genome, however not to the extent discovered in mammals, or with as large regions involved. There might be both false negatives and false positives in our results. That we cannot fully detect chromosome Z as a duplication between males and females is an indication to the false negative rate. Both methods used were based on the genome sequence of RJF, and cannot, as a result, detect full deletions in RJF. This is a common problem in CNV discovery. Re-sequencing of genomes as a means to detect CNV would remove this bias.

42 CHAPTER 6. PRESENT INVESTIGATION more than 50% of all annotated genes.

The aim of our analysis was to provide the users of the T. cruzi genome sequence with additional information in addition to that of the assembly. We provide information on the coverage of the annotated genes and approximate their copy number. The annotated genes are grouped to see which of the pre-dicted copies are present in the assembled genome sequence.

Copy number estimation

We only investigate the protein-coding genes and their pseudogenes. Other genes are short enough to be assembled properly as a read span more than a repeat copy. By annotation, we refer to an annotated protein-coding gene or pseudogene in the assembled genome. By gene we refer to a specific protein with a specific function or its pseudogene. By gene copies, we refer to the number of times the gene exist in the actual T. cruzi genome. Similarly, a gene copy is one example of that gene in the genome.

The copy number of a gene was calculated through assessing the coverage of an alignment formed by aligning shotgun reads to an annotated copy of the gene. Genes not annotated in the assembled genome are thus not investigated.

The multiple alignments were built using GRAT, with an accepted rate of dif-ferences between repeat copies of 5%. Reads could participate in more than one alignment, to show the maximum coverage of all annotations. All annotations were also aligned to each other to investigate how many copies of each gene were present in the assembly. This collapse of annotated gene copies was also made using GRAT with 5% differences allowed. So, for a specific annotation, the results show both the predicted copy number of the gene and the number of those copies present in the assembled genome sequence.

The coverage of the multiple alignment was used to calculate a predicted copy number of each annotation using the mean coverage of the assembly. This average coverage was calculated in regions where the assembly had very high quality and there were no repeats. The results show the predicted copy num-ber and a sampling of the coverage across the gene sequence. The standard deviation of the sampled coverage is also presented. In an additional attempt to investigate the repeated genes, the repeat unit boundaries were determined.

When a multiple alignment is optimized, the scores of columns in the alignment decrease. These scores were used to determine regions where the level of diver-sity increases. This increase is likely to represent the boundary of the repeat unit into unique sequence. This turned out to be a difficult task, as the repeat unit lengths were variable and the increase in column scores was not easy to determine. For a fraction of the repeated annotations, repeat boundaries are, however, reported.

Almost all annotations formed alignments. The information gathered is stored in a database, accessible at http://cruzi.cgb.ki.se/ek/cruzi/main.html.

576 annotations, or less than 3%, are missing from the database. The majority of the annotations that did not form alignments were surface antigens. Their alignments were too large to fit in computer RAM. There were some exceptions,

6.2. APPLICATIONS 43 for example DnaJ homolog subfamily A member 2, which was too short to form an alignment, as all reads overlapping the annotations completely enclosed it.

This is, as discussed above, one of the drawbacks of GRAT. On the other hand, this is the only occurrence in either of the three presented studies. Another annotation, eukaryotic translation initiation factor 1A, had multiple insertions in comparison to the aligning reads that thus did not fall within the 5% difference cutoff.

12% of annotations were predicted to have only one copy. These might be true singleton genes, or simply have unusual low coverage which would mis-predict them as singletons. 37% of annotations were mis-predicted to have two copies, which is the ’normal’ diploid gene arrangement. The remaining anno-tations are repeated genes. 45% of these were predicted to have ten or more copies.

Figure 6.7: A screenshot of the T. cruzi repeat database showing predicted copy number for a gene with locus id Tc00.1047053506551.10.

The database holds information on all annotations that formed alignments.

A user can find out that the single-copy gene he or she is working on really is a single-copy gene. Users interested in repeated genes can get an idea on how repeated the gene is and how many of its copies are annotated. Besides copy number estimation and collapse with other annotations, the database holds in-formation on COGs as predicted in the genome paper, repeat unit if that has

44 CHAPTER 6. PRESENT INVESTIGATION been located and additional information such as gene function, allele and hap-lotype, if that information is available. The user can see a sampling of the coverage along the annotation and get the standard deviation of this sampling.

A list of reads participating in the multiple alignment is also provided. Unfor-tunately, to save the alignments themselves would be too memory-demanding.

Links to GenBank [163] and GeneDB [164] are also provided. A screenshot of the database is shown in Figure 6.7.

In-depth analysis

To show examples of repeated genes and to show users of the database how the information in the database can be used, we performed in-depth studies on five genes. They were all predicted to have 15 copies or more. Alignments were built using GRAT of a few highly similar annotated copies of the gene with similar repeat boundaries, and assembled without quality values using phrap (http://www.phrap.org). The reads were assembled without the use of quality values to ensure that all were included in one assembly. The resulting assemblies were viewed and edited in DNPTrapper [158]. Their diversity was investigated and the multiple alignments were divided into groups using DNPs. The different genes show very different characteristics.

Figure 6.8: A screenshot of DNPTrapper. A section of an alignment of reads from the trans-sialidase locus is shown. The colored dots represent DNPs in the reads. A mutation rendering a group of reads inactive is circled.

The levels of divergence varied from hardly any at all, as in the heat shock protein 85, to very divergent, as in the surface antigen trans-sialidase. Tyrosine aminotransferase and flagellar calcium binding protein both showed a pattern that seem common in T. cruzi: highly similar tandem repeated copies within an array, but differing between arrays. A hypothetical protein showed an unex-pected number of copies with differences in predicted transmembrane regions.

Both trans-sialidase and the hypothetical protein showed erroneous consensus sequences in the annotated copies. Figure 6.8 show a section of the alignment

6.2. APPLICATIONS 45 of trans-sialidase in DNPTrapper. The horizontal boxes represent the reads and the colored positions represent DNPs. A mutation in a group of reads that make the gene copy lose its trans-sialidase activity is circled.

The database and the examples in the in-depth study provide the user of the T. cruzi genome a framework to do similar in-depth analysis on their gene of interest. The specific examples can serve as a manual on how to use the database to extract additional information on individual copies of a gene. If a gene is repeated, the correct sequences of each individual copy can be determined. The correct sequences of gene copies can be important for further experiments, such as cloning or primer design for polymerase chain reaction (PCR).

Acknowledgements

Thank you to all who have helped make my PhD program fun and fruitful!

Special thanks to:

Bj¨orn Andersson, for the years of support and guidance.

Martti Tammi, for tutelage and the never-ending stream of ideas.

Leif Andersson, for insight and enthusiasm.

Erik Arner, for collaborations and making me feel less isolated.

Daniel Nilsson, for answering every question I ever had.

Daryoush Rahmani, for much needed IT firefighting.

Shane McCarthy, for always taking the time.

Marcela Ferella, for encouragement and always being on my side.

All past and present group members.

The CGB bioinformatics group, the FunChick consortium and all past and present colleagues at the former CGB.

My thanks also to my family, for consistently trying to figure out what, ex-actly, it is that I do. And last but not least, to Jody, who always knows what to do.

46

Bibliography

[1] J. D. WATSON and F. H. CRICK, “Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid,” Nature, vol. 171, no. 4356, pp. 737–

738, 1953.

[2] R. Wu and A. D. Kaiser, “Structure and base sequence in the cohesive ends of bacteriophage lambda DNA,” J Mol Biol, vol. 35, no. 3, pp. 523–537, 1968.

[3] F. Sanger, J. E. Donelson, A. R. Coulson, H. Kossel, and D. Fischer, “Use of DNA polymerase I primed by a synthetic oligonucleotide to determine a nucleotide sequence in phage fl DNA,” Proc Natl Acad Sci U S A, vol. 70, no. 4, pp. 1209–1213, 1973.

[4] F. Sanger and A. R. Coulson, “A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase,” J Mol Biol, vol. 94, no. 3, pp. 441–448, 1975.

[5] A. M. Maxam and W. Gilbert, “A new method for sequencing DNA,”

Proc Natl Acad Sci U S A, vol. 74, no. 2, pp. 560–564, 1977.

[6] F. Sanger, S. Nicklen, and A. R. Coulson, “DNA sequencing with chain-terminating inhibitors,” Proc Natl Acad Sci U S A, vol. 74, no. 12, pp. 5463–5467, 1977.

[7] B. Gronenborn and J. Messing, “Methylation of single-stranded DNA in vitro introduces new restriction endonuclease cleavage sites,” Nature, vol. 272, no. 5651, pp. 375–377, 1978.

[8] F. Sanger, A. R. Coulson, B. G. Barrell, A. J. Smith, and B. A. Roe,

“Cloning in single-stranded bacteriophage as an aid to rapid DNA se-quencing,” J Mol Biol, vol. 143, no. 2, pp. 161–178, 1980.

[9] B. Ewing, L. Hillier, M. C. Wendl, and P. Green, “Base-calling of auto-mated sequencer traces using phred. I. Accuracy assessment,” Genome Res, vol. 8, no. 3, pp. 175–185, 1998.

[10] B. Ewing and P. Green, “Base-calling of automated sequencer traces using phred. II. Error probabilities,” Genome Res, vol. 8, no. 3, pp. 186–194, 1998.

47

48 BIBLIOGRAPHY [11] R. W. HOLLEY, J. APGAR, G. A. EVERETT, J. T. MADISON, M. MARQUISEE, S. H. MERRILL, J. R. PENSWICK, and A. ZA-MIR, “STRUCTURE OF A RIBONUCLEIC ACID,” Science, vol. 147, pp. 1462–1465, 1965.

[12] A. Edwards, H. Voss, P. Rice, A. Civitello, J. Stegemann, C. Schwager, J. Zimmermann, H. Erfle, C. T. Caskey, and W. Ansorge, “Automated DNA sequencing of the human HPRT locus,” Genomics, vol. 6, no. 4, pp. 593–608, 1990.

[13] A. Edwards and C. T. Caskey, “Closure Strategies for Random DNA Sequencing,” Methods: A Companion to Methods in Enzymology, vol. 3, no. 1, pp. 41–47, 1991.

[14] J. C. Roach, C. Boysen, K. Wang, and L. Hood, “Pairwise end sequenc-ing: a unified approach to genomic mapping and sequencing,” Genomics, vol. 26, no. 2, pp. 345–353, 1995.

[15] E. S. Lander, L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Bald-win, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, J. Meldrim, J. P. Mesirov, C. Miranda, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheri-dan, C. Sougnez, N. Stange-Thomann, N. Stojanovic, A. Subramanian, D. Wyman, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bentley, J. Burton, C. Clee, N. Carter, A. Coulson, R. Deadman, P. Deloukas, A. Dunham, I. Dunham, R. Durbin, L. French, D. Grafham, S. Gre-gory, T. Hubbard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. Mc-Murray, L. Matthews, S. Mercer, S. Milne, J. C. Mullikin, A. Mungall, R. Plumb, M. Ross, R. Shownkeen, S. Sims, R. H. Waterston, R. K.

Wilson, L. W. Hillier, J. D. McPherson, M. A. Marra, E. R. Mardis, L. A. Fulton, A. T. Chinwalla, K. H. Pepin, W. R. Gish, S. L. Chissoe, M. C. Wendl, K. D. Delehaunty, T. L. Miner, A. Delehaunty, J. B.

Kramer, L. L. Cook, R. S. Fulton, D. L. Johnson, P. J. Minx, S. W.

Clifton, T. Hawkins, E. Branscomb, P. Predki, P. Richardson, S. Wen-ning, T. Slezak, N. Doggett, J. F. Cheng, A. Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, R. A. Gibbs, D. M. Muzny, S. E. Scherer, J. B. Bouck, E. J. Sodergren, K. C. Worley, C. M. Rives, J. H. Gorrell, M. L. Metzker, S. L. Naylor, R. S. Kucherlapati, D. L. Nelson, G. M.

Weinstock, Y. Sakaki, A. Fujiyama, M. Hattori, T. Yada, A. Toyoda, T. Itoh, C. Kawagoe, H. Watanabe, Y. Totoki, T. Taylor, J. Weissenbach, R. Heilig, W. Saurin, F. Artiguenave, P. Brottier, T. Bruls, E. Pelletier, C. Robert, P. Wincker, D. R. Smith, L. Doucette-Stamm, M. Ruben-field, K. Weinstock, H. M. Lee, J. Dubois, A. Rosenthal, M. Platzer, G. Nyakatura, S. Taudien, A. Rump, H. Yang, J. Yu, J. Wang, G. Huang, J. Gu, L. Hood, L. Rowen, A. Madan, S. Qin, R. W. Davis, N. A. Feder-spiel, A. P. Abola, M. J. Proctor, R. M. Myers, J. Schmutz, M. Dickson,

BIBLIOGRAPHY 49 J. Grimwood, D. R. Cox, M. V. Olson, R. Kaul, C. Raymond, N. Shimizu, K. Kawasaki, S. Minoshima, G. A. Evans, M. Athanasiou, R. Schultz, B. A. Roe, F. Chen, H. Pan, J. Ramser, H. Lehrach, R. Reinhardt, W. R. McCombie, M. de la Bastide, N. Dedhia, H. Blocker, K. Hor-nischer, G. Nordsiek, R. Agarwala, L. Aravind, J. A. Bailey, A. Bateman, S. Batzoglou, E. Birney, P. Bork, D. G. Brown, C. B. Burge, L. Cerutti, H. C. Chen, D. Church, M. Clamp, R. R. Copley, T. Doerks, S. R.

Eddy, E. E. Eichler, T. S. Furey, J. Galagan, J. G. Gilbert, C. Har-mon, Y. Hayashizaki, D. Haussler, H. Hermjakob, K. Hokamp, W. Jang, L. S. Johnson, T. A. Jones, S. Kasif, A. Kaspryzk, S. Kennedy, W. J.

Kent, P. Kitts, E. V. Koonin, I. Korf, D. Kulp, D. Lancet, T. M. Lowe, A. McLysaght, T. Mikkelsen, J. V. Moran, N. Mulder, V. J. Pollara, C. P. Ponting, G. Schuler, J. Schultz, G. Slater, A. F. Smit, E. Stupka, J. Szustakowski, D. Thierry-Mieg, J. Thierry-Mieg, L. Wagner, J. Wallis, R. Wheeler, A. Williams, Y. I. Wolf, K. H. Wolfe, S. P. Yang, R. F. Yeh, F. Collins, M. S. Guyer, J. Peterson, A. Felsenfeld, K. A. Wetterstrand, A. Patrinos, M. J. Morgan, P. de Jong, J. J. Catanese, K. Osoegawa, H. Shizuya, S. Choi, and Y. J. Chen, “Initial sequencing and analysis of the human genome,” Nature, vol. 409, no. 6822, pp. 860–921, 2001.

[16] J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G.

Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Go-cayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, G. Sub-ramanian, P. D. Thomas, J. Zhang, G. L. Gabor Miklos, C. Nelson, S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J.

Levine, R. J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Rem-ington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Bran-don, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn, K. Eilbeck, C. Evangelista, A. E.

Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R. R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M. Moore, A. K.

Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Bee-son, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S.

McCaw-50 BIBLIOGRAPHY ley, T. McIntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y. H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter, S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K. Zaveri, J. F. Abril, R. Guigo, M. J. Campbell, K. V. Sjolander, B. Karlak, A. Kejariwal, H. Mi, B. Lazareva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo, S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y. H. Chiang, M. Coyne, C. Dahlke, A. Mays, M. Dombroski, M. Don-nelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe, R. Sanders, J. Scott, M. Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M. Wen, D. Wu, M. Wu, A. Xia, A. Zandieh, and X. Zhu, “The sequence of the human genome,” Science, vol. 291, no. 5507, pp. 1304–

1351, 2001.

[17] R. D. Fleischmann, M. D. Adams, O. White, R. A. Clayton, E. F.

Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, and J. M. Merrick, “Whole-genome random sequencing and assembly of Haemophilus influenzae Rd,” Science, vol. 269, no. 5223, pp. 496–512, 1995.

[18] A. Goffeau, B. G. Barrell, H. Bussey, R. W. Davis, B. Dujon, H. Feldmann, F. Galibert, J. D. Hoheisel, C. Jacq, M. Johnston, E. J. Louis, H. W.

Mewes, Y. Murakami, P. Philippsen, H. Tettelin, and S. G. Oliver, “Life with 6000 genes,” Science, vol. 274, no. 5287, pp. 563–567, 1996.

[19] C. elegans Sequencing Consortium, “Genome sequence of the nematode C.

elegans: a platform for investigating biology,” Science, vol. 282, no. 5396, pp. 2012–2018, 1998.

[20] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J Mol Biol, vol. 48, no. 3, pp. 443–453, 1970.

[21] R. Staden, “Sequence data handling by computer,” Nucleic Acids Res, vol. 4, no. 11, pp. 4037–4051, 1977.

[22] J. Sambrook, “Adenovirus amazes at Cold Spring Harbor,” Nature, vol. 268, no. 5616, pp. 101–104, 1977.

[23] W. Gilbert, “Why genes in pieces?,” Nature, vol. 271, no. 5645, p. 501, 1978.

BIBLIOGRAPHY 51 [24] W. B. Goad and M. I. Kanehisa, “Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symme-tries,” Nucleic Acids Res, vol. 10, no. 1, pp. 247–263, 1982.

[25] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” J Mol Biol, vol. 147, no. 1, pp. 195–197, 1981.

[26] J. P. Dumas and J. Ninio, “Efficient algorithms for folding and comparing nucleic acid sequences,” Nucleic Acids Res, vol. 10, pp. 197–206, 1982.

[27] W. J. Wilbur and D. J. Lipman, “Rapid similarity searches of nucleic acid and protein data banks,” Proc Natl Acad Sci U S A, vol. 80, pp. 726–730, 1983.

[28] W. R. Pearson and D. J. Lipman, “Improved tools for biological sequence comparison,” Proc Natl Acad Sci U S A, vol. 85, pp. 2444–2448, 1988.

[29] W. R. Pearson, “Rapid and sensitive sequence comparison with FASTP and FASTA,” Methods Enzymol, vol. 183, pp. 63–98, 1990.

[30] D. J. Lipman and W. R. Pearson, “Rapid and sensitive protein similarity searches,” Science, vol. 227, no. 4693, pp. 1435–1441, 1985.

[31] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J Mol Biol, vol. 215, no. 3, pp. 403–410, 1990.

[32] K. M. Chao, W. R. Pearson, and W. Miller, “Aligning two sequences within a specified diagonal band,” Comput Appl Biosci, vol. 8, no. 5, pp. 481–487, 1992.

[33] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res, vol. 25, no. 17, pp. 3389–3402, 1997.

[34] S. McGinnis and T. L. Madden, “BLAST: at the core of a powerful and di-verse set of sequence analysis tools,” Nucleic Acids Res, vol. 32, pp. W20–

W25, 2004.

[35] K. M. Chao, R. C. Hardison, and W. Miller, “Recent developments in linear-space alignment methods: a survey,” J Comput Biol, vol. 1, no. 4, pp. 271–291, 1994.

[36] W. J. Kent, “BLAT–the BLAST-like alignment tool,” Genome Res, vol. 12, no. 4, pp. 656–664, 2002.

[37] Z. Zhang, S. Schwartz, L. Wagner, and W. Miller, “A greedy algorithm for aligning DNA sequences,” J Comput Biol, vol. 7, pp. 203–214, 2000.

52 BIBLIOGRAPHY [38] S. Schwartz, W. J. Kent, A. Smit, Z. Zhang, R. Baertsch, R. C. Hardison, D. Haussler, and W. Miller, “Human-mouse alignments with BLASTZ,”

Genome Res, vol. 13, pp. 103–107, 2003.

[39] L. Florea, G. Hartzell, Z. Zhang, G. M. Rubin, and W. Miller, “A com-puter program for aligning a cDNA sequence with a genomic DNA se-quence,” Genome Res, vol. 8, pp. 967–974, 1998.

[40] Z. Ning, A. J. Cox, and J. C. Mullikin, “SSAHA: a fast search method for large DNA databases,” Genome Res, vol. 11, no. 10, pp. 1725–1729, 2001.

[41] B. Ma, J. Tromp, and M. Li, “PatternHunter: faster and more sensitive homology search,” Bioinformatics, vol. 18, no. 3, pp. 440–445, 2002.

[42] M. Li, B. Ma, D. Kisman, and J. Tromp, “Patternhunter II: highly sensi-tive and fast homology search,” J Bioinform Comput Biol, vol. 2, no. 3, pp. 417–439, 2004.

[43] S. Schwartz, Z. Zhang, K. A. Frazer, A. Smit, C. Riemer, J. Bouck, R. Gibbs, R. Hardison, and W. Miller, “PipMaker–a web server for align-ing two genomic DNA sequences,” Genome Res, vol. 10, pp. 577–586, 2000.

[44] A. L. Delcher, S. Kasif, R. D. Fleischmann, J. Peterson, O. White, and S. L. Salzberg, “Alignment of whole genomes,” Nucleic Acids Res, vol. 27, pp. 2369–2376, 1999.

[45] A. L. Delcher, A. Phillippy, J. Carlton, and S. L. Salzberg, “Fast algo-rithms for large-scale genome alignment and comparison,” Nucleic Acids Res, vol. 30, pp. 2478–2483, 2002.

[46] J. Buhler, “Efficient large-scale sequence comparison by locality-sensitive hashing,” Bioinformatics, vol. 17, no. 5, pp. 419–428, 2001.

[47] C. Miller, J. Gurd, and A. Brass, “A RAPID algorithm for sequence database comparisons: Application to the identification of vector contam-ination in the EMBL databases,” Bioinformatics, vol. 15, pp. 111–121, 1999.

[48] R. Benne, J. Van den Burg, J. P. Brakenhoff, P. Sloof, J. H. Van Boom, and M. C. Tromp, “Major transcript of the frameshifted coxII gene from trypanosome mitochondria contains four nucleotides that are not encoded in the DNA,” Cell, vol. 46, no. 6, pp. 819–826, 1986.

[49] J. Lukes, H. Hashimi, and A. Zikova, “Unexplained complexity of the mi-tochondrial genome and transcriptome in kinetoplastid flagellates,” Curr Genet, vol. 48, no. 5, pp. 277–299, 2005.

[50] H. B. Tanowitz, L. V. Kirchhoff, D. Simon, S. A. Morris, L. M. Weiss, and M. Wittner, “Chagas’ disease,” Clin Microbiol Rev, vol. 5, no. 4, pp. 400–419, 1992.

Related documents