• No results found

Whole genome sequencing and assembly of an avian genome, the European crow Corvus corone spec.

N/A
N/A
Protected

Academic year: 2022

Share "Whole genome sequencing and assembly of an avian genome, the European crow Corvus corone spec."

Copied!
78
0
0

Loading.... (view fulltext now)

Full text

(1)MSc BIOINF 11 001. Whole genome sequencing and assembly of an avian genome, the European crow Corvus corone spec. A reduced representation library based approach. Nagarjun Vijay Degree project in bioinformatics, 2011 Examensarbete i bioinformatik 45 hp till masterexamen, 2011 Biology Education Centre and Dept. of Evolutionary biology , EBC, Uppsala University Supervisor: Dr Jochen Wolf.

(2)

(3)  .

(4) 

(5)  

(6)  

(7)  

(8)

(9)  

(10)      

(11)       

(12)       

(13)    

(14)     

(15) 

(16)

(17)    

(18)     

(19)   

(20) 

(21)            

(22) 

(23)   

(24)  

(25) 

(26) 

(27)

(28)    

(29)       

(30)   

(31)          

(32)       

(33) ! "

(34)  

(35)

(36)       

(37) 

(38) 

(39)   

(40)   #   

(41)   

(42)    

(43)        

(44)     $%%&'   

(45)      

(46)   

(47)  

(48) 

(49)                 

(50)  

(51) 

(52) $ ()'

(53)  

(54)  

(55) * %%&

(56)  

(57)   

(58) 

(59)  +

(60) 

(61) 

(62)  ,-   

(63) +*

(64) 

(65) 

(66)   

(67) 

(68)  

(69) !,-  

(70)  

(71) + 

(72) 

(73) 

(74)  +

(75)  

(76)  +. 

(77)  

(78)                *         

(79)     /  /

(80) 

(81)   

(82) 

(83)  

(84) !  () 

(85)   

(86)   

(87)   

(88)  

(89)   

(90)   ,-  

(91)

(92)          

(93)  

(94)  * 

(95) 

(96) 

(97)  

(98)   %%&012

(99)   3345 63345+* %%& 

(100) 

(101) 

(102)     ()   

(103)  

(104) 

(105)   

(106)   

(107) 

(108)  

(109)

(110) 7

(111)   

(112)   

(113)    

(114) 

(115) .

(116)

(117) Whole genome sequencing and assembly of an avian genome, the European crow Corvus corone spec. A reduced representation library based approach -. Popular science summary. Nagarjun Vijay The complete hereditary information of an organism which mostly decides the various characters of an organism are stored in its genome in the form of different combinations of nucleotide bases in the DNA. To understand the various characters of an organism, genes and functional elements responsible for these features it is useful to sequence its genome. Sequencing involves finding the order in which the nucleotide bases are organized. Various sequencing methods can determine the order of the bases in a stretch of 100 to a maximum of 1000 bases. Hence, for organisms with large genomes it is not possible to sequence the entire genome directly.. The maximum number of consecutive bases that can be sequenced is limited to a maximum of 1.5 KB. To overcome this limitation 'shotgun' genome sequencing has been utilised. A sidewalk will eventually be completely covered by randomly falling raindrops. Similarly, the entire genome can be covered by randomly sequencing smaller fragments of the genome. This is done by first breaking down DNA into a number of random fragments of length suitable for sequencing. These fragments are then sequenced individually. Enough fragments are sequenced to have covered the genome multiple times. Sequenced pieces of the genome are put together into a single continuous sequence of DNA using a computer program called “Genome assembler”. The genome assembler looks for overlapping regions between the sequenced fragments and makes use of this information to place the different fragments with respect to each other. This method of genome assembly is known as ‘shotgun’ genome sequencing. In this project we tested the benefits of an alternative method for genome assembly called reduced representation library approach. In this approach the genome is first partitioned into smaller reproducible fractions which are then individually subjected to shotgun genome sequencing. These individually assembled genomes are then put together to obtain a complete genome assembly. Partitioning of the genome into smaller fractions is expected to reduce the number of incorrect assemblies and allow for faster assembling of the data. Thereby using in-silico simulation on the zebra finch genome the reduced representation library approach has been found to provide assemblies which are more contiguous than those obtained from shotgun genome sequencing. Degree project in bioinformatics, 2011 Examensarbete i bioinformatik 45 hp till masterexamen, 2011 Biology Education Centre and Dept. of Evolutionary biology, EBC, Uppsala University Supervisor: Dr Jochen Wolf.

(118)

(119)   ͞,LJƉĞƌďŽůŽŝĚƐŽĨǁŽŶĚƌŽƵƐ>ŝŐŚƚ ZŽůůŝŶŐĨŽƌĂLJĞƚŚƌŽƵŐŚ^ƉĂĐĞĂŶĚdŝŵĞ ,ĂƌďŽƵƌƚŚŽƐĞtĂǀĞƐǁŚŝĐŚƐŽŵĞŚŽǁDŝŐŚƚ WůĂLJŽƵƚ'ŽĚΖƐŚŽůLJƉĂŶƚŽŵŝŵĞ͟ . . . ͲdƵƌŶŝŶŐ. .       . 7.

(120) Contents 1 Abstract ...................................................................................................................................................3 2 Introduction .............................................................................................................................................9 2.1 Whole genome shotgun assembly ............................................................................................10 2.1.1 Traditional sanger sequencing ......................................................................................12 2.1.2 NGS: Next-generation sequencing ...............................................................................13 2.1.3 Genome assembly and its challenges ...........................................................................14 2.2 Partitioning the genome to reduce the problem of assembly....................................................17 2.2.1 BAC clones ............................................................................................................................17 2.2.2 Reduced representation library based genome assembly .............................................18 2.3 Sequencing the crow genome ...................................................................................................20 3 Materials and methods ..........................................................................................................................22 3.1 In silico genome sequencing.....................................................................................................22 3.1.1 Restriction enzyme and fragment size selection ..........................................................23 3.1.2 de novo assembly .........................................................................................................30 3.2 Laboratory methods ..................................................................................................................31 3.2.1 DNA extraction and quality check ...............................................................................32 3.2.2 Reduced representation library construction ................................................................33 4 Results ...................................................................................................................................................35 4.1 Simulated assemblies ................................................................................................................35 4.1.1 Comparison of WGS vs RRL strategy .........................................................................35 4.1.2 Evaluation of RRL library preparation in the laboratory .............................................35 5 Discussion .............................................................................................................................................38 6 Conclusion ............................................................................................................................................41 7 References .............................................................................................................................................43 8 Supplementary material ........................................................................................................................49 8.1 Source code for various scripts used in the project ..................................................................49 8.2 DNA extraction from blood (minimal fragmentation) .............................................................69 8.3 Reduced representation library construction ............................................................................71 8.4 List of figures............................................................................................................................75 8.5 List of tables .............................................................................................................................76 8.6 List of Acronyms ......................................................................................................................77 9 Acknowledgments ...............................................................................................................................78. 8.

(121) 2 Introduction LJƚŚĞƚƵƌŶŽĨƚŚĞĐĞŶƚƵƌLJ͕EƐĞƋƵĞŶĐŝŶŐŚĂĚƐĐĂůĞĚƵƉƌĞŵĂƌŬĂďůLJĂŶĚƚŚĞƉƵďůŝƐŚŝŶŐŽĨ ƚŚĞŚƵŵĂŶŐĞŶŽŵĞŝŶϮϬϬϭŚĞƌĂůĚĞĚƚŚĞƉŽƐƚͲŐĞŶŽŵŝĐĞƌĂ;sĞŶƚĞƌĞƚĂů͕͘ϮϬϬϭͿ͘ĞŝŶŐĂďůĞ ƚŽƌĞĂĚŐĞŶŽŵĞƐŚĂƐŚĞůƉĞĚƵƐƵŶĚĞƌƐƚĂŶĚƚŚĞĨƵŶĐƚŝŽŶĂůĂƐƉĞĐƚƐŽĨŐĞŶĞƐĂŶĚƌĞŐƵůĂƚŽƌLJ ŶĞƚǁŽƌŬƐ;ŶŶĂ^ŚĂƌŵĂŶĞƚĂů͕͘ϮϬϬϭͿ͘^ĞƋƵĞŶĐŝŶŐŽĨŵƵůƚŝƉůĞĐůŽƐĞůLJƌĞůĂƚĞĚŽƌŐĂŶŝƐŵƐŚĂƐ ĂůůŽǁĞĚ ĨŽƌ ƚŚĞ ĐŽŵƉĂƌĂƚŝǀĞ ĂŶĂůLJƐĞƐ ŽĨ ŐĞŶŽŵĞƐ ǁŚŝĐŚ ŝƐ ǀĞƌLJ ƵƐĞĨƵů ŝŶ ĞǀŽůƵƚŝŽŶĂƌLJ ƐƚƵĚŝĞƐ͘   tŚŝůĞ ŵŽƐƚ ŽĨ ƚŚĞ ŵĂŶƵĂůůLJ ŵĂŝŶƚĂŝŶĞĚ ŐĞŶŽŵĞƐ ĂǀĂŝůĂďůĞ ŶŽǁ ŚĂǀĞ ďĞĞŶ ƉƌŽĚƵĐĞĚ ǁŝƚŚ ƚŚĞ ĚŝͲĚĞŽdžLJ ĐŚĂŝŶ ƚĞƌŵŝŶĂƚŝŽŶ ŵĞƚŚŽĚ͕ ƐĞǀĞƌĂů ŶŽǀĞů ŵĞƚŚŽĚƐ ŽĨ E ƐĞƋƵĞŶĐŝŶŐ ŚĂǀĞ ďĞĞŶ ĚĞǀĞůŽƉĞĚ ŝŶ ƐƵďƐĞƋƵĞŶƚ ĚĞĐĂĚĞƐ ;DŝĐŚĂĞů >͘ D Ğƚ Ăů͕͘ ϮϬϭϬͿ͘ /ŵƉƌŽǀĞŵĞŶƚƐŝŶƐĞƋƵĞŶĐŝŶŐƚĞĐŚŶŽůŽŐLJŚĂǀĞŵĂĚĞŝƚƉŽƐƐŝďůĞƚŽƐĞƋƵĞŶĐĞůĂƌŐĞƌŐĞŶŽŵĞƐ ĂƚůŽǁĞƌĐŽƐƚƐ͘dŚŝƐŶŽǁŽƉĞŶƐƵƉƚŚĞƉŽƐƐŝďŝůŝƚLJƚŽŝŶǀĞƐƚŝŐĂƚĞŽƌŐĂŶŝƐŵƐǁŚĞƌĞŽŶůLJƐĐĂƌĐĞ ŐĞŶĞƚŝĐĚĂƚĂŝƐĂǀĂŝůĂďůĞ͘  tŚŝůĞƚĞĐŚŶŝĐĂůůLJŶŽǁŝƚŝƐƉŽƐƐŝďůĞƚŽƉƌŽĚƵĐĞŵƵůƚŝƉůĞĨŽůĚĐŽǀĞƌĂŐĞŽĨĂŐĞŶŽŵĞƵƐŝŶŐƚŚĞ ůĂƚĞƐƚ ƐĞƋƵĞŶĐŝŶŐ ƚĞĐŚŶŽůŽŐLJ͕ Ɛƚŝůů ŵĂŶLJ ĂůŐŽƌŝƚŚŵŝĐ ĂŶĚ ĐŽŵƉƵƚĂƚŝŽŶĂů ĐŚĂůůĞŶŐĞƐ ŝŶ ŽďƚĂŝŶŝŶŐĂĨƵůůLJĂƐƐĞŵďůĞĚŚŝŐŚƋƵĂůŝƚLJĚƌĂĨƚƌĞŵĂŝŶ;^ĂŶƚĞĞƚĂů͕͘ϮϬϬϭ͕DŝůůĞƌĞƚĂů͕͘ϮϬϭϬͿ͘ tĞǁŝůůŚĞƌĞĨŝƌƐƚƌĞǀŝĞǁƚŚĞŵŽƐƚĐŽŵŵŽŶt'^ĂƉƉƌŽĂĐŚĂŶĚƚŚĞŶĚĞƐĐƌŝďĞĂǀĂƌŝĂŶƚ;ZZ>Ϳ ƚŚĂƚ ǁŝůů ďĞ ĞdžƉůŽƌĞĚ ŝŶ ŵŽƌĞ ĚĞƚĂŝů͘ dŚĞ Ăŝŵ ŽĨ ƚŚŝƐ ƐƚƵĚLJ ǁĂƐ ƚŽ ĐŽŵƉĂƌĞ ƚŚĞ ƚǁŽ ǁŝƚŚ ƌĞƐƉĞĐƚ ƚŽ ĐŽŶƚŝŐƵŝƚLJ͕ ĂĐĐƵƌĂĐLJ͕ ƌĞĐŽǀĞƌLJ ĂŶĚ ŝŶĐůƵĚĞ ƐŽŵĞ ĞdžƉĞƌŝŵĞŶƚĂů ƚĞƐƚƐ ŽŶ ůŝďƌĂƌLJ ƉƌĞƉĂƌĂƚŝŽŶŝŶƚŚĞůĂďŽƌĂƚŽƌLJ͘. 9.

(122) 2.1 Whole genome shotgun assembly ^ĞƋƵĞŶĐŝŶŐŽƌƌĞĂĚŝŶŐƚŚĞĞŶƚŝƌĞŐĞŶŽŵĞŝƐůŝŵŝƚĞĚďLJƚŚĞŵĂdžŝŵƵŵůĞŶŐƚŚŽĨΖĐŽŶƚŝŶƵŽƵƐΖ EƚŚĂƚĐĂŶďĞƐĞƋƵĞŶĐĞĚ͘^ŵĂůůĞƌŐĞŶŽŵĞƐĐŽƵůĚďĞƐĞƋƵĞŶĐĞĚĂƐƐŵĂůůĞƌƉĂƌƚƐĂŶĚƉƵƚ ƚŽŐĞƚŚĞƌ ŝŶƚŽ Ă ŐĞŶĞƚŝĐ ŵĂƉ ďĂƐĞĚ ŽŶ ƚŚĞ ŽǀĞƌůĂƉ ƐŚĂƌĞĚ ďLJ ƚŚĞ ƐĞƋƵĞŶĐĞĚ ƌĞŐŝŽŶƐ͘ &Žƌ ĞdžĂŵƉůĞ͕ ƚŚĞ ͞ƉƐƚĞŝŶͲĂƌƌ sŝƌƵƐ 'ĞŶŽŵĞ͟ ǁĂƐ ĐŽŵƉŝůĞĚ ƚŽŐĞƚŚĞƌ ƵƐŝŶŐ ƉƌĞǀŝŽƵƐůLJ ƉƵďůŝƐŚĞĚĨĞĂƚƵƌĞƐƚŚĂƚŚĂĚďĞĞŶĂŶŶŽƚĂƚĞĚ;ĂĞƌĞƚĂů͕͘ϭϵϴϰͿ͘,ŽǁĞǀĞƌ͕ƐĞƋƵĞŶĐŝŶŐůĂƌŐĞƌ ŐĞŶŽŵĞƐŝƐŵŽƌĞĐŚĂůůĞŶŐŝŶŐ͘dŚĞĨŝƌƐƚůĂƌŐĞƌŐĞŶŽŵĞǁŚĞƌĞƚŚŝƐĐŽƵůĚďĞĂĐĐŽŵƉůŝƐŚĞĚǁĂƐ ƚŚĞŽŶĞŽĨƚŚĞďĂĐƚĞƌŝĂ,ĂĞŵŽƉŚŝůƵƐŝŶĨůƵĞŶnjĂĞ͘  dŚĞ ŐĞŶŽŵĞ ŽĨ ƚŚĞ ďĂĐƚĞƌŝƵŵ ,ĂĞŵŽƉŚŝůƵƐ ŝŶĨůƵĞŶnjĂĞ ǁĂƐ ƐĞƋƵĞŶĐĞĚ ĂŶĚ ĂƐƐĞŵďůĞĚ ďLJ ƌĂŶĚŽŵƐĞƋƵĞŶĐŝŶŐĨŽůůŽǁĞĚďLJĂƐƐĞŵďůLJƚŽŽďƚĂŝŶƚŚĞĨŝƌƐƚĐŽŵƉůĞƚĞŐĞŶŽŵĞƐĞƋƵĞŶĐĞŽĨ Ă ĨƌĞĞ ůŝǀŝŶŐ ŽƌŐĂŶŝƐŵ ;&ůĞŝƐĐŚŵĂŶŶ Ğƚ Ăů͕͘ ϭϵϵϱͿ͘  dŚŝƐ ƐƚƌĂƚĞŐLJ ŽĨ ƐĞƋƵĞŶĐŝŶŐ ǁŚŽůĞ ŐĞŶŽŵĞƐ ďLJ ƌĂŶĚŽŵůLJ ĨƌĂŐŵĞŶƚŝŶŐ   E ŝŶƚŽ ƐŵĂůůĞƌ ĨƌĂŐŵĞŶƚƐ ǁŚŝĐŚ ĂƌĞ ƐĞƋƵĞŶĐĞĚ ƐĞƉĂƌĂƚĞůLJ ŝƐ ŬŶŽǁŶ ĂƐ ƐŚŽƚŐƵŶ ƐĞƋƵĞŶĐŝŶŐ͘ /Ŷ ƚŚĞ ǁŚŽůĞ ŐĞŶŽŵĞ ƐŚŽƚŐƵŶ ĂƐƐĞŵďůLJ ƐƚƌĂƚĞŐLJ;&ŝŐƵƌĞϭͿ͕ŐĞŶŽŵŝĐEŝƐƌĂŶĚŽŵůLJďƌŽŬĞŶŝŶƚŽŶƵŵĞƌŽƵƐƐŵĂůůĞƌƉŝĞĐĞƐ͘dŚĞƐĞ ƐŵĂůůĞƌƉŝĞĐĞƐĂƌĞƐĞƋƵĞŶĐĞĚƚŽŽďƚĂŝŶƌĞĂĚƐ͘>ĂƌŐĞŶƵŵďĞƌƐŽĨƐƵĐŚƌĞĂĚƐĂƌĞŐĞŶĞƌĂƚĞĚƐŽ ĂƐ ƚŽ ŽďƚĂŝŶ ŽǀĞƌůĂƉƉŝŶŐ ƌĞĂĚƐ͘ ĂƐĞĚŽŶƚŚĞ ŽǀĞƌůĂƉ ƐŚĂƌĞĚ ďĞƚǁĞĞŶƚŚĞ ĚŝĨĨĞƌĞŶƚ ƌĞĂĚƐ͕ ĐŽŵƉƵƚĞƌ ƉƌŽŐƌĂŵƐ ŬŶŽǁŶ ĂƐ ΖĂƐƐĞŵďůĞƌƐΖ ďƵŝůĚ ƚŚĞƐĞ ƌĞĂĚƐ ŝŶƚŽ Ă ĐŽŶƚŝŶƵŽƵƐ ƐĞƋƵĞŶĐĞ ŬŶŽǁŶĂƐĂΖĐŽŶƚŝŐΖ͘  &ŝŶĚŝŶŐ ƚŚĞ ĐŽƌƌĞĐƚ ŽǀĞƌůĂƉ ďĞƚǁĞĞŶ ŵŝůůŝŽŶƐ ŽĨ ƐŚŽƌƚ ƐĞƋƵĞŶĐĞ ƌĞĂĚƐ ŝƐ Ă ĚŝĨĨŝĐƵůƚ ĂůŐŽƌŝƚŚŵŝĐ ƚĂƐŬ ƚŚĂƚ ŚĂƐ ƐŽ ĨĂƌ ŶŽƚ ĨŽƵŶĚ ĂŶ ĂŶĂůLJƚŝĐĂů ƐŽůƵƚŝŽŶ͘ ,ĞƵƌŝƐƚŝĐ ŵĞƚŚŽĚƐ ǁŚŝĐŚ. 10.

(123) ƵƚŝůŝƐĞ ĞdžƉĞƌŝĞŶĐĞ ďĂƐĞĚ ŵĞƚŚŽĚƐ ĨŽƌ ƐƉĞĞĚŝŶŐ ƵƉ ƐŽůǀŝŶŐ ŽĨ ƉƌŽďůĞŵƐ ĂƌĞ ĐŽŶƚŝŶƵŽƵƐůLJ ďĞŝŶŐ ĚĞǀĞůŽƉĞĚ ;ĂŶŝĞů Ğƚ Ăů͕͘ ϮϬϬϰ͕ ^ĐŚĞŝďLJĞͲůƐŝŶŐ Ğƚ Ăů͕͘ ϮϬϬϵ͕ DŝůůĞƌ Ğƚ Ăů͕͘ ϮϬϭϬͿ ƚŽ ŝŵƉƌŽǀĞƚŚĞƋƵĂůŝƚLJŽĨƚŚĞĂƐƐĞŵďůŝĞƐ͘dŚĞƉƌŽďůĞŵŝƐĞdžĂĐĞƌďĂƚĞĚďLJĞƌƌŽƌƐŝŶƚŚĞƉƌŽĐĞƐƐ ŽĨ ƌĞĂĚŝŶŐ ƚŚĞ ŐĞŶŽŵŝĐ ƐĞƋƵĞŶĐĞ ĂƐ Ă ƌĞƐƵůƚ ŽĨ ůŝŵŝƚĂƚŝŽŶƐ ŝŶ ƚŚĞ ƐĞƋƵĞŶĐŝŶŐ ƚĞĐŚŶŽůŽŐLJ͕ ƉŽůLJŵŽƌƉŚŝƐŵ ŝŶ ŐĞŶŽŵŝĐ E ŽĨ ĚŝƉůŽŝĚ ŽƌŐĂŶŝƐŵƐ͖ ƌĞƉĞĂƚ ƌĞŐŝŽŶƐ ǁŚŝĐŚ ĐĂŶŶŽƚ ďĞ ĚŝƐƚŝŶŐƵŝƐŚĞĚŽƌƉůĂĐĞĚŝŶƚŚĞĐŽƌƌĞĐƚůŽĐĂƚŝŽŶŵĂŬĞƚŚĞƉƌŽĐĞƐƐŽĨŐĞŶŽŵĞƐĞƋƵĞŶĐŝŶŐĂŶĚ ĂƐƐĞŵďůLJŵŽƌĞĐŽŵƉůŝĐĂƚĞĚŝŶƉƌĂĐƚŝĐĞ͘ . Figure 1: Flowchart of shotgun Genome sequencing . 11.

(124) ǁĂLJƚŽŝŵƉƌŽǀĞƚŚĞĂƐƐĞŵďůLJĂĐƌŽƐƐƌĞƉĞĂƚƌĞŐŝŽŶƐǁŚŝĐŚĐŽŶƐƚŝƚƵƚĞĂƉĂƌƚŝĐƵůĂƌĐŚĂůůĞŶŐĞ ;ĂŶŝĞů Z Ğƚ Ăů͕͘ ϮϬϭϬͿ ĂŶĚ ƚŽ ũŽŝŶ ĐŽŶƚŝŐƐ ǁŝƚŚ ŐĂƉƐ ŝŶ ďĞƚǁĞĞŶ ŝƐ ƚŚĞ ƵƐĞ ŽĨ ĨƌĂŐŵĞŶƚƐ ƐĞƋƵĞŶĐĞĚ ĨƌŽŵ ďŽƚŚ ĞŶĚƐ ǁŝƚŚ ŐĂƉƐ ŽĨ Ă ŬŶŽǁŶ ƐŝnjĞ ŝŶ ďĞƚǁĞĞŶ͘ dŚĞƐĞ ƐƉĞĐŝĂů ƚLJƉĞƐ ŽĨ ƌĞĂĚƐ ĂƌĞ ŐĞŶĞƌĂƚĞĚ ďLJ ƌĂŶĚŽŵůLJ ĨƌĂŐŵĞŶƚŝŶŐ E͕ ƐŝnjĞͲƐĞůĞĐƚŝŶŐ ĂŶĚ ƐĞƋƵĞŶĐŝŶŐ ŝƚ ĨƌŽŵ ďŽƚŚ ĞŶĚƐ ĂŶĚ ĂƌĞ ƵƐƵĂůůLJ ƌĞĨĞƌƌĞĚ ƚŽ ĂƐ ͚ŵĂƚĞ ƉĂŝƌƐΖ Žƌ ũƵŵƉ ůŝďƌĂƌŝĞƐ ;WŽƉ Ğƚ Ăů͕͘ ϮϬϬϰ͕ ^ĂŶƚĞĞƚĂů͕͘ϮϬϭϬͿ͘ΖDĂƚĞƉĂŝƌƐΖƉƌŽǀŝĚĞĂĚĚŝƚŝŽŶĂůŝŶĨŽƌŵĂƚŝŽŶĂďŽƵƚƚŚĞƌĞůĂƚŝǀĞƉŽƐŝƚŝŽŶŽĨ ƚŚĞƉĂŝƌŽĨƌĞĂĚƐǁŚŝĐŚĐĂŶďĞǀĞƌLJƵƐĞĨƵůŝŶĨŽƌŵĂƚŝŽŶŝŶƌĞƐŽůǀŝŶŐƌĞƉĞĂƚƌĞŐŝŽŶƐ͘ŝĨĨĞƌĞŶƚ ƐŝnjĞƐŽĨEĨƌĂŐŵĞŶƚƐ;ŐĞŶĞƌĂůůLJϮ͕ϱ͕ϭϬ͕ϱϬ͕ϭϬϬĂŶĚϭϱϬŬďͿĂƌĞƐĞůĞĐƚĞĚĂŶĚƐĞƋƵĞŶĐĞĚ ĨƌŽŵďŽƚŚĞŶĚƐ͘dŚĞƐĞΖŵĂƚĞƉĂŝƌƐΖŽĨĚŝĨĨĞƌĞŶƚƐŝnjĞƐĂƌĞƵƐĞĚďLJƚŚĞΖĂƐƐĞŵďůĞƌΖƚŽƉŽƐŝƚŝŽŶ ƚŚĞƌĞĂĚƐďĂƐĞĚŽŶĞƐƚŝŵĂƚĞƐŽĨƚŚĞĚŝƐƚĂŶĐĞďĞƚǁĞĞŶƚŚĞƌĞĂĚƉĂŝƌ͘hƐŝŶŐƚŚĞŝŶĨŽƌŵĂƚŝŽŶ ĂǀĂŝůĂďůĞ ŝŶ ƚŚĞ ƌĞĂĚƐ ĂŶĚ ΖŵĂƚĞ ƉĂŝƌƐΖ͕ ƚŚĞ ĐŽŵƉƵƚĞƌ ƉƌŽŐƌĂŵ ǁŝůů ŽƌŐĂŶŝnjĞ ΖĐŽŶƚŝŐƐΖ ŝŶƚŽ ΖƐĐĂĨĨŽůĚƐΖďĂƐĞĚŽŶƚŚĞƌĞůĂƚŝǀĞƉŽƐŝƚŝŽŶŽĨƚŚĞĐŽŶƚŝŐƐďĂƐĞĚŽŶĐŽŶŶĞĐƚŝŽŶƐďĞƚǁĞĞŶŵĂƚĞ ƉĂŝƌƐ͘dŚĞŐĂƉƐƉƌĞƐĞŶƚŝŶƚŚĞƐĐĂĨĨŽůĚƐĂƌĞĨŝůůĞĚďLJĂƉƌŽĐĞƐƐŬŶŽǁŶĂƐŐĞŶŽŵĞĨŝŶŝƐŚŝŶŐ͘ 'ĂƉƐ ƐŵĂůůĞƌ ƚŚĂŶ ϭϬŬď ĂƌĞ ĐŽǀĞƌĞĚ ďLJ WZ ĂŵƉůŝĨŝĐĂƚŝŽŶ ĂŶĚ ƐĞƋƵĞŶĐŝŶŐ ŽĨ ƚŚĞ ƌĞŐŝŽŶ͘ 'ĂƉƐ ďŝŐŐĞƌ ƚŚĂŶ ϭϬŬď ĂƌĞ ŐĞŶĞƌĂůůLJ ĨŝůůĞĚ ƵƐŝŶŐ ŽƚŚĞƌ ƐƚƌĂƚĞŐŝĞƐ ƐƵĐŚ ĂƐ  ĐůŽŶĞ ƐĞƋƵĞŶĐŝŶŐ͘  ĐůŽŶĞƐ ƚĂƌŐĞƚ Ă ƐƉĞĐŝĨŝĐ ƌĞĚƵĐĞĚ ƌĞŐŝŽŶ ŽĨ ƚŚĞ ŐĞŶŽŵĞ ƚŚĂƚ ĐĂŶ ďĞ ĂƐƐĞŵďůĞĚǁŝƚŚŽƵƚŚĂǀŝŶŐƚŽƌĞƐŽůǀĞƌĞĂĚƐĨƌŽŵŽƚŚĞƌƌĞŐŝŽŶƐŽĨƚŚĞŐĞŶŽŵĞ͘  2.1.1 Traditional sanger sequencing dŚĞ ĐŚĂŝŶ ƚĞƌŵŝŶĂƚŝŽŶ ŵĞƚŚŽĚ ;^ĂŶŐĞƌ Ğƚ Ăů͕͘ ϭϵϳϱͿ ŽĨ E ƐĞƋƵĞŶĐŝŶŐ ŚĂƐ ďĞĞŶ ǁŝĚĞůLJ ƵƐĞĚĚƵĞƚŽŝƚƐƌĞůŝĂďŝůŝƚLJĂŶĚƌĞůĂƚŝǀĞĞĂƐĞŽĨƵƐĞ͘/ŶƚŚĞĐŚĂŝŶƚĞƌŵŝŶĂƚŝŽŶŵĞƚŚŽĚƐŝŶŐůĞ ƐƚƌĂŶĚĞĚEƐĂŵƉůĞŝƐĞůŽŶŐĂƚĞĚŝŶĂEƌĞƉůŝĐĂƚŝŽŶƌĞĂĐƚŝŽŶƵƐŝŶŐĂEƉƌŝŵĞƌ͕E ƉŽůLJŵĞƌĂƐĞ͕ĚĞŽdžLJŶƵĐůĞŽƚŝĚĞƉŚŽƐƉŚĂƚĞƐ;ĚEdWƐͿĂŶĚŵŽĚŝĨŝĞĚĚĞŽdžLJŶƵĐůĞŽƚŝĚĞƉŚŽƐƉŚĂƚĞƐ. 12.

(125) ;ĚEdWƐͿǁŚŝĐŚůĂĐŬĂϯΖK,ŐƌŽƵƉ;ƌĞƋƵŝƌĞĚĨŽƌĞƐƚĂďůŝƐŚŝŶŐƉŚŽƐƉŚŽĚŝĞƐƚĞƌďŽŶĚƐďĞƚǁĞĞŶ ĂĚũĂĐĞŶƚŶƵĐůĞŽƚŝĚĞƐͿ͘EĞůŽŶŐĂƚŝŽŶŝƐŝŶƚĞƌƌƵƉƚĞĚǁŚĞŶĂŵŽĚŝĨŝĞĚĚEdWŝƐŝŶĐŽƌƉŽƌĂƚĞĚ͘ ĂĐŚ ŽĨ ƚŚĞ ŵĂŶLJ ŚƵŶĚƌĞĚ ĚŝĨĨĞƌĞŶƚ ƌĞĂĐƚŝŽŶƐ ŝƐ ŝŶƚĞƌƌƵƉƚĞĚ Ăƚ Ă ĚŝĨĨĞƌĞŶƚ ďĂƐĞ ĂůŽŶŐ ƚŚĞ E ƐĞƋƵĞŶĐĞ͘ ZĂĚŝŽĂĐƚŝǀĞůLJ Žƌ ĨůƵŽƌĞƐĐĞŶƚůLJ ůĂďĞůůĞĚ ŵŽĚŝĨŝĞĚ ĚEdWƐ ĐĂŶ ďĞ ǀŝƐƵĂůŝnjĞĚ ĂĨƚĞƌĚĞŶĂƚƵƌŝŶŐŝŶƚŽƐŝŶŐůĞƐƚƌĂŶĚĞĚEĨŽůůŽǁĞĚďLJƐŝnjĞƐĞƉĂƌĂƚŝŽŶ͘sĂƌŝŽƵƐĚEdWΖƐ͕d͕ ' ĂŶĚ  ŝƐ ůĂďĞůůĞĚ ǁŝƚŚ Ă ĚŝĨĨĞƌĞŶƚ ĐŽůŽƌĞĚ ĚLJĞ ƚŽ ďĞ ĂďůĞ ƚŽ ŝĚĞŶƚŝĨLJ ďĂƐĞƐ ƉƌĞƐĞŶƚ Ăƚ ĚŝĨĨĞƌĞŶƚƉŽƐŝƚŝŽŶƐĂůŽŶŐƚŚĞƐĞƋƵĞŶĐĞ͘  ZĞĂĚƐ ;ĐŽŶƚŝŶƵŽƵƐ ƐĞƋƵĞŶĐĞ ŽĨ EͿ ŽĨ ĂďŽƵƚ ϭŬď ĐĂŶ ďĞ ƐĞƋƵĞŶĐĞĚ ǁŝƚŚ ƚŚĞ ƚƌĂĚŝƚŝŽŶĂů ^ĂŶŐĞƌƐĞƋƵĞŶĐŝŶŐŵĞƚŚŽĚŽůŽŐLJ͘ĂƚĂŽďƚĂŝŶĞĚĨƌŽŵƚŚŝƐŵĞƚŚŽĚŝƐŽĨƉŽŽƌƋƵĂůŝƚLJĂƚďŽƚŚ ĞŶĚƐŽĨƚŚĞƐĞƋƵĞŶĐĞĚƌĞĂĚ͘,ĞŶĐĞ͕ƋƵĂůŝƚLJǀĂůƵĞƐĂƌĞŐĞŶĞƌĂƚĞĚĨŽƌƚŚĞƐĞƋƵĞŶĐĞĚƌĞĂĚƐƚŽ ĂŝĚ ƚŚĞ ĂƐƐĞŵďůLJ ƉƌŽĐĞƐƐ͘ ĞƉĞŶĚŝŶŐ ŽŶ ƚŚĞ ƐŽĨƚǁĂƌĞ ƵƐĞĚ ĨŽƌ ƉŽƐƚ ƉƌŽĐĞƐƐŝŶŐ ŽĨ ƌĞĂĚƐ ďĂƐĞĚ ŽŶ ƚŚĞƐĞ ƋƵĂůŝƚLJ ƐĐŽƌĞƐ͕ ĞƌƌŽƌ ƌĂƚĞƐ ŽĨ Ϭ͘ϬϬϭ ƚŽ ŵŽƌĞ ƚŚĂŶ ϭй ŚĂǀĞ ďĞĞŶ ƌĞƉŽƌƚĞĚ ;,ŽĨĨϮϬϬϵͿ͘  2.1.2 NGS: Next-generation sequencing ^ƵďƐĞƋƵĞŶƚ ƚŽ ƐĞƋƵĞŶĐŝŶŐ ŽĨ ƚŚĞ ,ƵŵĂŶ ŐĞŶŽŵĞ͕ ƚŚĞ ŶĞĞĚ ĨŽƌ Ă ĐŽƐƚ ĞĨĨĞĐƚŝǀĞ ĂŶĚ ĨĂƐƚ ŵĞƚŚŽĚŽĨƐĞƋƵĞŶĐŝŶŐŚĂƐůĞĚƚŽƚŚĞĚĞǀĞůŽƉŵĞŶƚŽĨǀĂƌŝŽƵƐŶĞǁƐĞƋƵĞŶĐŝŶŐƚĞĐŚŶŽůŽŐŝĞƐ͘ dŚĞƐĞŶĞǁĞƌŵĞƚŚŽĚƐĂƌĞĐŽůůĞĐƚŝǀĞůLJŬŶŽǁŶĂƐŶĞdžƚͲŐĞŶĞƌĂƚŝŽŶƐĞƋƵĞŶĐŝŶŐ;E'^Ϳ;DŝĐŚĂĞů >͘ D ϮϬϭϬͿ ĂŶĚ ƉƌŽǀŝĚĞ Ă ĐŽƐƚ ĞĨĨĞĐƚŝǀĞ ĂůƚĞƌŶĂƚŝǀĞ ƚŽ ƚƌĂĚŝƚŝŽŶĂů ^ĂŶŐĞƌ ƐĞƋƵĞŶĐŝŶŐ͘ E'^ ŐĞŶĞƌĂƚĞƐ ƐŚŽƌƚĞƌ ƌĞĂĚƐ ŽĨ Ϯϱ ďĂƐĞƉĂŝƌ ƵƉƚŽ ϰϬϬ ďĂƐĞ ƉĂŝƌƐ ŝŶ ůĞŶŐƚŚ ;WĂƵů &ůŝĐĞƌŬ Ğƚ Ăů͕͘ ϮϬϬϵͿĚĞƉĞŶĚŝŶŐŽŶƚŚĞƐĞƋƵĞŶĐŝŶŐƚĞĐŚŶŽůŽŐLJƵƐĞĚ͘. 13.

(126)  Sequencing Technology. Cost per Mb of DNA sequence. Cost per human sized Genome. Cost estimated on below Date. Reference. First generation $397.09 platforms (Sanger and other capillary based methods). $7,147,571. October-2007. Wetterstrand KA (Reference 41 ). Second-generation $102.13 platforms (454, SOLID, Illumina, etc). $3,063,820. January-2008. Wetterstrand KA (Reference 41 ). Second-generation $0.23 platforms (454, SOLID, Illumina, etc). $20,963. January-2011. Wetterstrand KA (Reference 41 ). Table 1: Cost comparison of DNA sequencing  ƵĞ ƚŽ ƚŚĞ ƐŝŐŶŝĨŝĐĂŶƚ ĐŽƐƚ ďĞŶĞĨŝƚƐ ;ƚĂďůĞ ϭͿ ĂƐƐŽĐŝĂƚĞĚ ǁŝƚŚ E'^ ƚĞĐŚŶŽůŽŐŝĞƐ͕ ƚŚĞLJ ŚĂǀĞ ďĞĐŽŵĞ Ă ƉŽƉƵůĂƌ ƚŽŽů ĨŽƌ ŶŽƚ ŽŶůLJ ŐĞŶŽŵĞ ƌĞͲƐĞƋƵĞŶĐŝŶŐ ďƵƚ ĂůƐŽ ĚĞ ŶŽǀŽ ŐĞŶŽŵĞ ĂƐƐĞŵďůLJ͘ /ƚƐ ƉŽƉƵůĂƌŝƚLJ ǁŝůů ĂƌŐƵĂďůLJ ĞǀĞŶ ŝŶĐƌĞĂƐĞ͕ ƐŝŶĐĞ ƐĞƋƵĞŶĐŝŶŐ ĐŽƐƚƐ ŚĂǀĞ ďĞĞŶ ĚĞĐƌĞĂƐŝŶŐĚƌĂƐƚŝĐĂůůLJƐŝŶĐĞƚŚĞŝŶƚƌŽĚƵĐƚŝŽŶŽĨƚŚĞƐĞĐŽŶĚŐĞŶĞƌĂƚŝŽŶƐĞƋƵĞŶĐŝŶŐŵĞƚŚŽĚƐ ĂŶĚĂƌĞĞdžƉĞĐƚĞĚƚŽĐŽŶƚŝŶƵĞƚŽĚŽƐŽĨŽƌĂǁŚŝůĞ͘  2.1.3 Genome assembly and its challenges. ĚǀĂŶĐĞƐ ŝŶ ƐĞƋƵĞŶĐŝŶŐ ƚĞĐŚŶŽůŽŐLJ ŚĂǀĞ ďĞĞŶ ƌĞĨůĞĐƚĞĚ ŝŶ ƚŚĞ ŵĞƚŚŽĚƐ ĨŽƌ ĂƐƐĞŵďůŝŶŐ ŐĞŶŽŵĞƐ͘  ĞĨŽƌĞ ƚŚĞ ĂĚǀĞŶƚ ŽĨ E'^ ƚĞĐŚŶŽůŽŐŝĞƐ ĂƐƐĞŵďůLJ ƉƌŽŐƌĂŵƐ ƌĞůŝĞĚ ŽŶ ͞ŽǀĞƌůĂƉͲ ůĂLJŽƵƚͲĐŽŶƐĞŶƐƵƐ͟ŵĞƚŚŽĚƐĨŽƌŐĞŶŽŵĞĂƐƐĞŵďůLJ͘>ŽŶŐƌĞĂĚůĞŶŐƚŚƐĂŶĚĂĞƌƌŽƌƌĂƚĞŽĨϭй ŵĂĚĞ ŝƚ ƉŽƐƐŝďůĞ ƚŽ ĂƐƐĞŵďůĞ ŐĞŶŽŵĞƐ ďĂƐĞĚ ŽŶ ŽǀĞƌůĂƉ ŐƌĂƉŚƐ ;ĞŶŝƐŽǀ Ğƚ Ăů͕͘ ϮϬϬϴͿ͘. 14.

(127) ^ŚŽƌƚĞƌƌĞĂĚůĞŶŐƚŚƐ͕ĞdžƉŽŶĞŶƚŝĂůŝŶĐƌĞĂƐĞŝŶƚŚĞŶƵŵďĞƌŽĨƌĞĂĚƐĂŶĚŚŝŐŚĞƌĞƌƌŽƌƌĂƚĞƐĂƌĞ ŶŽƚƐƵŝƚĂďůĞĨŽƌƚŚĞƐĞĂůŐŽƌŝƚŚŵƐ͘,ĞŶĐĞ͕ǀĂƌŝŽƵƐĂƐƐĞŵďůLJƉƌŽŐƌĂŵƐ;ĂŶŝĞůZĞƚĂů͕͘ϮϬϭϬͿ ďĂƐĞĚŽŶĚĞƌƵŝũŶ'ƌĂƉŚƐŚĂǀĞďĞĞŶĐƌĞĂƚĞĚ͘ĚĞƌƵŝũŶŐƌĂƉŚƐŝƐĂƚLJƉĞŽĨĚŝƌĞĐƚĞĚŐƌĂƉŚ ƵƐĞĚƚŽƌĞƉƌĞƐĞŶƚŽǀĞƌůĂƉƐďĞƚǁĞĞŶƐĞƋƵĞŶĐĞƐŽĨƐLJŵďŽůƐ͘  dŚĞƐĞ ĂƉƉƌŽĂĐŚĞƐ ƚƌĞĂƚ ƚŚĞ ĚĂƚĂ ĂƐ ǁŽƌĚƐ ŽĨ Ŭ ŶƵŵďĞƌ ŽĨ ŶƵĐůĞŽƚŝĚĞ ďĂƐĞƐ Žƌ ŬͲŵĞƌƐ ŝŶƐƚĞĂĚŽĨƌĞĂĚƐ͘,ŝŐŚĞƌƌĞĚƵŶĚĂŶĐLJŐĞŶĞƌĂƚĞĚďLJE'^ŵĞƚŚŽĚƐŝƐŚĂŶĚůĞĚďLJƚŚŝƐŵĞƚŚŽĚ ĂƐƌĞĂĚƐĂƌĞďƌŽŬĞŶŝŶƚŽΖŬͲŵĞƌƐ͛ƚŚĂƚŵĂŬĞƵƉĞĂĐŚƌĞĂĚ͘dŚĞĚĞƌƵŝũŶŐƌĂƉŚŝƐĐŽŶƐƚƌƵĐƚĞĚ ďLJƌĞƉƌĞƐĞŶƚŝŶŐĂƐĞƌŝĞƐŽĨŽǀĞƌůĂƉƉŝŶŐŬͲŵĞƌƐĂƐĂŶŽĚĞŝŶƚŚĞŐƌĂƉŚ͘ĨƚĞƌĐŽŶƐƚƌƵĐƚŝŽŶŽĨ ƚŚĞ ŐƌĂƉŚ ĂŶĚ ΖŚĂƐŚŝŶŐΖ ƚŚĞ ƌĞĂĚƐ ďĂƐĞĚ ŽŶ ŬͲŵĞƌΖƐ ƚŚĂƚ ŵĂŬĞ ƵƉ ƚŚĞ ƌĞĂĚ͕ ƚŚĞ ŐƌĂƉŚ ŝƐ ƐŝŵƉůŝĨŝĞĚ ǁŝƚŚŽƵƚ ůŽƐŝŶŐ ĚĂƚĂ ƵƐŝŶŐ ƐƚƌŝŶŐ ŐƌĂƉŚ ďĂƐĞĚ ŵĞƚŚŽĚƐ͘  ƐƚƌŝŶŐ ŐƌĂƉŚ ŝƐ ĂŶ ŝŶƚĞƌƐĞĐƚŝŽŶŐƌĂƉŚŽĨĐƵƌǀĞƐŝŶƚŚĞƉůĂŶĞ͕ǁŚĞƌĞĞĂĐŚĐƵƌǀĞŝƐĐĂůůĞĚĂƐƚƌŝŶŐ͘ZĞƉƌĞƐĞŶƚŝŶŐ ƚŚĞ ŝŶƚĞƌƐĞĐƚŝŽŶ ŽĨ ƚŚĞ ĚŝĨĨĞƌĞŶƚ ŬͲŵĞƌ͛Ɛ ĂůůŽǁƐ ƚŚĞ ƵƐĞ ŽĨ ƐƚƌŝŶŐ ŐƌĂƉŚ ƐŝŵƉůŝĨŝĐĂƚŝŽŶ ŵĞƚŚŽĚƐ͘ ^ŝŵƉůŝĨŝĐĂƚŝŽŶ ŽĨ ƚŚĞ ŐƌĂƉŚ ĞdžƉĞĚŝƚĞƐ ƚŚĞ ƐƵďƐĞƋƵĞŶƚ ĞƌƌŽƌ ƌĞŵŽǀĂů͕ ƌĞƉĞĂƚ ƌĞƐŽůƵƚŝŽŶ ĂŶĚ ƐĐĂĨĨŽůĚŝŶŐ ƐƚĞƉƐ͘  /ŵƉƌŽǀĞŵĞŶƚƐ ŝŶ ĞƌƌŽƌ ŚĂŶĚůŝŶŐ ĂŶĚ ƌĞƉĞĂƚ ƌĞƐŽůƵƚŝŽŶ ĂůŐŽƌŝƚŚŵƐ ŚĂǀĞ ŵĂĚĞ ŝƚ ƉŽƐƐŝďůĞ ƚŽ ĂƐƐĞŵďůĞ ǁŚŽůĞ ŐĞŶŽŵĞƐ ƵƐŝŶŐ ũƵƐƚ ƐŚŽƌƚͲƌĞĂĚ ƐĞƋƵĞŶĐŝŶŐ;ZƵŝƋŝĂŶŐ>ŝĞƚĂů͕͘ϮϬϭϬͿ͘dŚĞŽǀĞƌůĂƉĐŽŶƐĞŶƐƵƐďĂƐĞĚĂƉƉƌŽĂĐŚ͕ŽŶƚŚĞŽƚŚĞƌ ŚĂŶĚ͕ ƌĞůŝĞƐ ĞŶƚŝƌĞůLJ ŽŶ ĨŝŶĚŝŶŐ ŽǀĞƌůĂƉƐ ďĞƚǁĞĞŶ ƌĞĂĚƐ ĨŽƌ ƉĞƌĨŽƌŵŝŶŐ ƚŚĞ ĂƐƐĞŵďůLJ͘ WƌĞƐĞŶĐĞ ŽĨ ƌĞĚƵŶĚĂŶƚ ŝŶĨŽƌŵĂƚŝŽŶ ŵĂŬĞƐ ƚŚĞ ŽǀĞƌůĂƉ ďĂƐĞĚ ŵĞƚŚŽĚ ĐƵŵďĞƌƐŽŵĞ ĂƐ ŵĞŵŽƌLJƌĞƋƵŝƌĞŵĞŶƚƐĂƌĞǀĞƌLJŚŝŐŚ͘ . 15.

(128) ĞƐƉŝƚĞ ŽĨ ƚŚĞ ĚŝĨĨŝĐƵůƚŝĞƐ ĂƐƐŽĐŝĂƚĞĚ ǁŝƚŚ ƐŚŽƌƚ ƌĞĂĚ ĚĂƚĂ͕ ƐĞƋƵĞŶĐŝŶŐ ŽĨ 'ŝŐĂ ďĂƐĞ ƐŝnjĞĚ ŐĞŶŽŵĞƐ ƵƐŝŶŐ ƐŚŽƌƚͲƌĞĂĚ ƐĞƋƵĞŶĐŝŶŐ ŚĂƐ ďĞĞŶ ĚĞŵŽŶƐƚƌĂƚĞĚ ǁŝƚŚ ĐŽŶƐŝĚĞƌĂďůĞ ƐƵĐĐĞƐƐ ;ZƵŝƋŝĂŶŐ>ŝĞƚĂů͕͘ϮϬϭϬ͕^ĐŚĂƚnjĞƚĂů͕͘ϮϬϭϬͿ͘  dŚĞŚŝŐŚĞƌĞƌƌŽƌƌĂƚĞŝŶƚŚĞƐŚŽƌƚͲƌĞĂĚƐĞƋƵĞŶĐŝŶŐĂůŽŶŐǁŝƚŚƚŚĞƉŽůLJŵŽƌƉŚŝƐŵƉƌĞƐĞŶƚŝŶ ƚŚĞŐĞŶŽŵĞŝŵƉŽƐĞƐĂůŝŵŝƚŽŶƚŚĞŵĂdžŝŵƵŵĐŽŶƚŝŐƵŝƚLJƚŚĂƚĐĂŶďĞĂƐƐĞŵďůĞĚ͘/ƚŚĂƐďĞĞŶ ƐŚŽǁŶ ƚŚĂƚ ƚŚĞ ĐŽŶƚŝŐƵŝƚLJ ŵĞĂƐƵƌĞĚŝŶƚĞƌŵƐŽĨ EϱϬ ;ƚŚĞ ůĞŶŐƚŚ y ŽĨƚŚĞ ƐĞƋƵĞŶĐĞ ǁŚŝĐŚ ŚĂƐ ϱϬй ŽĨ Ăůů ďĂƐĞƐ ŝŶ ƐĞƋƵĞŶĐĞƐ ůŽŶŐĞƌ ƚŚĂŶ yͿ ŽĨ ƚŚĞ ĂƐƐĞŵďůĞĚ ŐĞŶŽŵĞ ƌĞĂĐŚĞƐ Ă ŵĂdžŝŵƵŵǀĂůƵĞĂƚĂĐŽǀĞƌĂŐĞŽĨĂƌŽƵŶĚϱϬyĂŶĚĂĐƚƵĂůůLJĚĞĐƌĞĂƐĞƐĂƚŚŝŐŚĞƌĐŽǀĞƌĂŐĞǁŝƚŚ ϭй ƐĞƋƵĞŶĐŝŶŐ ĞƌƌŽƌ ĂŶĚ ϭй ^EWƐ ŝŶ ƚŚĞ ƌĂǁ ĚĂƚĂ ƵƐĞĚ ĨŽƌ ĂƐƐĞŵďůLJ ;ĂŶŝĞů ϮϬϬϵͿ͘ ĞƉĞŶĚŝŶŐ ŽŶ ƚŚĞĚĂƚĂƐĞƚ ƵƐĞĚƚŚĞ ŵĂdžŝŵƵŵ ĐŽŶƚŝŐƵŝƚLJ ŝƐ ŽďƚĂŝŶĞĚ ďĞƚǁĞĞŶ ĐŽǀĞƌĂŐĞ ŽĨ ϰϬĂŶĚϲϬy͘  dƌĂĚŝƚŝŽŶĂůůLJ ƌĞƉĞĂƚ ƌĞŐŝŽŶƐ ŚĂǀĞ ďĞĞŶ ƌĞƐŽůǀĞĚ ďLJ ƐĐĂĨĨŽůĚŝŶŐ ƵƐŝŶŐ ŵĂƚĞ ƉĂŝƌƐ ŽĨ ǀĂƌŝŽƵƐ ŝŶƐĞƌƚ ƐŝnjĞ͘ ŝĨĨĞƌĞŶƚ ŝŶƐĞƌƚ ƐŝnjĞ ůŝďƌĂƌŝĞƐ ŚĂǀĞ ďĞĞŶ ƵƐĞĚ ǁŝƚŚ ǀĂƌLJŝŶŐ ĚĞŐƌĞĞƐ ŽĨ ƵƚŝůŝƚLJ͘ ŽŶƚŝŐƵŝƚLJ͕ ĐŽƐƚ ďĞŶĞĨŝƚƐ͕ ĂĐĐƵƌĂĐLJ͕ ŐĞŶŽŵĞ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ŽďƚĂŝŶĞĚ ďLJ ƚŚĞ ĚŝĨĨĞƌĞŶƚ ŵĞƚŚŽĚƐŚĂƐƐŚŽǁŶƚŚĂƚĞĂĐŚŵĞƚŚŽĚŚĂƐŝƚƐŽǁŶĂĚǀĂŶƚĂŐĞƐĂŶĚĚŝƐĂĚǀĂŶƚĂŐĞƐ;>ŝĂŶŐzĞ Ğƚ Ăů͕͘ ϮϬϭϭͿ͘  ZĞƉĞĂƚ ĐŽŶƚĞŶƚ ĂŶĚ ' ĐŽŶƚĞŶƚ ŽĨ ŐĞŶŽŵĞƐ ďĞŝŶŐ ƐĞƋƵĞŶĐĞĚ ĂůƐŽ ŚĂǀĞ Ă ĐŽŶƐŝĚĞƌĂďůĞŝŵƉĂĐƚŽŶůŝďƌĂƌLJƉƌĞƉĂƌĂƚŝŽŶ͕ƐĞƋƵĞŶĐŝŶŐĂŶĚĂƐƐĞŵďůLJŵĞƚŚŽĚƐ͘  ZĞĐĞŶƚůLJ͕ ŚŝŐŚ ƋƵĂůŝƚLJ ŐĞŶŽŵĞ ĂƐƐĞŵďůŝĞƐ ŽĨ ŵĂŵŵĂůŝĂŶ ŐĞŶŽŵĞƐ ǁĞƌĞ ĚŽŶĞ ĨƌŽŵ ŵĂƐƐŝǀĞůLJ ƉĂƌĂůůĞů ƐĞƋƵĞŶĐĞ ĚĂƚĂ ;^ĂŶƚĞ Ğƚ Ăů͕͘ ϮϬϭϬͿ ǁŝƚŚ Ă ƋƵĂůŝƚLJ ǀĞƌLJ ƐŝŵŝůĂƌ ƚŽ ƚŚĂƚ. 16.

(129) ŽďƚĂŝŶĞĚ ĨƌŽŵ ^ĂŶŐĞƌ ƐĞƋƵĞŶĐŝŶŐ͘  hƐŝŶŐ ƐƉĞĐŝĨŝĐ ƚLJƉĞƐ ŽĨ ůŝďƌĂƌŝĞƐ ĂŶĚ ŶĞǁ ĂƐƐĞŵďůLJ ƉƌŽŐƌĂŵƐ;>>Wd,^Ͳ>'ͿŚŝŐŚƋƵĂůŝƚLJĂƐƐĞŵďůŝĞƐŚĂǀĞďĞĞŶŽďƚĂŝŶĞĚ͘  2.2 Partitioning the genome to reduce the problem of assembly 2.2.1 BAC clones KŶĞǁĂLJƚŽƌĞĚƵĐĞƚŚĞƉƌŽďůĞŵŽĨĂƐƐĞŵďůLJŝƐƚŽƌĞĚƵĐĞƚŚĞĂƐƐĞŵďůLJƚĂƌŐĞƚƐŝnjĞ͘'ĞŶŽŵŝĐ ůŝďƌĂƌŝĞƐ ŚĂǀĞ ďĞĞŶ ĚŝǀŝĚĞĚ ŝŶƚŽ ƐŵĂůůĞƌ ƉĂƌƚƐ ƚŚƌŽƵŐŚ ƚŚĞ ƵƐĞ ŽĨ ĂĐƚĞƌŝĂů ĂƌƚŝĨŝĐŝĂů ĐŚƌŽŵŽƐŽŵĞƐ ;ΖƐͿ Žƌ zĞĂƐƚ ĂƌƚŝĨŝĐŝĂů ĐŚƌŽŵŽƐŽŵĞƐ ;zƐͿ͘ Ɛ ĂƌĞ ĐŽŶƐƚƌƵĐƚĞĚ ďLJ ƉĂƌƚŝĂůůLJĚŝŐĞƐƚŝŶŐĂŶĚĨƌĂŐŵĞŶƚŝŶŐŐĞŶŽŵŝĐEƚŽŽďƚĂŝŶůĂƌŐĞĐŚƵŶŬƐŽĨƚŚĞŐĞŶŽŵĞƚŚĂƚ ĂƌĞƚŚĞŶƉƌĞƐĞƌǀĞĚŝŶďĂĐƚĞƌŝĂůĐŽůŽŶŝĞƐ͘dƌĂĚŝƚŝŽŶĂůůLJůŝďƌĂƌŝĞƐŚĂǀĞďĞĞŶĐŽŶƐƚƌƵĐƚĞĚ ďĂƐĞĚŽŶƚŚĞ͞ĚŝǀŝĚĞĂŶĚĐŽŶƋƵĞƌ͟ƐƚƌĂƚĞŐLJƚŽŽďƚĂŝŶŚŝŐŚƋƵĂůŝƚLJŐĞŶŽŵĞĂƐƐĞŵďůŝĞƐƵƐŝŶŐ ^ĂŶŐĞƌƐĞƋƵĞŶĐŝŶŐ͘WĂƌƚŝƚŝŽŶŝŶŐƚŚĞŐĞŶŽŵĞŝŶƚŽƐŵĂůůĞƌĨƌĂŐŵĞŶƚƐƚŚĂƚĐĂŶďĞĂƐƐĞŵďůĞĚ ƐĞƉĂƌĂƚĞůLJƐŝŵƉůŝĨŝĞƐƚŚĞĂƐƐĞŵďůLJƉƌŽďůĞŵĂŶĚƉƌŽǀŝĚĞƐŵŽƌĞĐŽŶƚŝŐƵŽƵƐĂƐƐĞŵďůŝĞƐ͘/ƚŚĂƐ ďĞĞŶ ĨŽƵŶĚ ƚŚĂƚ ďŽƚŚ ůŽŶŐ ĂŶĚ ƐŚŽƌƚ ƌĂŶŐĞ ŵŝƐͲĂƐƐĞŵďůŝĞƐ ĐĂŶ ďĞ ƌĞĚƵĐĞĚ ŝŶ  ďĂƐĞĚ ĂƉƉƌŽĂĐŚĞƐĚƵĞƚŽƐĞƋƵĞŶĐŝŶŐŝŶĨŽƌŵĂƚŝŽŶďĞŝŶŐƌĞƐƚƌŝĐƚĞĚƚŽƚŚĞŐĞŶŽŵŝĐƌĞŐŝŽŶĐŽǀĞƌĞĚ ďLJƚŚĞĐůŽŶĞ;DĂƌƌĂĞƚĂů͕͘ϭϵϵϴͿ͘dŚĞŐĞŶŽŵĞĐĂŶďĞƐƉůŝƚŝŶƚŽĐůŽŶĞƐĂŶĚĚŝĨĨĞƌĞŶƚ ƉĂƌƚƐĐĂŶďĞƐĞƋƵĞŶĐĞĚďLJĚŝĨĨĞƌĞŶƚŐƌŽƵƉƐƐŝŵƵůƚĂŶĞŽƵƐůLJĂƐǁĂƐĚŽŶĞǁŚŝůĞƐĞƋƵĞŶĐŝŶŐ ƚŚĞŚƵŵĂŶŐĞŶŽŵĞ͘  ůŝďƌĂƌŝĞƐŚĂǀĞďĞĞŶƉŽŽůĞĚĂŶĚƐĞƋƵĞŶĐĞĚƵƐŝŶŐE'^ƚĞĐŚŶŽůŽŐŝĞƐƚŽŽďƚĂŝŶĂƐƐĞŵďůŝĞƐ ƚŚĂƚďƌŝŶŐƚŽŐĞƚŚĞƌĂŶĚE'^ŵĞƚŚŽĚŽůŽŐŝĞƐ͘/ƚŚĂƐĂůƐŽďĞĞŶƐŚŽǁŶ;EŝŝŶĂĞƚĂů͕͘ϮϬϭϭͿ ƚŚĂƚ ƉŽŽůĞĚ  ƐƚƌĂƚĞŐLJ ĨŽƌ ǁŚŽůĞ ŐĞŶŽŵĞ ƐĞƋƵĞŶĐŝŶŐ ŚĂƐ ŵĂŶLJ ďĞŶĞĨŝƚƐ ƐƵĐŚ ĂƐ ůŽǁĞƌ. 17.

(130) ĐŽƐƚ ĂŶĚ ďĞƚƚĞƌ ĂƐƐĞŵďůLJ ƋƵĂůŝƚLJ ĐŽŵƉĂƌĞĚ ƚŽ ǁŚŽůĞ ŐĞŶŽŵĞ ƐŚŽƚŐƵŶ ƐĞƋƵĞŶĐŝŶŐ͘ ,ŽǁĞǀĞƌ͕  ůŝďƌĂƌŝĞƐ ĂƌĞ ĞdžƉĞŶƐŝǀĞ ƚŽ ĐŽŶƐƚƌƵĐƚ ĂŶĚ ŵĂŝŶƚĂŝŶ͘  ůŝďƌĂƌŝĞƐ ƐƵĨĨĞƌ ĨƌŽŵ ƐŽŵĞĨŽƌŵŽĨǀĂƌŝĂďůĞĐůŽŶŝŶŐďŝĂƐǁŚŝĐŚůĞĂĚƐƚŽŽǀĞƌƌĞƉƌĞƐĞŶƚĂƚŝŽŶŽĨƐŽŵĞƌĞŐŝŽŶƐƚŚĂƚ ĂƌĞĐůŽŶĞĚďĞƚƚĞƌĂƚƚŚĞĞdžƉĞŶƐĞŽĨŽƚŚĞƌƌĞŐŝŽŶƐƚŚĂƚĂƌĞŶŽƚĐůŽŶĞĚ͘^ŝŶĐĞ͕ƚŚĞĂŶĚ z ƐƚƌĂƚĞŐŝĞƐ ŝŶǀŽůǀĞ ƌĂŶĚŽŵ ĨƌĂŐŵĞŶƚĂƚŝŽŶ͖ ƚŚĞ ƐĂŵĞ ůŝďƌĂƌŝĞƐ ĐĂŶŶŽƚ ďĞ ƌĞƉƌŽĚƵĐĞĚ͘ ŽŶƚĂŵŝŶĂƚŝŽŶĨƌŽŵǀĂƌŝŽƵƐƐŽƵƌĐĞƐƐƵĐŚĂƐƚŚĞĐůŽŶŝŶŐǀĞĐƚŽƌƵƐĞĚƚŽƉƌŽĚƵĐĞƚŚĞĐůŽŶĞ ĐĂŶĂůƐŽďĞĂŵĂũŽƌƉƌŽďůĞŵǁŚŝůĞƵƐŝŶŐďĂĐƚĞƌŝĂŽƌLJĞĂƐƚƐ͘  2.2.2 Reduced representation library based genome assembly ŽŶƐƚƌƵĐƚŝŽŶ ŽĨ Ă ƐĞƌŝĞƐ ŽĨ ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ůŝďƌĂƌŝĞƐ ďLJ ƐŝnjĞ ĨƌĂĐƚŝŽŶĂƚŝŽŶ ŚĂƐ ďĞĞŶ ƐĞĞŶ ĂƐ ĂŶŽƚŚĞƌ ƉŽƐƐŝďůĞ ǁĂLJ ĨŽƌ ƉĂƌƚŝƚŝŽŶŝŶŐ ƚŚĞ ŐĞŶŽŵĞ͘ dŚĞ ŐĞŶŽŵĞ ŝƐ ĨƌĂŐŵĞŶƚĞĚ ďLJ ĚŝŐĞƐƚŝŶŐ ǁŝƚŚ Ă ƌĞƐƚƌŝĐƚŝŽŶ ĞŶnjLJŵĞ ƚŽ ŽďƚĂŝŶ Ă ƌĞƉƌŽĚƵĐŝďůĞ ƐĞƚ ŽĨ ĨƌĂŐŵĞŶƚƐ͘ dŚĞƐĞ ĨƌĂŐŵĞŶƚƐ ĂƌĞ ƐĞƉĂƌĂƚĞĚ ďĂƐĞĚ ŽŶ ƐŝnjĞ ŝŶƚŽ ĚŝƐƚŝŶĐƚ ůŝďƌĂƌŝĞƐ ďĂƐĞĚ ŽŶ ƐŝnjĞ ƚŽ ŽďƚĂŝŶ ƌĞƉƌŽĚƵĐŝďůĞŐĞŶŽŵŝĐƉĂƌƚŝƚŝŽŶƐ͘  tŚŽůĞ ŐĞŶŽŵĞ ĂƐƐĞŵďůLJ ƵƐŝŶŐ ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ůŝďƌĂƌŝĞƐ ĂŶĚ ƐŚŽƌƚ ƌĞĂĚƐ ŚĂƐ ďĞĞŶ ĚĞŵŽŶƐƚƌĂƚĞĚ ƚŽ ďĞ ĞĨĨĞĐƚŝǀĞ ;zŽƵŶŐ Ğƚ Ăů͕͘ ϮϬϭϬͿ ǁŝƚŚ ƚŚĞ ƌŽƐŽƉŚŝůĂ ŐĞŶŽŵĞ ĂŶĚ ŝŶ ĐŽŵƉĂƌŝƐŽŶ ǁŝƚŚ Ă ƌĞĨĞƌĞŶĐĞ ŐĞŶŽŵĞ ƐĞƋƵĞŶĐĞĚ ďLJ ƚƌĂĚŝƚŝŽŶĂů ^ĂŶŐĞƌ ƐĞƋƵĞŶĐŝŶŐ͘ ĞƚƚĞƌ ƋƵĂůŝƚLJ ĂƐƐĞŵďůŝĞƐ ǁĞƌĞ ŽďƚĂŝŶĞĚ ƵƐŝŶŐ ƌĞƐƚƌŝĐƚŝŽŶ ĞŶnjLJŵĞƐ ƚŽ ĐƌĞĂƚĞ ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶƚŚĂŶƵƐŝŶŐǁŚŽůĞŐĞŶŽŵĞůŝďƌĂƌLJĂƐƐĞŵďůŝĞƐ͘ . 18.

(131) ZĞƐƚƌŝĐƚŝŽŶ ĞŶnjLJŵĞ ďĂƐĞĚ ĨƌĂŐŵĞŶƚĂƚŝŽŶ ŽĨ ŐĞŶŽŵĞƐ ƉƌŽĚƵĐĞƐ ƌĞƉƌŽĚƵĐŝďůĞ ĨƌĂŐŵĞŶƚƐ͘ ,ĞŶĐĞ͕ƌĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌŝĞƐŚĂǀĞďĞĞŶƵƐĞĚĨŽƌŽďƚĂŝŶŝŶŐ^EWŵĂƉƐ;ůƚƐŚƵůĞƌ ĞƚĂů͕͘ϮϬϬϬͿ͘/ƚŚĂƐďĞĞŶƐƵŐŐĞƐƚĞĚ;zŽƵŶŐĞƚĂů͕͘ϮϬϭϬͿƚŚĂƚƌĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌŝĞƐ ĂƌĞĞĂƐŝĞƌĂŶĚĐŚĞĂƉĞƌƚŽƉƌŽĚƵĐĞƚŚĂŶŽƌzĐůŽŶĞƐ͘  ZĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ůŝďƌĂƌŝĞƐ ĐĂŶ ďĞ ƉƌŽĚƵĐĞĚ ďLJ ĚŝŐĞƐƚŝŶŐ ŐĞŶŽŵŝĐ E ǁŝƚŚ Ă ƌĞƐƚƌŝĐƚŝŽŶĞŶnjLJŵĞĨŽůůŽǁĞĚďLJĞůĞĐƚƌŽƉŚŽƌĞƚŝĐƐŝnjĞƐĞƉĂƌĂƚŝŽŶŽĨŐĞŶŽŵŝĐE͘EĨƌŽŵ ĚŝĨĨĞƌĞŶƚ ƐŝnjĞ ƌĂŶŐĞ ĂƌĞ ĐƵƚ ŽƵƚ ĨƌŽŵ ƚŚĞ ŐĞů ĂŶĚ ƉƵƌŝĨŝĞĚ͘ ĂĐŚ ůŝďƌĂƌLJ ƚŚĞŶ ĐŽŶƐŝƐƚƐ ŽĨ ĨƌĂŐŵĞŶƚƐ ŽĨ Ă ƉĂƌƚŝĐƵůĂƌ ƐŝnjĞ ƌĂŶŐĞ͘ /ŶĚŝǀŝĚƵĂů ůŝďƌĂƌŝĞƐ ĂƌĞ ƐĞƋƵĞŶĐĞĚ ƐĞƉĂƌĂƚĞůLJ Žƌ ƐĞƋƵĞŶĐĞĚƚŽŐĞƚŚĞƌĂĨƚĞƌƚĂŐŐŝŶŐƚŚĞŵǁŝƚŚĂƵŶŝƋƵĞƐĞƋƵĞŶĐĞŵĂƌŬĞƌďLJĂƉƌŽĐĞƐƐŬŶŽǁŶ ĂƐďĂƌĐŽĚŝŶŐ͘ƐƐĞŵďůŝŶŐƚŚĞƐĞůŝďƌĂƌŝĞƐƐĞƉĂƌĂƚĞůLJŚĂƐďĞŶĞĨŝƚƐƐŝŵŝůĂƌƚŽƚŚŽƐĞŽďƚĂŝŶĞĚ ďLJƵƐŝŶŐůŝďƌĂƌŝĞƐ͘  /ƚ ŝƐ ĞdžƉĞĐƚĞĚ ƚŚĂƚ ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ůŝďƌĂƌŝĞƐ ǁŝůů ƌĞĚƵĐĞ ŵŝƐͲĂƐƐĞŵďůŝĞƐ ǁŝƚŚŽƵƚ ŝŶƚƌŽĚƵĐŝŶŐ ƉƌŽďůĞŵƐ ĂƐƐŽĐŝĂƚĞĚ ǁŝƚŚ ƚŚĞ  ĂƉƉƌŽĂĐŚ͘ ƉĂƌƚ ĨƌŽŵ ďĞŝŶŐ ĞdžƉĞŶƐŝǀĞ ƚŽ ĐŽŶƐƚƌƵĐƚĂŶĚŵĂŝŶƚĂŝŶ͕ůŝďƌĂƌŝĞƐĂƌĞƉƌŽŶĞƚŽĐŽŶƚĂŵŝŶĂƚŝŽŶĂŶĚǀĂƌŝĂďůĞĐůŽŶŝŶŐďŝĂƐ ǁŚŝĐŚůĞĂĚƐƚŽŽǀĞƌƌĞƉƌĞƐĞŶƚĂƚŝŽŶŽĨƐŽŵĞƌĞŐŝŽŶƐĂƚƚŚĞĞdžƉĞŶƐĞŽĨŽƚŚĞƌƌĞŐŝŽŶƐ͘hƐŝŶŐ ƐŝŶŐůĞ ĞŶĚ ƐĞƋƵĞŶĐŝŶŐ ĨŽƌ ƌŽƐŽƉŚŝůĂ ŐĞŶŽŵĞ͕ zŽƵŶŐ Ğƚ Ăů͕͘ ŚĂǀĞ ƐŚŽǁŶ ƚŚĂƚ ƌĞĂĚƐ ŐĞŶĞƌĂƚĞĚ ďLJ ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ĂƉƉƌŽĂĐŚ ŐŝǀĞƐ Ă ďĞƚƚĞƌ ĂƐƐĞŵďůLJ ǁŚĞŶ ĞĂĐŚ ŽĨ ƚŚĞ ůŝďƌĂƌŝĞƐŝƐĂƐƐĞŵďůĞĚƐĞƉĂƌĂƚĞůLJĂŶĚƉƵƚƚŽŐĞƚŚĞƌŚŝĞƌĂƌĐŚŝĐĂůůLJƌĂƚŚĞƌƚŚĂŶĂƐƐĞŵďůŝŶŐĂůů ƚŚĞƌĞĂĚƐƚŽŐĞƚŚĞƌ͘,ŽǁĞǀĞƌ͕ƌĞĂĚƐŐĞŶĞƌĂƚĞĚďLJƌĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶĂƉƉƌŽĂĐŚǁŝůůůĂĐŬ ŽǀĞƌůĂƉƉŝŶŐƌĞĂĚƐĂƚƌĞƐƚƌŝĐƚŝŽŶĞŶnjLJŵĞĐƵƚƐŝƚĞƐǁŚŝĐŚĐŽƵůĚůĞĂĚƚŽƉŽŽƌƉĞƌĨŽƌŵĂŶĐĞŽĨ. 19.

(132) ƚŚĞǁŚŽůĞŐĞŶŽŵĞĂƐƐĞŵďůLJŵĞƚŚŽĚ͘,ĞŶĐĞ͕ǁĞƐŝŵƵůĂƚĞƌĞĂĚƐĨŽƌƌĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ĂƉƉƌŽĂĐŚĂŶĚǁŚŽůĞŐĞŶŽŵĞĂƉƉƌŽĂĐŚƐĞƉĂƌĂƚĞůLJ͘  2.3 Sequencing the crow genome. /Ŷ ƚŚŝƐ ƉƌŽũĞĐƚ ǁĞ ƚƌŝĞĚ ƚŽ ĂƐĐĞƌƚĂŝŶ ƚŚĞ ĨĞĂƐŝďŝůŝƚLJ ĂŶĚ ƵƚŝůŝƚLJ ŽĨ ĂƉƉůLJŝŶŐ ƐƵĐŚ Ă ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌLJďĂƐĞĚĂƉƉƌŽĂĐŚƚŽŐĞŶŽŵĞĂƐƐĞŵďůLJŝŶĂůĂƌŐĞĂǀŝĂŶŐĞŶŽŵĞŽĨŽŶĞ ŚŽŽĚĞĚĐƌŽǁ;ŽƌǀƵƐĐŽƌŶŝdžͿŝŶĚŝǀŝĚƵĂů͘dŚĞŵŽƚŝǀĂƚŝŽŶĨŽƌŐĞŶŽŵĞƐĞƋƵĞŶĐŝŶŐŽĨƚŚĞĐƌŽǁ ĐŽŵĞƐĨƌŽŵŝƚƐŝŵƉŽƌƚĂŶĐĞŝŶĞǀŽůƵƚŝŽŶĂƌLJďŝŽůŽŐLJ͘ ĂƌƌŝŽŶĂŶĚŚŽŽĚĞĚĐƌŽǁƐĂƌĞĂǁĞůůƐƚƵĚŝĞĚĞdžĂŵƉůĞŽĨŝŶĐŝƉŝĞŶƚƐƉĞĐŝĂƚŝŽŶ͕ŝŶǁŚŝĐŚƚǁŽ ƐƵďͲƐƉĞĐŝĞƐ ƌĂƌĞůLJ ŝŶƚĞƌďƌĞĞĚ Žƌ ŽŶůLJ ǁŝƚŚ ůŝƚƚůĞ ƐƵĐĐĞƐƐ͘ ,ŽǁĞǀĞƌ͕ ƚŚĞ ƵŶĚĞƌůLJŝŶŐ ŐĞŶĞƚŝĐ ŵĞĐŚĂŶŝƐŵ ŽĨ ƚŚĞ ŚLJďƌŝĚ njŽŶĞ ŝƐ LJĞƚ ƚŽ ďĞ ƐƚƵĚŝĞĚ ŝŶ ĚĞƚĂŝů͘ ůƚŚŽƵŐŚ ƚŚĞƐĞ ƐƉĞĐŝĞƐ ĂƌĞ ŵŽƌƉŚŽůŽŐŝĐĂůůLJ ĂŶĚ ƚĂdžŽŶŽŵŝĐĂůůLJ ĚŝĨĨĞƌĞŶƚ͕ ƉƌĞǀŝŽƵƐ ƐƚƵĚŝĞƐ ;tŽůĨ Ğƚ Ăů͕͘ ϮϬϭϬͿ ŚĂǀĞ ƐŚŽǁŶƚŚĂƚŵŽůĞĐƵůĂƌĚŝĨĨĞƌĞŶƚŝĂƚŝŽŶďĞƚǁĞĞŶƚŚĞƐƉĞĐŝĞƐŝƐŶŽƚƉƌŽŶŽƵŶĐĞĚĂŶĚŝƐŝŶƐƚĂƌŬ ĐŽŶƚƌĂƐƚƚŽŵŽƌƉŚŽůŽŐŝĐĂůĚŝĨĨĞƌĞŶĐĞƐŝŶƉůƵŵĂŐĞĐŽůŽƌĂƚŝŽŶ͘,ĞŶĐĞ͕ĂŐĞŶŽŵĞǁŝĚĞƐƚƵĚLJ ĐŽƵůĚ ƐĞƌǀĞ ĂƐ Ă ƐƚĂƌƚŝŶŐ ƉŽŝŶƚ ĨŽƌ ŽďƚĂŝŶŝŶŐ Ă ďĞƚƚĞƌ ƵŶĚĞƌƐƚĂŶĚŝŶŐ ŽĨ ƚŚĞ ƉƵƚĂƚŝǀĞůLJ ĨĞǁ ŐĞŶĞƐ ƌĞƐƉŽŶƐŝďůĞ ĨŽƌ ƚŚĞ ŵŽƌƉŚŽůŽŐŝĐĂů ĚŝĨĨĞƌĞŶĐĞƐ͘  tĞ ƉůĂŶ ƚŽ ƵƐĞ ƚŚĞ ĐƌŽǁ ŐĞŶŽŵĞ ĂƐƐĞŵďůLJ ĂƐ Ă ďĂĐŬďŽŶĞ ĨŽƌ ƌĞͲƐĞƋƵĞŶĐŝŶŐ ŽĨ ƐĞǀĞƌĂů ƉŽƉƵůĂƚŝŽŶƐ ŝŶ Ă ƉŽƉƵůĂƚŝŽŶ ŐĞŶŽŵŝĐ ĨƌĂŵĞǁŽƌŬ͘  tĞ ĐŽŵƉĂƌĞĚ ƚŚĞ ƉĞƌĨŽƌŵĂŶĐĞ ŽĨ ƚŚĞ t'^ ĂƐƐĞŵďůLJ ƐƚƌĂƚĞŐLJ ǁŝƚŚ ƚŚĞ ZZ> ĂƐƐĞŵďůLJ ƐƚƌĂƚĞŐLJ ĨŽƌ ƐĞƋƵĞŶĐŝŶŐ ƚŚĞ ĐƌŽǁ ŐĞŶŽŵĞ ƵƐŝŶŐ ŝŶ ƐŝůŝĐŽ ŐĞŶŽŵĞ ƐĞƋƵĞŶĐŝŶŐ ďĂƐĞĚ ŽŶ. 20.

(133) ƐŝŵƵůĂƚŝŽŶƐ͘ tĞ ĂůƐŽ ůŽŽŬĞĚ Ăƚ ĞĨĨĞĐƚŝǀĞŶĞƐƐ ĂŶĚ ĞĂƐĞ ŽĨ ĐŽŶƐƚƌƵĐƚŝŶŐ Ă ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌLJƵƐŝŶŐƌĞƐƚƌŝĐƚŝŽŶĞŶnjLJŵĞƐŝŶƚŚĞůĂďŽƌĂƚŽƌLJ͘                     21.

(134) 3 Materials and methods 3.1 In silico genome sequencing hƐŝŶŐŝŶƐŝůŝĐŽƐŝŵƵůĂƚŝŽŶƐǁĞĐŽŵƉĂƌĞĚƚŚĞt'^ĂŶĚZZ>ŵĞƚŚŽĚŽĨŐĞŶŽŵĞĂƐƐĞŵďůLJ͘dŚĞ t'^ŵĞƚŚŽĚ;&ŝŐƵƌĞϯͿĂƐƐĞŵďůĞƐƐŚŽƌƚƌĞĂĚƐŽďƚĂŝŶĞĚƌĂŶĚŽŵůLJĨƌŽŵƚŚĞĞŶƚŝƌĞŐĞŶŽŵĞ͕ ǁŚŝůĞƚŚĞZZ>ŵĞƚŚŽĚ;&ŝŐƵƌĞϮͿĨŝƌƐƚƉĂƌƚŝƚŝŽŶƐƚŚĞŐĞŶŽŵĞŝŶƚŽƐŵĂůůĞƌĨƌĂĐƚŝŽŶƐǁŚŝĐŚĂƌĞ ƚƌĞĂƚĞĚ ĂƐ ΖŵŝŶŝͲŐĞŶŽŵĞƐΖ͘ :ƵƐƚ ĂƐ ŝŶ ƚŚĞ t'^ ŵĞƚŚŽĚ͕ ƐŚŽƌƚ ƌĞĂĚƐ ĂƌĞ ƌĂŶĚŽŵůLJ ĚƌĂǁŶ ĨƌŽŵ ƚŚĞƐĞ ΖŵŝŶŝͲŐĞŶŽŵĞƐΖ ĂŶĚ ĂƐƐĞŵďůĞĚ ŝŶƚŽ ĐŽŶƚŝŐƐ͘ /Ŷ Ă ƐĞĐŽŶĚ ŵĞƚĂͲĂƐƐĞŵďůLJ ƐƚĞƉ ͚ŵŝŶŝͲŐĞŶŽŵĞƐΖĂƌĞĂƐƐĞŵďůĞĚŝŶƚŽĂĐŽŵƉůĞƚĞŐĞŶŽŵĞ͘  &Žƌ ĐŽŵƉĂƌŝŶŐ ƚŚĞ t'^ ĂŶĚ ZZ> ŵĞƚŚŽĚƐ ǁĞ ŵĂŬĞ ƵƐĞ ŽĨ ƚŚĞ njĞďƌĂ ĨŝŶĐŚ ŐĞŶŽŵĞ ĨŽƌ ƉĞƌĨŽƌŵŝŶŐŝŶƐŝůŝĐŽƐŝŵƵůĂƚŝŽŶƐƚŽƐŝŵƵůĂƚĞƚŚĞŬŝŶĚŽĨƌĞƐƵůƚƐƚŚĂƚǁŽƵůĚďĞŽďƚĂŝŶĞĚǁŚŝůĞ ƵƐŝŶŐƚŚĞĐƌŽǁŐĞŶŽŵĞ͘dŚĞnjĞďƌĂĨŝŶĐŚŐĞŶŽŵĞŝƐƚŚĞĐůŽƐĞƐƚĂǀĂŝůĂďůĞŐĞŶŽŵĞƐĞƋƵĞŶĐĞ͘ dŚĞ ůĂƚĞƐƚ ǀĞƌƐŝŽŶ ŽĨ ƚŚĞ njĞďƌĂ ĨŝŶĐŚ ĚƌĂĨƚ ŐĞŶŽŵĞ ĂƐƐĞŵďůLJ ;th'^ ϯ͘Ϯ͘ϰͬƚĂĞ'ƵƚϭͿ ǁĂƐ ŽďƚĂŝŶĞĚĨƌŽŵƚŚĞh^ǁĞďƐŝƚĞ;ŚƚƚƉ͗ͬͬŐĞŶŽŵĞ͘ƵĐƐĐ͘ĞĚƵͬͿĨŽƌƉĞƌĨŽƌŵŝŶŐĂůůƐŝŵƵůĂƚŝŽŶƐ͘  /ŶƚŚĞĐƌŽǁ͕ƐĂŵƉůĞƐĂƌĞŽďƚĂŝŶĞĚĨƌŽŵǁŝůĚƉŽƉƵůĂƚŝŽŶƐĂƐĐƌŽǁƐĂƐŝƚ͛ƐĚŝĨĨŝĐƵůƚƚŽďƌĞĞĚ ƚŚĞŵ ŝŶ ĐĂƉƚŝǀŝƚLJ ĂŶĚ ŚŝŐŚůLJ ŝŶďƌĞĚ ŝŶĚŝǀŝĚƵĂůƐ ĐĂŶŶŽƚ ďĞ ƉƌŽĚƵĐĞĚ ĚƵĞ ƚŽ Ă ƐƵŝƚĞ ŽĨ ƉƌĂĐƚŝĐĂů ƌĞĂƐŽŶƐ͘ ,ĞŶĐĞ͕ ŝŶƚƌĂͲŝŶĚŝǀŝĚƵĂů ƉŽůLJŵŽƌƉŚŝƐŵƐ ŶĞĞĚ ƚŽ ďĞ ŝŶĐŽƌƉŽƌĂƚĞĚ ĚƵƌŝŶŐ ƚŚĞĂƐƐĞŵďůLJƉƌŽĐĞƐƐ͘E'^ƐĞƋƵĞŶĐŝŶŐĂůƐŽŚĂƐĂŚŝŐŚĞƌƌŽƌƌĂƚĞ;ŽŚŵĞƚĂů͕͘ϮϬϬϴͿ͘/ƚŚĂƐ ďĞĞŶ ƐŚŽǁŶ ƚŚĂƚ ŝŶĐƌĞĂƐĞ ŝŶ ƉŽůLJŵŽƌƉŚŝƐŵ ĂŶĚͬŽƌ ƐĞƋƵĞŶĐŝŶŐ ĞƌƌŽƌ ƉƌŽĚƵĐĞƐ ŵŽƌĞ ĨƌĂŐŵĞŶƚĞĚĂƐƐĞŵďůŝĞƐ;ĂŶŝĞů͘ϮϬϬϵͿǁŚĞŶƵƐŝŶŐƚŚĞt'^ŵĞƚŚŽĚŽĨŐĞŶŽŵĞĂƐƐĞŵďůLJ͘. 22.

(135)  ĂƐĞĚŽŶƉƌĞǀŝŽƵƐĞƐƚŝŵĂƚĞƐŽĨƐĞƋƵĞŶĐŝŶŐĞƌƌŽƌĂŶĚƉŽůLJŵŽƌƉŚŝƐŵ;tŽŶŐĞƚů͕͘ϮϬϬϰͿĂůů ƌĞĂĚƐǁĞƌĞƐŝŵƵůĂƚĞĚƵƐŝŶŐĚǁŐƐŝŵǀĞƌƐŝŽŶ͗Ϭ͘ϭ͘Ϯ;ƉĂƌƚŽĨEŶĂůLJƐŝƐWĂĐŬĂŐĞͿǁŝƚŚϬ͘ϭй ƐĞƋƵĞŶĐŝŶŐ ĞƌƌŽƌ ;ďĂƐĞ ĞƌƌŽƌ ƌĂƚĞͿ ĂŶĚ Ϭ͘ϭй ƉŽůLJŵŽƌƉŚŝƐŵ ;ƌĂƚĞ ŽĨ ŵƵƚĂƚŝŽŶƐͿ͘ ZĞƋƵŝƌĞĚ ĐŽǀĞƌĂŐĞǁĂƐŽďƚĂŝŶĞĚƵƐŝŶŐƚŚĞǁƌĂƉƉĞƌƐĐƌŝƉƚ͞ƌĞĂĚƐŝŵ͘Ɖů͟;ƌĞĨĞƌƐƵƉƉůĞŵĞŶƚĂƌLJŵĂƚĞƌŝĂů ĨŽƌƐŽƵƌĐĞĐŽĚĞͿ͘ . 3.1.1 Restriction enzyme and fragment size selection ZĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌŝĞƐĂƌĞĐŽŶƐƚƌƵĐƚĞĚƐŽĂƐƚŽƉĂƌƚŝƚŝŽŶƚŚĞŐĞŶŽŵĞŝŶƚŽƐŵĂůůĞƌ ƐŝnjĞĚ ƌĞƉƌŽĚƵĐŝďůĞ ĨƌĂĐƚŝŽŶƐ ƚŚĂƚ ĂƌĞ ĞĂƐŝĞƌ ƚŽ ŚĂŶĚůĞ ŝŶ ƚŚĞ ĂƐƐĞŵďůLJ ƉƌŽĐĞƐƐ͘  &Žƌ ĞĂĐŚ ĞŶnjLJŵĞ͕ ŝƚ ǁŽƵůĚ ďĞ ŽƉƚŝŵĂů ƚŽ ŚĂǀĞ ĞƋƵĂůůLJ ƐŝnjĞĚ ƉĂƌƚŝƚŝŽŶƐ ŽĨ ƚŚĞ ŐĞŶŽŵĞ ƚŽ ĞŶƐƵƌĞ ƵŶŝĨŽƌŵĚŝƐƚƌŝďƵƚŝŽŶŽĨƌĞĂĚƐ͘tŚĞŶƚŚĞŐĞŶŽŵĞŝƐLJĞƚƚŽďĞƐĞƋƵĞŶĐĞĚŝĚĞŶƚŝĨLJŝŶŐĞŶnjLJŵĞƐ ǁŚŝĐŚ ƉĂƌƚŝƚŝŽŶ ƚŚĞ ŐĞŶŽŵĞ ŝŶƚŽ ĞƋƵĂůůLJ ƐŝnjĞĚ ƉĂƌƚŝƚŝŽŶƐ ĐĂŶ ŽŶůLJ ďĞ ĂĐŚŝĞǀĞĚ ďLJ ƵƐŝŶŐ Ă ĐůŽƐĞůLJƌĞůĂƚĞĚŐĞŶŽŵĞ͘tĞĐŚŽŽƐĞƚŽƵƐĞƚŚĞĐŚŝĐŬĞŶĂŶĚnjĞďƌĂĨŝŶĐŚŐĞŶŽŵĞƐǁŚŝĐŚĂƌĞ ĐůŽƐĞůLJ ƌĞůĂƚĞĚ ƚŽ ƚŚĞ ĐƌŽǁ ŐĞŶŽŵĞ͕ ƚŽ ĞŶƐƵƌĞ ƚŚĂƚ ƚŚĞ ĨƌĂŐŵĞŶƚĂƚŝŽŶ ƉĂƚƚĞƌŶ ŝƐ ƌŽďƵƐƚ ĚĞƐƉŝƚĞƚŚĞŵƵůƚŝƉůĞŐĂƉƐŝŶƚŚĞĨŝŶĂůĚƌĂĨƚŐĞŶŽŵĞƐ͘  ƵƌŝŶŐ ůŝďƌĂƌLJ ƉƌĞƉĂƌĂƚŝŽŶ ŝŶ ƚŚĞ ůĂďŽƌĂƚŽƌLJ͕ ĨƌĂŐŵĞŶƚƐ фϭŬď ǁŝůů ďĞ ůŽƐƚ͘ DŽƌĞŽǀĞƌ͕ ǁŝƚŚ ǀĞƌLJůĂƌŐĞĨƌĂŐŵĞŶƚƐ;хϮϬŬďͿŝƚǁŽƵůĚŵĂŬĞŝƚĚŝĨĨŝĐƵůƚƚŽƉĂƌƚŝƚŝŽŶƚŚĞŐĞŶŽŵĞŝŶƚŽϰĞƋƵĂůůLJ ƐŝnjĞĚ ůŝďƌĂƌŝĞƐ͘ dŚĞƌĞĨŽƌĞ͕ ĂŶĞŶnjLJŵĞ ŶĞĞĚĞĚƚŽ ďĞ ƐĞůĞĐƚĞĚ ƐƵĐŚ ƚŚĂƚ ůĞƐƐ ƚŚĂŶ ϱй ŽĨƚŚĞ ŐĞŶŽŵĞǁĂƐƉƌĞƐĞŶƚŝŶĨƌĂŐŵĞŶƚƐфϭŬďŽƌхϮϬŬď͘. 23.

(136) Figure 2: Reduced Representation Library (RRL) approach to genome assembly (Approach taken in this thesis project).  24.

(137) . . Figure 3: Whole Genome Shotgun (WGS) approach to genome assembly.  ŝĨĨĞƌĞŶƚ ĞŶnjLJŵĞƐ ĐĂŶ ůĞĂĚ ƚŽ ĚŝĨĨĞƌĞŶƚůLJ ƐŝnjĞĚ ĨƌĂŐŵĞŶƚƐ ďĂƐĞĚ ŽŶ ƚŚĞ ĨƌĞƋƵĞŶĐLJ ŽĨ ƚŚĞ ŽĐĐƵƌƌĞŶĐĞŽĨƚŚĞƌĞƐƚƌŝĐƚŝŽŶƌĞĐŽŐŶŝƚŝŽŶƐŝƚĞ͘dŽƐĞůĞĐƚĞŶnjLJŵĞƐƚŚĂƚĐŽƵůĚďĞƐƵŝƚĂďůĞĨŽƌ ƉƌĞƉĂƌĂƚŝŽŶ ŽĨ ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ůŝďƌĂƌŝĞƐ͕ Ăůů ƚŚĞ ϳϱϱ ĞŶnjLJŵĞƐ ĂǀĂŝůĂďůĞ ŝŶ Z^ ǀĞƌƐŝŽŶ ϵϬϵ ǁĞƌĞ ƚĞƐƚĞĚ ĂŐĂŝŶƐƚ ƚŚĞ ĐŚŝĐŬĞŶ ĂŶĚ njĞďƌĂ ĨŝŶĐŚ ŐĞŶŽŵĞ͘ ŵŽŶŐ ƚŚĞƐĞ ƚŚĞ ĞŶnjLJŵĞƐǁŚŝĐŚǁĞƌĞŶŽƚƐĞŶƐŝƚŝǀĞƚŽŵĞƚŚLJůĂƚŝŽŶŽƌƉ'ƐŝƚĞƐǁĞƌĞƐĞůĞĐƚĞĚďĂƐĞĚŽŶƚŚĞŝƌ ĂǀĂŝůĂďŝůŝƚLJ ĚƵĞ ƚŽ ŽďƚĂŝŶ ƚŚĞ ĞdžƉĞĐƚĞĚ ƐŝnjĞ ĚŝƐƚƌŝďƵƚŝŽŶ ŽĨ ĨƌĂŐŵĞŶƚƐ ĂŶĚ ĂůƐŽ ĂǀŽŝĚ. 25.

(138) ŚĞƚĞƌŽŐĞŶĞŝƚLJ ŝŶ ƚŚĞ E ĐůĞĂǀĂŐĞ ƌĞĂĐƚŝŽŶ͘ &ŽůůŽǁŝŶŐ ĐĂŶĚŝĚĂƚĞ ĞŶnjLJŵĞƐ ;&ŝŐƵƌĞ ϰͿ ǁĞƌĞ. 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% BglII. BsrGI. EcoRI. PciI. PsiI. PstI. Above 17000 16500 to 17000 16000 to 16500 15500 to 16000 15000 to 15500 14500 to 15000 14000 to 14500 13500 to 14000 13000 to 13500 12500 to 13000 12000 to 12500 11500 to 12000 11000 to 11500 10500 to 11000 10000 to 10500 9500 to 10000 9000 to 9500 8500 to 9000. 8000 to 8500 7500 to 8000 7000 to 7500 6500 to 7000 6000 to 6500 5500 to 6000 5000 to 5500 4500 to 5000 4000 to 4500 3500 to 4000 3000 to 3500 2500 to 3000 2000 to 2500 1500 to 2000 1000 to 1500 500 to 1000 0 to 500. Figure 4: Fragment size distribution for some of the short listed enzymes, namely, BglII, BsrGI, EcoRI, PciI, Psil and Pstl ƐĞůĞĐƚĞĚďĂƐĞĚŽŶƚŚĞƐĞĐƌŝƚĞƌŝĂ͘  ŚŝŐŚ ĐŽƌƌĞůĂƚŝŽŶ ďĞƚǁĞĞŶ ĨƌĂŐŵĞŶƚ ƐŝnjĞ ĚŝƐƚƌŝďƵƚŝŽŶƐ ďĞƚǁĞĞŶ ĐŚŝĐŬĞŶ ĂŶĚ njĞďƌĂ ĨŝŶĐŚ ƐƵŐŐĞƐƚƐƚŚĂƚĂƐŝŵŝůĂƌůĞŶŐƚŚƉƌŽĨŝůĞĐĂŶďĞĨŽƵŶĚŝŶƚŚĞĐƌŽǁ;dĂďůĞϮͿ͘ Enzyme. Pearson Correlation (r) calculated using R function cor.test. BglII. 0.91. BsrGI. 0.92. EcoRI. 0.78. PciI. 0.79. PsiI. 0.82. PstI. 0.93. Table 2: Correlation (r) between the fragment size distribution between chicken and zebra finch Genomes. 26.

(139) Figure 5.1: Variation in BglII fragment size distribution at Genomic partitions in human, mouse, chicken and zebra finch genome. Fragment sizes were determined after removing all gaps represented by N’s in the genome.. Figure 5.2: Variation in PciI Fragment size distribution at genomic partitions in human, mouse, chicken and zebra finch genome. Fragment sizes were determined after removing all gaps represented by N’s in the genome.  WĐŝ/ ĂŶĚ Őů// ǁĞƌĞ ĨŝŶĂůůLJ ƐĞůĞĐƚĞĚ ĨŽƌ ƚŚĞ ĐŽŶƐƚƌƵĐƚŝŽŶ ŽĨ ƚŚĞ ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ůŝďƌĂƌŝĞƐĨŽƌƚŚĞĐƌŽǁŐĞŶŽŵĞĨƌŽŵƚŚĞĐĂŶĚŝĚĂƚĞƐĞƚ;&ŝŐƵƌĞϱͿĚƵĞƚŽƚŚĞŚŝŐŚĚĞŐƌĞĞŽĨ. 27.

(140) ĐŽƌƌĞůĂƚŝŽŶŝŶƚŚĞĨƌĂŐŵĞŶƚƐŝnjĞĚŝƐƚƌŝďƵƚŝŽŶďĂƐĞĚŽŶƐŝŵƵůĂƚŝŽŶĚŽŶĞƵƐŝŶŐƚŚĞĐŚŝĐŬĞŶĂŶĚ njĞďƌĂĨŝŶĐŚŐĞŶŽŵĞ͘  ZĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌŝĞƐŶĞĞĚƚŽďĞĐŽŶƐƚƌƵĐƚĞĚŝŶƐƵĐŚĂǁĂLJƚŚĂƚƚŚĞŐĞŶŽŵĞŝƐ ĂƉƉƌŽdžŝŵĂƚĞůLJ ĞƋƵĂůůLJ ƉĂƌƚŝƚŝŽŶĞĚ ĂŵŽŶŐ ƚŚĞ ĚŝĨĨĞƌĞŶƚ ůŝďƌĂƌŝĞƐ͘ ĞďƌĂ &ŝŶĐŚ ŝƐ ƚŚĞ ŵŽƐƚ ĐůŽƐĞůLJƌĞůĂƚĞĚƐƉĞĐŝĞƐƚŽƚŚĞĐƌŽǁƚŚĂƚŚĂƐďĞĞŶƐĞƋƵĞŶĐĞĚĂŶĚĂƐƐĞŵďůĞĚ͘,ĞŶĐĞ͕ƚŚĞƐŝnjĞ ƌĂŶŐĞǁŚŝĐŚƉĂƌƚŝƚŝŽŶƐƚŚĞnjĞďƌĂĨŝŶĐŚŐĞŶŽŵĞŝŶƚŽĨŽƵƌĞƋƵĂůƉĂƌƚŝƚŝŽŶƐŚĂƐƚŽďĞƵƐĞĚƚŽ ƉĂƌƚŝƚŝŽŶƚŚĞZZůŝďƌĂƌŝĞƐ͘,ŽǁĞǀĞƌ͕ŵĂŶLJůĂƌŐĞŐĂƉƐĂƌĞƉƌĞƐĞŶƚŝŶƚŚĞĞďƌĂ&ŝŶĐŚŐĞŶŽŵĞ͘ dŽ ŽďƚĂŝŶ ƌŽďƵƐƚ ǀĂůƵĞƐ ĨŽƌ ƚŚĞ ĨƌĂŐŵĞŶƚ ƐŝnjĞ Ăůů ĂŶĂůLJƐĞƐ ǁĞƌĞ ƌĞƉĞĂƚĞĚ ǁŝƚŚ ĐŚŝĐŬĞŶ͕ ŚƵŵĂŶĂŶĚŵŽƵƐĞŐĞŶŽŵĞƐƚŽŐĞƚĂŶŝĚĞĂŽĨƚŚĞǀĂƌŝĂƚŝŽŶŝŶƚŚĞĨƌĂŐŵĞŶƚƐŝnjĞĚŝƐƚƌŝďƵƚŝŽŶ ďĞƚǁĞĞŶƚŚĞĚŝĨĨĞƌĞŶƚŐĞŶŽŵĞƐ͘  dŚĞ ĨƌĂŐŵĞŶƚ ƐŝnjĞ ĚŝƐƚƌŝďƵƚŝŽŶ ŝŶ ĐůŽƐĞůLJ ƌĞůĂƚĞĚ ŐĞŶŽŵĞƐ ƐƵĐŚ ĂƐ ŵŽƵƐĞ͕ ŚƵŵĂŶ͕ ĐŚŝĐŬĞŶ ĂŶĚnjĞďƌĂĨŝŶĐŚŐĞŶŽŵĞƐǁŝƚŚŽƵƚĂŶLJŐĂƉƐǁĂƐƵƐĞĚƚŽĐŚĞĐŬĨŽƌǀĂƌŝĂƚŝŽŶŝŶĨƌĂŐŵĞŶƚƐŝnjĞƐ͘ ĂƐĞĚ ŽŶ ƐŝŵŝůĂƌŝƚLJ ŝŶ ƚŚĞ ĨƌĂŐŵĞŶƚ ƐŝnjĞ ĚŝƐƚƌŝďƵƚŝŽŶ ŝŶ ĚŝĨĨĞƌĞŶƚ ŐĞŶŽŵĞƐ͕ ĨƌĂŐŵĞŶƚ ƐŝnjĞ ĚŝƐƚƌŝďƵƚŝŽŶŝŶĐŚŝĐŬĞŶǁĂƐƵƐĞĚƚŽƐĞůĞĐƚƚŚĞĐƵƚƐŝƚĞƐĨŽƌWĐŝ/ĂŶĚĨƌĂŐŵĞŶƚƐŝnjĞĚŝƐƚƌŝďƵƚŝŽŶ ŝŶnjĞďƌĂ&ŝŶĐŚǁĂƐƵƐĞĚƚŽƐĞůĞĐƚƚŚĞĐƵƚƐŝƚĞƐĨŽƌŐů//͘  ZĞƐƚƌŝĐƚŝŽŶƐŝƚĞƐǁĞƌĞŝĚĞŶƚŝĨŝĞĚŽŶďŽƚŚƚŚĞƉŽƐŝƚŝǀĞĂŶĚŶĞŐĂƚŝǀĞƐƚƌĂŶĚƐŽĨƚŚĞEƚŽŐĞƚ ƚŚĞ ĨƌĂŐŵĞŶƚ ƐŝnjĞ ĚŝƐƚƌŝďƵƚŝŽŶ͘ ZĞƐƚƌŝĐƚŝŽŶ ƌĞĐŽŐŶŝƚŝŽŶ ƐŝƚĞƐ ;ŝŶ WĐŝ/ d'dͿ ǁĞƌĞ ƵƐĞĚ ƚŽ ŝĚĞŶƚŝĨLJƚŚĞƐŝƚĞƐĂŶĚĐƵƚƐŝƚĞƐ;ŝŶWĐŝ/Ζd'ͿǁĞƌĞƵƐĞĚƚŽŽďƚĂŝŶƚŚĞĚŝĨĨĞƌĞŶƚĨƌĂŐŵĞŶƚƐ͘. 28.

(141)  džĂĐƚƐŝnjĞƌĂŶŐĞĨŽƌĞĂĐŚůŝďƌĂƌLJǁĂƐĨŽƵŶĚĂƚϳϱй͕ϱϬйĂŶĚϮϱйŽĨƚŚĞƚŽƚĂůŐĞŶŽŵĞƐŝnjĞ ĂĨƚĞƌĞdžĐůƵĚŝŶŐĨƌĂŐŵĞŶƚƐƐŚŽƌƚĞƌƚŚĂŶϭ<͘tŚŝůĞĐƵƚƚŝŶŐŽƵƚƚŚĞĚŝĨĨĞƌĞŶƚůŝďƌĂƌŝĞƐĨƌŽŵ ƚŚĞŐĞůŝŶƚŚĞůĂďŽƌĂƚŽƌLJďĂƐĞĚŽŶƚŚĞƉŽƐŝƚŝŽŶŽĨƚŚĞEůĂĚĚĞƌ͕ĨƌĂŐŵĞŶƚƐŝnjĞƐĐĂŶŽŶůLJ ďĞ ŬŶŽǁŶ ĂƉƉƌŽdžŝŵĂƚĞůLJ͘ ,ĞŶĐĞ͕ ǁĞ ĂůƐŽ ĐŚĞĐŬĞĚ ƚŚĞ ƌŽďƵƐƚŶĞƐƐ ŽĨ ƚŚĞ ůŝďƌĂƌLJ ĨƌĂŐŵĞŶƚ ƐŝnjĞ ďŽƵŶĚĂƌŝĞƐ͘ >ŝďƌĂƌLJ ƐŝnjĞƐ ĂŶĚ ĐŚĂŶŐĞƐ ŝŶ ůŝďƌĂƌLJ ĨƌĂŐŵĞŶƚ ƐŝnjĞ ďŽƵŶĚĂƌŝĞƐ ǁŝƚŚ ƐŵĂůů ĐŚĂŶŐĞƐ ŝŶ ƚŚĞ ůŝďƌĂƌLJ ƐŝnjĞƐ ;dĂďůĞ ϯͿ ƉƌŽǀŝĚĞ ƚŚĞ ŝŶĨŽƌŵĂƚŝŽŶ ŶĞĞĚĞĚ ƚŽ ƉĂƌƚŝƚŝŽŶ ƚŚĞ ŐĞŶŽŵĞ͘  Library (Enzyme). Lower Bound. -5% error. Upper Bound. +5% error. Library-1(PciI). 4.17% (<1kb). 2639(20%). 3090. 3541(30%). Library-2(PciI). 3091. 4996 (45%). 5549. 6139 (55%). Library-3(PciI). 5550. 8357 (70%). 9320. 10548(80%). Library-4(PciI). 9321. NA. 48000. NA. Library-1(BglII). 4.09%(<1kb). 2651(20%). 3106. 3570(30%). Library-2(BglII). 3107. 5077(45%). 5654. 6290(55%). Library-3(BglII). 6291. 8755(70%). 9956. 11525(80%). Library-4(BglII). 11526. NA. 48000. NA. Table 3: Size range for the different libraries digested by different enzymes. 29.

(142) 3.1.2 de novo assembly t'^ŵĞƚŚŽĚŽĨŐĞŶŽŵĞĂƐƐĞŵďůLJ͗ &ŽƌƉĞƌĨŽƌŵŝŶŐƚŚĞt'^ŵĞƚŚŽĚŽĨŐĞŶŽŵĞĂƐƐĞŵďůLJϭϬϬďƉƌĞĂĚƐǁŝƚŚϰϬϬďƉŝŶƐĞƌƚƐŝnjĞ ǁĞƌĞ ŐĞŶĞƌĂƚĞĚ ǁŝƚŚ ĐŽǀĞƌĂŐĞ ŽĨ ϮϬ͕ ϯϬ͕ ϰϬ͕ ϱϬ ĂŶĚ ϲϬ y͘ ĂĐŚ ŽĨ ƚŚĞ ĚĂƚĂ ƐĞƚƐ ǁŝƚŚ ĚŝĨĨĞƌĞŶƚĐŽǀĞƌĂŐĞǁĂƐĂƐƐĞŵďůĞĚƐĞƉĂƌĂƚĞůLJƵƐŝŶŐƚŚĞϲϰďŝƚǀĞƌƐŝŽŶŽĨ^KWĚĞŶŽǀŽ;>ŝĞƚ Ăů͕͘ϮϬϭϬͿŐĞŶŽŵĞĂƐƐĞŵďůĞƌ;ǁŝƚŚϯϭ͕ϲϯĂŶĚϭϮϳŬŵĞƌͿƚƌLJŝŶŐŽƵƚĂůůŬŵĞƌƐŝnjĞƐĨƌŽŵϮϭƚŽ ϲϯĂŶĚƐĞůĞĐƚŝŶŐƚŚĞŬŵĞƌƐŝnjĞǁŝƚŚƚŚĞďĞƐƚEϱϬǀĂůƵĞ͘ ZZ>ŵĞƚŚŽĚŽĨŐĞŶŽŵĞĂƐƐĞŵďůLJ͗ &ŽƌƚŚĞƌĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌLJĂƉƉƌŽĂĐŚ͕njĞďƌĂĨŝŶĐŚŐĞŶŽŵĞǁĂƐĚŝŐĞƐƚĞĚŝŶͲƐŝůŝĐŽ ǁŝƚŚ WĐŝ/ ĂŶĚ Őů// ƵƐŝŶŐ ƚŚĞ ͞ZĞƐƚƌŝĐƚŝŽŶ ŝŐĞƐƚŝŽŶ͟ ƉĞƌů ŵŽĚƵůĞ ;ƌĞĨĞƌ ƐƵƉƉůĞŵĞŶƚĂƌLJ ŵĂƚĞƌŝĂů ĨŽƌ ƐŽƵƌĐĞ ĐŽĚĞͿ ƚŽ ŽďƚĂŝŶ ƚŚĞ ĚŝĨĨĞƌĞŶƚ ĨƌĂŐŵĞŶƚƐ ƚŚĂƚ ǁŽƵůĚ ďĞ ŽďƚĂŝŶĞĚ ŝĨ ŐĞŶŽŵŝĐEǁĂƐĂĐƚƵĂůůLJĚŝŐĞƐƚĞĚŝŶƚŚĞůĂďŽƌĂƚŽƌLJ͘dŚĞƐĞĨƌĂŐŵĞŶƚƐǁĞƌĞŐƌŽƵƉĞĚŝŶƚŽϰ ůŝďƌĂƌŝĞƐ ĞĂĐŚ ďĂƐĞĚ ŽŶ ƚŚĞ ƐŝnjĞ ƌĂŶŐĞ ƚŚĂƚ ŚĂƐ ďĞĞŶ ƐĞůĞĐƚĞĚ ;dĂďůĞ ϯͿ͘ /Ŷ ĞĂĐŚ ŽĨ ƚŚĞ ϴ ůŝďƌĂƌŝĞƐ ϭϬϬ ďƉ ƌĞĂĚƐ ǁŝƚŚ ϰϬϬ ďƉ ŝŶƐĞƌƚ ǁĞƌĞ ŐĞŶĞƌĂƚĞĚ ǁŝƚŚ ϮϬ͕ Ϯϱ ĂŶĚ ϯϬ y ĐŽǀĞƌĂŐĞ͘ ĂĐŚ ŽĨ ƚŚĞ ĞŝŐŚƚ ůŝďƌĂƌŝĞƐ ǁĂƐ ĂƐƐĞŵďůĞĚ ƐĞƉĂƌĂƚĞůLJ ƵƐŝŶŐ ƚŚĞ ^KW ĚĞŶŽǀŽ ĂƐƐĞŵďůĞƌ ƉƌŽďŝŶŐĂůůŬŵĞƌƐŝnjĞƐĨƌŽŵϮϭƚŽϲϯ͘&ŽƌĞĂĐŚŽĨƚŚĞĚĂƚĂƐĞƚƐƚŚĞŬŵĞƌǁŝƚŚƚŚĞďĞƐƚEϱϬ ǀĂůƵĞǁĂƐƐĞůĞĐƚĞĚ͘ĞƐƚŬŵĞƌĂƐƐĞŵďůLJĐŽŶƚŝŐƐĨƌŽŵĞĂĐŚŽĨƚŚĞϰůŝďƌĂƌŝĞƐĨŽƌĞĂĐŚĞŶnjLJŵĞ ǁĞƌĞƉŽŽůĞĚƚŽŐĞƚĞŶnjLJŵĞĐŽŶƚŝŐƐ͘ . 30.

(143) dŚĞ Ϯ ƐĞƚƐ ŽĨ ĞŶnjLJŵĞ ĐŽŶƚŝŐƐ ǁĞƌĞ ŵĞƌŐĞĚ ďLJ ŵĞƚĂͲĂƐƐĞŵďůLJ ƵƐŝŶŐ DŝŶŝŵƵƐ ĂƐƐĞŵďůĞƌ ;^ŽŵŵĞƌĞƚĂů͕͘ϮϬϬϳͿĨƌŽŵƚŚĞDK^ƉĂĐŬĂŐĞǁŝƚŚ͞ͲZĞĨĐŽƵŶƚ͟ŽƉƚŝŽŶƐƉĞĐŝĨLJŝŶŐƚŚĞƚǁŽ ƐĞƚƐŽĨĞŶnjLJŵĞĐŽŶƚŝŐƐĂƐƚǁŽĚŝƐƚŝŶĐƚĂƐƐĞŵďůŝĞƐ͘ŽƚŚƉŽƐƐŝďůĞŽƌĚĞƌƐŽĨĂƐƐĞŵďůŝĞƐǁĞƌĞ ƚƌŝĞĚ ĂŶĚ ƚŚĞ ŽƌĚĞƌ ǁŝƚŚ Ă ŚŝŐŚĞƌ EϱϬ ǀĂůƵĞ ǁĂƐ ĐŚŽƐĞŶ͘DŝŶŝŵƵƐϮ ŵĂŬĞƐ ƵƐĞ ŽĨEƵĐŵĞƌ ĨƌŽŵDhDŵĞƌƉĂĐŬĂŐĞ͘dŚĞDhDŵĞƌƉĂĐŬĂŐĞǁĂƐĐŽŵƉŝůĞĚǁŝƚŚƚŚĞϲϰďŝƚŽƉƚŝŽŶƚŽďĞ ĂďůĞƚŽĂƐƐĞŵďůĞůĂƌŐĞĚĂƚĂƐĞƚƐ͘  &ŽƌĞĂĐŚŽĨƚŚĞĚĂƚĂƐĞƚƐƚŚĞŬŵĞƌǁŝƚŚƚŚĞďĞƐƚEϱϬǀĂůƵĞǁĂƐƐĞůĞĐƚĞĚ͘sĂƌŝŽƵƐƐƚĂƚŝƐƚŝĐƐ ƐƵĐŚĂƐEϱϬ͕ůŽŶŐĞƐƚĐŽŶƚŝŐ͕ƚŽƚĂůŶƵŵďĞƌŽĨĂƐƐĞŵďůĞĚďĂƐĞƐ͕ƚŽƚĂůŶƵŵďĞƌŽĨĐŽŶƚŝŐƐǁĂƐ ĐĂůĐƵůĂƚĞĚĨŽƌƚŚĞƐĞĂƐƐĞŵďůŝĞƐƵƐŝŶŐƚŚĞƐĐƌŝƉƚ͞EϱϬ͘Ɖů͟;ƌĞĨĞƌƐƵƉƉůĞŵĞŶƚĂƌLJŵĂƚĞƌŝĂůĨŽƌ ƐŽƵƌĐĞ ĐŽĚĞͿ͘ dŚĞ ĂƐƐĞŵďůĞĚ ĐŽŶƚŝŐƐ ǁĞƌĞ ĂůŝŐŶĞĚ ĂŐĂŝŶƐƚ ƚŚĞ ƌĞĨĞƌĞŶĐĞ ŐĞŶŽŵĞ ƵƐŝŶŐ ŶƵĐŵĞƌĨƌŽŵDhDŵĞƌƉĂĐŬĂŐĞ͘dŚĞƉĞƌĐĞŶƚŽĨƚŚĞŐĞŶŽŵĞƚŚĂƚǁĂƐĂƐƐĞŵďůĞĚ;ƌĞĐŽǀĞƌLJͿ ĂŶĚ ƚŚĞ ĐŽƌƌĞĐƚůLJ ĂƐƐĞŵďůĞĚ ƉƌŽƉŽƌƚŝŽŶ ŽĨ ƚŚĞ ĐŽŶƚŝŐƐ ;ĂĐĐƵƌĂĐLJͿ ǁĞƌĞ ĐĂůĐƵůĂƚĞĚ ƵƐŝŶŐ ΖƐŚŽǁͲƚŝůŝŶŐΖĨƌŽŵDhDŵĞƌƉĂĐŬĂŐĞ͘ . 3.2 Laboratory methods.  WƌŝŽƌƚŽƐĞƋƵĞŶĐŝŶŐ͕ŐĞŶŽŵŝĐEŶĞĞĚƐƚŽďĞĞdžƚƌĂĐƚĞĚĂŶĚƉƌĞƉĂƌĞĚŝŶƚŽĂĨŽƌŵƐƵŝƚĂďůĞ ĨŽƌ ƐĞƋƵĞŶĐŝŶŐ ĚĞƉĞŶĚŝŶŐ ŽŶ ƚŚĞ ƐĞƋƵĞŶĐŝŶŐ ƚĞĐŚŶŽůŽŐLJ ďĞŝŶŐ ƵƐĞĚ ĂŶĚ ƐĞƋƵĞŶĐŝŶŐ ƐƚƌĂƚĞŐLJďĞŝŶŐĂĚŽƉƚĞĚ͘. 31.

(144) 3.2.1 DNA extraction and quality check EƌĞƋƵŝƌĞĚĨŽƌŐĞŶŽŵŝĐƐĞƋƵĞŶĐŝŶŐǁĂƐĞdžƚƌĂĐƚĞĚĨƌŽŵĨƌŽnjĞŶďůŽŽĚƐĂŵƉůĞƐƉƌĞƐĞƌǀĞĚ ŝŶYƵĞĞŶΖƐ>LJƐŝƐďƵĨĨĞƌ͘ZĞĨĞƌƚŽƚŚĞ^ƵƉƉůĞŵĞŶƚĂƌLJŵĂƚĞƌŝĂůĨŽƌĚĞƚĂŝůĞĚƉƌŽƚŽĐŽůƵƐĞĚƚŽ ŽďƚĂŝŶŚŝŐŚƋƵĂůŝƚLJEǁŝƚŚŵŝŶŝŵĂůĨƌĂŐŵĞŶƚĂƚŝŽŶ͘WƵƌŝƚLJĂŶĚƋƵĂŶƚŝƚLJŽĨĞdžƚƌĂĐƚĞĚE ǁĂƐ ŵĞĂƐƵƌĞĚ ďLJ EĂŶŽĚƌŽƉ ĂŶĚ YƵďŝƚ ;/ŶǀŝƚƌŽŐĞŶͿ ŵĞĂƐƵƌĞŵĞŶƚƐ͘ &ŽƌƋƵĂůŝƚLJ ĞǀĂůƵĂƚŝŽŶ͕ E ǁĂƐ ƌƵŶ ŽŶ Ă ϭй ŐĞů ďĞƐŝĚĞ Ă ŚŝŐŚ ŵŽůĞĐƵůĂƌ ǁĞŝŐŚƚ ŵĂƌŬĞƌ ;'Z,ZнKZ ůĂĚĚĞƌ͕ &ĞƌŵĞŶƚĂƐͿ;ĞdžĂŵƉůĞŐĞůƉŝĐƚƵƌĞƐĞĞ&ŝŐƵƌĞϳͿ͘         &ŝŐƵƌĞϳ͗ƌŽǁŐĞŶŽŵŝĐEƐĂŵƉůĞƐĨƌĂŐŵĞŶƚĂƚŝŽŶĐŚĞĐŬĂĨƚĞƌĞdžƚƌĂĐƚŝŽŶĨƌŽŵďůŽŽĚ͘.   . 32. .

(145) . 3.2.2 Reduced representation library construction ZĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌŝĞƐǁĞƌĞĐŽŶƐƚƌƵĐƚĞĚƵƐŝŶŐƌĞƐƚƌŝĐƚŝŽŶĞŶnjLJŵĞƐƐĞůĞĐƚĞĚƵƐŝŶŐ ƚŚĞnjĞďƌĂĨŝŶĐŚŐĞŶŽŵĞĨŽƌĨŝŶĚŝŶŐĨƌĂŐŵĞŶƚƐŝnjĞĚŝƐƚƌŝďƵƚŝŽŶƐ͘'ĞŶŽŵŝĐEĞdžƚƌĂĐƚĞĚĨƌŽŵ ĐƌŽǁ ďůŽŽĚ ǁĂƐ ĐŚĞĐŬĞĚ ĨŽƌ ƉƵƌŝƚLJ ĂŶĚ ƋƵĂůŝƚLJ ĂƐ ĚĞƐĐƌŝďĞĚ ĂďŽǀĞ͘ ZĞĨĞƌ ƚŽ ƚŚĞ ^ƵƉƉůĞŵĞŶƚĂƌLJ ŵĂƚĞƌŝĂů ĨŽƌ ƚŚĞ ĚĞƚĂŝůĞĚ ƉƌŽƚŽĐŽů ƵƐĞĚ ƚŽ ĐŽŶƐƚƌƵĐƚ ƚŚĞ ƌĞĚƵĐĞĚ ƌĞƉƌĞƐĞŶƚĂƚŝŽŶ ůŝďƌĂƌŝĞƐ͘ WƵƌŝĨŝĞĚ ŐĞŶŽŵŝĐ E ǁĂƐ ĨƵůůLJ ĚŝŐĞƐƚĞĚ ǁŝƚŚ ƚŚĞ ƉƌĞͲƐĞůĞĐƚĞĚ ĞŶnjLJŵĞ;Ğdž͗Őů//Ϳ͘dŚĞĨƵůůĚŝŐĞƐƚǁĂƐƚŚĞŶƌƵŶŽŶϭйĂŐĂƌŽƐĞŐĞůƚŽƐĞƉĂƌĂƚĞƚŚĞĨƌĂŐŵĞŶƚƐ ďĂƐĞĚŽŶƐŝnjĞ͘         &ŝŐƵƌĞϴ͗ƌŽǁ'ĞŶŽŵŝĐEǁĂƐĚŝŐĞƐƚĞĚǁŝƚŚƚŚĞƐĞůĞĐƚĞĚƌĞƐƚƌŝĐƚŝŽŶĞŶnjLJŵĞĂŶĚƌƵŶŽŶ ĂϭйĂŐĂƌŽƐĞŐĞůĨŽƌƉĂƌƚŝƚŝŽŶŝŶŐƚŚĞŐĞŶŽŵĞŝŶƚŽůŝďƌĂƌŝĞƐďĂƐĞĚŽŶƐŝnjĞ. . . 33.

(146) &ƌŽŵ ƚŚĞ ŐĞŶŽŵŝĐ E ĚŝŐĞƐƚ ĨŽƵƌ ĚŝĨĨĞƌĞŶƚůLJ ƐŝnjĞĚ ;dĂďůĞ ϯͿ ůŝďƌĂƌŝĞƐ ǁĞƌĞ ĐƵƚ ŽƵƚ ĂŶĚ ƉƵƌŝĨŝĞĚ ĨƌŽŵ ƚŚĞ ŐĞů ;ƌĞĨĞƌ ƐƵƉƉůĞŵĞŶƚĂƌLJ ŝŶĨŽƌŵĂƚŝŽŶ ĨŽƌ ĚĞƚĂŝůĞĚ ƉƌŽƚŽĐŽůͿ͘ ŝĨĨĞƌĞŶƚ ŵĞƚŚŽĚƐ;YŝĂŐĞŶ͕'ĞůĂƐĞ͕&ƌĞĞnjĞĂŶĚƐƋƵĞĞnjĞͿĨŽƌĞdžƚƌĂĐƚŝŶŐEĨƌŽŵƚŚĞŐĞůǁĞƌĞƚƌŝĞĚƚŽ ƐĞůĞĐƚĂŵĞƚŚŽĚƚŚĂƚƉƌŽǀŝĚĞĚƚŚĞďĞƐƚLJŝĞůĚǁŝƚŚŵŝŶŝŵĂůĨƌĂŐŵĞŶƚĂƚŝŽŶ;ĚĂƚĂŶŽƚƐŚŽǁŶͿ͘  DƵůƚŝƉůĞŐĞůƐǁĞƌĞƌƵŶƚŽŽďƚĂŝŶŵŽƌĞƚŚĂŶϭϬƵŐŽĨEƉĞƌůŝďƌĂƌLJ͘dŚĞĨŽƵƌůŝďƌĂƌŝĞƐǁĞƌĞ ĂŐĂŝŶƌƵŶŽŶϭйĂŐĂƌŽƐĞŐĞůƚŽĞŶƐƵƌĞƉƌŽƉĞƌƐĞƉĂƌĂƚŝŽŶŽĨƚŚĞĨƌĂŐŵĞŶƚƐŝŶƚŽƚŚĞůŝďƌĂƌŝĞƐ ŽĨƚŚĞƌĞƋƵŝƌĞĚƐŝnjĞ͘              . 34.

(147) 4 Results 4.1 Simulated assemblies.  tĞĂƐƐĞŵďůĞĚϭϬϬďƉƌĞĂĚƐǁŝƚŚϰϬϬďƉŝŶƐĞƌƚƐŝnjĞƐŝŵƵůĂƚĞĚĨƌŽŵƚŚĞnjĞďƌĂĨŝŶĐŚŐĞŶŽŵĞ ĨŽƌƚŚĞĞŶƚŝƌĞŐĞŶŽŵĞĂŶĚĨƌŽŵƌĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌŝĞƐƐĞƉĂƌĂƚĞůLJǁŝƚŚƚŚĞŐŽĂůƚŽ ĐŽŵƉĂƌĞƚŚĞƚǁŽĂƉƉƌŽĂĐŚĞƐŽƵƚůŝŶĞĚĂďŽǀĞ͘ . 4.1.1 Comparison of WGS vs RRL strategy dŚĞƌĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌLJĂƉƉƌŽĂĐŚŐĂǀĞ;dĂďůĞϰͿůŽŶŐĞƌĐŽŶƚŝŶƵŽƵƐĐŽŶƚŝŐƐ;EϱϬ ƚŚĂƚŝƐϭŬďŵŽƌĞƚŚĂŶƚŚĞt'^ƐĞƋƵĞŶĐŝŶŐƐƚƌĂƚĞŐLJͿ͘,ŽǁĞǀĞƌ͕ƚŚĞƌĞĐŽǀĞƌLJĨƌŽŵƚŚĞt'^ ŵĞƚŚŽĚ ǁĂƐ ϯй ŚŝŐŚĞƌ ǁŚŝůĞ ƚŚĞ ĂĐĐƵƌĂĐLJ ƌĞŵĂŝŶƐ ƚŚĞ ƐĂŵĞ͘ ^ŝŶĐĞ͕ Ă ĚĂƚĂƐĞƚ ǁŝƚŚ ϭй ƐĞƋƵĞŶĐŝŶŐĞƌƌŽƌĂŶĚϭйƉŽůLJŵŽƌƉŚŝƐŵǁĂƐƵƐĞĚĂŶĂĐĐƵƌĂĐLJŽĨϵϵйŝƐĞdžƉĞĐƚĞĚ͘  4.1.2 Evaluation of RRL library preparation in the laboratory. The construction of the RRL libraries requires the separation of the digested DNA fragments based on size using gel electrophoresis into separate libraries.. Resolution of the high. molecular weight DNA requires the use of low (1 to 0.5%) concentration gels which can be difficult to handle, especially for cutting out bands of various sizes.    . 35.

(148) Library. Best Kmer Number of Contigs. Longest Contig. Total number of bases. N50. Recovery. Accuracy. Merged with Minimus (20X +20X). BP (BglII PciI). 81370. 101091. 193094868. 8177. 92.88. 99.9. Merged with Minimus (25X +25X). BP (BglII PciI). 78843. 87214. 192987837. 8443. 92.9. 99.9. Wgs (20X). 47. 3,13,964. 1,10,286. 203077063. 6767. 94.87. 99.93. Wgs (25X). 49. 3,00,744. 1,07,314. 203467969. 6964. 95.19. 99.93. Wgs (30X). 49. 3,02,712. 1,15,974. 203646071. 7022. 95.21. 99.93. Wgs (35X). 49. 3,03,677. 1,03,197. 203751991. 7054. 95.23. 99.93. Wgs (40X). 49. 3,05,229. 1,06,435. 203864219. 7090. 95.24. 99.93. Wgs (45X). 49. 3,06,161. 1,05595. 203940103. 7099. 95.28. 99.93. Wgs (50X). 49. 3,07,288. 1,05,596. 204020912. 7085. 95.27. 99.93. dĂďůĞϰ͗ŽŵƉĂƌŝƐŽŶŽĨtŚŽůĞ'ĞŶŽŵĞ^ĞƋƵĞŶĐŝŶŐǁŝƚŚƚŚĞƌĞĚƵĐĞĚƌĞƉƌĞƐĞŶƚĂƚŝŽŶůŝďƌĂƌLJƐƚƌĂƚĞŐLJďĂƐĞĚŽŶ ƐŝŵƵůĂƚĞĚƉĂŝƌĞĚĞŶĚƌĞĂĚƐ. . Correct size fractions can be cut out from the gel using high molecular weight markers as a reference. However, differences in the concentration of the DNA in the markers and the sample can lead to the marker and sample DNA running at different speeds. It could be seen from repeated tests that the DNA marker was moving master than the digested DNA sample due to its lower concentration. The use of high concentrations of DNA marker leads to smearing of the marker and results in loss of resolution. Ethidium bromide was used due to its compatibility with the downstream sequencing steps. The bands of the appropriate size were cut out from the gel after visualizing them with Ultraviolet light for a short duration. Exposure of DNA to UV leads to degradation and fragmentation of DNA. Using a glass plate to shield from the UV will reduce the damage to. 36.

(149) the DNA. After cutting out the bands from the gel, DNA has to be purified from the gel. Purification of DNA from gel results in fragmentation and loss of DNA. Hence, the method of purification which provides the best yield with least fragmentation needs to be used for purification of DNA from the gel.. Use of Qiagen QIAEX II beads, Gelase and freeze and squeeze methods for purification of DNA from the agarose gel were tried. Qiagen provides the best yields (data not shown) for the low molecular weight libraries while the Gelase method is better for the high molecular weight libraries. However, the ammonium acetate precipitation step in the Gelase method could lead to inactivation of the ligase enzyme while adding the adapters prior to sequencing. All the methods show similar amount of fragmentation of the DNA after purification and provide yields of approximately 50%. Loosing half the DNA in a single purification step requires the use of large amounts of starting material for the RRL approach.. The first step of separation of DNA based on size needs to be verified and validated by at least one more step of size separation. Each step of size separation using gel electrophoresis leads to further fragmentation and loss of DNA. Hence, to obtain a yield of 2 ug DNA for each library, a total of 80 to 100 ug of DNA has to be digested. Digestion and phenol chloroform purification to remove the restriction enzyme leads to a loss of about 20 ug of DNA. The first step of size separation leaves approximately 20 ug of DNA (5 ug/ library). Second step of size separation will give a yield of approximately 2 ug/ library for sequencing. Hence, the RRL approach requires 50 times more DNA than the WGS approach to sequencing.. 37.

References

Related documents

Secondly, it also demonstrated practically what can be expected for an EG-GWAS or GWAS approach for an exonic causal variant: for both phenotypes investigated, EG-GWAS had a

[r]

Genome Transcriptome Proteome Metabolome Signalome Metabolic Engineering (this thesis) Cell Databases Genome whole-genome sequencing assembly &amp; annotation phylogeny

CGI önskade även att man på ett tydligt sätt skulle kunna se utvalda KPI:er, och hur det faktiska resultatet låg i förhållande till de uppsatta målen..

We present the genome sequences for 15 Mma isolates including the complete genomes of two type strains CCUG20998 and 1218R, both derivatives of the original Mma strain isolated

Quality control of reads and the actual genome assembly are different for the Illumina technology compared with long read technologies. These technologies will be

(a) Geographic positions for all wolverine samples included in the population genetic study (n = 234, mainly tissue samples collected from 1993 to 2011) (encircled points, samples

Sequence coverage refers to the average number of reads per locus and differs from physical coverage, a term often used in genome assembly referring to the cumulative length of reads