• No results found

Flera av de mest lovande molekylärbiologiska teknikerna analyserar prover i stor skala, och kan potentiellt revolutionera sjukvården och öppna för nya biologiska användningsområden. Den ledande metoden för storskaliga prote-instudier är masspektrometri. I dagsläget kan man rutinmässigt identifiera och kvantifiera tusentals proteiner i ett enda experiment, med hjälp av tekniken shotgun proteomics.

En utmaning hos de här experimenten är beräkningarna, och hur masspek-trumen ska tolkas. Ett shotgun proteomics-experiment kan enkelt generera tio-tusentals spektra, som alla kan representera en peptid från ett protein. Dock gör både biologiska och tekniska svårigheter att beräkningsmetoderna vi an-vänder ofta tolkar spektra felaktigt, så att fel peptid identifieras. Detta leder till att vi måste förlita oss fullständigt på statistiska uppskattningar om resultatens kvalité för att kunna dra biologiska slutsatser.

Den här avhandlingen innehåller fyra artiklar från min forsking om rik-tigheten hos de statistiska säkerhetsuppskattningar som beräknas för shotgun proteomics-resultat, och hur denna riktighet kan mätas. I de första två artiklar-na presenterar vi en ny metod för att använda redan karaktäriserade protein-prov för att mäta riktigheten. Den tredje artikeln visar hur man kan undvika feluppskattningar när man använder maskininlärning för att analysera datat.

I den fjärde artikeln presenterar vi ett nytt verktyg för att analysera shotgun proteomics-resultat, och mäter riktigheten i de statistiska uppskattningar det rapporterar.

Resultaten som presenteras i den här avhandlingen kan förenkla utveck-lingen av nya och exakta beräkningsverktyg inom masspektrometri-baserad proteomik. Sådana verktyg gör tolkningen av spektra och de efterföljande bi-ologiska slutsatserna mer tillförlitliga.

Acknowledgements

I must thank my supervisor Lukas Käll for considering me and choosing me over the (presumably) hard competition for the Ph.D. position in his group.

I am very grateful for all the ideas, knowledge, honest feedback and general encouragement that I got which made this project so much smoother. Many thanks to my fellow Ph.D. student Luminita Moruz for all the guidance and the great company during work, conferences and breaks. And thanks to Arne Elofsson for being my optimistic co-supervisor.

Erik Sjölund, thank you for teaching me what I should have already known about computers. I am still recovering from your win on 5 km though. I am also grateful to José Fernández Navarro for all your skilled programming ad-vice, and for all the hilarious stories. I am very thankful to my knowledgeable co-authors abroad, William Stafford Noble, Sangtae Kim and Richard Smith, for great ideas and collaborations. And I owe many thanks to the other mem-bers of the research group, like Mattia Tomasoni, Magnus Rosenlund, Amin Saffari, Xiao Liang, David Menéndez Hurtado and Matthew The, among oth-ers, for help with the projects and the thesis. To Jorrit Boekel, for the quick and fun perspective on science and many other things, and to the other members of Janne Lehtiö’s group that I have had the luck to work with.

Almost needless to say, I thank the many friends and colleagues at the Ar-rhenius Laboratory for giving me such an inspiring start of my Ph.D.! You are too many to mention here, but I am especially grateful to Johannes Björn-erås, although your research risks overturning the foundations of our society.

And to Axel Abelein for your kindness when Munich trashed Barcelona. To my Ph.D. twin Ye Weihua for your contagious enthusiasm, Sofia Unnerståle for your playfulness and to Patrik Björkholm for all the witty and true advice.

Thanks to members of Arne Elofsson’s group that I’ve been discussing, teach-ing and plannteach-ing with durteach-ing these years. And thanks to Torbjörn Astlind and Haidi Astlind for the help with everything from appliances to administration, like when I needed a saw(!) at work.

The last years of my Ph.D. I spent at the Science for Life Laboratory sur-rounded by an army of talented, friendly and helpful people. Special thanks to Joel Sjöstrand for never rejecting any new ideas or perspectives, regardless of their quality. To Hossein Farahani for the passion and the perfected cap-puccinos, you taught me the scientific importance of Beethoven. To Mattias

Frånberg and Kristoffer Sahlin for keeping it freq, and keeping me in competi-tive shape! Many thanks to Ikram Ullah, Muhammad Owais Mahmudi, Pekka Parviainen, Mehmood Khan, Auwn Muhammad Sayyed and Raja Hashim Ali for the riddles, discussions, cookies, cakes and the general education about the world. To the brilliant Rezin Dilshad of course, for countless of reasons.

And to all other friends and colleagues at Scilifelab who really made me enjoy being there.

I must also thank Jonas Bengtsson, Tomas Taus, Jürgen Klevert, Edward Young and Leslei Nogueira for your great company and help during summer schools and conferences. And Johan Seijsing and Shahin Aeinehband for re-warding discussions and football games. At last, many greetings and thanks to friends and family!

References

[1] Beran M (2012). Did you ever hear the one about the horse that could count? Frontiers in Psychol-ogy, 3(357):1–2.

[2] Anderson NL and Anderson NG (2002). The human plasma proteome history, character, and diag-nostic prospects. Molecular & Cellular Proteomics, 1(11):845–867.

[3] Templin MF, Stoll D, Schrenk M, et al. (2002). Protein microarray technology. Drug Discovery Today, 7(15):815–822.

[4] Alberts B (2008). Molecular biology of the cell. Garland Science.

[5] Uhlen M and Ponten F (2005). Antibody-based proteomics for human tissue profiling. Molecular

& Cellular Proteomics, 4(4):384–393.

[6] Wang Z, Gerstein M, and Snyder M (2009). RNA-Seq: a revolutionary tool for transcriptomics.

Nature Reviews Genetics, 10(1):57–63.

[7] Gygi SP, Rochon Y, Franza BR, et al. (1999). Correlation between protein and mRNA abundance in yeast. Molecular & Cellular Biology, 19(3):1720–1730.

[8] Selbach M, Schwanhäusser B, Thierfelder N, et al. (2008). Widespread changes in protein synthesis induced by microRNAs. Nature, 455(7209):58–63.

[9] Kitano H (2002). Systems biology: a brief overview. Science, 295(5560):1662–1664.

[10] Steen H and Mann M (2004). The ABC’s (and XYZ’s) of peptide sequencing. Nature Reviews Molecular Cell Biology, 5(9):699–711.

[11] Casella G and Berger RL (2002). Statistical inference, Second Edition, volume 70. Brooks/Cole, Cengage Learning.

[12] Murphy KP (2012). Machine learning: a probabilistic perspective. The MIT Press.

[13] Sterne JA and Smith GD (2001). Sifting the evidence–what’s wrong with significance tests? Physical Therapy, 81(8):1464–1469.

[14] Shaffer JP (1995). Multiple hypothesis testing. Annual Review of Psychology, 46(1):561–584.

[15] Altschul SF, Gish W, Miller W, et al. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410.

[16] Mackey AJ, Haystead TA, and Pearson WR (2002). Getting More from Less Algorithms for Rapid Protein Identification with Multiple Short Peptide Sequences. Molecular & Cellular Proteomics, 1(2):139–147.

[17] Fenyö D and Beavis RC (2003). A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Analytical Chemistry, 75(4):768–774.

[18] Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), 57(1):289–300.

[19] Storey JD (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society:

Series B (Statistical Methodology), 64(3):479–498.

[20] Storey JD and Tibshirani R (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16):9440–9445.

[21] Marsland S (2009). Machine learning: an algorithmic perspective. CRC Press.

[22] Boser BE, Guyon IM, and Vapnik VN (1992). A Training Algorithm for Optimal Margin Classifiers.

In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, pages 144–152. ACM, New York, NY, USA.

[23] Cortes C and Vapnik V (1995). Support-vector networks. Machine Learning, 20(3):273–297.

[24] Xie F, Smith RD, and Shen Y (2012). Advanced proteomic liquid chromatography. Journal of Chromatography A, 1261:78–90.

[25] Moruz L (2013). Chromatographic retention time prediction and its applications in mass spectrometry-based proteomics. Ph.D. thesis, Stockholm University.

[26] Thomson JJ (1913). Rays of positive electricity. Proceedings of the Royal Society of London Series A, 89(607):1–20.

[27] Eidhammer I, Flikka K, Martens L, et al. (2007). Computational methods for mass spectrometry proteomics. Wiley Online Library.

[28] Whitehouse CM, Dreyer RN, Yamashita M, et al. (1985). Electrospray interface for liquid chro-matographs and mass spectrometers. Analytical Chemistry, 57(3):675–679.

[29] Fenn JB, Mann M, Meng CK, et al. (1989). Electrospray ionization for mass spectrometry of large biomolecules. Science, 246(4926):64–71.

[30] Aebersold R and Mann M (2003). Mass spectrometry-based proteomics. Nature, 422(6928):198–

207.

[31] Wolff M and Stephens W (1953). A pulsed mass spectrometer with time dispersion. Review of Scientific Instruments, 24(8):616–617.

[32] Wiley W and McLaren IH (1955). Time-of-flight mass spectrometer with improved resolution.

Review of Scientific Instruments, 26:1150.

[33] March RE (1997). An introduction to quadrupole ion trap mass spectrometry. Journal of Mass Spectrometry, 32(4):351–369.

[34] Makarov A (2000). Electrostatic axially harmonic orbital trapping: a high-performance technique of mass analysis. Analytical Chemistry, 72(6):1156–1162.

[35] Comisarow MB and Marshall AG (1974). Fourier transform ion cyclotron resonance spectroscopy.

Chemical Physics Letters, 25(2):282–283.

[36] Hu Q, Noll RJ, Li H, et al. (2005). The Orbitrap: a new mass spectrometer. Journal of Mass Spectrometry, 40(4):430–443.

[37] des Poids et Mesures BI (2006). The international system of units (SI). http://www.bipm.org/

utils/common/pdf/si_brochure_8_en.pdf. 8th edition.

[38] Cooks R and Rockwood A (1991). The Thomson-A Suggested Unit for Mass Spectroscopists. Rapid Communications in Mass Spectrometry, 5(2):93–93.

[39] McLafferty FW and Bockhoff FM (1978). Separation/identification system for complex mixtures using mass separation and mass spectral characterization. Analytical Chemistry, 50(1):69–76.

[40] McLafferty FW (1981). Tandem mass spectrometry. Science, 214(4518):280–287.

[41] Sleno L and Volmer DA (2004). Ion activation methods for tandem mass spectrometry. Journal of Mass Spectrometry, 39(10):1091–1112.

[42] Olsen JV, Macek B, Lange O, et al. (2007). Higher-energy C-trap dissociation for peptide modifica-tion analysis. Nature Methods, 4(9):709–712.

[43] Zubarev RA, Kelleher NL, and McLafferty FW (1998). Electron capture dissociation of multi-ply charged protein cations. A nonergodic process. Journal of the American Chemical Society, 120(13):3265–3266.

[44] Syka JE, Coon JJ, Schroeder MJ, et al. (2004). Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proceedings of the National Academy of Sciences of the United States of America, 101(26):9528–9533.

[45] Kim S, Mischerikow N, Bandeira N, et al. (2010). The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search. Molecular & Cellular Proteomics, 9(12):2840–2852.

[46] McCormack AL, Schieltz DM, Goode B, et al. (1997). Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level. Analytical chemistry, 69(4):767–776.

[47] Link AJ, Eng J, Schieltz DM, et al. (1999). Direct analysis of protein complexes using mass spec-trometry. Nature Biotechnology, 17(7):676–682.

[48] Washburn MP, Wolters D, and Yates JR (2001). Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotechnology, 19(3):242–247.

[49] Purvine S, Yi EC, Goodlett DR, et al. (2003). Shotgun collision-induced dissociation of peptides using a time of flight mass analyzer. Proteomics, 3(6):847–850.

[50] Venable JD, Dong MQ, Wohlschlegel J, et al. (2004). Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nature Methods, 1(1):39–45.

[51] Gillet LC, Navarro P, Tate S, et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Molecular &

Cellular Proteomics, 11(6):1–17.

[52] Venter JC, Adams MD, Myers EW, et al. (2001). The sequence of the human genome. Science, 291(5507):1304–1351.

[53] Lander ES, Linton LM, Birren B, et al. (2001). Initial sequencing and analysis of the human genome.

Nature, 409(6822):860–921.

[54] Gygi SP, Rist B, Griffin TJ, et al. (2002). Proteome analysis of low-abundance proteins using multidi-mensional chromatography and isotope-coded affinity tags. Journal of Proteome Research, 1(1):47–

54.

[55] Chen J, Balgley BM, DeVoe DL, et al. (2003). Capillary isoelectric focusing-based multidimensional concentration/separation platform for proteome analysis. Analytical Chemistry, 75(13):3145–3152.

[56] Hebert AS, Richards AL, Bailey DJ, et al. (2013). The One Hour Yeast Proteome. Molecular &

Cellular Proteomics, page IN PRESS.

[57] Nesvizhskii AI, Vitek O, and Aebersold R (2007). Analysis and validation of proteomic data gener-ated by tandem mass spectrometry. Nature Methods, 4(10):787–797.

[58] Nesvizhskii AI (2010). A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Journal of Proteomics, 73(11):2092–2123.

[59] Käll L and Vitek O (2011). Computational Mass Spectrometry–Based Proteomics. PLoS Computa-tional Biology, 7(12):e1002277.

[60] Noble WS and MacCoss MJ (2012). Computational and statistical analysis of protein mass spec-trometry data. PLoS Computational Biology, 8(1):e1002296.

[61] Bartels C (1990). Fast algorithm for peptide sequencing by mass spectroscopy. Biological Mass Spectrometry, 19(6):363–368.

[62] Taylor JA and Johnson RS (1997). Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Communications in Mass Spectrometry, 11(9):1067–1075.

[63] Dancik V, Addona TA, Clauser KR, et al. (1999). De novo peptide sequencing via tandem mass spectrometry. Journal of Computational Biology, 6(3-4):327–342.

[64] Mann M and Wilm M (1994). Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Analytical Chemistry, 66(24):4390–4399.

[65] Tabb DL, Saraf A, and Yates JR (2003). GutenTag: high-throughput sequence tagging via an empir-ically derived fragmentation model. Analytical Chemistry, 75(23):6415–6421.

[66] Frank A, Tanner S, Bafna V, et al. (2005). Peptide sequence tags for fast database search in mass-spectrometry. Journal of Proteome Research, 4(4):1287–1295.

[67] Yates JR, Morgan SF, Gatlin CL, et al. (1998). Method to compare collision-induced dissociation spectra of peptides: potential for library searching and subtractive analysis. Analytical Chemistry, 70(17):3557–3565.

[68] Craig R, Cortens J, Fenyo D, et al. (2006). Using annotated peptide mass spectrum libraries for protein identification. Journal of Proteome Research, 5(8):1843–1849.

[69] Frewen BE, Merrihew GE, Wu CC, et al. (2006). Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Analytical Chemistry, 78(16):5678–5684.

[70] Internet. Ensembl project. http://www.ensembl.org/. January 24, 2014.

[71] Internet. National center for biotechnology information. http://www.ncbi.nlm.nih.gov/. January 24, 2014.

[72] Bairoch A, Apweiler R, Wu CH, et al. (2005). The universal protein resource (UniProt). Nucleic Acids Research, 33(suppl 1):D154–D159.

[73] Eng JK, McCormack AL, and Yates JR (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 5(11):976–989.

[74] Granholm V and Käll L (2011). Quality assessments of peptide–spectrum matches in shotgun pro-teomics. Proteomics, 11(6):1086–1093.

[75] Park CY, Klammer AA, Käll L, et al. (2008). Rapid and accurate peptide identification from tandem mass spectra. Journal of Proteome Research, 7(7):3022–3027.

[76] Craig R and Beavis RC (2004). TANDEM: matching proteins with tandem mass spectra. Bioinfor-matics, 20(9):1466–1467.

[77] Kim S, Gupta N, and Pevzner PA (2008). Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. Journal of Proteome Research, 7(8):3354–3363.

[78] Cottrell J and London U (1999). Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis, 20(18):3551–3567.

[79] Geer LY, Markey SP, Kowalak JA, et al. (2004). Open mass spectrometry search algorithm. Journal of Proteome Research, 3(5):958–964.

[80] Cox J, Neuhauser N, Michalski A, et al. (2011). Andromeda: a peptide search engine integrated into the MaxQuant environment. Journal of Proteome Research, 10(4):1794–1805.

[81] Gatlin CL, Eng JK, Cross ST, et al. (2000). Automated identification of amino acid sequence varia-tions in proteins by HPLC/microspray tandem mass spectrometry. Analytical Chemistry, 72(4):757–

763.

[82] Dasari S, Chambers MC, Slebos RJ, et al. (2010). TagRecon: high-throughput mutation identifica-tion through sequence tagging. Journal of Proteome Research, 9(4):1716–1726.

[83] Tsur D, Tanner S, Zandi E, et al. (2005). Identification of post-translational modifications via blind search of mass-spectra. In Computational Systems Bioinformatics Conference, 2005. Proceedings.

2005 IEEE, pages 157–166. IEEE.

[84] Houel S, Abernathy R, Renganathan K, et al. (2010). Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies. Journal of Proteome Research, 9(8):4152–4160.

[85] Paizs B and Suhai S (2005). Fragmentation pathways of protonated peptides. Mass Spectrometry Reviews, 24(4):508–548.

[86] Colaert N, Degroeve S, Helsens K, et al. (2011). Analysis of the resolution limitations of peptide identification algorithms. Journal of Proteome Research, 10(12):5555–5561.

[87] Internet. Searle B, Proteome software: X!Tandem explained.

https://proteome-software.wikispaces.com/file/view/XTandem-explained.ppt. January 24, 2014.

[88] Keller A, Nesvizhskii AI, Kolker E, et al. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74(20):5383–

5392.

[89] Dempster AP, Laird NM, and Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1):1–38.

[90] Moore RE, Young MK, and Lee TD (2002). Qscore: an algorithm for evaluating SEQUEST database search results. Journal of the American Society for Mass Spectrometry, 13(4):378–386.

[91] Elias JE and Gygi SP (2007). Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods, 4(3):207–214.

[92] Käll L, Storey JD, MacCoss MJ, et al. (2007). Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. Journal of Proteome Research, 7(1):29–34.

[93] North BV, Curtis D, and Sham PC (2002). A note on the calculation of empirical P values from Monte Carlo procedures. American Journal of Human Genetics, 71(2):439–441.

[94] Käll L, Canterbury JD, Weston J, et al. (2007). Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature Methods, 4(11):923–925.

[95] Everett LJ, Bierl C, and Master SR (2010). Unbiased statistical analysis for multi-stage proteomic search strategies. Journal of Proteome Research, 9(2):700–707.

[96] Gupta N, Bandeira N, Keich U, et al. (2011). Target-decoy approach and false discovery rate: when things may go wrong. Journal of the American Society for Mass Spectrometry, 22(7):1111–1120.

[97] Schulz-Knappe P, Hans-Dieter Z, Heine G, et al. (2001). Peptidomics the comprehensive analysis of peptides in complex biological mixtures. Combinatorial Chemistry & High Throughput Screening, 4(2):207–217.

[98] Nesvizhskii AI, Keller A, Kolker E, et al. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry, 75(17):4646–4658.

[99] Cox J and Mann M (2008). MaxQuant enables high peptide identification rates, individual-ized ppb-range mass accuracies and proteome-wide protein quantification. Nature Biotechnology, 26(12):1367–1372.

[100] Bern M and Goldberg D (2008). Improved ranking functions for protein and modification-site iden-tifications. Journal of Computational Biology, 15(7):705–719.

[101] Serang O, MacCoss MJ, and Noble WS (2010). Efficient marginalization to compute protein poste-rior probabilities from shotgun mass spectrometry data. Journal of Proteome Research, 9(10):5346–

5357.

[102] Shteynberg D, Deutsch EW, Lam H, et al. (2011). iProphet: multi-level integrative analysis of shot-gun proteomic data improves peptide and protein identification rates and error estimates. Molecular

& Cellular Proteomics, 10(12):M111.007690.

[103] Nesvizhskii AI and Aebersold R (2005). Interpretation of shotgun proteomic data the protein infer-ence problem. Molecular & Cellular Proteomics, 4(10):1419–1440.

[104] Tabb DL, McDonald WH, and Yates JR (2002). DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. Journal of Proteome Research, 1(1):21–26.

[105] Weatherly DB, Atwood JA, Minning TA, et al. (2005). A heuristic method for assigning a false-discovery rate for protein identifications from mascot database search results. Molecular & Cellular Proteomics, 4(6):762–772.

[106] Zhang B, Chambers MC, and Tabb DL (2007). Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. Journal of Proteome Research, 6(9):3549–3557.

[107] Serang O, Moruz L, Hoopmann MR, et al. (2012). Recognizing uncertainty increases robustness and reproducibility of mass spectrometry-based protein inferences. Journal of Proteome Research, 11(12):5586–5591.

[108] Keller A, Purvine S, Nesvizhskii AI, et al. (2002). Experimental protein mixture for validating tandem mass spectral analysis. OMICS: A Journal of Integrative Biology, 6(2):207–212.

[109] Klimek J, Eddes JS, Hohmann L, et al. (2007). The standard protein mix database: a diverse data set to assist in the production of improved peptide and protein identification software tools. Journal of Proteome Research, 7(01):96–103.

[110] Tanner S, Shu H, Frank A, et al. (2005). InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Analytical Chemistry, 77(14):4626–4639.

[111] Huttlin EL, Hegeman AD, Harms AC, et al. (2007). Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. Journal of Proteome Research, 6(1):392–

398.

[112] Bern M, Cai Y, and Goldberg D (2007). Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. Analytical Chemistry, 79(4):1393–

1400.

[113] Zhang J, Ma J, Dou L, et al. (2009). Bayesian nonparametric model for the validation of peptide identification in shotgun proteomics. Molecular & Cellular Proteomics, 8(3):547–557.

[114] Wenger CD, Phanstiel DH, Lee M, et al. (2011). COMPASS: A suite of pre-and post-search pro-teomics software tools for OMSSA. Propro-teomics, 11(6):1064–1074.

[115] Fisher SRA (1925). Statistical methods for research workers. Oliver and Boyd Edinburgh.

[116] Kohavi R (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence, 14(2):1137–1145.

Related documents