• No results found

Identification of obesity-associated SNPs in the human genome : Method development and implementation for SOLiD sequencing data analysis

N/A
N/A
Protected

Academic year: 2021

Share "Identification of obesity-associated SNPs in the human genome : Method development and implementation for SOLiD sequencing data analysis"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)

Department of Clinical and Experimental Medicine

Master’s Thesis

Identification of Obesity-Associated SNPs in the

Human Genome

Method Development and Implementation for SOLiD Sequencing Data Analysis

Lilia Hedberg

LiU-IKE-EX-10/05

(2)
(3)

Master’s Thesis LiU-IKE-EX-10/05

Identification of Obesity-Associated SNPs in the

Human Genome

Method Development and Implementation for SOLiD Sequencing Data Analysis

Lilia Hedberg

Supervisors: Helgi Schiöth

Department of Neuroscience, Uppsala university

Markus Sällman-Almén

Department of Neuroscience, Uppsala university

Examiner: Per-Eric Lindgren

Department of Clinical and Experimental Medicine, Linköping university

(4)
(5)

Avdelning, Institution

Division, Department

Department of Clinical and Experimental Medicine Linköping university

SE-581 83 Linköping, Sweden

Datum Date 2010-06-07 Språk Language  Svenska/Swedish  Engelska/English   Rapporttyp Report category  Licentiatavhandling  Examensarbete  C-uppsats  D-uppsats  Övrig rapport  

URL för elektronisk version

ISBN

ISRN

LiU-IKE-EX-10/05

Serietitel och serienummer

Title of series, numbering

ISSN

Titel

Title

Identification of Obesity-Associated SNPs in the Human Genome – Method Development and Implementation for SOLiD Sequencing Data Analysis

Författare

Author

Lilia Hedberg

Sammanfattning

Abstract

Over the last few years, genome-wide association studies (GWAS) have been used to identify numerous obesity associated SNPs in the human genome. By using linkage studies, candidate obesity genes have been identified. When SNPs in the first intron of FTO were found to be associated to BMI, it became the first gene to be linked to common obesity. In order to look for causative explanations behind the associated SNPs, a re-sequencing of FTO had been performed on the SOLiD sequencing platform. In-house candidate gene, SLCX, was also sequenced in order to evaluate a potential obesity association. The purpose of this project was to analyse the sequences and also to evaluate the quality of the SOLiD sequencing. A part of the project consisted in performing PCRs and selecting genomic regions for future sequencing projects. I developed and implemented a sequence analysis strategy to identify obesity associated SNPs. I found 39 obesity-linked SNPs in FTO, a majority of which were located in introns 1 and 8. I also identified 3 associated intronic SNPs in SLCX. I found that the SOLiD sequencing coverage varies between non-repetitive and repetitive genomic regions, and that it is highest near amplicon ends. Interestingly, coverage varies significantly between different amplicons even after repetitive sequences have been removed, which indicates that it is affected by features inherent to the sequence. Still, the observed allele frequencies for known SNPs were highly correlated with the SNP frequencies documented in HapMap. In conclusion, I verify that SNPs in FTO are associated with obesity and also identify a previously unassociated gene, SLCX, as a potential obesity gene. Re-sequencing of genomic regions on the SOLiD platform was proven to be successful for SNP identification, although the difference in sequencing coverage might be problematic.

(6)
(7)

Abstract

Over the last few years, genome-wide association studies (GWAS) have been used to identify numerous obesity associated SNPs in the human genome. By using linkage studies, candidate obesity genes have been identified. When SNPs in the first intron of FTO were found to be associated to BMI, it became the first gene to be linked to common obesity. In order to look for causative explanations behind the associated SNPs, a re-sequencing of FTO had been performed on the SOLiD sequencing platform. In-house candidate gene, SLCX, was also sequenced in order to evaluate a potential obesity association. The purpose of this project was to analyse the sequences and also to evaluate the quality of the SOLiD sequencing. A part of the project consisted in performing PCRs and selecting genomic regions for future sequencing projects. I developed and implemented a sequence analysis strategy to identify obesity associ-ated SNPs. I found 39 obesity-linked SNPs in FTO, a majority of which were locassoci-ated in introns 1 and 8. I also identified 3 associated intronic SNPs in SLCX. I found that the SOLiD sequencing coverage varies between non-repetitive and repetitive ge-nomic regions, and that it is highest near amplicon ends. Interestingly, coverage varies significantly between different amplicons even after repetitive sequences have been removed, which indicates that it is affected by features inherent to the sequence. Still, the observed allele frequencies for known SNPs were highly correlated with the SNP frequencies documented in HapMap. In conclusion, I verify that SNPs in FTO are associated with obesity and also identify a previously unassociated gene, SLCX, as a potential obesity gene. Re-sequencing of genomic regions on the SOLiD platform was proven to be successful for SNP identification, although the difference in sequencing coverage might be problematic.

(8)
(9)

Acknowledgments

I would like to thank professor Helgi Schiöth for giving me the opportunity to do my master thesis in his group - it has been a lot of fun! I am very grateful to Markus for his encouragement, support and good company and for always taking the time to help! A big hug to everyone at Helgilab - you are the best!

(10)
(11)

Contents

1 Introduction 1

2 Background 3

2.1 The FTO Gene . . . 4

2.2 Potential Obesity Genes . . . 5

2.2.1 SH2B1 . . . 6 2.2.2 KCTD15 . . . 6 2.2.3 TMEM18 . . . 6 2.2.4 NEGR1 . . . 6 2.2.5 SLCX . . . 7 2.3 SOLiD Sequencing . . . 7 3 Methods 11 3.1 Identification of Genomic Regions and Primer Design . . . 11

3.2 PCR and SOLiD Sequencing . . . 12

3.3 Sequence Analysis . . . 12

3.4 Quality of SOLiD Sequencing . . . 14

4 Results 17 4.1 Identification of Genomic Regions and Primer Design . . . 17

4.2 Sequence Analysis - SLCX . . . 18

4.3 Sequence Analysis - FTO . . . 19

4.4 Quality of SOLiD Sequencing . . . 21

4.4.1 Adherence to known SNP frequencies . . . 22

4.4.2 Effect of repetitive regions on primers . . . 23

4.4.3 Effect of distance to primer on coverage . . . 23

4.4.4 Difference in coverage between amplicons . . . 23

5 Discussion 25

(12)
(13)

Chapter 1

Introduction

Obesity is an increasing global health problem and a risk factor for type 2 diabetes, cardiovascular disease and certain types of cancer. The obesity epidemic is spreading throughout the world and the problem is becoming huge. In 2005, 400 million indi-viduals were considered clinically obese [1] and if the current trend continues, 1.12 billion will be obese by 2030 [2]. Although excessive eating and low levels of physical exercise clearly matter, it has been estimated that about 50-90% of variation in Body Mass Index (BMI) is due to genetic factors [3]. What these factors are remains to be investigated. The only gene so far to be definitely linked to common obesity is the so called fat mass and obesity associated gene, FTO. Individuals carrying two copies of the risk allele weigh on average 3 kg more than individuals with the low-risk allele [4], but a causative explanation to this observation is still lacking. In general, the search for causative mutations explaining common obesity will require a re-sequencing of large genomic regions, something which has been made possible by second generation sequencing technology, an example of which is Applied Biosystems’ SOLiD sequenc-ing platform. SOLiD sequencsequenc-ing is a rather new technology, and the development of sequence analysis software has been left behind.

In order to evaluate a potential obesity association for in-house candidate gene SLCX, and to search for causative explanations behind FTO’s association with obesity, a re-sequencing of these genes had been performed on the SOLiD re-sequencing platform. The genes had been sequenced in a cohort consisting of 547 overweight and 531 normal-weight children and adolescents. The goal of this project was to develop and implement a strategy for the analysis of the sequencing data and also to evaluate the general quality of SOLiD sequencing. A part of the project consisted in performing PCRs for a candidate obesity gene, TMEM18, and in selecting genomic regions for future sequencing projects.

Due to confidentiality, SLCX’s annotated name will not be given, nor will genomic positions for identified SNPs be revealed.

(14)
(15)

Chapter 2

Background

Obesity is an increasing global health problem and a major risk factor for type two di-abetes, cardiovascular diseases and certain types of cancer [1]. It leads to an increased overall mortality and morbidity [5], carrying with it heavy economic health care bur-dens [6]. The so called "obesity epidemic" started in the 1980’s in the United States [7] and is currently spreading over the world [1]. In the year of 2005 approximately 1.6 billion individuals were overweight and 400 million were clinically obese [1]. It has been estimated that if the current trend continues, 1.12 billion individuals will be obese by 2030 [2]. Attempts to stop the epidemic have so far been unsuccessful, and it is clear that a better understanding of the problem is needed.

A common measure of obesity is the Body Mass Index (BMI) which is calculated as body weight in kilograms divided by the square of the body length in meters. Overweight is defined as having a BMI over or equal to 25, and obesity as higher or equal to 30 [1]. Overweight and obesity are caused by an imbalance between energy uptake and energy expenditure, and have both environmental and genetic risk factors [1], [8]. Although the ongoing obesity epidemic is assumed to be caused by societal changes affecting food intake and physical exercise [7], there are large genetic risk factors involved. Some individuals are simply more prone to weight gain than others. How much of the variance in BMI within a population that is due to genetic factors is uncertain; results from twin studies indicate that 50 to 90% of BMI variation within a population can be attributed to genetic factors, although lower numbers have been reported [3]. Several genes involved in monogenic, severe cases of obesity have been identified, but these mutations are rare and cannot explain the genetic factor seen in general obesity [9].

In the search for genes associated with obesity, there are three strategies [8]. The first strategy is to perform linkage studies in families where obesity is common and to identify genomic markers which co-segregate with the disease. The second strategy is to screen candidate genes for variants that can be associated with obesity. Both

(16)

4 Background

these strategies have been unsuccessful for determining genetic causes of common obesity, but rather successful in the search for causes of rare monogenic forms of obesity [8], [9]. The third, and newest, strategy is to conduct genome-wide association studies (GWASs) in order to identify single nucleotide polymorphisms (SNPs) that show association with obesity [10]. These SNPs can together with linkage studies be used to identify candidate genes for screening. Linkage studies are used to identify linkage disequilibrium (LD) blocks, which are genomic regions that are inherited as single units, without being disrupted by recombination events. This means that SNPs and genes that are located in the same block are almost always inherited together. Therefore, if an SNP is found to be associated with obesity, it is possible that the cause of the association is in fact a mutation in a gene which co-segregates with the SNP.

GWASs were made possible by the HapMap project [11] which aimed at identifying common polymorphisms in the human genome, determining their frequencies and constructing a map of linkage disequilibrium blocks. By using HapMap information, microarrays for large-scale analysis of SNPs could be developed, something which was necessary for conducting genome-wide scans. Several GWASs have been conducted so far, which has led to the identification of the first human gene to be linked to common obesity, FTO (fat mass and obesity associated gene).

2.1

The FTO Gene

The first gene to be linked to common obesity was the FTO gene. It was first reported by Frayling et al [4] that a cluster of SNPs in the first intron of this gene were linked to type two diabetes. All SNPs were in strong linkage disequilibrium, meaning that the rate of recombination between these SNPs is low and that they are almost always inherited together. However, after correcting for BMI, the association with diabetes was completely abolished, indicating that this association was in fact mediated by an increase in body weight. The results were then replicated in 13 cohorts with a total of 38,759 individuals where it was confirmed that individuals homozygous for the high-risk A allele of SNP rs9939609 weigh on average 3 kg more and have a 1.67-fold increased odds of becoming obese compared to those homozygous for the low-risk T allele. The A allele was also associated with a higher waist circumference and higher subcutaneous mass, and it was found that the association was almost entirely attributable to changes in fat mass. These effects are seen from the age of 7 and beyond and have since then been replicated in several studies.

FTO is a large gene consisting of 9 exons that span about 400,000 kb on chromosome 16. In populations of European descent, the frequency of the A-allele is about 63%, with about 16% homozygous for it [4]. It has been estimated that the population attributable risk of FTO for obesity is 20%, meaning that if the effects of FTO on body weight were eliminated, 20% of obesity cases would be prevented [12].

(17)

2.2 Potential Obesity Genes 5

shares sequence motifs with Fe(II)- and 2-oxoglutarate-dependent oxygenases and that it can catalyse the demethylation of 3-methylthymine in single-stranded DNA [13]. Whether this is its physiologically relevant substrate remains to be discovered. FTO is conserved within vertebrates and, oddly enough, two types of algae [14]. It is expressed throughout the body in both adult and fetal tissues and its highest expression has been found to be in the brain, especially in the hypothalamus, a region important in the regulation of energy balance. FTO localizes to the cell nucleus which is consistent with its predicted demethylation activity [13], [14].

Several studies aiming at finding the signaling pathways and understanding the function of the FTO protein have been published. It has been shown that there is a link between feeding and Fto expression in mice and rats. In mice, Fto mRNA levels in the arcuate nucleus of the hypothalamus are decreased during fasting [13]. Interestingly, the effect of fasting in rats is opposite to that in mice [14].

The first study to demonstrate that the mouse ortholog of FTO is involved in energy homeostasis used Fto knockouts. The Fto-null mice were found to weigh less than normal mice, almost entirely due to a decrease in fat mass, and to gain less weight when exposed to a high-fat diet. The low body weight was however not caused by a decrease in energy intake; Fto-null mice were actually found to eat more than normal mice, and they also had a decreased level of physical activity. Despite this, their energy expenditure was higher than normal, as were their plasma adrenaline levels [15]. In light of this, Tung et al [16] studied the effect of hypothalamic levels of Fto on food-intake in rats. By manipulating the expression of Fto in the arcuate nucleus, they found that an over expression of Fto leads to a decrease in food intake and that a reduced expression results in an increased food intake. Body weight was however not affected.

Studies in humans support the involvement of FTO in energy homeostasis. Individuals homozygous for the A allele have been found to eat more [17], [18] and have an impaired satiety response [19] compared to those homozygous for the T allele. No association to physical activity has however been found in humans [16], [17].

It seems clear that there is a linkage between FTO genotype and BMI, but even so, FTO only explains about 1% of the genetic component of BMI, so clearly many genes remain to be identified [12]. SNPs in several other regions have been linked to obesity, which we will look at next.

(18)

6 Background

2.2.1

SH2B1

Willer et al [20] found that several SNPs in a region containing the SH2B1 gene were associated with BMI. SH2B1 is located on chromosome 16 and encodes a cytoplasmic protein known to be involved in leptin and insulin signaling [21], [22]. SH2B1 is expressed in various tissues including the liver, skeletal muscle, fat, and brain. It binds to various protein tyrosine kinases among which are the insulin receptor (IR) and JAK2. JAK2 initiates cell signaling in response to leptin, which is an adiposity signal conveying information regarding peripheral energy status to the brain [21], [22]. SH2B1 has been found to enhance insulin and leptin signaling in vivo; mice with a systemic SH2B1 deletion are obese and develop hyperglycemia, hyperleptinemia and hyperinsulinemia [21], [22]. When SH2B1 expression is restored specifically in neural tissue, a normal phenotype is obtained, which suggests that neuronal SH2B1 controls energy balance and body weight [22].

A coding SNP in SH2B1 has been associated to serum leptin and body fat in humans. This SNP has no predicted effect on protein function or structure and is probably in LD with the causative mutation [23]. Deletions encompassing SH2B1 have been found to co-segregate with severe early-onset obesity [24].

2.2.2

KCTD15

KCTD15 is located on chromosome 19 and has been found to have a high level of expression in the brain and hypothalamus [20]. Several studies have found that SNPs downstream of the gene are associated with BMI [20], [25], [26]. The function of KCTD15 is however unknown.

2.2.3

TMEM18

TMEM18 is located on chromosome 2 and SNPs in close proximity to the gene have been strongly associated with obesity [20], [25], [26], [27]. The function of TMEM18 is unknown, although it has been shown to have a nuclear localization signal, and contain a potential DNA or RNA binding site. The gene is highly conserved in eukaryotes and is widely expressed. Most interestingly, it shows a high expression in the hypothalamus, which is known to be involved in feeding and body weight regulation. Gene expression is however not affected by fasting [27].

2.2.4

NEGR1

NEGR1 is located on chromosome 1 and is thought to influence neuronal outgrowth. The gene is expressed at high levels in the brain and hypothalamus. Several studies

(19)

2.3 SOLiD Sequencing 7

have shown that there is an association between BMI and a region upstream NEGR1 [20], [25], [26]. Two segments in this region are copy number variable and there are two deletion polymorphisms (10 kb and 45 kb) that segregate on distinct haplotype blocks. The larger deletion sequence contains conserved, noncoding elements. The SNPs that were most strongly associated to BMI by Willer et al [20] flank the 45 kb deletion (rs2568958 and rs2815752) and it has been suggested that this deletion is what causes the association to BMI. Individuals who have these BMI-associated SNPs will thus have a 45 kb deletion and it therefore seems like the deleted sequence has some "protective" function against becoming overweight.

2.2.5

SLCX

SLCX is an in-house candidate obesity gene. It is believed to be a solute carrier, and it shows a high expression in the entire brain, especially the hypothalamus. Besides that, nothing is known regarding its function, but since it was found to be highly expressed in brain regions important for weight control, SLCX was considered interesting for further investigation regarding association to BMI (personal communication).

2.3

SOLiD Sequencing

The approach of using GWASs to locate regions associated with BMI has led to the identification of many candidate obesity genes. In order to discover the causative mutations behind these seen associations, large genomic regions will need to be re-sequenced. Targeted re-sequencing has been greatly facilitated by the use of high-throughput "next generation sequencing" techniques, one of which is SOLiD sequenc-ing.

SOLiD sequencing (Applied Biosystems; California, USA) is one of the so called next generation sequencing platforms which are, unlike traditional sequencing methods, not based on Sanger sequencing. Second generation sequencing methods all entail the following steps: DNA fragmentation to create a library, adaptor ligation to the library fragments, clustering of amplicons from each library fragment, and cyclic array sequencing (Fig 2.1) [28]. There are different approaches for accomplishing each of these steps, the ones which are used in SOLiD sequencing are described below. The first step in the SOLiD sequencing protocol is to create a DNA library. DNA libraries can be created in any way that gives rise to short fragments, for instance by sonication where the DNA molecules are shattered by ultrasound. After fragmentation, adaptor sequences are ligated to both ends of the fragments (Fig 2.1a) and then an emulsion PCR, with primers complementary to the adaptor sequences (Fig 2.1b), is performed. An emulsion PCR is like a regular PCR, except that the reaction mixture

(20)

8 Background

DNA fragmentation adaptor ligation

emulsion PCR

Enrichment of beads carrying amplicons

SOLiD Sequencing a

b

c

Figure 2.1. Preparation of DNA for SOLiD sequencing. a) DNA is randomly fragmented,

and adaptor sequences are ligated to the fragment ends. b) Emulsion PCR is performed, with one of the primers linked to a paramagnetic bead. Beads end up being covered with amplicons from one specific DNA frament. c) Beads carrying amplicons are enriched and prepared for SOLiD sequencing. Modified from Shendure and Ji [28].

template, so that a very large amount of miniature PCR reactions are performed in the emulsion [29]. This feature makes it possible to cluster PCR amplicons belonging to each separate library fragment. In SOLiD sequencing, one of the PCR primers is linked to a 1 µm paramagnetic bead which means that as the PCR is performed, the bead will become covered with PCR products (Fig 2.1b). By keeping the template concentration low, each drop will contain a maximum of one bead and possibly a DNA template; thus when the PCR is finished, each bead will be covered by amplicons from

(21)

2.3 SOLiD Sequencing 9

one specific library fragment. The point of using beads is that they enable the isolation of clusters of unique PCR products after the reaction has finished (Fig 2.1c). Beads containing amplicons are separated from beads that did not take part in a PCR and these beads are then attached to a slide [28].

Unlike Sanger sequencing which is based on elongation by a DNA polymerase, SOLiD uses a DNA ligase [28]. At the beginning of sequencing, a primer is hybridized to the adaptor sequence and then, in each cycle the primer is extended by hybridizing fluorescently labeled octamer sequences to the amplicons (Fig 2.2). Four fluorescent labels are used and each corresponds to the two bases in the fourth and fifth position of the octamer according to the colour scheme in Fig 2.2. As an octamer binds to the amplicon to be sequenced, the fluorescent colour is registered and the last three bases which bind the fluorescent are cleaved off. The primer extension continues with the binding of yet another octamer. In this way every fourth and fifth base are queried, however, since there are four colours to identify 16 base pairs, one round of primer extension does not reveal the identity of a single base, it only narrows it down. So after a certain read length (50 bp), the strands are denatured to begin a new round of sequencing, this time with a primer one base longer. The process is iterated until every base has been queried twice. If the sequencing begins in the adaptor sequence, the genetic code can be deciphered according to the principle described in figure 2.2 [28], [29]. In practice however, the data is aligned to a reference sequence which has been translated to colour space, and after the alignment, the data is translated into base space [30].

The random fragmentation of the DNA strands gives overlapping sequences, which means that each genomic position is included in several fragments with different start-ing positions, so called unique startstart-ing points (usp). The number of times a genomic

(22)

10 Background A T G C A T G C NNNXXNNN NNNXXNNN NNNXXNNN NNN XX NNN Cycle 1 Cycle 2 After 4 cycles XX XX XX XX XX TX XX XX XX XX XX XX XX XX XX XX XX XX XX XX Deciphering AC AA CT CC CG TA AA CC GC AC TA AC GG GA CC CT AA TG CG GC TACTAAAACCTGGCCGACGCA NNNXXNNN T TX XX XX XX XX XX XX XX XX XX

Figure 2.2. The principles of SOLiD sequencing. Fluorescently labeled octamers bind to

the DNA strand that is to be sequenced. Each fluorescent corresponds to the nucleotides in the fourth and fifth positions of the octamers according to the colour scheme. When an octamer binds to the DNA strand, the fluorescent colour is registered, and the last bases of the octamer are cleaved off. In this way every fourth and fifth base is queried. To get the entire sequence, successive rounds of primer extension are performed, each time with a displacement of the primer by one step, so that every base is interrogated twice. By letting the first base be a part of the adaptor sequence (pink), the code can be deciphered.

(23)

Chapter 3

Methods

The approach we use to identify SNPs linked to obesity can be said to roughly follow the process illustrated in Fig 3.1. The first step is to identify candidate genes, either through bioinformatic methods or literature studies. When a candidate gene has been identified, target-specific primers are designed so that the gene can be isolated and amplified in a PCR. The PCR products are sequenced on the SOLiD platform and the sequencing data is analysed. SNPs associated with obesity are then genotyped to verify the results.

Figure 3.1. Schematic overview of the approach used to identify obesity associated SNPs

3.1

Identification of Genomic Regions and Primer

Design

The choice of candidate genes was based on the GWAS by Willer et al. [20]. Due to limited resources it was not possible to sequence the entire genes, so regions of specific interest needed to be identified. My approach was to look at the SNPs with the highest obesity association and by using Haploview [31], isolate a region in LD with the SNPs.

(24)

12 Methods

For the deletion upstream of NEGR1, the deletion region was studied using the UCSC Genome Browser (http://genome.ucsc.edu) [32] and blastn (refseq database). Genomic sequences were downloaded from UCSC Table Browser [33]. Primers were designed using Primer3Plus [34]. Two separate design strategies were used: The first strategy was to design one pool of overlapping amplicons of about 2000 base pairs (bp). However, due to previously observed problems with a highly varying sequenc-ing coverage and the loss of information in primer regions, a second strategy was developed. The strategy was to design two almost completely overlapping pools of amplicons. This makes it possible to remove the primer sequences from each pool and still have a large overlap between amplicons, which evens out the sequencing coverage.

3.2

PCR and SOLiD Sequencing

Two study groups were used for the sequencing of the candidate obesity genes: (1) A case group consisting of 547 children enrolled at the National Childhood Obesity Centre at Karolinska University Hospital, Huddinge, Sweden. (2) A control group consisting of 531 normal-weight Swedish adolescents. The DNA from each group was pooled, so that two pools where no individual genotype could be identified were obtained. This was done in order to decrease the amount of PCR reactions needed. I performed PCRs for the TMEM18 gene. Primers had been designed so that there was one pool of five overlapping amplicons. Primers were optimised by trying 3 melting temperatures (55, 63, and 68◦C). PCRs were performed with the following reaction mixture (volumes given for 1 reaction): 7.5µl primermix (0.3µM), 10µl template DNA (5 ng/µl), and 7.5µl master mix. For the preparation of the master mix, solutions from the KAPA HiFi PCR Kit (Kapa Biosystems; Massachusetts, USA) were used. The master mix consisted of 1.25µl MilliQ water, 5µl buffer (5x, KAPA HiFi Fidelity Buffer), 0.75µl dNTP (10 mM, KAPA dNTP Mix), and 0.5µl DNA polymerase (1U/µl, KAPA HiFi polymerase). The PCR protocol used was 2 minutes at 95◦C, 30 seconds at 98◦C, 15 sec at the optimised primer temperature, and 1.20 min at 72◦C.

Gel electrophoresis was performed on PCR products to verify correct amplification. DNA was extracted from the gel by using the GeneJET Extraction Kit (Fermentas Life Sciences; Sweden). In the case that there were no studder bands, DNA was extracted directly by using the GeneJET PCR Purification Kit. Amplicon concentration was measured on NanoDrop (Thermo Fischer Scientific; Michigan, USA). Sequencing on the SOLiD platform was performed at the Uppsala Genome Centre.

3.3

Sequence Analysis

When analysing SLCX, I developed a strategy that I implemented in Microsoft Office Excel 2007. The strategy was then slightly refined and implemented into two Python

(25)

3.3 Sequence Analysis 13

scripts for the analysis of FTO. A summary of the strategies is given in figure 3.2.

Figure 3.2. Summary of the steps in the SOLiD sequencing analysis. For each genomic

position, the coverage for the reference allele and for the alternative alleles is calculated. Together with total coverage, this is used to calculate major and minor allele frequencies, which are then used to perform χ2 tests and to calculate Odds Ratios.

The first step of both strategies was to calculate the coverage for each genomic po-sition. The number of times there was a hit for the reference base was distinguished from the number of times there was a hit for an alternative base. Major and minor allele frequencies were then calculated for each position that had a total coverage ex-ceeding 500. Frequencies were calculated as individual base coverage divided by total coverage.

For SLCX, usps for each position were also counted. Statistical analyses were per-formed for positions with a coverage exceeding 500, with more than 30 usp and a minor allele frequency over 5%. For each position, Odds Ratio (OR) (3.1) and Stan-dard Error (3.3) were calculated as

OR = pob/(1 − pob) pc/(1 − pc) (3.1) L = ln(OR) (3.2) SE = s 1 nobpob + 1 nob(1 − pob) + 1 ncpc + 1 nc(1 − pc) (3.3)

(26)

14 Methods

least 95% after Bonferroni correction. Confidence intervals were constructed using Woolf’s method.

CI = L ± 2.58SE (3.4)

χ2 tests were then performed for positions with an OR 6= 1. The null hypothesis

was that the probability of being obese or slim is equal in both groups independent of whether one has the minor or major allele. Since the individuals were pooled, the genotype for each individual could not be considered. Therefore, instead of considering the number of people in each group, the number of alleles was used in the statistical calculations.

To assess the regions in which the identified positions were located, the UCSC genome browser was used. PhyloP [35] and RepeatMasker [36] were used to study the conser-vation at individual positions and to examine whether the region contained repetitive elements or not.

For the sequence analysis of FTO, the method of analysis was slightly refined. Instead of basing the selection of positions on OR, χ2 tests were performed for all genomic positions that had a sequencing coverage exceeding 500 and a minor allele frequency higher than 5% in either the test or control group. Positions were then evaluated based on the χ2 p-value, after Bonferroni correction. OR was used to indicate the

direction of influence of the minor allele (whether it had an increased or decreased odds of being found in obese individuals). The number of usps was not counted since it was considered less interesting than coverage.

3.4

Quality of SOLiD Sequencing

The observed minor allele frequencies for SLCX and FTO were compared to HapMap SNP frequencies by performing Mann-Whitney tests. For FTO, I examined whether the relative frequency discrepancy (3.5) was correlated with the minor allele frequency in the control and obese groups, and whether there was any difference between the groups. I also examined whether we were able to identify the correct minor allele.

|ObservedF requency − HapM apF requency|

ObservedF requency (3.5)

The remaining analyses of sequencing quality were performed for the SLCX sequencing only. The sequencing coverage, the number of unique starting points and the difference in minor allele frequency between the test and control groups in repetitive regions (as defined by UCSC’s RepeatMasker) were compared to those in non-repetitive regions by performing Mann-Whitney tests. The effect of distance to primer, i.e. amplicon ends, on coverage was studied by correlation.

(27)

3.4 Quality of SOLiD Sequencing 15

I examined whether there was any difference in sequencing coverage between amplicons (repetitive regions were removed). Since coverage was not normally distributed a non-parametric Kruskal-Wallis test was performed. A Friedman test was performed to examine the block influences (test and control groups).

(28)
(29)

Chapter 4

Results

4.1

Identification of Genomic Regions and Primer

Design

Because SH2B1 has previously been linked to obesity [20], I chose to design primers for the entire gene to look for causative SNPs. For KCTD15, the associated SNPs are downstream of the gene [20], and a haplotype analysis was performed to choose which region to sequence. Parts of KCTD15 were found to be in LD with two of the obesity associated SNPs (Fig 4.1) and I therefore chose to design primers for the entire KCTD15 and three haplotype blocks upstream of the gene, one of which contains the assocated SNPs. In the case of TMEM18, the associated SNPs were tightly clustered in a region of about 57 kb downstream of the gene [20]. Because of the strong association in this region, I chose to sequence it in addition to the TMEM18 gene. The deletion close to NEGR1 was found to contain several highly conserved regions, and the end regions showed the highest degree of conservation. There were no hu-man mRNAs encoded in the region, but there was a hit for a cDNA clone from a mosquito. A blastn search with this sequence, using the refseq rna database, revealed that the sequence is similar to protein X (actual name will not be given) in many species, among which are humans. To confirm, a blastn search with the human pro-tein X was performed, which gave many hits across the genome, indicating that this is a common pseudogene. One of the hits was in the deletion region upstream of NEGR1. The pseudogene has not been shown to be transcribed, but there are six human expressed sequence tags (ESTs) at the 3’ end. These ESTs have however only been documented in cell lines. I used the Emboss pariwise alignment algorithm (www.ebi.ac.uk/Tools/emboss/align) to evaluate the similarity between the parental gene and its pseudogene. The pseudogene is 85.8% identical to the mRNA of pro-tein X, and the greatest difference is in exon 3. By using BLAT searches I scanned several mammalian genomes with the protein X mRNA as bait. I discovered that the

(30)

18 Results

Figure 4.1. Haplotype view of the region around KCTD15. The SNPs which were most

strongly associated with BMI according to Willer et al [20] are given. The arrow indicates the direction of transcription of KCTD15.

pseudogene is conserved in monkeys, but not in more distantly related organisms. To examine whether the pseudogene is expressed in humans, I chose to design primers for a qPCR which will be performed on human RNA from brain.

The success rate of the primers I designed was on average 70% (unsuccessful primers were redesigned).

4.2

Sequence Analysis - SLCX

Three positions in SLCX with an OR 6= 1 were identified (Table 4.1, Fig 4.2). Two of the SNPs have higher odds of being found in the obese group than in the control group. The null hypothesis of the χ2 test could be rejected for all three positions

(P ≤ 0.01), i.e. there is a significant difference in minor allele frequency between the test and control groups for these positions. Neither of the positions is located in a coding sequence or obvious splice site. Two positions are located in intronic regions, whereas the third is found upstream exon 1.

The three identified SNPs were studied with UCSC’s PhyloP and RepeatMasker. One of the positions was found to be negatively conserved, whereas the other positions were evolutionarily neutral (Table 4.1). Two positions were located in repetitive regions.

(31)

4.3 Sequence Analysis - FTO 19

Figure 4.2. Sequencing results for SLCX. a) Approximate exon locations and the locations

of the three obesity-associated SNPs that were identified. b) Level of sequence conservation. A positive value indicates a high conservation, while a negative value indicates that the sequence has changed many times during evolution. c) Difference in minor allele frequency between the obese and control groups. d) PCR amplicons of SLCX. e) Sequencing coverage.

Table 4.1. Positions associated with obesity in SLCX. LTR=Long terminal repeats,

SINE=Short interspersed repetitive elements Position Region Allele Minor

allele frequency difference Odds Ratio 95% Confidence Interval p‐ value Coverage Unique Starting points Repetitive region Conservation 1 Intron 8 C/A 0,131 0,219 0,117‐ 0,410 0,000 3046 100 No Neutral 2 Intron 9 T/C 0,091 0,443 0,269‐ 0,730 0.000 1039 95 LTR Negative 3 Upstream exon 1 A/G 0,065 1,576 1,030‐ 2,414 0.006 34650 47 SINE Neutral

(32)

20 Results

positions had a p-value that met the requirements. Out of these, positions with a minor allele that could not be determined (N) were removed. This left 53 positions (Fig 4.3) which were then filtered for positions with a minor allele frequency exceeding 5% in the control group (since we are interested in frequent SNPs). 39 positions were then left, 31 of which were not documented in dbSNP [37]. The distribution of the significant positions can be seen in table 4.2. SNPs considered significant have a p-value ≤ 0.05 and a minor allele frequency ≥ 0.05 in the control group. SNPs where the minor allele could not be determined were not included. 11 of the associated SNPs were located in intron 1, and 6 of these are in LD with rs9939609. There were no significant SNPs in any exons. The results for the most "famous" SNP in FTO, rs9939609, could not be repeated, its χ2 p-value was 0.14.

Table 4.2. Distribution of obesity linked SNPs in FTO

Intron Number of significant SNPs 1 11 2 1 3 1 4 5 7 4 8 17 52295376 52398002 52500629 52603255 52705882 0 5 10 15

FTO Chi2 P-values 'C:\users\student\desktop\Lilia\FTO\graphs\FTOp-limits.txt' u 1:4

'C:\users\student\desktop\Lilia\FTO\graphs\FTOp-limits.txt' u 1:4 'C:\users\student\desktop\Lilia\FTO\graphs\FTOp-limits.txt' u 1:5

Figure 4.3. Obesity linked positions in FTO. Top graph: schematic view of FTO. Bottom

graph: -log(p) plotted against genomic position. Top line (pink) shows the limit for signifi-cance at 1% (after Bonferroni correction), second line (turquoise) shows the 5% limit, and the bottom line (green) shows the limit for significance at 5% without adjusting for multiple testing.

(33)

4.4 Quality of SOLiD Sequencing 21

(34)

22 Results

4.4.1

Adherence to known SNP frequencies

The observed minor allele frequencies were found to be fairly adherent to the HapMap frequencies (Fig 4.4). A MannWhitney test for SLCX gave a point estimate of -0.00312 for the difference in minor allele frequency between HapMap and our results (95% CI [-0,01025; 0,01245]). The null hypothesis of equal distributions could not be rejected (P=0.2869). For both FTO and SLCX, the Pearson correlation coefficient was slightly over 0.8, which indicates that the HapMap allele frequencies are consistently slightly higher than the allele frequencies in our cohort.

I found a significant correlation between the minor allele frequency and the relative frequency discrepancy compared to HapMap (-0.383 and -0.210) for the control and obese groups respectively (P=0.000). The relative error decreases for higher minor allele frequencies. For FTO, the discrepancy with HapMap frequencies is smaller in the obese than in the control group (Fig 4.5). Our minor allele was compared to that of HapMap. In those cases where a minor allele was detected, it was consistent with that of HapMap. Out of 713 HapMap SNPs, 520 were detected in both groups. Out of the detected SNPs, all but 3 had the correct minor allele. The nucleotide for these 3 SNPs could not be identified, which was indicated by an "N".

Figure 4.5. Deviation between HapMap SNP frequencies and observed frequencies for FTO.

(35)

4.4 Quality of SOLiD Sequencing 23

4.4.2

Effect of repetitive regions on primers

To examine whether repetitive regions affect coverage and thus the quality of the sequencing, a Mann-Whitney test was performed. The test showed that the coverage in repetitive regions is significantly different from the coverage in non-repetitive regions (P=0.0000). The point estimate for the difference in coverage is -585.0 (95% CI: -657.0; -514.0). The median values were 2893.0 and 3801.0 for repetitive and non-repetitive regions, respectively. There was no significant effect of non-repetitive regions on the number of unique starting points. The difference in MAF between the test and control groups was found to be slightly higher in repetitive regions; point estimate 0.00028 (P=0.0000, 95% CI: 0.00023;0.00034). A correlation study confirmed that a higher coverage is correlated to a lower difference in minor allele frequency between case and control groups (Pearson coefficient -0.10).

4.4.3

Effect of distance to primer on coverage

Coverage is negatively correlated to distance to primer (-0.37), meaning that when the distance to a primer decreases, i.e. near the ends of the PCR-amplified sequences, the coverage increases. This can also be seen in figures 4.2d-e.

4.4.4

Difference in coverage between amplicons

The null hypothesis of the Kruskal-Wallis test of equal median values could be rejected at a level of P=0.000. The test was performed for both the control and obesity groups. Friedman tests for the median and average coverage values were performed. The null hypothesis for this test is that all treatment effects, in this case amplicon effects, are zero, i.e. that coverage is not affected by the amplicons. The null hypothesis could be rejected at a level P=0.010 in favour of the alternative hypothesis that coverage

(36)

24 Results

Figure 4.6. Sequencing coverage in the amplicons of SLCX in the test and control groups, respectively

(37)

Chapter 5

Discussion

SLCX is a gene that has never been associated with obesity until now. Three SNPs in SLCX have an OR significantly separated from 1, indicating that these alleles are not found in obese and lean individuals to an equal extent and therefore could play a role in the development of obesity. SNP 1 has the lowest OR (Table 4.1), suggesting that individuals with the alternative allele have a lower odds of becoming obese compared to the test group. SNP 2 has a slightly higher OR, but still the alternative allele is more common in the control group. Interestingly, the direction of the OR of SNP 3 is opposite to that of the two other SNPs. While SNPs 1 and 2 seem to protect against obesity, SNP 3 is more often found in obese individuals and is potentially causing the weight gain. SNP 2 was found to be negatively conserved, whereas the other positions were evolutionarily neutral. A negative degree of conservation means that the position is fast evolving and that alternative alleles have been introduced many times during evolution, possibly to adapt to changes in the environment. SNP 3 is found upstream of the first exon whereas the other two SNPs are intronic. None of the SNPs is located at an obvious splice site or regulatory region, but it is possible that they exert their effect by altering the splicing mechanism or the expression of the gene through unknown mechanisms.

Whereas SLCX was previously unlinked with obesity, SNPs in the first intron of FTO have showed a significant association in several studies. I here identified 39 obesity-associated SNPs in the FTO gene. All of the SNPs were located in intronic regions, with clusters in introns 1 and 8 (Table 4.2). SNPs in intron 1 have previously been associated with obesity [4], and SNPs in intron 4 have been found to correlate with serum insulin levels and insulin resistance, but not body weight [38]. Intron 8 has so far not been associated with obesity or any other medical condition. It should however be noted that intron 8 is the longest, which could explain the larger amount of SNPs found in it.

(38)

26 Discussion

regulation. Inactivation of Fto in mice leads to a reduction in adipose tissue, and a higher tolerance to high-fat diets [15]. These results were strengthened by another study which showed that food intake in rats is influenced by the expression levels of Fto in the hypothalamus. The more Fto is expressed, the lower is the energy intake [16]. Studies have also shown that variants in FTO intron 1 influence food intake in children [18].

Although variants in intronic regions of FTO have been linked to obesity, this does not necessarily mean that the FTO gene is the reason behind the association. If FTO is indeed involved in common obesity, it would be expected that mutations causing a loss of FTO function, would be prevalent in either obese or lean individuals. This is however not the case; loss-of-function mutations are equally common in both groups [39]. It was recently reported that an autosomal recessive disorder in a consanguineous family was caused by a loss-of-function mutation in FTO. Affected individuals had multiple malformations including growth retardation, functional brain deficiency and severe psychomotor delay. Some individuals had structural brain malformations and cardiac defects. This points to an involvement in the development of the central nervous and cardiovascular systems. It was also observed that heterozygotes were not clinically obese [40]. Thus, although studies point to the involvement of FTO in body weight regulation, it is clearly not its only function. The wide expression of FTO also fits with an involvement in several systems, one of which is likely to be energy homeostasis [14]. The intronic regions where the strongest obesity association has been observed are highly conserved in vertebrates. The proximity to the genes IRX3, IRX5 and IRX6 is also highly conserved, which indicates that the region is a genomic regulatory block (GRB). GRBs control the expression of target genes and the regulatory sequences are often found in introns of so called bystander genes. In the case of FTO, this would mean that FTO is a bystander gene that has been mistaken for an obesity gene and that the actual cause for the association would be IRX3, 5 or 6. IRX3 is closest to FTO and has been found to be involved in the regulation of the ratio of pancreatic beta to epsilon cells, which means that it is involved in the control of insulin secretion and could be the underlying cause for the association to BMI [41]. The combination of GWAS and studies of conserved synteny could be a promising approach for identifying causative mutations behind common obesity. I here identified additional intronic SNPs in FTO and SLCX, but whether these are located in GRBs has not been examined. In the BMI associated deletion upstream NEGR1 I identified a processed pseudogene. Pseudogenes are non-functional sequences with a high similarity to one or more par-alogous genes and they can arise through either retrotransposition or duplication of genomic regions [42]. Processed pseudogenes are copies of the mRNA of their parental genes and are thus intron free [43]. It has been shown that processed pseudogenes regulate gene expression by RNA interference in mouse oocytes [44], [45] and that the transcription level of a human ABC transporter is regulated by its pseudogene [43]. The expression of the ABC transporter gene was found to decrease when the pseudogene was knocked out. It was speculated that the genes competed for a cellular RNA degradation route, and that when the pseudogene was not expressed, the mRNA of the parental gene was degraded to a higher extent. Whether the pseudogene near

(39)

27

NEGR1 is the explanation behind the association with BMI will be studied in a future project.

To summarise, many genes are involved in the development of common obesity, a majority of them are certainly still unknown. There are also several types of genetic variation that could contribute to an obesity-prone genotype, for instance copy number variations [46] and epigenetic modifications [47]. It is however important to stress that most people are not genetically destined to become obese. Even though some people are more prone to gaining weight, environmental factors still matter. This was demonstrated in a study by Andreasen et al [48] which showed that the effect of FTO SNP rs9939609 depends on the level of physical activity. Physically active individuals carrying the risk allele weigh less than those that are physically inactive. A study in a Swedish cohort of elderly men [49] further indicates the importance of environmental factors. In this study, no association between FTO genotype and BMI was found. These men have lived most of their life in a non-obesogenic environment, which suggests that the effect of FTO genotype is dependent on environmental factors. Overall, the results of this pooled sequencing strategy should be seen as an indicative as to which SNPs could be obesity linked, rather than regarded as absolute proof. The strategy of pooling DNA has several risks. There is no guarantee that DNA from all individuals in the cohort was amplified in the targeted PCR, and DNA from different individuals could be amplified to a varying extent, leading to false allele frequency approximations and thus false positive SNPs. However, a recent study performed on the Roche 454 sequencer [50] found that pooled DNA strategies give accurate estimates of SNP frequencies, something which was supported by my results (Fig 4.4). Still, in order to verify the results, all individuals in the cohort will be genotyped for the most significant SNPs in FTO and SLCX. It is however difficult to draw conclusions from these results; the test cohorts used may not be optimal since psychological problems such as depression are likely to be overrepresented among the obese children and results will need to be verified in other cohorts as well. The clustering of SNPs in the first intron of FTO is however a sign of quality, as is the high correlation between observed SNP frequencies and HapMap frequencies.

SOLiD sequencing is cheaper and has a higher throughput than Sanger sequencing [51], but it suffers from a varying coverage depth between different regions (Fig 4.2). I found that coverage is significantly higher near amplicon ends, which has been previously reported by Harismendy et al [51]. The higher coverage at amplicon ends was expected since the PCR products are fragmented through sonication prior to sequencing. The sonication process is biased toward the ends of the amplicons (private communication), and there will thus be more fragmentation near the ends of the amplified regions. As a result, the amplicon ends are overrepresented in the DNA library that is to be sequenced, which leads to a significantly higher coverage. We hope that the issue of a higher coverage near amplicon ends can be resolved by using a primer design strategy with two almost completely overlapping pools of primers.

(40)

28 Discussion

could be due to problems with mapping short repetitive reads to a unique position. Since 45% of the human genome consists of repetitive regions [51], this is potentially problematic. Interestingly, the coverage varies within non-repetitive regions as well; I found that different amplicons are covered to a significantly varying extent (Fig 4.6). A possible explanation is that the amplicons are not present at equal concentrations in the initial DNA mix (due to random pipetting errors). This assumption would have been strengthened if the variance in coverage were not consistent between the test and control groups. However, a Friedman test showed that the variance in coverage is the same in both the test and control groups and that the difference in coverage is due to some unknown difference between amplicons. Since repetitive regions were removed prior to statistical analysis they could not be the cause. Longer amplicons naturally have a higher coverage, but since the Friedman test was performed for average coverage (coverage per bp) this would not influence the results. It seems as though something inherent to the DNA sequence could be the reason, as for example differences in GC content (private communication). Indeed, repetitive, AT-rich regions have been found to have the lowest coverage of all sequences [51].

The variance in coverage depth leads to resources being used to sequence the same part over and over again, while other parts are sequenced to a lower extent, and in some cases, not sequenced at all. The variance slightly affects the difference in minor allele frequency between test and control groups, but this difference is very small and unlikely to have any impact. More importantly, variance in coverage influences which SNPs can be identified since rare genetic variants are less likely to be identified at lower coverage [50]. A lower coverage also leads to a higher percentage of random base calling errors which are not influenced by local sequence characteristics [51]. With all of this in mind, the search for genetic explanations behind obesity continues. The purpose of this research is not primarily the development of a drug against obesity, but rather a greater understanding of genetic mechanisms behind body weight control, something which could lead to better recommendations and treatments in the future. It is clear that the methods used so far have failed, and more research will be needed in order to stop the ongoing obesity epidemic.

(41)

Bibliography

[1] W. H. Organization, “Fact sheet: obesity and overweight, http://www.who.int/mediacentre/factsheets/fs311/en/print.html,” 2006, Accessed April 28 2010.

[2] T. Kelly, W. Yang, C.-S. Chen, K. Reynolds, and J. He, “Global burden of obesity in 2005 and projections to 2030.,” Int J Obes (Lond), vol. 32, pp. 1431–1437, Sep 2008.

[3] H. H. Maes, M. C. Neale, and L. J. Eaves, “Genetic and environmental factors in relative body weight and human adiposity.,” Behav Genet, vol. 27, pp. 325–351, Jul 1997.

[4] T. M. Frayling, N. J. Timpson, M. N. Weedon, E. Zeggini, R. M. Freathy, C. M. Lindgren, J. R. B. Perry, K. S. Elliott, H. Lango, N. W. Rayner, B. Shields, L. W. Harries, J. C. Barrett, S. Ellard, C. J. Groves, B. Knight, A.-M. Patch, A. R. Ness, S. Ebrahim, D. A. Lawlor, S. M. Ring, Y. Ben-Shlomo, M.-R. Jarvelin, U. Sovio,

et al., “A common variant in the fto gene is associated with body mass index

and predisposes to childhood and adult obesity.,” Science, vol. 316, pp. 889–894, May 2007.

[5] K. M. Flegal, B. I. Graubard, D. F. Williamson, and M. H. Gail, “Cause-specific excess deaths associated with underweight, overweight, and obesity.,” JAMA, vol. 298, pp. 2028–2037, Nov 2007.

[6] R. Sturm, “The effects of obesity, smoking, and drinking on medical problems and costs.,” Health Aff (Millwood), vol. 21, no. 2, pp. 245–253, 2002.

[7] G. Taubes, “As obesity rates rise, experts struggle to explain why.,” Science, vol. 280, pp. 1367–1368, May 1998.

[8] A. J. Walley, J. E. Asher, and P. Froguel, “The genetic contribution to non-syndromic human obesity.,” Nat Rev Genet, vol. 10, pp. 431–442, Jul 2009. [9] A. Hinney, C. I. G. Vogel, and J. Hebebrand, “From monogenic to polygenic

obesity: recent advances.,” Eur Child Adolesc Psychiatry, vol. 19, pp. 297–310, Mar 2010.

(42)

30 Bibliography

[10] C. Bogardus, “Missing heritability and gwas utility.,” Obesity (Silver Spring), vol. 17, pp. 209–210, Feb 2009.

[11] K. A. Frazer, D. G. Ballinger, D. R. Cox, D. A. Hinds, L. L. Stuve, R. A. Gibbs, J. W. Belmont, A. Boudreau, P. Hardenbol, S. M. Leal, S. Pasternak, D. A. Wheeler, T. D. Willis, F. Yu, H. Yang, C. Zeng, Y. Gao, H. Hu, W. Hu, C. Li, W. Lin, S. Liu, H. Pan, X. Tang, J. Wang, W. Wang, et al., “A second generation human haplotype map of over 3.1 million snps.,” Nature, vol. 449, pp. 851–861, Oct 2007.

[12] R. J. F. Loos and C. Bouchard, “Fto: the first gene contributing to common forms of human obesity.,” Obes Rev, vol. 9, pp. 246–250, May 2008.

[13] T. Gerken, C. A. Girard, Y.-C. L. Tung, C. J. Webby, V. Saudek, K. S. Hewitson, G. S. H. Yeo, M. A. McDonough, S. Cunliffe, L. A. McNeill, J. Galvanovskis, P. Rorsman, P. Robins, X. Prieur, A. P. Coll, M. Ma, Z. Jovanovic, I. S. Farooqi, B. Sedgwick, I. Barroso, T. Lindahl, C. P. Ponting, F. M. Ashcroft, S. O’Rahilly, and C. J. Schofield, “The obesity-associated fto gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase.,” Science, vol. 318, pp. 1469–1472, Nov 2007.

[14] R. Fredriksson, M. Hägglund, P. K. Olszewski, O. Stephansson, J. A. Jacobsson, A. M. Olszewska, A. S. Levine, J. Lindblom, and H. B. Schiöth, “The obe-sity gene, fto, is of ancient origin, up-regulated during food deprivation and ex-pressed in neurons of feeding-related nuclei of the brain.,” Endocrinology, vol. 149, pp. 2062–2071, May 2008.

[15] J. Fischer, L. Koch, C. Emmerling, J. Vierkotten, T. Peters, J. C. Brüning, and U. Rüther, “Inactivation of the fto gene protects from obesity.,” Nature, vol. 458, pp. 894–898, Apr 2009.

[16] Y.-C. L. Tung, E. Ayuso, X. Shan, F. Bosch, S. O’Rahilly, A. P. Coll, and G. S. H. Yeo, “Hypothalamic-specific manipulation of fto, the ortholog of the human obe-sity gene fto, affects food intake in rats.,” PLoS One, vol. 5, no. 1, p. e8771, 2010.

[17] J. R. Speakman, K. A. Rance, and A. M. Johnstone, “Polymorphisms of the fto gene are associated with variation in energy intake, but not energy expenditure.,”

Obesity (Silver Spring), vol. 16, pp. 1961–1965, Aug 2008.

[18] J. Wardle, C. Llewellyn, S. Sanderson, and R. Plomin, “The fto gene and mea-sured food intake in children.,” Int J Obes (Lond), vol. 33, pp. 42–45, Jan 2009. [19] J. Wardle, S. Carnell, C. M. A. Haworth, I. S. Farooqi, S. O’Rahilly, and R. Plomin, “Obesity associated genetic variation in fto is associated with diminished satiety.,”

J Clin Endocrinol Metab, vol. 93, pp. 3640–3643, Sep 2008.

[20] C. J. Willer, E. K. Speliotes, R. J. F. Loos, S. Li, C. M. Lindgren, I. M. Heid, S. I. Berndt, A. L. Elliott, A. U. Jackson, C. Lamina, G. Lettre, N. Lim, H. N. Lyon, S. A. McCarroll, K. Papadakis, L. Qi, J. C. Randall, R. M. Roccasecca,

(43)

Bibliography 31

S. Sanna, P. Scheet, M. N. Weedon, E. Wheeler, J. H. Zhao, L. C. Jacobs, I. Prokopenko, et al., “Six new loci associated with body mass index highlight a neuronal influence on body weight regulation.,” Nat Genet, vol. 41, pp. 25–34, Jan 2009.

[21] C. Duan, H. Yang, M. F. White, and L. Rui, “Disruption of the sh2-b gene causes age-dependent insulin resistance and glucose intolerance.,” Mol Cell Biol, vol. 24, pp. 7435–7443, Sep 2004.

[22] D. Ren, Y. Zhou, D. Morris, M. Li, Z. Li, and L. Rui, “Neuronal sh2b1 is essential for controlling energy and glucose homeostasis.,” J Clin Invest, vol. 117, pp. 397– 406, Feb 2007.

[23] Y. Jamshidi, H. Snieder, D. Ge, T. D. Spector, and S. D. O’Dell, “The sh2b gene is associated with serum leptin and body fat in normal female twins.,” Obesity

(Silver Spring), vol. 15, pp. 5–9, Jan 2007.

[24] E. G. Bochukova, N. Huang, J. Keogh, E. Henning, C. Purmann, K. Blaszczyk, S. Saeed, J. Hamilton-Shield, J. Clayton-Smith, S. O’Rahilly, M. E. Hurles, and I. S. Farooqi, “Large, rare chromosomal deletions associated with severe early-onset obesity.,” Nature, vol. 463, pp. 666–670, Feb 2010.

[25] C. Y. Y. Cheung, A. W. K. Tso, B. M. Y. Cheung, A. Xu, K. L. Ong, C. H. Y. Fong, N. M. S. Wat, E. D. Janus, P. C. Sham, and K. S. L. Lam, “Obesity sus-ceptibility genetic variants identified from recent genome-wide association stud-ies: implications in a chinese population.,” J Clin Endocrinol Metab, vol. 95, pp. 1395–1403, Mar 2010.

[26] J. Zhao, J. P. Bradfield, M. Li, K. Wang, H. Zhang, C. E. Kim, K. Annaiah, J. T. Glessner, K. Thomas, M. Garris, E. C. Frackelton, F. G. Otieno, J. L. Shaner, R. M. Smith, R. M. Chiavacci, R. I. Berkowitz, H. Hakonarson, and S. F. A. Grant, “The role of obesity-associated loci identified in genome-wide association studies in the determination of pediatric bmi.,” Obesity (Silver Spring), vol. 17, pp. 2254–2257, Dec 2009.

[27] M. S. Almén, J. A. Jacobsson, J. H. A. Shaik, P. K. Olszewski, J. Cedernaes, J. Alsiö, S. Sreedharan, A. S. Levine, R. Fredriksson, C. Marcus, and H. B. Schiöth, “The obesity gene, tmem18, is of ancient origin, found in majority of neuronal cells in all major brain regions and associated with obesity in severely obese children.,” BMC Med Genet, vol. 11, p. 58, 2010.

[28] J. Shendure and H. Ji, “Next-generation dna sequencing.,” Nat Biotechnol, vol. 26, pp. 1135–1145, Oct 2008.

[29] J. Timmer, “Dna sequencing gets solid with built-in error detection, http://arstechnica.com/science/guides/2009/12/dna-sequencing-gets-solid-with-built-in-error-detection.ars,” December 2009, Accessed February 23 2010.

(44)

32 Bibliography

http://marketing.appliedbiosystems.com/images/product_microsites/solid_ knowledge_ms/pdf/solid_dibase_sequencing_and_color_space_analysis.pdf,” 2008, Accessed April 6 2010.

[31] J. C. Barrett, B. Fry, J. Maller, and M. J. Daly, “Haploview: analysis and visu-alization of ld and haplotype maps.,” Bioinformatics, vol. 21, pp. 263–265, Jan 2005.

[32] W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler, “The human genome browser at ucsc.,” Genome Res, vol. 12, pp. 996–1006, Jun 2002.

[33] D. Karolchik, A. S. Hinrichs, T. S. Furey, K. M. Roskin, C. W. Sugnet, D. Haus-sler, and W. J. Kent, “The ucsc table browser data retrieval tool.,” Nucleic Acids

Res, vol. 32, pp. D493–D496, Jan 2004.

[34] A. Untergasser, H. Nijveen, X. Rao, T. Bisseling, R. Geurts, and J. A. M. Leu-nissen, “Primer3plus, an enhanced web interface to primer3.,” Nucleic Acids Res, vol. 35, pp. W71–W74, Jul 2007.

[35] K. S. Pollard, M. J. Hubisz, K. R. Rosenbloom, and A. Siepel, “Detection of nonneutral substitution rates on mammalian phylogenies.,” Genome Res, vol. 20, pp. 110–121, Jan 2010.

[36] H. R. Smit, AFA and P. Green, “Repeatmasker open-3.0, http://www.repeatmasker.org,” 1996-2007, Accessed May 6 2010.

[37] S. T. Sherry, M. Ward, and K. Sirotkin, “dbsnp-database for single nucleotide polymorphisms and other classes of minor genetic variation.,” Genome Res, vol. 9, pp. 677–679, Aug 1999.

[38] J. A. Jacobsson, J. Klovins, I. Kapa, P. Danielsson, V. Svensson, M. Ridderstråle, U. Gyllensten, C. Marcus, R. Fredriksson, and H. B. Schiöth, “Novel genetic variant in fto influences insulin levels and insulin resistance in severely obese children and adolescents.,” Int J Obes (Lond), vol. 32, pp. 1730–1735, Nov 2008.

[39] D. Meyre, K. Proulx, H. Kawagoe-Takaki, V. Vatin, R. Gutiérrez-Aguilar, D. Lyon, M. Ma, H. Choquet, F. Horber, W. V. Hul, L. V. Gaal, B. Balkau, S. Visvikis-Siest, F. Pattou, I. S. Farooqi, V. Saudek, S. O’Rahilly, P. Froguel, B. Sedgwick, and G. S. H. Yeo, “Prevalence of loss-of-function fto mutations in lean and obese individuals.,” Diabetes, vol. 59, pp. 311–318, Jan 2010.

[40] S. Boissel, O. Reish, K. Proulx, H. Kawagoe-Takaki, B. Sedgwick, G. S. H. Yeo, D. Meyre, C. Golzio, F. Molinari, N. Kadhom, H. C. Etchevers, V. Saudek, I. S. Farooqi, P. Froguel, T. Lindahl, S. O’Rahilly, A. Munnich, and L. Colleaux, “Loss-of-function mutation in the dioxygenase-encoding fto gene causes severe growth retardation and multiple malformations.,” Am J Hum Genet, vol. 85, pp. 106–111, Jul 2009.

(45)

Bibliography 33

[41] A. Ragvin, E. Moro, D. Fredman, P. Navratilova, Øyvind Drivenes, P. G. En-gström, M. E. Alonso, E. de la Calle Mustienes, J. L. G. Skarmeta, M. J. Tavares, F. Casares, M. Manzanares, V. van Heyningen, A. Molven, P. R. Njølstad, F. Ar-genton, B. Lenhard, and T. S. Becker, “Long-range gene regulation links genomic type 2 diabetes and obesity risk regions to hhex, sox4, and irx3.,” Proc Natl Acad

Sci U S A, vol. 107, pp. 775–780, Jan 2010.

[42] A. J. Mighell, N. R. Smith, P. A. Robinson, and A. F. Markham, “Vertebrate pseudogenes.,” FEBS Lett, vol. 468, pp. 109–114, Feb 2000.

[43] A. P. Piehler, M. Hellum, J. J. Wenzel, E. Kaminski, K. B. F. Haug, P. Kierulf, and W. E. Kaminski, “The human abc transporter pseudogene family: Evidence for transcription and gene-pseudogene interference.,” BMC Genomics, vol. 9, p. 165, 2008.

[44] O. H. Tam, A. A. Aravin, P. Stein, A. Girard, E. P. Murchison, S. Cheloufi, E. Hodges, M. Anger, R. Sachidanandam, R. M. Schultz, and G. J. Hannon, “Pseudogene-derived small interfering rnas regulate gene expression in mouse oocytes.,” Nature, vol. 453, pp. 534–538, May 2008.

[45] T. Watanabe, Y. Totoki, A. Toyoda, M. Kaneda, S. Kuramochi-Miyagawa, Y. Obata, H. Chiba, Y. Kohara, T. Kono, T. Nakano, M. A. Surani, Y. Sakaki, and H. Sasaki, “Endogenous sirnas from naturally formed dsrnas regulate tran-scripts in mouse oocytes.,” Nature, vol. 453, pp. 539–543, May 2008.

[46] B.-Y. Sha, T.-L. Yang, L.-J. Zhao, X.-D. Chen, Y. Guo, Y. Chen, F. Pan, Z.-X. Zhang, S.-S. Dong, X.-H. Xu, and H.-W. Deng, “Genome-wide association study suggested copy number variation may be associated with body mass index in the chinese population.,” J Hum Genet, vol. 54, pp. 199–202, Apr 2009.

[47] M. A. Haemer, T. T. Huang, and S. R. Daniels, “The effect of neurohormonal factors, epigenetic factors, and gut microbiota on risk of obesity.,” Prev Chronic

Dis, vol. 6, p. A96, Jul 2009.

[48] C. H. Andreasen, K. L. Stender-Petersen, M. S. Mogensen, S. S. Torekov, L. Weg-ner, G. Andersen, A. L. Nielsen, A. Albrechtsen, K. Borch-Johnsen, S. S. Ras-mussen, J. O. Clausen, A. Sandbaek, T. Lauritzen, L. Hansen, T. Jørgensen, O. Pedersen, and T. Hansen, “Low physical activity accentuates the effect of the fto rs9939609 polymorphism on body fat accumulation.,” Diabetes, vol. 57, pp. 95–101, Jan 2008.

[49] J. A. Jacobsson, U. Risérus, T. Axelsson, L. Lannfelt, H. B. Schiöth, and R. Fredriksson, “The common fto variant rs9939609 is not associated with bmi in a longitudinal study on a cohort of swedish men born 1920-1924.,” BMC Med

(46)

34 Bibliography

[51] O. Harismendy, P. C. Ng, R. L. Strausberg, X. Wang, T. B. Stockwell, K. Y. Beeson, N. J. Schork, S. S. Murray, E. J. Topol, S. Levy, and K. A. Frazer, “Evaluation of next generation sequencing platforms for population targeted se-quencing studies.,” Genome Biol, vol. 10, no. 3, p. R32, 2009.

References

Related documents

(a) Geographic positions for all wolverine samples included in the population genetic study (n = 234, mainly tissue samples collected from 1993 to 2011) (encircled points, samples

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika

Combining the concept of hygiene factors with research on customer preferences, this study aims to assess the relative importance of hygiene factors to customers and in turn