• No results found

Student Thesis Level: Master Title: Co-inheritance of breast and prostate cancer in a pedigree with large family data

N/A
N/A
Protected

Academic year: 2021

Share "Student Thesis Level: Master Title: Co-inheritance of breast and prostate cancer in a pedigree with large family data"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Student Thesis

Level: Master

Title: Co-inheritance of breast and prostate cancer in a pedigree with large family data

Author: Adrián Calvo Chozas Supervisor: Lars Rönnegård Examiner: Moudud Alam

Subject/main field of study: Microdata Analysis Course code: MI4001

Credits: 30 ECTS-credits

Date of examination: 06/08/2020

At Dalarna University it is possible to publish the student thesis in full text in DiVA. The publishing is open access, which means the work will be freely accessible to read and download on the internet. This will significantly increase the dissemination and visibility of the student thesis.

Open access is becoming the standard route for spreading scientific and academic information on the internet. Dalarna University recommends that both researchers as well as students publish their work open access.

I give my/we give our consent for full text publishing (freely accessible on the internet, open access):

Yes ☒ No ☐

Dalarna University – SE-791 88 Falun – Phone +4623-77 80 00

(2)

Abstract: The connection between breast and prostate cancer in relatives within a family is an intriguing question in the field since this information is valuable when diagnosing patients with either type of cancer as it may contribute to cancer prevention. The aim of the thesis is to ascertain whether these two cancers are inherited together. Markov Chain Monte Carlo estimation is used with the MCMCglmm package in R. The data used was the Minnesota Breast Cancer Study with up to five generations in families.

The data consists of 28081 individuals in 426 families. Results show that the heritability for prostate cancer is 65% and 34% for breast cancer in the liability scale, regardless of other factors that may increase the risk of these cancers. The odds ratio of having breast cancer given the brother has prostate cancer is increased 1.59 times whilst the odds ratio of having prostate cancer given the sister has breast cancer is 1.58 times. This information can undoubtedly be useful to doctors to enable them to prevent the disease by bearing in mind the family history of both cancers.

Keywords: prostate cancer, breast cancer, simulations, MCMCglmm, heritability, genetic correlation.

(3)

Table of contents

1. Introduction ... 1

1.1. Problem description ... 1

1.2. Literature Review ... 1

1.3. Gap in the literature ... 3

1.4. Research questions ... 3

2. Methodology ... 5

2.1. Data Collection ... 5

2.2. Data description ... 5

2.2.1. Analyzed data ... 7

2.3. Model description ... 7

2.3.1. Metropolis-Hastings ... 8

2.3.2. Gibbs sampling... 8

2.3.3. Deviance Information Criterion (DIC) ... 8

2.3.4. Pedigree information ... 8

2.3.5. Priors ... 9

2.3.6. Heritability ... 9

2.3.7. Bivariate models ... 10

2.3.8. Liability Threshold Model ... 10

2.4. Simulations ... 11

3. Results ... 14

3.1. Prostate cancer ... 14

3.2. Breast cancer ... 16

3.3. Bivariate model ... 18

3.4. Simulation results ... 20

4. Discussion ... 21

5. Conclusion ... 23

(4)

1

1. Introduction

This section addresses the research problem and its area, it reviews the literature, and highlights the gaps therein and posits the research questions.

1.1. Problem description

Cancer is an often-lethal disease present in our society and is defined as the uncontrolled growth of malignant cells in the body. When the individual is affected, the inherent mechanisms of the body stop working. Researchers have been studying genetic and non-genetic causes for the onset of the disease for decades.

Little is known regarding the inherited traits that predispose subjects to malignancies such as these. Therefore, a relevant question is whether there might be a connection between cancer types in a family with affected members. By identifying whether there is a connection between types of cancer in families, doctors can work to prevent the growth of the illness.

The importance of this study is not only to investigate first-degree relative members in a family but also to study more distant relatives. In this way, patients with a high risk of developing cancer can be closely monitored, thereby increasing the likelihood of early detection of cancer.

1.2. Literature Review

The research paper by Beebe-Dimmer et al. (2015) is based on the relationship between first-degree relatives with a family history of breast and prostate cancer and post-menopausal breast cancer. However, there is insufficient information regarding the relationship between prostate and breast cancer in families. Nonetheless, there is evidence that the risk of breast and prostate cancer is increased in families where those illnesses are present. The study claims that the risk is increased when first-degree family members are diagnosed with cancer. The analysis carried out by the researchers was restricted to first-degree relatives.

The results show that having either cancer increases the probability of having the other cancer. Moreover, women in families with prostate cancer have significantly more risk of developing breast cancer. Further investigation regarding whether another type of cancer can increase the probability of having breast cancer was carried out. The results showed that colorectal cancer slightly increased the risk of breast cancer.

In line with and cited within the previous study, Sellers et al. (1994) show that families where father or brother were affected with prostate cancer increased post-menopausal breast cancer risk. When there are incidents of both cancers in a family, breast cancer risk is doubled.

(5)

2 A meta-analysis was performed by Ren et al. (2019) to determine the association between breast and prostate cancer risk in first-degree relatives.

The study identified that approximately 35% of prostate cancer risk can be explained by BRCA1 and BRCA2 genes, which are known genes in the field.

These genes are responsible for increasing the risk of breast and ovarian cancer. There is also strong evidence that mutation in those genes increases prostate cancer risk in families affected by breast cancer incidents. It is reported that BRCA1 mutations are responsible for increasing the risk of prostate cancer more than 3-fold in men younger than 65 years old whereas BRCA2 increases said risk more than 8-fold. Other studies found in this meta-analysis suggest that lethal prostate cancer is increased in families where first-degree relatives were affected with breast cancer.

The review also indicates that the risk of prostate cancer in a family is increased when mothers and sisters only had breast cancer. Mothers only, but not sisters only, increase lethal prostate cancer risk. Previous meta- analyses claimed that prostate cancer risk, among men in families where close relatives were affected with the same illness, is doubled. Lamy et al.

(2018) also discovered that there is a strong correlation between both types of cancer when breast cancer was present in women younger than 50, in first-degree relatives. Their findings suggest that prostate cancer diagnosis among first-degree relatives increases the likelihood of having breast cancer. Furthermore, it is known that BRCA2 is a gene that helps repair DNA. Thus, inherited mutations increase the risk of both types of cancer.

Foulkes (2008) carried out a review of the main types of cancer in the US.

The investigation claims that mutations in genes are not the main reason for inheriting cancer, but the importance of family history is supported. Coupled with this, the study suggests that only a small proportion of the population of breast cancer is due to gene mutations. The researcher also states that BRCA1 and BRCA2 are the most important breast cancer genes and for prostate cancer BRCA2 plays an important role, since it is the closest candidate which can cause a risk of prostate cancer.

Barber et al. (2018) investigated the extent to which one type of the described cancers could affect the risk increment of prostate cancer in first- degree relatives within a family. The study claims that a higher number of relatives affected with prostate cancer increases the risk, whilst early age at diagnosis decreases the risk. Furthermore, the possibility of having prostate cancer for men is increased when the family is affected with both type of cancers in comparison to those men whose family was unaffected.

Therefore, the investigation suggests that genetic factors are a cause of prostate cancer risk.

Mehrgou et al. (2016) support the aforementioned investigations by claiming that BRCA1 and BRCA2 mutations are responsible for 90% of hereditary breast cancer and ovarian cancer incidents. It is also stated that mutations in these genes increase the risk of other types of cancer such as prostate, colon, pancreatic, gastric and skin cancer.

(6)

3 Fisher (1918) and Visscher et al. (2008) state that heritability, a concept that measures the quantity of the variation of a trait, is due to genetic factors. It is represented as a ratio between additive genetic values (VA) and phenotypic variation (VP).

Wu & Gu (2016) carried out a meta-analysis to identify causes of prostate cancer. The most important factor in the development of prostate cancer is age followed by race, smoking and obesity. They also found evidence of prostate cancer clustering in families.

Hjelmborg et al. (2008) claimed that the heritability of prostate cancer is the highest among cancer types. In this study, researchers estimated a prostate cancer heritability value of 58%.

Mucci et al. (2016) claimed that the risk of having cancer in a family is vital in the detection of future cancer incidents in the same family. Hence, in their study researchers tried to estimate the heritability of cancer. The data gathered is normally distributed and was analyzed using a Liability Threshold Model. Their findings suggest that in general the heritability of the cancer types studied was 33%. Prostate and breast cancer figure among the studied cancer types, their heritability being 57% and 31%, respectively.

1.3. Gap in the literature

As part of this Thesis, throughout research has been performed in order to seek studies which investigate the relationship between breast and prostate cancer further than first-degree relatives. However, it was not possible to find any research paper that investigates this. Moreover, Therneau (2020) claimed that there is no study that has investigated the “inherited traits that predispose subjects to all or some subset of malignancies.”

The aim of this study is to analyze the connection between breast and prostate cancer in a family. The study shall focus on prostate and breast cancer incidents in a family and the connection between them and it will not be restricted to first-degree relatives.

1.4. Research questions

There are three lines of investigation in this thesis, which are:

• Research question one: Is there a tendency for the two diseases to be inherited together?

• Research question two: Can we draw any conclusions from the available data set?

• Research question three: What is the probability of a brother and sister developing prostate and breast cancer?

(7)

4 This investigation could potentially lead to a better diagnosis of the patient, given that within a family with incidents of either type of cancer, an increase in the other type may also be seen. Hence, doctors can take steps to prevent the illness from growing and give advice to those who are more at risk.

(8)

5

2. Methodology

In this section the data and methods are presented.

2.1. Data Collection

Between 1944 and 1952 a family study of breast cancer was initiated at the Dight Institute for Human Genetics at the University of Minnesota by Anderson (1958) with the aim of investigating whether relatives of breast cancer patients may see an increase in their risk of developing cancer. Thus, incidents of breast cancer in members of 544 families were analyzed where the proband, the first individual added to the study, was a female with breast cancer. Results from this study suggested that breast cancer tends to cluster in families.

Fifty years later, two follow-up studies were carried out by Sellers (1995) and Sellers et al. (1999) to examine the heredity of the risk of breast cancer.

Families were contacted to extend the number of individuals in the dataset so that more generations could be included in the analysis. Moreover, 118 families were excluded from the analysis due to little or no information regarding relatives. The subjects selected to be part of the study were the proband’s relatives including married women from the family.

Suspicions related to the connection between breast and prostate cancer led Grabrick et al. (2003) to collect, via questionnaire, prostate cancer data in a subset of families from the original breast cancer study. Those questionnaires were sent to men over the age of 40 and 118 incidents of prostate cancer were found. They distinguish three specific groups in the data: high-risk families where there are more than 3 incidents of breast cancer, low-risk where only the proband was affected with breast cancer and marry-ins where men married first- or second-degree relatives from the high- risk families. The number of families which met the criteria for high-risk was 60 and for low-risk 166. In order for them to include this sub-study into the original data, a variable called bcpc was created where the value 0 means that the individual is not part of the sub-study and 1 means that the individual is part of it.

2.2. Data description

The dataset contains information about the families in the study and the data is stored in R in a package called “Kinship2” (Sinnwell et al. 2014).

This study presents 426 families in a total of 28081 observations where there are 1376 incidents of cancer, 7549 null values of cancer and 19156 incidents of no cancer. The data has 14 variables which are fully described in Annex 1. However, a description of the most important variables for this study can be found below:

(9)

6 Table 1: description of the variables in the data set and its summary

statistics.

Variable Description Summary statistics

Id Identifier N: 28081

FatherId Identifier of the father, if the father is part of the data set; zero otherwise

Number of fathers: 4245

MotherId Identifier of the mother, if the mother is part of the data set; zero otherwise

Number of mothers: 4215

Famid Family identifier Summary statistics of family size:

Minimum: 4 1st quartile: 159 Mean: 313 3rd quartile: 464 Maximum: 605

Cancer 1= breast cancer (females) or prostate cancer (males), 0=censored

Minimum: 0 Mean: 0.067 Maximum: 1

1st quartile: 0 3rd quartile: 0 NA’s: 7549

Median: 0

Sex M or F Females: 12818 Males: 13502

NA’s: 1761

Bcpc Part of one of the families in the breast/prostate cancer substudy: 0=no, 1=yes. Note that subjects who were recruited to the overall study after the date of the BP substudy are coded as zero.

Minimum: 0 Mean: 0,3603 Maximum: 1

1st quartile: 0 3rd quartile: 1 Median: 0

Figure 1 represents a family example where the relationship between individuals is shown:

Figure 1: Family example. Individuals represented by a circle are females while squares are males. Coloured squares and rectangles show that the individual has the disease. Individual number 4 is marked with a slash since this is the proband, the first member and also the one with breast cancer added in the study, in this family.

(10)

7

2.2.1. Analyzed data

For the scope of this thesis, the analyzed data is a subset of the original data where bcpc is equal to 1. All the individuals who were part of the prostate cancer sub-study and relatives are added. Therefore, the subset contains 141 families and 11474 individuals whose members belong to the high- and low-risk families to avoid bias in the results.

2.3. Model description

This thesis shall focus on estimating the likelihood of having cancer as well as its heritability. Also, the relationship between various cancer types will be investigated. The model chosen is the mixed effect model Pawitan (2013) which has the form:

𝑌 = µ + 𝑍𝑏 + 𝑒 Eq. 1

where Y is an N-vector of outcome data whose values are 0 or 1, Z is an Nxq matrix related to the genetic random effects for each individual, Z is a Nxq matrix, where q is the number of columns in the dataset, b corresponds to the random effects, e is N(0,Σ), b is N(0,D) and b and e are independent.

Σ and D are variance matrices parametrized by θ, which is an unknown variance component parameter. Σ = σ2IN where σ2 is the residual variance and D = 𝜎𝑏2𝐴 where 𝜎𝑏2 is the variance of the genetic effects and A is a correlation matrix, referred to as the additive relationship matrix, constructed using the pedigree information see Pawitan (2013). An example of the matrix A is given in Section 2.3.4 below. The ratio between the variance components, σb2 and σ2, controls how often cancer is inherited within families compared to between families. A measure of this ratio is “heritability”

which is explained in Section 2.3.6.

In order to fit this model the MCMCglmm package, created by Hadfield (2019), has been selected since it can deal with large pedigree data and the genetic correlation can be estimated. MCMCglmm stands for Markov Chain Monte Carlo method (MCMC) and Generalized Linear Mixed Models (glmm).

When the joint posterior distribution cannot be derived analytically MCMC methods are used. Even though we cannot derive the complete posterior, what this package does is to calculate the height of the posterior distribution at the set of parameter values which offers a good approximation Hadfield (2019). The MCMC moves stochastically through parameter space instead of going systematically through every likely combination of µ and σ.

For the algorithm to start we need to initialize the chain and specify a set of parameter values from which the chain can start moving. Heuristic techniques are used in the package to be initialized. The MCMCglmm uses

(11)

8 a combination of Gibbs sampling, slice sampling and Metropolis-Hastings updates so that the chain can start moving.

2.3.1. Metropolis-Hastings

The chain is initialized, and the algorithm needs to decide, based on two rules, where to go next. The first rule selects a candidate to which the chain might go, and the algorithm moves or stays depending on the decision taken.

The method that the MCMCglmm uses involves picking random coordinates from a multivariate normal distribution in order to analyze the posterior probability and decide whether to move or not. The chain will move if the new set of parameters is greater than the old one and an iteration is said to be successfully completed. However, the chain might not move if the new set of parameters has a lower posterior probability than the old one. This process may fail, and the chain may move when it should not. The relative difference between the new and old posterior probabilities gives the probability for the chain to move to low-lying areas.

2.3.2. Gibbs sampling

This is a special case of Metropolis-Hastings. The MCMCglmm uses Gibbs sampling rather than Metropolis-Hastings to update the parameters. Gibbs sampling can be used when the conditional distribution of µ is known. It is also more efficient than Metropolis-Hastings updates when the conditional distributions are known (Hadfield, 2019).

2.3.3. Deviance Information Criterion (DIC)

The Deviance Information Criterion is considered as a Bayesian measure of fit and it is equivalent to the Akaike criterion (AIC). The values do not make sense on their own since they need comparison. The model with a smaller DIC is usually considered to be better, according to Spiegelhalter et al.

(2002).

2.3.4. Pedigree information

The model to be fitted is known as the animal model (Wilson et al. 2010) and it needs pedigree data to work. The pedigree information is extracted from the data. The new dataset is composed of 3 tabular-form columns and must be ordered, that is, parents must appear before the offspring, and each row must represent one individual. Column 1 represents the identifier of the individual and columns 2 and 3 represent individual parents. If a parent is unknown NA will appear in the dataset.

The pedigree information is included in the mixed effect model through the correlation matrix A. Figure 2 shows an example pedigree and the corresponding matrix A is given in Table 2.

(12)

9 Figure 2. Family example. Individuals represented by a circle are females while squares are males. Coloured squares and rectangles show that the individual has the disease. Question marks mean that there is no information about cancer for those individuals.

Table 2. Genetic correlation matrix with the individual’s id for the pedigree in Figure 2.

119 120 121 122 123 124 125 126 127 128 129

119 1 0 0.25 0.5 0.5 0.125 0.125 0.125 0.125 0 0

120 0 1 0.25 0.5 0.5 0.125 0.125 0.125 0.125 0 0

121 0.25 0.25 1 0.5 0.25 0.5 0.5 0.5 0.5 0 0.5

122 0.5 0.5 0.5 1 0.5 0.25 0.25 0.25 0.25 0 0

123 0.5 0.5 0.25 0.5 1 0.125 0.125 0.125 0.125 0 0

124 0.125 0.125 0.5 0.25 0.125 1 0.5 0.5 0.5 0.5 0.25

125 0.125 0.125 0.5 0.25 0.125 0.5 1 0.5 0.5 0.5 0.25

126 0.125 0.125 0.5 0.25 0.125 0.5 0.5 1 0.5 0.5 0.25

127 0.125 0.125 0 0.25 0.125 0.5 0.5 0.5 1 0.5 0.25

128 0 0 0.5 0 0 0.5 0.5 0.5 0.5 1 0

129 0 0 0 0 0 0.25 0.25 0.25 0.25 0 1

2.3.5. Priors

Priors represent researchers’ belief regarding the data at hand. MCMCglmm uses two different techniques for priors, one for the (co)variances and another for the fixed effects. These priors are inverse Wishart and normal prior respectively (Hadfield 2019).

2.3.6. Heritability

From the model, the heritability must be computed. An estimate of the heritability can be obtained by applying the formula h2 = VA / VP.

Where VA corresponds to additive genetic, which is information from the pedigree, and VP corresponds to permanent environment Wilson et al.

(13)

10 (2010). In this study those variables correspond to 𝜎𝑏2/(𝜎𝑏2+ 𝜎2) where 𝜎𝑏2is the genetic variance and (𝜎𝑏2 + 𝜎2) the total variance.

2.3.7. Bivariate models

In order to fit a bivariate model in the MCMCglmm package, incidents of breast and prostate cancer must be used, so a matrix is created where the first column would correspond to either breast or prostate cancer and the second column with the remaining one. This gives Ybp, which is a bivariate response variable. As above, a linear mixed model is fitted to this response variable but with a vector of random effects bbp that is twice as long as the random effect b used in the previous univariate model, and with residuals ebp. These are assumed to be normally distributed with

[𝑏𝑏𝑝

𝑒𝑏𝑝] = 𝑁 ([0

0] , [𝐺 0

0 𝑅]) Eq. 2

where G and R are the expected (co)variances of the random effects and residuals respectively Hadfield (2019) with

𝐺 = [ 𝐴𝜎𝑏12 𝐴𝜌𝜎𝑏1𝜎𝑏2

𝐴𝜌𝜎𝑏1𝜎𝑏2 𝐴𝜎𝑏22 ] Eq. 3

The correlation ρ is a measure of how often breast cancer and prostate cancer are inherited together, whereas σb12 and σb22 are the genetic variances for breast and prostate cancer, respectively.

2.3.8. Liability Threshold Model

In the case where we have binary data and only the intercept as an explanatory variable the liability threshold model can be used (Figure 3). The liability threshold model makes it possible to analyze the data with the Gaussian model and thereafter transform the results to a probit model.

Hence, the data is first analyzed assuming that it is gaussian and thereafter the results are transformed using the assumptions of a probit model (Lee et al. 2011).

This model has the following form:

𝑙2 = ℎ02 𝐾(1 − 𝐾)/𝑧2 Eq. 4 where ℎ02 is the heritability obtained in the model, K is the proportion of cancer in the dataset and z is the height of the standard normal probability density function at the threshold. The genetic correlation 𝜌 is unaffected by the transformation (Lee et al. 2012, eq. 3).

(14)

11 Figure 3 Liability Threshold Model

2.4. Simulations

Simulations were carried out to evaluate the probability of co-inheritance for the two cancers. The algorithm used is as follows:

1. Specify the value of the variance components 𝜎𝑔12 and 𝜎𝑔22 to be used where 𝜎𝑔12 = √( 𝜂𝑝

1−𝜂𝑝) and 𝜎𝑔22 = √( 𝜂𝑏

1−𝜂𝑏) where 𝜂𝑝 is the heritability of prostate cancer taken from the bivariate output and 𝜂𝑏 is the heritability of breast cancer taken from the bivariate output. Both heritability values are extracted from the liability scale.

2. Simulate the genetic value for those individuals with no parents as 𝜐𝑖 ~ 𝑁(0, 𝜎𝑔12 ).

3. Compute the genetic values for the next generations as 𝜐𝑖 = 𝑠𝑖 +

(𝜐𝑠𝑖+ 𝜐𝑑𝑖)

2 , where 𝑠𝑖 ~ 𝑁(0, √0.5𝜎𝑔12 ) and the subscripts 𝑠𝑖 and 𝑑𝑖 are the sire and dam index of animal 𝑖.

4. Since this simulation is done for the bivariate model, two different traits, Υ1 and Υ2 need to be computed.

(15)

12 5. Υ1 is equal to 𝜇1+ 𝜐𝑖+ 𝜀1 whereas the other trait, Υ2, is 𝜗𝑖 = 𝜆21𝜐𝑖+

𝜆22𝛼𝑖 where 𝜆21 and 𝜆22 are two constants defined below and 𝛼𝑖 is sampled from an iid standard normal distribution.

6. Compute the value of Υ2 as Υ2 = 𝜇2+ 𝜗 + 𝜀2, where 𝜇1 and 𝜇2 are two intercepts taken from the bivariate output and correspond to prostate and breast cancer respectively, 𝜀1 and 𝜀2 are two error terms whose value is equal to 1 in order for the Liability Threshold Model to work.

7. Obtain a binary trait as Ι(Υ𝑖 > 0) where Ι(·) is an indicator function and 𝑖 = 1,2.

The values 𝜆21 and 𝜆22 are taken from the 2x2 matrix 𝐺 and the Cholesky decomposition of 𝐺 called 𝜆. Where

𝐺 = ( 𝜎𝑔12 𝜌𝜎𝑔1𝜎𝑔2

𝜌𝜎𝑔1𝜎𝑔2 𝜎𝑔22 ) Eq. 5

and

𝜆 = (𝜎𝑔1 0

𝜌𝜎2 𝜎2√1 − 𝜌2) Eq. 6

where the elements 𝜆21 and 𝜆22 are used in the computation in Step 5.

The value of 𝜌 used in the simulation is 𝜌 =𝐶𝑂𝑉(𝜎𝑔1,𝜎𝑔2)

√𝜎𝑔12 𝜎𝑔22

where 𝐶𝑂𝑉(𝜎𝑔1, 𝜎𝑔2) is taken from the bivariate output.

The simulation is performed for the whole dataset instead of the subset where the sub-study is obtained. Even though the sub-study is used in the data analysis due to the valuable information regarding prostate cancer, this is not necessary for the simulations as breast and prostate cancer incidents are computed in the simulation algorithm.

The simulations also give an opportunity to compute the probability of an individual having both cancers simply by hypothetically assuming that both cancers could be expressed in the same individual regardless of the gender of the individual.

Finally, the simulation algorithm is conceived to provide information regarding the likelihood of brother and sister developing cancer. Pairs of siblings will be selected from the data with the aim of computing the probability of:

P1 = 𝑃(𝑏𝑟𝑒𝑎𝑠𝑡 𝑐𝑎𝑛𝑐𝑒𝑟 | 𝑏𝑟𝑜𝑡ℎ𝑒𝑟 ℎ𝑎𝑠 𝑝𝑟𝑜𝑠𝑡𝑎𝑡𝑒 𝑐𝑎𝑛𝑐𝑒𝑟), P2 = 𝑃(𝑏𝑟𝑒𝑎𝑠𝑡 𝑐𝑎𝑛𝑐𝑒𝑟 | 𝑏𝑟𝑜𝑡ℎ𝑒𝑟 ℎ𝑎𝑠 𝑛𝑜𝑡 𝑝𝑟𝑜𝑠𝑡𝑎𝑡𝑒 𝑐𝑎𝑛𝑐𝑒𝑟), P3 = 𝑃(𝑝𝑟𝑜𝑠𝑡𝑎𝑡𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 | 𝑠𝑖𝑠𝑡𝑒𝑟 ℎ𝑎𝑠 𝑏𝑟𝑒𝑎𝑠𝑡 𝑐𝑎𝑛𝑐𝑒𝑟), P4 = 𝑃(𝑝𝑟𝑜𝑠𝑡𝑎𝑡𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 | 𝑠𝑖𝑠𝑡𝑒𝑟 ℎ𝑎𝑠 𝑛𝑜𝑡 𝑏𝑟𝑒𝑎𝑠𝑡 𝑐𝑎𝑛𝑐𝑒𝑟).

(16)

13 and the results are to be presented as odds ratios as follows

𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜1 = 𝑃1(1 − 𝑃2)

𝑃2(1 − 𝑃1) Eq. 7

and

𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜2 = 𝑃3(1 − 𝑃4)

𝑃4(1 − 𝑃3) Eq. 8

(17)

14

3. Results

In this section the results are presented. First, the prostate and breast cancer cases are separately analyzed using the linear mixed model presented in Section 2.3. Second, a bivariate model is fitted for both cancers simultaneously using the model presented in Section 2.3.7. The models produce estimates assuming that the binary outcome is gaussian, but the results are subsequently transformed using the liability threshold model (Figure 3).

3.1. Prostate cancer

Before attempting to discuss the summary of the model, it is convenient to analyze whether the MCMC model has converged or not by looking at the plots and autocorrelations (Figure 4). The model was run with 1 million iterations, with a burn-in of 3000 iterations and four chains have been checked to see convergency.

Figure 4. Plot of the Markov chain for the intercept (µ). The left plot is a trace of the sample posterior, a time series of the Markov chain. The right plot is

(18)

15 a density estimate, and it is a smooth histogram that approximates the posterior.

Since no trend can be seen for the Markov chain in Figure 4, four chains have been checked, the model can be claimed to have converged.

Figure 5. Variance component of the model. The density of animal is for the variance component of genetic effects (σ2b) and the density for units is for the residual variance (σ2).

It is also important that no trend is seen in the variance components to guarantee the model has converged (Figure 5).

The analysis of the autocorrelation in the chains is needed. Their values are reasonable when they are less than 0.1 according to Villemereuil (2012).

This was achieved with a lag of 1000.

The number of iterations used was 1 million. The chain is initialized and starts iterating according to the model heuristics. The MCMC sample size, after the thinning interval and removing the burn-in period, was 997. The burn-in was 3000.

The rest of the output is separated into three sections, the G-structure which informs us about the variance estimates of the random effects (VA); the second is the R-structure which informs us about the residual variance (VR);

and the third section, Location effects, gives us the results regarding the fixed effects, μ in the model.

The estimated location effect µ represents the number of individuals with prostate cancer, and the value is 3.3%. The 95% credibility interval (i.e.

inference in a bigger population) is between 2.7% to 4%.

(19)

16 The 95% credibility interval for the heritability for this model goes from 5.9%

to 16% and the posterior mean is 10.83%. Finally, this value of the posterior mean corresponds to a heritability of 64% on the liability scale.

3.2. Breast cancer

Figure 6. Plot of the Markov chain for the intercept (µ). The left plot is a trace of the sample posterior, a time series of the Markov chain. The right plot is a density estimate, and it is a smooth histogram that approximates the posterior.

From Figure 6 and with the analysis of four different chains the model is sure to have converged.

(20)

17 Figure 7. Variance component of the breast cancer model. The density of animal is for the variance component of genetic effects (σ2b) and the density for units is for the residual variance (σ2).

From Figure 7 there is also no trend in the variance.

The number of iterations used was 1 million. The MCMC sample size after thinning (every 1000 iterations) and removing the burn-in period, which is 3000, was 997.

The location effect µ represents the number of individuals with cancer, and the posterior mean is 9.5%. The 95% credibility is between 8.5% to 10.7%.

The 95% credibility interval for the heritability for this model goes from 7.6%

to 16.2% and the posterior mean is 11.7%. Finally, in the liability scale the heritability value is approximately 35%.

(21)

18

3.3. Bivariate model

Figure 8. Plot of the Markov Chain for the two intercepts 𝜇𝑝𝑟𝑜𝑠𝑡𝑎𝑡𝑒 𝑎𝑛𝑑 𝜇𝑏𝑟𝑒𝑎𝑠𝑡. The left plot is a trace of the sample posterior, a time series where the prostate and breast means are represented. The right plot is a density estimate, and it is a smooth histogram that approximates the posterior.

From Figure 8 one can see that the model has successfully converged, and four chains were run to confirm the convergence.

(22)

19 Figure 9. Variance components of the bivariate model. The density of

“traitprostate:traitprostate.animal” is the density for the variance 𝜎𝑏12 , the density of “traitprostate:traitbreast.animal” is the density for the covariance 𝜌𝜎𝑏1𝜎𝑏2 and the density of “traitbreast:traitbreast.animal” is the density for the variance 𝜎𝑏22 (see Section 2.3.7).

The number of iterations used was 600.000. The MCMC sample size after thinning (every 1000 iterations) and removing the burn-in period, which is 3000, was 597.

The location effects represent the number of individuals with prostate and breast cancer, and the values are approximately 3.1% and 9.5%

respectively. An inference in a bigger population is shown in the interval and is estimated approximately as 2.4% to 3.7% and 8.5% to 10.6% respectively.

The posterior in this model is 10.7% for prostate, 11.5% for breast cancer and the covariance is 0.001422. Finally, in the liability scale the heritability values are approximately 65% for prostate and 34% for breast, respectively.

(23)

20

3.4. Simulation results

As discussed in Section 2.4 the values taken to run the simulations are presented in Table 3 below.

Table 3: Simulation inputs where 𝜇1 and 𝜇2 are the posterior means for intercept terms for prostate and breast cancer, respectively. Furthermore, 𝜂𝑝 and 𝜂𝑏 are the heritabilities for prostate and breast cancer, respectively, and 𝜌 is the genetic correlation between prostate and breast cancer.

𝜇1 𝜇2 𝜂𝑝 𝜂𝑏 𝜌

0.03128 0.09553 0.6561839 0.3450741 0.23 The simulations were run 20 times to investigate the hypothetical situation of individuals being able to have both cancers regardless of their gender.

From these simulations 2.7% of the individuals had both cancers and the conditional probability 𝑃(𝑏𝑟𝑒𝑎𝑠𝑡 | 𝑝𝑟𝑜𝑠𝑡𝑎𝑡𝑒) was 17.7%.

The probabilities used for the odds ratio computation are presented in Table 4 below.

Table 4: Average of the probabilities obtained after running the simulation program 200 times where 𝑃𝑛 where 𝑛 = 1, 2, 3, 4 is described in Section 2.4.

𝑃1 𝑃2 𝑃3 𝑃4

0.201 0.136 0.190 0.129

To compute the probabilities 𝑃1, 𝑃2, 𝑃3 and 𝑃4 described in Section 2.4 the algorithm was run 200 times. Odds ratios is a measure of the strength between two events. In this simulation the events correspond to the probability of having breast cancer given the brother has prostate cancer, the probability of having breast cancer given the brother has not prostate cancer, the probability of prostate cancer given the sister has breast cancer and the probability of prostate cancer given the sister has not breast cancer.

Odds ratio results from these probabilities are 𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜1 equals 1.59 and 𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜2 equals 1.58.

(24)

21

4. Discussion

The simulation results show that only 2.7% of the individuals would have both cancers, where the simulations assumed that both prostate and breast cancer could be expressed both for females and males. This is a low number and the conclusion might be drawn that the two cancers do not tend to be inherited together. However, the probability of having breast cancer given the presence of prostate cancer in these simulations is almost 18%.

Furthermore, the posterior mean of 𝜌 was positive (0.23) indicating a genetic correlation between the two diseases.

The odds ratio of having breast cancer given the brother has prostate cancer are increased 1.59 times whereas the odds ratio of having prostate cancer given the sister has breast cancer are increased 1.58 times. Then, knowing that the heritability for prostate and breast cancer is 65% and 34% on the underlying scale and that the proportion of prostate and breast cancer is 3%

and 9.5% respectively, it is reasonable to believe that 1.58 times for prostate cancer and 1.59 times for breast cancer are good predictors of prostate cancer and breast cancer. Moreover, it is important to highlight that this study focuses on heritability regardless of bad habits or obesity which are known to be the main causes of prostate cancer.

The original study in 1958 was focused on seeking the cancer risk increment that relatives can have when a family member developed breast cancer.

Therefore, the data presented breast cancer in all the 426 families since the study started with a woman affected by breast cancer, called the proband.

Fifty years later prostate cancer was included in a sub-study of the original data to investigate whether both cancers cluster in families or, at least, there was a connection between them. It is important to point out that this dataset contains information for five generations of relatives in most of the families.

In this study three separate analyses were carried out: the first one using the whole dataset where the results in the liability scale were inconclusive, the second using high-risk families and therefore the results were biased and the third where a subset of the families which took part in the sub-study was selected. This last approach avoids bias since it includes high- and low-risk families where the low-risk families were selected randomly. As randomness is present, reliable conclusions from the dataset can be obtained.

Furthermore, the whole dataset is used in the simulations while, as being discussed, the sub-study is the dataset for the analysis. For the simulations it is suitable to analyze the whole dataset since both breast and prostate cancer are computed by an algorithm which considers the pedigree data.

The literature review in this study shows that there are several studies which claim that prostate cancer can cluster in families. Also, it is stated that mutations in genes BRCA1 and BRCA2 increase the risk of breast and prostate cancer. Therefore, there is enough evidence to support that breast cancer can increase the risk of prostate cancer and that prostate cancer can

(25)

22 increase the risk of breast cancer. Other studies discover heritability values close to the ones obtained in this study, also using a liability scale.

Specifically, one study carried out in the Nordic countries (Sweden, Denmark, and Finland) found a prostate cancer heritability of 57% and breast cancer heritability of 31% using the liability scale. The results from this thesis are very similar where prostate cancer presents a heritability of 65% and breast cancer 34% using the liability scale.

This study has focused on investigating up to five generations of relatives within families. The output suggests that there is a tendency to develop breast cancer when there are prostate cancer incidents in the family due to the high heritability of prostate cancer. Thus, not only is it important to ask patients for the breast cancer history in the family but also to consider incidents of prostate cancer. In this way, patients with a high risk of developing cancer can be closely monitored, thereby increasing the likelihood of early detection of cancer.

Future research should focus on applying similar models to different data aiming to find heritability patterns for different types of cancer such as: skin melanoma, ovary, and kidney cancer. Those cancers are known to be heritable as well, according to one of the studies mentioned in the literature review. It would also be interesting to analyze, by using simulated data, the probability of developing breast cancer or prostate cancer across generations. For instance, calculating the probability of prostate cancer given the mother or grandmother had breast cancer. There are a few different settings that could potentially lead to an interesting result. Finally, this thesis is focused only on heritability so future studies may add information related to other causes that can increase the probability of either breast cancer or prostate cancer, such as obesity, age, race and smoking.

(26)

23

5. Conclusion

This study shows the inherent connection between prostate cancer and breast cancer; an incident of one in a family unarguably increases the risk of developing the other. Moreover, incidents of both prostate and breast cancer increase the risk of more incidents of cancer in the family and therefore doctors must pay special attention to risk individuals. When diagnosing patients, doctors could also ask for the patient’s family history of prostate or breast cancer so as to have a global overview of cancer risk within the family and thereafter take steps to prevent the illness.

(27)

24

References

Anderson, V. E. (1958). Variables Related to Human Breast Cancer.

University of Minnesota Press.

https://ebookcentral.proquest.com/lib/dalarna/detail.action?docID=345 303

Barber, L., Gerke, T., Markt, S. C., Peisch, S. F., Wilson, K. M., Ahearn, T., Giovannucci, E., Parmigiani, G., & Mucci, L. A. (2018). Family history of breast or prostate cancer and prostate cancer risk. Clinical Cancer Research, 24(23), 5910–5917. https://doi.org/10.1158/1078-

0432.CCR-18-0370

Beebe-Dimmer, J. L., Yee, C., Cote, M. L., Petrucelli, N., Palmer, N., Bock, C., Lane, D., Agalliu, I., Stefanick, M. L., & Simon, M. S. (2015).

Familial clustering of breast and prostate cancer and risk of

postmenopausal breast cancer in the Women’s Health Initiative Study.

Cancer, 121(8), 1265–1272. https://doi.org/10.1002/cncr.29075

Fisher, R. A. (1918). The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans. Roy. Soc, 52, 399–433.

Foulkes, W. D. (2008). Inherited susceptibility to common cancers. New England Journal of Medicine, 359(20), 2143.

https://doi.org/10.1056/NEJMra0802968

Grabrick, D. M., Cerhan, J. R., Vierkant, R. A., Therneau, T. M., Cheville, J.

C., Tindall, D. J., & Sellers, T. A. (2003). Evaluation of familial clustering of breast and prostate cancer in the Minnesota Breast Cancer Family Study. Cancer Detection and Prevention, 27(1), 30–36.

https://doi.org/10.1016/S0361-090X(02)00176-9

Hadfield, J. (2019). Markov chain Monte Carlo generalised linear mixed models - Course Notes. https://cran.r-

project.org/web/packages/MCMCglmm/vignettes/CourseNotes.pdf Hadfield, J. D. (2019). MCMC methods for multi-response generalized

linear mixed models: The MCMCglmm R package. Journal of

Statistical Software, 33(2), 1–22. https://doi.org/10.18637/jss.v033.i02 Jacob B. Hjelmborg, Thomas Scheike, Klaus Holst, Axel Skytthe, Kathryn

L. Penney, Rebecca E. Graff, Eero Pukkala, Kaare Christensen, Hans- Olov Adami, Niels V. Holm, Elizabeth Nuttall, S. H. & M. H. (2014).

The Heritability of Prostate Cancer in the Nordic Twin Study of Cancer.

Cancer Epidemiol Biomarkers, 23(11), 2303–2310.

https://doi.org/10.1158/1055-9965

Lamy, P. J., Trétarre, B., Rebillard, X., Sanchez, M., Cénée, S., &

Ménégaux, F. (2018). Family history of breast cancer increases the risk of prostate cancer: Results from the EPICAP study. Oncotarget, 9(34), 23661–23669. https://doi.org/10.18632/oncotarget.25320 Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M., & Wray, N. R.

(2012). Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics, 28(19), 2540–2542.

https://doi.org/10.1093/bioinformatics/bts474

Lee, Sang Hong, Wray, N. R., Goddard, M. E., & Visscher, P. M. (2011).

Estimating missing heritability for disease from genome-wide

(28)

25 association studies. American Journal of Human Genetics, 88(3), 294–

305. https://doi.org/10.1016/j.ajhg.2011.02.002

Mehrgou, A. A. (2016). The importance of BRCA1 and BRCA2 genes mutations in breast cancer development. Medical Journal of the Islamic Republic of Iran (MJIRI) Iran University of Medical Sciences The, 30:369. https://doi.org/10.1016/j.ecss.2017.04.019

Mucci, L. A., Hjelmborg, J. B., Harris, J. R., Czene, K., Havelick, D. J., Scheike, T., Graff, R. E., Holst, K., Möller, S., Unger, R. H., McIntosh, C., Nuttall, E., Brandt, I., Penney, K. L., Hartman, M., Kraft, P.,

Parmigiani, G., Christensen, K., Koskenvuo, M., … Author, C. (2016).

Familial Risk and Heritability of Cancer Among Twins in Nordic Countries Critical revision of the manuscript for important intellectual content. Jama, 315(1), 68–76.

https://doi.org/10.1001/jama.2015.17703 Pawitan, Y. (2013). In All Likelihood.

Ren, Z. J., Cao, D. H., Zhang, Q., Ren, P. W., Liu, L. R., Wei, Q., Wei, W.

R., & Dong, Q. (2019). First-degree family history of breast cancer is associated with prostate cancer risk: A systematic review and meta- analysis. BMC Cancer, 19(1), 1–13. https://doi.org/10.1186/s12885- 019-6055-9

Sellers, T A; Anderson, V E; Potter, J D; Bartow, S A; Chen, P L; Everson, L; King, R A; Kuni, C C; Kushi, L H; McGovern, P. G. (1995).

Epidemiologic and genetic follow‐up study of 544 Minnesota breast cancer families: Design and methods. Genetic Epidemiology, 12(4), 417–429.

https://onlinelibrary.wiley.com/doi/pdf/10.1002/gepi.1370120409 Sellers, T. A., King, R. A., Cerhan, J. R., Chen, P. L., Grabrick, D. M.,

Kushi, L. H., Oetting, W. S., Vierkant, R. A., Vachon, C. M., Couch, F.

J., Therneau, T. M., Olson, J. E., Pankratz, V. S., Hartmann, L. C., &

Anderson, V. E. (1999). Fifty-year follow-up of cancer incidence in a historical cohort of Minnesota Breast Cancer Families. Cancer Epidemiology Biomarkers and Prevention, 8(12), 1051–1057.

Sellers TA, Potter JD, Rich SS, et al. (1994). Familial clustering of breast and prostate cancers and risk of postmenopausal breast cancer. J Natl Cancer Inst., 86, 1860–1865.

Sinnwell, J. P., Therneau, T. M., & Schaid, D. J. (2014). The kinship2 R package for pedigree data. Human Heredity, 78(2), 91–93.

https://doi.org/10.1159/000363105

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002).

Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 64(4), 583–616.

https://doi.org/10.1111/1467-9868.00353

Therneau, T. M. (2020). coxme: Mixed Effects Cox Models. R-Package Description., 1–14. https://doi.org/10.1111/oik.01149

Villemereuil, P. De. (2012). Estimation of a biological trait heritability using the animal model How to use the MCMCglmm R package.

Visscher, P. M., Hill, W. G., & Wray, N. R. (2008). Heritability in the genomics era - Concepts and misconceptions. Nature Reviews Genetics, 9(4), 255–266. https://doi.org/10.1038/nrg2322

Wilson, A. J., Réale, D., Clements, M. N., Morrissey, M. M., Postma, E.,

(29)

26 Walling, C. A., Kruuk, L. E. B., & Nussey, D. H. (2010). An ecologist’s guide to the animal model. Journal of Animal Ecology, 79(1), 13–26.

https://doi.org/10.1111/j.1365-2656.2009.01639.x

Wu, X., & Gu, J. (2016). Heritability of prostate cancer A tale of rare variants and common single nucleotide polymorphisms. Annals of Translational Medicine, 4(10), 10–13.

https://doi.org/10.21037/atm.2016.05.31

(30)

27

Annex 1

Table 1: description of the variables in the data set and its summary statistics.

Variable Description Summary statistics

Id Identifier

Proband If 1 it is the original person in the study FatherId Identifier of the father, if the father is part of the

data set; zero otherwise

MotherId Identifier of the mother, if the mother is part of the data set; zero otherwise

Famid Family identifier

Endage Age at last follow-up or incident cancer Minimum: 18,21 Mean: 64,28 Maximum: 116

1st quartile: 52 3rd quartile: 76,50 NA’s: 14254

Median: 65 Cancer 1= breast cancer (females) or prostate cancer

(males), 0=censored

Minimum: 0 Mean: 0.067 Maximum: 1

1st quartile: 0 3rd quartile: 0 NA’s: 7549

Median: 0

Yob Year of birth Minimum: 1842 Mean: 1925 Maximum: 2001

1st quartile: 1909 3rd quartile: 1942 NA’s: 7184

Median: 1925 Education Level of education: 1-8 years, 9-12 years, high

school graduate, vocational education beyond high school, some college but did not graduate, college graduate, post-graduate education, refused to answer on the questionnaire.

Minimum: 1 Mean: 3,683 Maximum: 9

1st quartile: 3 3rd quartile: 5 NA’s: 21886

Median: 3 Marstat Marital status: married, living with someone in a

marriage-like relationship, separated or divorced, widowed, never married, refused to answer the questionnaire.

Minimum: 1 Mean: 1,847 Maximum: 9

1st quartile: 1 3rd quartile: 3 NA’s: 21886

Median: 1

Everpreg Ever pregnant: never pregnant at the time of baseline survey, ever pregnant at the time of baseline survey

Minimum: 0 Mean: 0,904 Maximum: 1

1st quartile: 1 3rd quartile: 1 NA’s: 21899

Median: 1

Parity Number of births Minimum: 0 Mean: 1,37 Maximum: 23

1st quartile: 0 3rd quartile: 2 NA’s: 3327

Median: 0

Nbreast Number of breast biopsies Minimum: 0 Mean: 0,071 Maximum: 1

1st quartile: 0 3rd quartile: 0 NA’s: 24340

Median: 0

Sex M or F Females: 12818 Males: 13502

NA’s: 1761

Bcpc Part of one of the families in the breast/prostate cancer substudy: 0=no, 1=yes. Note that subjects who were recruited to the overall study after the date of the BP substudy are coded as zero.

Minimum: 0 Mean: 0,3603 Maximum: 1

1st quartile: 0 3rd quartile: 1 Median: 0

(31)

28

Annex 2

This annex represents the simulation algorithm explained in Section 2.4.

library(kinship2) library(MCMCglmm) library(pedigree) library(sqldf) library(coda) data("minnbreast")

##Input values for the simulations mu1 <- 0.03128 # prostate

mu2 <- 0.09553 # breast

herit1 <- 0.1074405 #prostate herit2 <- 0.1150283 #breast rho <- 0.23 #genetic correlation

# Liability Threshold model h2_o_prostate <- herit1 K_prostate <- mu1

z_prostate <- dnorm(qnorm(K_prostate))

h2_l_prostate <- h2_o_prostate * K_prostate * (1-K_prostate)/(z_pr ostate^2)

cat("Heritability for prostate cancer on the underlying liability scale:", h2_l_prostate, "\n")

h2_o_breast <- herit2 K_breast <- 1-mu2

z_breast <- dnorm(qnorm(K_breast))

h2_l_breast <- h2_o_breast * K_breast * (1-K_breast)/(z_breast^2) cat("Heritability for breast cancer on the underlying liability sc ale:", h2_l_breast, "\n")

# Step 1:

# Calculating the parameters used in the algorithm.

sigmag1 <- sqrt(h2_l_prostate/(1-h2_l_prostate)) sigmag2 <- sqrt(h2_l_breast/(1-h2_l_breast)) mu1 <- qnorm(mu1)

mu2 <- qnorm(mu2) e1 <- 1

e2 <- 1

minnbreast$sigmaG <- rep(sigmag1,nrow(minnbreast))

# Step 2:

(32)

29 geneticValue <- 0

for (i in 1:nrow(minnbreast)){

#Simulating genetic values for those individuals with no parent s.

minnbreast$geneticValue[i] <- ifelse(minnbreast$fatherid[i]==0 &

minnbreast$motherid[i]==0,

rnorm(1,mean = 0, sd=minnbr east$sigmaG[i]),

NA) }

# Step 3:

v <- 0 si <- 0

for (indiv in 1:nrow(minnbreast)){

# Taking fatherid

FID <- minnbreast[minnbreast$id == indiv, "fatherid"]

# Taking motherid

MID <- minnbreast[minnbreast$id == indiv, "motherid"]

# Using fatherid/motherid for the computation

# If we have NA then we will write 0 so that there is no problem in the computations

# Otherwise we take the genetic value of the father/mother

GVf <- ifelse(is.na(minnbreast[minnbreast$id == FID, "geneticVal ue"]), 0, minnbreast[minnbreast$id == FID, "geneticValue"])

GVm <- ifelse(is.na(minnbreast[minnbreast$id == MID, "geneticVal ue"]), 0, minnbreast[minnbreast$id == MID, "geneticValue"]) # Checking if the id has parents

if (FID != 0 & MID != 0) {

# The id has parents so we calculate si as N(0, sqrt(0.5)*sigm aG)

minnbreast$si[indiv] <- rnorm(1, mean = 0, sd = sqrt(0.5) * mi nnbreast[minnbreast$id == indiv, "sigmaG"])

# Calculating v as si + (Vsi + Vdi)/2

minnbreast$v[indiv] <- minnbreast$si[indiv] + (GVf + GVm)/2 #

} else {

# The individual has no parents so si is 0 and V will be indiv idual's genetic value

minnbreast$si[indiv] <- 0

minnbreast$v[indiv] <- minnbreast[minnbreast$id == indiv, "gen eticValue"]

} }

# Steps 4, 5, 6 and 7:

# Creating the matrices G11 <- sigmag1^2

G12 <- rho * sigmag1 * sigmag2 G21 <- rho * sigmag1 * sigmag2 G22 <- sigmag2^2

(33)

30 G <- as.matrix(cbind(c(G11,G21), c(G12,G22)))

L11 <- sigmag1^2 L12 <- 0

L21 <- rho * sigmag2

L22 <- sigmag2 * sqrt(1-rho^2)

L <- as.matrix(cbind(c(L11,L21), c(L12,L22)))

minnbreast$m_i <- rnorm(nrow(minnbreast), mean=0, sd=1) minnbreast$u_i <- L21 * minnbreast$v + L22 * minnbreast$m_i

# Step 4

minnbreast$Y1 <- mu1 + minnbreast$v + rnorm(nrow(minnbreast), 0, e 1)

minnbreast$Y2 <- mu2 + minnbreast$u_i + rnorm(nrow(minnbreast), 0, e2)

minnbreast$Y1_1 <- ifelse(minnbreast$Y1 > 0, 1, 0) minnbreast$Y2_1 <- ifelse(minnbreast$Y2 > 0, 1, 0)

# Results

cat("Proportion of simulated individuals with prostate cancer:", m ean(minnbreast$Y1_1), "\n")

cat("Proportion of simulated individuals with breast cancer:", mea n(minnbreast$Y2_1), "\n")

cat("Proportion of simulated individuals that would have both canc ers if not separately expressed in the two sexes: ")

cat(mean(1*(minnbreast$Y1_1 ==1 & minnbreast$Y2_1 ==1)), "\n")

# Formula:

# P(breast | prostate)= P(breast & prostate)/P(breast) cat("P(breast | prostate): ")

cat(mean(1*(minnbreast$Y1_1 == 1 & minnbreast$Y2_1 ==1) / mean(min nbreast$Y2_1)), "\n")

# P(prostate | breast)= P(prostate & breast)/P(prostate) cat("P(prostate | breast): ")

cat(mean(1*(minnbreast$Y2_1 == 1 & minnbreast$Y1_1 ==1) / mean(min nbreast$Y1_1)), "\n")

# We know that the proband is a female and has breast cancer

# But for the simulation it does not matter if the individual has or has not cancer

# Since it will be simulated.

prob <- sqldf("SELECT * FROM minnbreast WHERE proband = '1'")

# The second step is to take all the siblings from the family

References

Related documents

The present thesis explored the effect of light pressure effleurage massage in women with breast cancer in six main domains; nausea, anxiety, depression, quality of life, stress

The present thesis explored the effect of light pressure effleurage massage in women with breast cancer in six main domains; nausea, anxiety, depression, quality of life, stress

In my project, I studied one of the most important genes in our body, RARRES1, that play an important role in different mechanisms of our body.. RARRES1 is also involved

To evaluate the effect in women with BCa and hot flushes of 12 weeks of acu- puncture or two years of hormone therapy on both the number of hot flushes per unit time and the level

When talking about sexual identity this is most commonly connected with sexual orientation and self-identification with a particular group of people (Ridner,

In collaborating with profes- sor Carina Berterö, Department of Medical and Health Sci- ences, Faculty of Health Sciences, Linköping University and Kerstin Sandell associate

Both Brazil and Sweden have made bilateral cooperation in areas of technology and innovation a top priority. It has been formalized in a series of agreements and made explicit

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större