Contributing factors to apartment pricing in Stockholm Vasastan:: An analysis using multilinear regression

(1)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2020,

Contributing factors to apartment pricing in Stockholm

Vasastan - An analysis using multilinear regression

REZA DALFI

SEBASTIAN GIERLOWSKI CARLING

KTH

(2)

(3)

Contributing factors to apartment pricing in Stockholm:

Vasastan - An analysis using multilinear regression

Reza Dalfi

Sebastian Gierlowski Carling

ROYAL

Degree Projects in Applied Mathematics and Industrial Economics (15 hp) Degree Programme in Industrial Engineering and Management (300 hp) KTH Royal Institute of Technology year 2020

Supervisor at KTH: Mykola Shykula, Julia Liljegren Examiner at KTH: Sigrid Källblad Nordin

(4)

TRITA-SCI-GRU 2020:122 MAT-K 2020:023

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

Acknowledgements

We would like to express our gratitude to our supervisor Julia Liljegren, examiner Sigrid Källblad Nordin and the cooperation between KTH, Rosane Hungria Gunnelin, and Datscha, Julia Olsson, that enabled us to acquire data. We would also like to express our appreciation to our supervisor Mykola Shykula for his extensive assistance during the entire process.

(6)

(7)

Abstract

This thesis uses multilinear regression analysis to identify the variables and the magnitude of the variables affecting the housing market in Vasastan, a district of Stockholm, Sweden. We then make an attempt to generalize the results to the entire Stockholm area, and reason around why certain factors may be important drivers of price. The factors identified to affect the prices are the number of rooms, living area, floor number, fee and age of the building.

Some results we find are intuitive while others are less so. Some of the factors in our regression model can be changed without a massive change in construction price, which means there is a real world application of our thesis to increase the value of newly built apartments.

(8)

(9)

1. Introduction

1.1 Background

The Swedish housing market, and the market in Stockholm in particular, is under constant debate, with discussions about a potential housing bubble, the relative lack of new apartments being built in comparison to the steadily increasing demand, and the difficulty for young people to independently integrate into society and get their own place to stay. For example, in 2018 it was reported that the monthly salary required to buy the average 30 square meter one room apartment in Stockholm, provided that one could fund the necessary deposit, was at least 36 thousand SEK. Meanwhile the average monthly salary of 25-29 year olds in Stockholm was 40% lower than that requirement for men and 47% lower for

women.(StockholmDirekt 2018)

With the basic concept in economics of supply and demand it is easy to draw some

conclusions of why the prices of condominiums in Stockholm keeps increasing by the year.

These conclusions can be based on the fact that as the capital city of Sweden, more people are inclined to move here, as well as various business related factors such as Stockholm ranking in different lists of tech-oriented cities for startups, etc. During the short period of January 2020 to March 2020 a total of 2535 condominiums were sold in Stockholm City

central.(Svensk Mäklarstatistik 2020) When compared to the period of December 2013 to February 2015 where only 1521 apartments were sold we can see that the frequency of sales in this market has increased substantially. When observing this data we can see that it is interesting to understand what makes an apartment valuable out of a socio-economic perspective by applying mathematical models to quantify these factors, in this case;

Regression modelling.

(12)

1.2 Purpose

The purpose of this thesis is to find which factors inherent in an apartment itself that may have the biggest effect on a apartments's valuation. We will not go into detail about how the process of selling the apartment or similar external factors may affect the final valuation of the of it. Instead, we will see how factors such as the number of rooms in the apartment, area, construction year, monthly fee and combinations of these factors affect the prices in general.

This is of great interest to both individual housing companies, as well as city planners, as it sends a signal of how housing companies should plan the layouts of the apartments they produce to maximize their revenue. It is then up to them to do this in a cost-efficient way to maximize profits, and for city planners to maximize the utility of their inhabitants.

In some way, this is meant to help produce properties that to a greater degree meet the demands of the people living in them, where we let the final price of the apartments act as a proxy of how much in demand the apartments were. We assume that with a large enough sample, we will not have a bias in the form of for example much more luxurious decorating in three bedroom apartments affecting their apartment prices, or something of that sort, but that rather these differences will average out and the effect of the different variables will be a close representation to how important the actual variables are.

The problem statement which this thesis circle around can be narrowed down to these two questions:

1. What quantitative factors affects the value of sold condominiums in the Stockholm district of Vasastan?

2. To what extent do they influence the price?

1.3 Previous research

Every year a compilation is produced of the county governments regarding that years housing market. The data is complemented with a yearly survey sent out by Boverket, regional

development program, statistics from Statiska Centralbryån (SCB) and various reports and

(13)

investigations to ensure it is as comprehensive as possible. The aim of these analyses is to provide regional governments and actors in the housing market support and a deeper understanding on the progress. Some interesting conclusions from this work is how:

● The amount of resources the municipalities is lacking. Statistics and methods missing to map the amount of people affected by the lack of housing.

● The supply increases but the demand remains i.e. continued competetion of the same kind of housing despite extensive housing production in several municipalities.

● Problems resolving housing isn’t resolved by only an increase in supply. Solutions need to be found regarding existing issues.

● National and local investments done to produce cheaper apartments are failing.

- amongst many other conclusions. It is important to understand that this data and analysis is made based on data from all of Sweden while our project is focused on one small district in the capital. An interesting topic that is brought up which we exclude is the demographic and surrounding factors. An example is how elderly is particularly vulnerable due to the fact that they are conservative and tend to not want to relocate. But for those who want or must they are faced with situations where they for example. have an ability to pay but the willingness isn’t there, or the ability to pay at all is a reality.(Boverket 2019)

2016 an article was published regarding price trends in the housing market with focus on geographical analysis by Ina Blind, Matz Dahlberg och Gustav Engström. The authors acquired their data through Mäklarstatistik AB. The data was reported to cover 80% of all sold units during the period of their scope, 2011-2015. A price index was used to ignore the time differences. What’s interesting with the chosen price index is that it was based on hedonic price functions. This means that a unit can be looked upon as a set of characteristics which together generate its value, which is a big aim of how we are trying to build up our own regression model. The authors moved on by building up a regression model with a logarithmic dependent variable and a set of regressors and dummy variables. The result of their work that is relevant for this thesis can be summarized as in the region we are observing, Stockholm - Vasastan, a price increase was the reality during this period. From the period of 2015 to today we have no indications that this hasn’t continued.

(14)

To summarize this section, we note that the amount of data and research in the matter is extensive. Our conclusion of why this subject is so vastly debated and researched is because of two main reasons. The first one is based on the degree of impact on the population, everyone needs affordable housing, and the second one is the broad scope. There are several entry points to analyse this subject from statistical, political, economical standpoints. All these entry points can also be broken down into smaller parts.

1.4 How our research contributes to the current academic literature

The past decade the amount of data collected and used for statistical purposes has considerably increased. The applications of these works vary greatly; predicting and analyzing price trends, the volatility of demand and supply are some of the common

denominators. This has also made it possible to observe the effects of political influence such as the legislation regarding amortization requirements which reduced the demand.

(Riksgälden 2019) Beyond the scope of what research is done on a national level, there are various theses in the subject already published. These previous dissertations also revolve around modelling apartment pricing in Stockholm using regression. However, what we aim to do is eliminating external factors such as cultural effects of valuation. This means that we will, for example, narrow the city of Stockholm down to one district so we don’t get affected by the volatile price differences between for example Östermalm and Kungsholmen. By excluding these factors from the model we aim to see what actually has the greatest impact on apartment valuation. This separation also eliminates the creation of a too complex regression model.

(15)

2. Theory and hypothesis

2.1 Introduction to Multiple Linear Regression

The multiple linear regression model is denoted such as

x .. x

y_i= β₀ + β_{0 i1}+ . + β_{k ik}+ ε_i (2.1)

in this case is the dependent variable and is , ,..., represents the multiple regressor

y_i x₁ x₂ x_k

variables. By observing various data we can use this model to forecast the value of y depending on the values of the covariates .x_i

, , are the parameters also referred to as the regression coefficients. A simple β_j j = 1 2 . k, , .,

way of understanding their impact in the model is by observing the value. When x_i(i = )/ j are held constant i.e. all of the remaining regressor variables, the value of β_j will represent the expected change in the response variable per unit change in . is called partialy x_j β_j regression coefficients. For the last term, the error term, which is normally distributed and denoted as ,ε_i (Montgomery, Peck, Vining, 2012, pp. 68)

For various reasons regarding modelling, handling data or displaying results the usage of matrix notation is more convenient, (Montgomery, Peck, Vining, 2012, pp. 72).

(16)

2.2 Assumptions within the regression model

Our model, regression, is based on five important assumptions to maintain a reliable and valid end result. (Montgomery, Peck, Vining, 2012, pp. 129). These five

assumptions are presented in this section.

1. The first assumption is based upon eq. 2.1. The dependent variable should be able to be expressed as the equation mentioned i.e. a linear function of the covariates. To prevent violation of this assumption we must choose the correct regressors. Also one must maintain certainty that there isn’t any nonlinear relationship between the dependent variable and the covariate.

2. The error term ε has zero mean. Expressed mathematically; E ε[ ] = 0 .

3. The error term ε has constant variance . Contradiction to this assumption isσ² the detection of heteroscedasticity. In short terms this means that the variance of the error term isn’t constant. Explained deeper in the section below 2.1.2.

4. The errors are uncorrelated, which means that they do not move together, which would imply they are in fact not error terms, but rather something that should be incorporated into the model..

5. The errors are normally distributed. This means they are spread around the mean, which is zero, with values for the error terms being more likely to occur the closer they are to zero.(StatisticsSolutions n.d.)

2.3 Ordinary Least Square Estimation

The OLS method, ordinary least square estimation, will be the optimal estimator for the parameters which are unknown to us. The estimated value of the OLS is denoted with a hat,β in this caseβ.^︿To optimize this value we minimize the sum of the squared errors. This is done by the traditional way of optimizing where you put the derivative equal to zero. In our case the derivative is equal to zero in regards of the sum of squared errors of , and this is will^︿β finally provide us the the least square estimator of asβ

(X X)

︿β

= ^T ˆ⁻¹* y (2.3)

(17)

However, for this equation to be valid the inverse matrix (X X) ^T ˆ⁻¹must exist and this is true if the covariate terms are linearly independent, i.e. no column is a linear combination of another column. There are several properties of the least square estimators. One of these are that the estimators are unbiased i.e. is an unbiased estimator of which means that^︿β β

. (β) E ^︿ = β

2.4 Homoscedasticity and Heteroscedasticity

Homoscedasticity and Heteroscedasticity refers to the phenomenon in where the variance of the error term remains constant or differ. Both these scenarios will be explained and

visualised below.

2.4.1 Homoscedasticity

Homoscedasticity is the goal of the residual plot in the regression model. This means that the variance of the error term remains constant, which is one of the five assumptions mentioned above. Mathematically this is denoted by:

ar(ε |x )

V _{i i} = σ² (2.4)

where is the error term, is the measure of a covariate and ε_i x_i σ² is the variance. For a reliable result and model we aim to have homoscedasticity. This due to the fact that

independent of which value we take the response variable will be consistently predicted. See figure 2.1^.

2.4.2 Heteroscedasticity

Heteroscedasticity is phenomenon of when the variance isn’t constant and instead you will have a function to denote the error term. If we observe eq. 2.4 we will now have σ² replaced with f(x )_i which will give us the equation

ar(ε |x ) (x )

V _{i i} = f _i (2.5)

See figure 2.1

To maintain a model without any heteroscedasticity we must observe and verify that there is no occurrence of this phenomenon for any covariate in our regression model. Since

heteroscedasticity implies that the value of the dependent variable isn’t accurate for all input

(18)

data it gives us an undesirable model. Therefore we must validate that heteroscedasticity doesn’t exist.

The issue of heteroscedasticity is that the error terms would be incorrect, and that the OLS estimator is no longer the best linear unbiased estimator, BLUE. The OLS would still be an unbiased linear estimator, but no longer give the highest explanatory effect. The smaller the sample size, the bigger the issue of heteroscedasticity would be. Due to our large sample, it is unlikely that it would have a large effect on our regression model. However, there are

solutions to the issues of heteroscedasticity, which we will cover below, and which will be applied in our regression model.

2.4.3 Detecting heteroscedasticity

There are numerous ways of detecting the occurrence of constant error terms in a model and by that logic also multiple ways of identifying heteroscedasticity. We will be presenting two of these approaches but only the statistical testing method is used throughout this thesis.

1. Graphically

Creating a residual plot and observing it graphically gives you a rough assessment whether there is any heteroscedasticity present. By plotting the least squared residuals towards the explanatory variable you can eliminate or detect heteroscedasticity by observing any type of pattern. For it to be homoscedastic you are looking for a random but equal distribution of residual points throughout the plot. However, even though a visual test can be

revealing, and in many situations correct, it is better to use a more rigorous method.(rstudio-pubs-static 2016)

Figure 2.1

(19)

2. Statistical testing

A more formal and mathematical way of detecting heteroscedasticity is by using what is called the Breusch-Pagan Test. This involves an variance function and using a χ² − test. The null hypothesis of the Breusch-Pagan test is that there is no heteroscedasticity. This means that if the test returns a p-value below our set limit, we reject the null hypothesis that there is no heteroscedasticity, and thereby statistically confirm that there is.

2.4.4 Solutions to heteroscedasticity

As discussed previously, there are ways to combat the issues with heteroscedasticity in an OLS regression model. This is done by using a generalized least squares estimator, GLS , instead of the OLS, to perform the regression analysis.

2.5 Multicollinearity

When two or more regressors are what you call near-linear dependent the problem of multicollinearity exists. The aim of regression modelling is to estimate the relationship between variables. You have parameters and independent variables that all affect the

dependent variable. When you have several regressors without any linear dependency i.e. no relationship between them, they are orthogonal. However, when regression modelling is applied this phenomenon is rare and they are typically not orthogonal, i.e. there is some type of linear relationship between the regressors. If the regressors are multicollinear there will be a larger variance on the OLS estimators of these parameters thus the model will be less applicable. There are other effects of multicollinearity such as when the produced

least-squares estimates are too large in absolute value. In short terms this means that the^︿β length of vector is longer than the vector . What this means is that the method of least^︿β β squares produces estimated regression coefficients with absolute values that are too large.(Montgomery, Peck, Vining, 2012, pp. 290)

2.5.1 Detecting Multicollinearity

There are different ways of detecting multicollinearity. Some of these are observing a linear dependency through a scatter plot, examining the correlation matrix or the variance inflation

(20)

method, VIF method, (Montgomery, Peck, Vining, 2012, pp. 292-296).

Due to the fact that the correlation matrix shows simple pairwise correlation this can exclude any possibility of collinearity between more complex relationships. An example of this would be if regressor x₁isn’t collinear to x₂but is to (x₂+ x₃). Because of this we will proceed with the VIF method and compliment it with a scatter plot.

VIF Method:

The diagonal elements of the inverse matrix are useful to detect multicollinearity. To express this mathematically you calculate an VIF-Value and this is denoted as:

iagonal element of (X X) V IF_k = ¹

(1−R )_k² = d ^T ˆ⁻¹ (2.6)

equals to the coefficient of determination when , the , predictor is regressed against

R²_k x_k k^th

all other predictor variables. These other variables is seen as independent variables.

Depending on the VIF value we can draw certain conclusions.

VIF-Value >5: Further investigation

VIF-Value>10: Certain harmful collinearity

The mean value of all these VIF-Values is also of interest. If the mean value >>1 then one might worry about how good the model is at interpreting these coefficients.

Scatter plot:

A scatter plot is an easy and functional way of seeing indicators of multicollinearity. By taking the values of two different regressors and plotting them towards each other you

construct a scatter plot. Depending on the output you can observe various results. If the plot is scattered around a clear line then there is reason the suspect multicollinearity.

(21)

2.5.2 Solving Multicollinearity

If multicollinearity is detected it must be handled in some way. There are several approaches to remedy the issue and they all have their pros and cons. One way is to analyse the results of the VIF-values and eliminate regressors that show a high value. Thereby you will also

remove any regressors that doesn’t provide any new information. The problem with removing any regressors is that the remaining ones will be biased. But with a high enough VIF-Value an elimination won’t affect the R²substantially. Other paths to solve the issue can be by adding data to break the multicollinearity. Adding data can however be inappropriate due to the fact that the new data can extend out of the analyst’s region of interest.(Montgomery, Peck, Vining, 2012, pp. 303-304) You can also make an experiment on the affected regressors and find a fixed coefficient and use that constant instead of computing them through data.

We use the following two quotes as additional guidelines when creating the model,(Kennedy 2008, pp. 194-197)

1. “Don’t worry about multicollinearity if the R2 from the regression exceeds the R2 of any of the independent variable regressed on the other independent variables.”

2. “Don’t worry about multicollinearity if the t-statistics are all greater than 2.”

2.6 Model validation

2.6.1 R

²

and adjusted R

²

and adjusted are the basic tools to validate a linear regression model. These values

R² R²

indicate how well the fitted model matches the data. The R² can take on values between 0 and 1, where 0 means our model explains nothing of the variation in the dependent variable, and 1 means our model explains all of the variation in the dependent variable.(D.

Montgomery, E. Peck, G. Vining 2012)

(22)

Adjusted R²is and extension of the R²value, which punishes the use of too many

independent variables to explain the dependent variable. This is because with the regular R² value, adding additional independent variables can either increase the explanatory power of the model or keep it the same, but never decrease it. This runs the risk of making people throw in a lot of variables with no real effect on the value of the dependent variable, just because they may be correlated in some way, and then claim that they have improved the explanatory power of the model. The adjusted R²instead motivates people to use the correct explanatory variable, as just adding independent variables with no real explanatory effect will reduce the adjusted R².(D. Montgomery, E. Peck, G. Vining 2012)

2.6.2 Hypothesis testing

Hypothesis testing is the act of using statistics to test what the probability is that a hypothesis holds. This is done by formulating a null hypothesis H0, and an alternative hypothesis H1.

The null hypothesis is then tested with a test statistic, and a p-value is calculated. The p-value is the probability that the test statistic performed is at least as significant as can be observed in the test, under the assumption that the null hypothesis holds. If the p-value is equal to or lower than a given boundary α, which is typically set at 0.1, 0.05 or 0.01, the null hypothesis is rejected at the α significance level, and H1 is then statistically significant.

2.6.3 F-Statistics & p-value

F-Statistics is a way to decide if your null hypothesis should be rejected or accepted. If your result is significant it means that the likelihood it happened by chance is low. In statistics and in your F test you will have a F-Value and a F-Critical Value which is referred to as

F-Statistics. A general rule is that if your calculated F-Value is larger than the F-Statistic then you can reject the null hypothesis.

You can compute the F-Value using:

SSE /m)/SSE /n

F = ( ₁− SSE₂ ₂ − k (2.7)

SSE = Residual sum of squares m = Amount of restrictions

k = Amount of independent variables

(23)

From the F-Value you can then calculate the F-Statistic:

Statistic

F = V ariance of the group means

mean of the within group variances (2.8)

Having a significant result does not mean that all the regressors in the model are significant for the result. This is why you also compute a p-value. While the F-Statistic is giving information of the joint effect the p-value is a good indication of which individual variables are statistically significant. An approach is to set up an alpha level of for example 0.05. This will narrow down which variables you need to investigate, exclude etc.(Statistics How To n.d.)

2.7 Macroeconomic theory

The fundamental theory of macroeconomics revolves around AS and AD, aggregate supply and aggregate demand. When aggregate supply equals aggregate demand, the market is in equilibrium. The model of supply and demand is typically illustrated with quantity on the x-axis and price on the y-axis, where supply is an upward sloping function and demand is a downward sloping function. What this model tells us is that the higher the price of a good or service, the more people will strive to supply it, be it through production or through

sacrificing a certain quantity of this good or service from their own use, and simultaneously, the less people will demand it. Equilibrium is the point in the model where the supply equals the demand of a good or service. (Mankiw, 2012)

Figure 2.2

(24)

Figure 2.2 symbolises how the two factors influence each other whilst constantly converging towards new equilibriums. The image is taken from a well known article in the subject, ”Tax subsidies to owner-occupied housing: an asset market approach” written by James M. Poterba (1984). Even though we eliminate some of the factors that are presented in that article that influences the supply and demand it’s important to understand their significance. Poterba presents a model where the demand is determined by the ownership cost. This can be categorized into two parts, one observable and one non-observable. The first one is straight forward, costs that occur and are easy to connect to the ownership such as maintenance, repairs, operating costs etc. The latter category include depreciation, taxes, interest etc.

Regardless of what costs that affects the ownership, it is interesting to see how the curve and the market behaves and adapts when a cost reduction emerges. Point A represents the first equilibrium state. Point B represents the new state after a cost reduction for the owners. This means that demand has increased, allowing for a higher price level.

What we research in this thesis is what factors affects the demand. As housing is slightly different than most goods in that the vast majority of people will not consume neither more nor less than 1 unit of housing, we do not research the quantity of housing demanded. This is more a question of how many people live in a country and how comfortable people are living together with others, neither of which is within our scope. While we do not investigate the quantity of housing demand, we are investigating what factors affect housing demand on an aggregate level. With knowledge of this, large-scale builders would be able to adapt their production and supply apartments that are more in demand from the population, which would be beneficial to all parties involved. This becomes more viable due to the fact posted by Boverket 2019 that even though there has been an extensive housing production the last few years the demand of the same kind of housing still remains constant.(Boverket 2019) This is an indication that the issue isn’t necessarily a low quantity of apartments, but rather a low quantity of the right kind of apartments.

The demand of apartments are functions of much more than is the scope of this thesis. For example, factors such as the interest rate, limits to leverage and the general economic well-being of the country will all affect the demand for apartments, but this thesis is limited to investigating factors affecting the demand of apartments that are inherent in the apartments

(25)

themselves. The supply can be expressed more constant in the real estate market. Not in the way that it doesn’t change, more so that the supply curve has a tendency to resist short-term impact. A combination of this with sellers resistance of going under the high buyout price results in a phenomenon called ”downward price stickiness” where the price is resilient to decrease even though there are fundamental reasons for it to do so. (Adams & Füss, 2010) The process for new production is more than often or always on demand i.e. research is done on several areas to see where, what and how much to build. The amount of new productions in correlation to how many existing condominiums is very low but the ratio of sales is different where 12-15% of all sold apartments are new produced.(Riksgälden 2019) Due to the fact that this process can take several years city planners must investigate and plan for a foreseeable future. Meanwhile the demand can change on a much shorter time period. An example of how the demand changed rapidly is how the amortization requirement legislated by the government affected young people and first time buyers in Stockholm.(Stockholms Handelskammare 2019).

(26)

3. Method and study design

3.1. Processing data

To prevent the results and the thesis in general to become to abstract we will narrow our scope to our best ability. This includes geographical, time and variable restraints. This due to the fact that the aim of this thesis is not to involve outside factors such as the cultural

opinions of an district. Our results shall present the physical factors that drives apartment valuation. Our time horizon is set between April 2019 - April 2020. A time period of one year not only gave us enough data to observe but also lets us assume that the volatility of time money doesn’t need to be taken into account for. Geographically we have chosen only to observe data from the area with the postal code starting with 113, Vasastan. A total of 2651 different observations was taken into account to develop the model. Out of these, 334

observations were found to contain missing values. These were removed in order to be able to program regression models, and we ended up with 2317 valid observations.

After processing the data, the set of 2317 apartments left to analyse had the following properties:

Table 3.1

Variable Min Max Unit

Room 1 7 No.

Area 16 311 m²

Age 2 158 No.

Floor 1 16 No.

(Monthly) Fee 0 9881 SEK

Price 712 500 38 000 000 SEK

(27)

The data we have analysed is gathered from Datscha and was provided to us thanks to a cooperation between KTH and Datscha.(Datscha 2020) This cooperation is made available to us for research purposes and contact persons at these institutes were:

- Rosane Hungria Gunnelin, KTH, Postdoc

- Julia Olsson, Datscha - Customer Success Manager.

The data contained information regarding; Monthly fee, Monthly fee/ m², floor; total floors, construction year, price, price today, date of sale, housing cooperative, coordinates;

longitude/latitude, living area.

The first regression model we created was an OLS model. This model contained the six independent variables, Monthly fee,m² floor, total number of floors, age of the building, and number of rooms. Five of these six variables, all except the total number of floors, were found to have a significant effect on the price and were thus kept in the next iteration of the

regression model, while this variable was discarded.

3.2. Variable exclusion

There are different methods to decide which variables to use are such as all possible

regression. However, this method was excluded relatively early in the process. The subject of the thesis is vastly wide and therefore it is a risk to create a model too complex with too many regressors. By formulating a narrow problem statement we could ignore variables such as district for example. This due to the fact that when collecting data we only focused on our scope. Our method in hand for the variables we had data on was to exclude variables with a p-value higher than 5%. By doing so we could eliminate variables such as coordinates, housing cooperative, Monthly fee/ m², date of sale, total number of floors due to the fact that these would affect the result beyond our scope or because of a high p value.

3.3. Analysing heteroscedasticity

The model with the five independent variables was then tested for heteroscedasticity with a Breusch-Pagan test. The test gave a p-value on the Breusch-Pagan test that was essentially zero.This means that our null hypothesis, which is that there is no heteroscedasticity, is

(28)

rejected. When heteroscedasticity is present, the OLS estimator is no longer the best linear unbiased estimator, and thus we had to recode our regression model into a generalized least squares estimator model, or GLS.

3.4. Analysing multicollinearity

The newly iterated regression model, with a higher R-squared value than the OLS model, was then tested for multicollinearity using a Value Inflation Factor, or VIF. The VIF identified one independent variable with a value slightly above five, which indicates that it should be looked into further. This was the m² variable, which had a VIF of 5.408. We investigated this variable but found that removing it drastically lowered our R-squared value of the regression model, and hence elected to keep it. See table below for all VIF values.

Table 3.2

Rooms area Monthly fee Age Floor

4.338157 5.407848 2.192770 1.106539 1.086648

3.5. The initial regressors

The regressors chosen for the model are presented in the table below;

Table 3.3 Regressor Notation Unit Comment

x₁ rooms No. No. of rooms in the condominium.

x₂ area m² Living area.

x₃ fee SEK The monthly fee in SEK.

x₄ age No Years since construction instead of construction year.

(29)

3.6. Model adjustments

After step 3.1 - 3.5 we could present an initial model. However, by observing the residual analysis of this model (see below 4.1.2.2) we came to the conclusion that the model had to be reworked. This was accomplished by first identifying the factors that caused this, and then we iterating the steps above on the new model.

We could first and foremost determine that the extremities needed to be handled. These outliers was dealt with through Cook’s Distance. Moving forward we also realised that the dependent variable price needed to be logarithmic, due to the distribution not being normally distributed. This eliminated a few of our problems with the residuals. The remaining issues could be resolved by implementing the variable areasq; area².

The new and final model had similar behaviour and values when looking upon

heteroscedasticity and multicollinearity. This could be resolved without any major effort. The only noticeable deviation was an increased VIF-value with the regressor area which is

understandable and acceptable due to the fact that area and area² has a strong connection.

This wasn’t seen as harmful enough to eliminate one of the variables due to the necessity of them both for a valid model with a accepted residual analysis.

Table 3.4

Regressor Notation x₁ rooms

x₂ area

x₃ area²

x₄ fee

x₅ age

x₆ floor

(30)

4. Results

4.1 The initial model

The model presented down below has been developed by the software R.

Here you have the estimates presented in simple non logarithmic units. Below you can see the regression equation stated by this model. This output also provides us with the necessary information to evaluate the various model validation tools such as p-value, R²etc which is necessary for our analyze in 4.2.1. The regression equation is expressed in a matter based on the assumption made in section 2.2.

Figure 4.1

(4.1) Y = β₀+ β_{1 1}x + β_{2 2}x + β_{3 3}x + β_{4 4}x + β_{5 5}x

The initial regression equation rice − 51943.93

P = 7

ooms 18679.65

+ r * 2

rea 8522.80 + a * 8

ee 78.63

− f * 1 ge 263.29 + a * 9

+ floor* 105902.64 (4.2)

(31)

From equation 4.2 we can see that the only negative coefficient is the one related to fee meaning for all other regressors, a higher value means a higher price. For example, increasing the value of rooms by 1 would increase the price by 218679.65 SEK. The fact that an

increase in fee decreased the price of the apartment is not surprising. A higher fee results in a higher ownership cost, which normally decreases the demand and therefore the price.

4.2 Validation of the initial model

4.2.1 R-squared, p-values and t-statistic

Model validation is performed by evaluating the R², the p-values and the t-statistics. First of all, we consider the R² value of the regression. The adjusted R² value tells us how much of the variation in the price of an apartment that is explained by our model, given the set limitations. We find that the R² value of our model is 0.9007, which means it explains 90,07% of all price variations.

Furthermore, all p-values are controlled as stated in the data processing, such that no variable that does not show statistical significance is kept in the final model. All variables used in the final model have p-values below 0.01, meaning that they are all significant at the 0.01 level.

Lastly, we want to ensure that the t-statistics are all greater than 2 or less than -2. This gives us a second confirmation that all independent variables are significant. As can be seen in the table above, the lowest absolute t-statistic value of the model is 5.849

4.2.2 Residual analysis

QQ-plot Histogram

(32)

Figure 4.2 Figure 4.3 Figure 4.4

The values provided from the model regarding R-squared, p-values and t-statistic could be interpreted in a way that our model was a good interpretation of the data. However the residual analysis indicates otherwise. Cross checking the information provided from the plots in the residual analysis to the assumptions mentioned in section 2.2 we can gather new

information regarding our initial model. The histogram, see figure 4.4, shows that the residual is normally distributed around zero which in theory tells us that the second assumption can be regarded as valid. However you can very easily see that the fitted residual plot, see figure 4.2, is funnel shaped and this is a bad indication and therefor we have to furthermore work with our dataset. The QQ-plot also shows a non sought after behaviour. This is visible in the endpoints where you see a tan-like curve instead of linear function, compare figure 4.3 with figure 4.7. This is normally an indication that the outliers have too much influence on the model.

4.3 Handling of outliers

Cook’s distance was used to identify and handle outliers. Observations larger than 4/n, with n being 2317, were removed to increase model accuracy. The model with outliers removed included 2188 observations. To further improve the model accuracy, we logarithmized the dependent variable (price), and added the square of the independent variable area, which is seen as areasq in the model in chapter 4.2. Examine differences in the endpoint in both QQ-plots, see figure 4.3 and figure 4.7.

(33)

4.4 The revised model

Output from R. The estimates corresponds with the coefficient which together forms the regression equations used to predict the prices. Important note is that the dependent variable price is logarithmic. Therefore if one seeks to transform the value of the variable into the unit SEK you must follow the fundamental rules of logarithms i.e. e^price.

Figure 4.5

Ln(Y ) = β₀ + β_{1 1}x + β_{2 2}x + β_{3 3}x + β_{4 4}x + β_{5 5}x + β_{6 6}x (4.1)

The final regression equation n(P rice) 4.087273334

L = 1

ooms .058474054

+ r * 0

rea .022737105 + a * 0

reasq .000066871

− a * 0

ee .000066871

− f * 0

ge .001937448 + a * 0

+ floor* 0.020424929 (4.2)

(34)

Compared to the previous regression equation we can see that all the coefficients remain either positive or negative. The new variable areasq obtained a negative coefficient even though it is strongly correlated with area that has an positive coefficient. This means that the increase of a factor 1 in living area will have less of an impact in a bigger condominium due to the fact that the areasq will possess a stronger influence the bigger the value.

The decision to logarithmize the dependent variable had several benefits presented in section 4.5.2. but the downside is that the coefficient and the regression equation isn’t as easy to interpret. When compared to the initial regression equation we could determine only by glancing at the equation that an increase of rooms by factor 1 would increase the value by 218679.65 SEK. In this equation we have instead a logarithmic value of 0.058474054. If we want to investigate the value change of a factor 1 in rooms we now have to perform some simple mathematical operations based on the laws of logarithms.

Simply take

e

rooms 0.058474054* , for 1 room this will yield

e

0.058474054which is equal to 1.060 which can be translated into 6%. An increase of 1 room will increase the value by 6%.

And for the entire model you simply perform the same operation when you have inserted all the various regressor values. What this means is for example if you obtain an apartment with a ^L^{n(P rice)}^{≈ 1}^5.32 then you perform the mathematical operation of

e

^15.32

^{which will}

return a value of 4.5MSEK

4.5 Validation of the revised model

4.5.1 R-squared, p-values and t-statistic

To validate our final model we used the same tools as described in the initial model, see 4.2.2.1. We can, with the new model, see an even higher âdjustedR² of 0.9311 compared to the previous 0.9007. This means that new model has an 3.04% explanation degree higher than the previous one. We also analyzed the p-values ândt-statistics as in section 4.2.1 to ensure the significance of the variables in the model such as checking their p-valuesând

(35)

t-statistics. We could see that the final model performed equally or better in these tests with the benefit of a stronger residual analysis, see below.

4.5.2 Residual analysis

^{Figure 4.6} Figure 4.7 Figure 4.8

Figure 4.6-4.8 tells us that the changes made to the data gave us a more sought after behaviour with residuals. The elimination of the outliers affected the QQ-plot in a way that moved the plot further away from a tangent to a linear function. The alterations to the initial model i.e. the use of a logarithmic dependent variable and also squaring the area both led to a more scattered and random fitted plot. Even tho one could argue that our histogram was valid, see figure 4.4, the one presented above, see figure 4.8, is a much better result.

By observing the individual scatter plots of some of the regressors we get a deeper understanding of what effect the new model have, see figure 4.9-4.11.

In the plots below we can see how area was one of the regressors that had an substantial negative effect on the model. In the first plot, see figure 4.9, we can see a clear pattern of how the residual is moving in a function like behaviour. This resembles a x²function, more precise a negative x²function. By adding a sixth regressor which is areasq we could

eliminate this phenomenon and the results is presented in plot to the right. The plot from the final model represent both what we have but also a sought after behaviour from regressors i.e.

it fulfills the assumptions necessary and presented in section 2.2.

(36)

Figure 4.9

The next plot we chose to demonstrate is from the regressor rooms, see figure 4.10. This due to the fact that it was the second highest to reap the benefits of our alterations from the initial to the final model. Same as with area we can see a behaviour which isn’t ideal and a function like plot was visible. However, in this case we didn’t need to add an regressor. Just by

eliminating the outliers with Cook’s Distance combined with a logarithmic dependent variable we saw an improvement. The reason why you have a stronger result in the middle is solely because of the dataset i.e. we have more data points on those nodes.

Figure 4.10

The third and final comparison is between a regressor that didn’t show any remarkable change, age, see figure 4.11. Age, floors and fee was all variables that from the initial model showed that their residual plots held the measurement to be approved without further

(37)

remodeling. Therefore we chose only to demonstrate one of these instead of all three. You can see a change from these plots but not in their behaviour and implications.

Figure 4.11

The key information that lies within all these plots above is tied together with the assumptions mentioned in section 2.2. Throughout this comparison it is clear that the

assumptions are now fulfilled in the revised mode. We can now state that the error term has a zero mean with a constant variance. Also the error is normally distributed and the correlation that was present have been incorporated into the model, see the behaviour of the initial area plot vs theareaplot when the variablearea²is incorporated, figure 4.9.

4.6 Application on real data

In order to validate our model, we gathered sales data from 23 new apartment sales and used the model to predict the price, and plotted it against the actual price that the apartments sold for. We did this to ensure that our model is not only valid on the data that was used to create the model, but that it is also applicable to predict what an apartment would cost, given the variables that we used to predict the model. Figure 4.12 below illustrates the actual prices that the apartments sold for on the x-axis and the value of the price that our model predicted on the y-axis. Our model i.e. the y-axis predicts the Ln(price) but this has been translated into SEK for ease of comparison. The dotted line indicates where the observations would be if our model predicted the prices with 100% accuracy for every observation.

(38)

Figure 4.12

As the plot shows, see figure 4.12, our model accurately predicts the price. We do not identify any observations with a large deviation from the predicted values. Out of the 23 observations, our model is able to predict 19 of the 23 prices within a 10% error margin, with two observations being more than 10% above and two being more than 10% below the predicted value. All observed prices are within 16% of the predicted value.

Given that our model can accurately predict prices of apartments from the cheaper side of the spectrum to the more expensive side, we are satisfied that applying the model will give the user a good idea of what an apartment with certain characteristics should cost. Slight

deviations are bound to occur due to the fact that we have left out variables that are harder to measure, such as the general quality of the apartment, and the performance of the real-estate agent. However, for predicting the value of what can be seen as a “standard” apartment in Vasastan, regardless of the price level of it, our model performs in a satisfactory way.

While we are satisfied with the quality of our model, a way to further test the accuracy of our model, would be to test it on a larger data sample. As it stands, we tested it on 23

observations, and found that it had a good fit. However, if we had access to a larger data set, we would have preferred to test the model accuracy on a sample of at least a couple of

hundred apartments. However, we are limited in the data we are able to access at this point in time, and all the observations we used to test the model had to be gathered manually, leading to the relatively small sample of model validation observations.

(39)

5. Discussion

5.1 Variable selection & exclusion

The variables were selected based on the what features of the apartment itself that may affect the prices. We can only test for things that are measurable, which is why factors such as layout and design of the apartment, despite being features of the apartment, are not taken into consideration. The one variable we would have liked to include in the study that is a

measurable variable and would have been a dummy variable in our regression model, is whether the apartment has a balcony. However, that particular variable was not present in the database, and we had to make do with what we had. The high R² value of our model

indicates that it was not a huge loss, since it is able to explain more than 93.11% of all price differences based on our data.

A factor that was a part of a first iteration of the model but was later discarded is the total number of floors in the building. The thought was that living in a high rise may add some form of status via the building itself, and therefore add value to an apartment even if it was on a lower floor. This variable did however not have a significant effect on the price. This may be due to several different reason, such as status not being important when choosing the particular building to live in the district of Vasastan in Stockholm, or the lack of high rises. It would be interesting to see what effect the status of a building could have on the price in an area where the concept of a building with a lot of status is more prominent, such as New York City. Seeing if status between buildings affect the prices in such an area, and if the total number of floors in turn affect the status of the building, would make a good topic for another time.

The variables left in the study are all the prominent variables one will see in apartment advertising. Due to these variables explaining so much of the price, we do not believe that there is any major important quantifiable factor that is generally left out in apartment marketing. The one variable that we believe could affect the price of an apartment, that is inherent to the apartment but that we were unable to measure due to the fact that it was not

(40)

present in our data, is whether the apartment has a balcony. However; our model is still able to explain over 93% of the price of an apartment, so being able to measure this as a variable would not cause a big difference in our model. If this is due to very few apartments in

Vasastan having balconies, most of the apartments in Vasastan having balconies, or balconies being unimportant for people buying apartments in Vasastan, is a topic for another thesis.

5.2 Indications of Covariate

In this section follows a short interpretation of what the covariates indicates.

First of we look at the effect that the number of rooms has on the price. We can see that on average, one additional room increases the value of an apartment by e0.058474054 ≈ 1.06. This implies that every new room increases the price with 6% . This variable is interesting due to the fact that it is often possible to increase the number of rooms in an apartment without requiring a bigger plot to build on. If a builder can design an apartment for example from three to four rooms without changing the living area, they can increase the price of an

apartment, without having to spend more money on a bigger plot. In this case it simply comes down to the additional cost of building the extra walls required to make another room, and if this is lower than the increase in price, there is a profit to be made. This is quite intuitive given the situation explained earlier with the housing supply in Stockholm. Many people are struggling to find a place to live, so being able to fit one more person into a given apartment should increase its value.

Secondly, we have the living area. This is perhaps the most obvious variable, as it simply is a factor without downsides. Even though the coefficient for the areasq is negative that doesn’t imply that an increase in the living area means a decrease in the price. The fact that the value for area is positive and have a greater absolute value than areasq implies that increasing the living area of the apartment increases its price. However, the positive effect of an extra square meter is smaller the greater the apartment is, as an extra square meter of living area will have a larger effect on the areasq variable if the apartment is larger.

Comparing with rooms for example, where an additional room brings benefit of the apartment being able to house more people, but also making each room smaller, an extra

(41)

square meter does not detract from any other part of the apartment. Therefore an apartment should become more expensive the bigger it is, which it does.

The fee is not as obvious as the previous factors, despite looking very intuitive at first glance.

A high fee for an apartment could signal many different things, such as a high level of debt due to investments, bad management of the housing cooperatives funds, or many amenities which has a high maintenance cost. An interesting topic would be to see if people regard the fee of an apartment differently depending on what the fee is financing, but that is not within the scope of this thesis. For now, we conclude that in general people become less inclined to pay for an apartment the higher the monthly fee is, which is what one would expect.

One factor we find interesting is how the age of the house affects the price. We find that the price increases for every year older the house is. This means that if one wants to buy an identical apartment in a house that is much older than what they are currently living in, but otherwise identical in factors such as living area and number of rooms, they would have to pay a substantial amount extra. We find this quite puzzling, as we would perhaps expect newer buildings to carry a greater value. However, the monthly fee is strongly correlated to the housing cooperatives LTV, loan to value ratio. Housing cooperatives of new productions tend to have a higher loan ratio and therefore must claim a higher monthly fee. This will result that the total monthly cost of a purchase is lower for an older condominium.

Finally we have the floor number. Previous research has shown that the price of an apartment increases for every floor up to floor five. (Booli 2013)

While that report calculates the value of increasing a floor in a different way than we do, which has implications for it when it goes above five floors, with other things having a negative effect on the apartment price, the base effect is that increasing the floor number also increases the price. We find that same effect, with each floor number increasing the price of an apartment by 2% on average. We believe this effect to be associated with the improved view and distance from the roads on the ground level, especially in such a beautiful city as Stockholm.

(42)

5.3 Findings and Conclusions

Due to the fact that we isolated just one district in Stockholm, the first model we created would likely not be directly applicable on data beyond this district, that is, not in Vasastan.

However, the updated model uses a logarithmic scale for the price, and hence, the regressors drive a percentage difference in price, rather than a concrete price change. This makes the model more useful in other districts than Vasastan and on other cities than Stockholm as well, as the percentage difference in price for adding an extra room, for example, likely is rather equal in for example Stockholm and Gothenburg, while the actual price difference is likely very different, due to other factors. We would not, however, recommend using this model on places with a vastly different housing market than Stockholm, such as a smaller city on the countryside. This would cause a larger risk of an incorrect prediction, as the demand is likely to be affected differently from these variables in places where it is easier to find housing.

If the model was to be used in another city district or city to predict the value of an apartment, the value our model gives would obviously have to be changed with an index between the districts. For example, apartments on Strandvägen in Stockholm may be 30% more expensive than in Vasastan, while apartments in Hornstull may be 20% less expensive, on average. To get an accurate value for an apartment using our model, one will of course have to adjust for this. However, if one wants to know how much an extra room or ten extra square meters will affect the price of their apartment, the model does not have to be adjusted.

In conclusion, we find both intuitive and less intuitive results from our regression model. One thing that we certainly find interesting, and would like to see a study on, is why older houses are more expensive in Vasastan in Stockholm, and whether this effect holds for the entire Stockholm. While we are very confident that the effect of the rest of the variables will hold in a different district in Stockholm, with reservation for scaling according to an apartment price index should one want to predict the price of an apartment, this is the factor that we are not sure whether it would be unaffected if we looked at a different district, and therefore, we urge any reader to consider this a topic for future research.

(43)

5.4. Further research

With the results of the thesis presented and analysed there are still room in this subject for more in depth investigation. For several reasons our access to data was limited, the main one being Covid 19. This both put restraints on which variables we could analyze but also time restraints. For future research there are various ways to either reinvestigate our problem statement or totally approach the matter differently.

Throughout this entire process we were clear that we wanted to quantify physical attributes of a condominium as mentioned in problem statement. A different approach could be to change or broaden the scope in a way where cultural variables is investigated more thoroughly. For example districts in Stockholm are identified with various features. You can even push it further to see what variables have a great impact in Stockholm and bench these regressors towards other cities. A great example is how the regressor total floor gave our model nothing but in a city characterized by skyscrapers as New York City that regressor can have play a bigger role.

One could also implement dummy variables such as balcony, proximity to public

transportation, real estate agents ability to increase the selling price, interest rates etc. These are all variables that can possess valuable information to the final price but only needs to be measure as a dummy i.e. the value of 0 or 1 to verify their existence. This will not only yield us a different result but unlocks new tools to analyze the data such as interplay between the dummy and the dependent variable. The outcome will result in two different gradients which both have explanatory values.

(44)

6. References

Printed sources:

Adams Z, Füss R (2010) Macroeconomic determinants of international housing markets. Journal of Housing Economics 19: 38-50

Montgomery. D, Peck. E, Vining. G (2012): Introduction to Linear Regression Analysis. 5th edition, Wiley-Interscience.

Kennedy, Peter (2008). A Guide to Econometrics 6th edition. Malden: Wiley-Blackwell.

Mankiw. N Gregory,(2012). Macroeconomics 8th edition, Worth Publishers

Poterba. James M (1984), Tax subsidies to owner-occupied housing: An asset-market approach, The Quarterly Journal of Economics, November 1984

Web-based sources:

Blind, Dahlberg & Engström 2016, “Prisutvecklingen på bostäder i Sverige – en geografisk analys”

Booli 2013, viewed 7 May 2020

<http://www.mynewsdesk.com/se/booli.se/pressreleases/vaaning-5-ger-hoegst-kvadratmeterpris-9261 42?fbclid=IwAR3dP9e7ByOr-C3K841aGOSRE7cL2oLqAycwlAvZ3hUzwRLcCVU038jACu4>

Boverket 2019, viewed 22 May 2020

<https://www.boverket.se/contentassets/6a2b012de53444a8a9c4f1a922d21443/regionala-bostadsmark nadsanalyser-2019.pdf>

Boverket 2019:14 “Kostnaden för att bo” viewed May 2020

https://www.boverket.se/globalassets/publikationer/dokument/2019/kostnaden-for-att-bo.pdf>

Datscha 2020, viewed 21 April 2020 <https://system.datscha.com/Transaction/Sweden/Valueguard>

Bjellerup. Mårten , Majtorp. Lina, Riksgälden 2019, viewed 7 May 2020

<https://www.riksgalden.se/contentassets/123d8a09ad2a46d6b2f5024d959477ad/2019-05-28-fokusra pport-bostadsprisernas-utveckling.pdf>

rstudio-pubs-static 2016, viewed 7 May 2020

<https://rstudio-pubs-static.s3.amazonaws.com/187387_3ca34c107405427db0e0f01252b3fbdb.html>

Statisticssolutions n.d., viewed 7 May 2020

<https://www.statisticssolutions.com/assumptions-of-multiple-linear-regression/>

(45)

Statistics How To n.d. , viewed 7 May 2020

<https://www.statisticshowto.com/probability-and-statistics/f-statistic-value-test/>

StockholmDirekt 2018, viewed 7 May 2020

<https://www.stockholmdirekt.se/bostad/det-har-maste-du-tjana-for-att-fa-kopa-en-etta/reprfh!mio7Y 9EDICURKhSbiCMzw/>

Svensson. Lars E.O, Stockholms Handelskammare 2019, viewed 7 May 2020

<https://larseosvensson.se/files/papers/amorteringskraven-felaktiga-grunder-och-negativa-effekter.pdf

>

Svensk Mäklarstatistik 2020, viewed 7 May 2020

<https://www.maklarstatistik.se/omrade/riket/stockholms-lan/stockholm/#/bostadsratter>

Contributing factors to apartment pricing in Stockholm Vasastan:: An analysis using multilinear regression

Contributing factors to apartment pricing in Stockholm

Vasastan - An analysis using multilinear regression

REZA DALFI

SEBASTIAN GIERLOWSKI CARLING

Contributing factors to apartment pricing in Stockholm:

Vasastan - An analysis using multilinear regression

Reza Dalfi

Sebastian Gierlowski Carling

Acknowledgements

Abstract

Table of Contents

1. Introduction

1.1 Background

1.2 Purpose

1.3 Previous research

1.4 How our research contributes to the current academic literature

2. Theory and hypothesis

2.1 Introduction to Multiple Linear Regression

2.2 Assumptions within the regression model

2.3 Ordinary Least Square Estimation

2.4 Homoscedasticity and Heteroscedasticity

2.4.1 Homoscedasticity

2.4.2 Heteroscedasticity

2.4.3 Detecting heteroscedasticity

2.4.4 Solutions to heteroscedasticity

2.5 Multicollinearity

2.5.1 Detecting Multicollinearity

VIF Method:

Scatter plot:

2.5.2 Solving Multicollinearity

2.6 Model validation

2.6.1 R

and adjusted R

2.6.2 Hypothesis testing

2.6.3 F-Statistics & ​p-​value

2.7 Macroeconomic theory

3. Method and study design

3.1. Processing data

3.2. Variable exclusion

3.3. Analysing heteroscedasticity

3.4. Analysing multicollinearity

3.5. The initial regressors

3.6. Model adjustments

4. Results

4.1 The initial model

4.2 Validation of the initial model

4.2.1 R-squared, ​p​-values and t-statistic

4.2.2 Residual analysis

4.3 Handling of outliers

4.4 The revised model

e

e

e

​

4.5 Validation of the revised model

4.5.1 R-squared, ​p​-values and t-statistic

4.5.2 Residual analysis

4.6 Application on real data

5. Discussion

5.1 Variable selection & exclusion

5.2 Indications of Covariate

5.3 Findings and Conclusions

5.4. Further research

6. References

2.6.3 F-Statistics & p-value

4.2.1 R-squared, p-values and t-statistic

4.5.1 R-squared, p-values and t-statistic