• No results found

Automated Bid Adjustments in Search Engine Advertising

N/A
N/A
Protected

Academic year: 2021

Share "Automated Bid Adjustments in Search Engine Advertising"

Copied!
56
0
0

Loading.... (view fulltext now)

Full text

(1)

INOM

EXAMENSARBETE

DATALOGI OCH DATATEKNIK,

AVANCERAD NIVÅ, 30 HP

,

STOCKHOLM SVERIGE 2017

Automated Bid Adjustments

in Search Engine Advertising

(2)

Automated Bid Adjustments in

Search Engine Advertising

Student name: Mazen Aly

Master Degree Project

Royal Institute of Technology (KTH)

Sweden

(3)

Acknowledgements

I would like to acknowledge the help and support that I got from my examiner prof. Magnus Boman and my academic supervisor prof. Sarunas Girdzijauskas. Many thanks to my supervisor at Precis digital, Carl Reg˚ardh, for his guidance and support throughout this project. Thank you Precis Digital and especially the data science team (Jo˜ao Coelho, Marie Ericsson, Patrik Berggren and Pierre Rudolfsson) for the continuous support and for our discussions that helped shape this thesis. Special thanks go to my wife, Rewan, who is always there to support me. This would not be possible without her.

(4)

Abstract

In digital advertising, major search engines allow advertisers to set bid adjustments on their ad campaigns in order to capture the valuation differences that are a function of query dimensions. In this thesis, a model that uses bid adjustments is developed in order to increase the number of conversions and decrease the cost per conversion. A statistical model is used to select cam-paigns and dimensions that need bid adjustments along with several techniques to determine their values since they can be between -90% and 900%. In addition, an evaluation procedure is developed that uses campaign historical data in order to evaluate the calculation methods as well as to validate different approaches. We study the problem of interactions between dif-ferent adjustments and a solution is formulated. Real-time experiments showed that our bid adjustments model improved the performance of online advertising campaigns with statistical significance. It increased the number of conversions by 9%, and decreased the cost per conver-sion by 10%.

Keywords: Digital Advertising; Bid Adjustments; Optimization; Statistical Analysis; A/B Testing.

Sammanfattning

I digital marknadsf¨oring till˚ater de dominerande s¨okmotorerna en annons¨or att ¨andra sina bud med hj¨alp av s˚a kallade budjusteringar baserat p˚a olika dimensioner i s¨okf¨orfr˚agan, i syfte att kompensera f¨or olika v¨arden de dimensionerna medf¨or. I det h¨ar arbetet tas en modell fram f¨or att s¨atta budjusteringar i syfte att ¨oka m¨angden konverteringar och samtidigt min-ska kostnaden per konvertering. En statistisk modell anv¨ands f¨or att v¨alja kampanjer och dimensioner som beh¨over justeringar och flera olika tekniker f¨or att best¨amma justeringens storlek, som kan sp¨anna fr˚an -90% till 900%, unders¨oks. Ut¨over detta tas en evalueringsme-tod fram som anv¨ander en kampanjs historiska data f¨or att utv¨ardera de olika metoderna och validera olika tillv¨agag˚angss¨att. Vi studerar interaktionsproblemet mellan olika dimensioners budjusteringar och en l¨osning formuleras. Realtidsexperiment visar att v˚ar modell f¨or bud-justeringar f¨orb¨attrade prestandan i marknadsf¨oringskampanjerna med statistisk signifikans. Konverteringarna ¨okade med 9% och kostnaden per konvertering minskade med 10%.

Nyckelord: Digital Marknadsf¨oring; Budjusteringar; Optimering; Statistisk analys; A/B-testning.

(5)

Contents

1 Introduction 7

1.1 Research Problem and Motivation . . . 7

1.2 Aim and Scope . . . 8

1.3 Contributions . . . 8

1.4 Environment . . . 8

1.5 Methodology . . . 8

1.5.1 Data Collection Methods . . . 10

1.5.2 Data Analysis Methods . . . 10

1.5.3 Quality Assurance . . . 10

1.6 Thesis Outline . . . 11

2 Background and Literature Review 12 2.0.1 Early Models . . . 12

2.0.2 Advertisers Metric of Success . . . 12

2.0.3 Performance-Based Advertising . . . 13

2.1 Search Engine Perspective . . . 13

2.1.1 Google Adwords . . . 14

2.1.2 Search Advertising Terms . . . 14

2.1.3 Ad Auctions . . . 15

2.2 Advertiser Perspective . . . 15

2.3 Bid Adjustments . . . 17

3 Campaigns and Dimensions Selection Model 18 3.1 Motivation . . . 18

3.2 Metric of Optimization . . . 18

3.3 Diminishing Returns Law . . . 19

3.4 Hypothesis Testing . . . 20

3.5 Analysis of Variance (ANOVA) . . . 21

3.5.1 Assumptions Validation . . . 21

3.5.2 Post-hoc test . . . 23

3.6 Chi-Squared Test for Independence . . . 23

3.6.1 Motivation . . . 23

3.6.2 Assumptions . . . 23

3.6.3 Formulation . . . 23

4 Adjustments Calculations and Evaluation 25 4.1 Marginal ICPA Method . . . 25

4.2 Constrained Linear Regression Method . . . 26

4.3 Average of Slopes Method . . . 28

4.3.1 Formulation . . . 28

4.4 Total ICPA Method . . . 28

4.5 Cost Weighting . . . 29

4.6 Evaluation Procedure . . . 29

4.6.1 Traditional Evaluation Methods . . . 29

(6)

4.7 Adjustments Interactions . . . 31 4.7.1 Challenges . . . 31 4.7.2 Solution . . . 32 4.7.3 Minimization Procedure . . . 33 5 Real-time Experiments 35 5.1 A/B testing . . . 35 5.2 Adwords Experiments . . . 35 5.3 Experiment Design . . . 36 5.3.1 Binomial Test . . . 36

5.3.2 A Priori Power Analysis . . . 36

5.3.3 Experiments Campaigns . . . 37

6 Results 38 6.1 Aggregate Results . . . 38

6.2 In-depth Analysis . . . 39

7 Discussion 42 7.1 Statistical Significance Discussion . . . 42

7.2 Adjustments Calculations Discussion . . . 44

7.3 Risks and Recommendations . . . 45

7.3.1 Cross-device Conversions . . . 45

7.3.2 Broad vs Exact Keywords . . . 45

7.3.3 Conversions Time Lag . . . 46

8 Conclusions 47 8.1 Future Work . . . 48

References 49

(7)

Acronyms

CPA Cost Per Acquisition. CPC Cost Per Click. CPM Cost Per Mille. CR Conversion Rate. CTR Click-Through Rate.

GSP Generalized Second-Price auction. ICPA Inverse of Cost Per Acquisition. SEM Search Engine Marketing. SEO Search Engine Optimization. SERP Search Engine Results Page. ROAS Return On Ad Spend.

Glossary

Campaign:

In our context, a campaign is just an organization of ads and keywords, as opposed to being a whole marketing campaign that lasts for a short duration of time.

Conversion:

A conversion is the action that the advertisers want the user to make after clicking on an ad. Normally, it is a purchase of a product or service, but it can also be other actions like sign ups, phone calls, downloading a brochure, application installation, etc. Some sources use the word acquisition to mean the same thing. In this thesis, we use the two words interchangeably. Impression:

Impressions are defined as the number of times an ad is shown. An impression is counted when an ad is displayed on the search engine results of a user.

Keyword:

Keywords are words and phrases that are chosen by the advertisers to describe a product or service. They help determine when and where the ads appear in the search engine results. Organic Search Results:

Organic search results are shown as a result of just being relevant to a query entered in a search engine, and they do not include any paid ads.

(8)

Chapter 1

Introduction

Search Engine Marketing (SEM) allows advertisers to take advantage of millions of searches conducted on search engines each day by driving interested people to their websites. Advertisers create campaigns and ads of their businesses, and they pay to search engines in order for their advertisements to be shown beside the organic search results. Search Engine Optimization (SEO) is different from SEM in the sense that website owners do not pay to make their websites appear in the search engine results page (SERP). It is mainly the process of changing a web page and follow best practices in order to be trusted by search engines and improve its visibility in organic results. This chapter intends to provide the reader with an overview of the field of search advertising as well as our research problem, motivation, methodology, outline and limitations of this thesis project.

1.1

Research Problem and Motivation

In search advertising, advertisers bid on certain keywords in order for their clickable ads to appear in the search engine results. One of the most important challenges that advertisers face in SEM is determining the bid values for each keyword in an advertising campaign. This problem is interesting in the advertising industry as it affects the advertisers marketing costs and profits. Bids calculations affect the marketing costs as a bid is the maximum cost that the advertiser is willing to pay when an ad get clicked. Bidding on the right keywords with the right values can get a profitable click as the user can make a conversion.

If we bid on a keyword that has a high probability of conversion with a too small bid, then our advertisement can be shown on a low position or not be shown at all, and thus we may lose a conversion and decrease our profits. On the other hand, high bids on keywords with poor performance can exhaust our marketing budgets without achieving our marketing goals. As shown in chapter 2, calculating the bids values is a challenging problem and researchers tackle it from different angles using various sources of information as there are many aspects that affect these calculations, like market changes, competitors behavior, seasonality, keyword relevance to the search query, keyword-match type, etc.

Our work builds on previous work in the sense that it uses more dimensions of the search query like user device and day of the week not to calculate the the bids, but rather to adjust the already calculated bids. These dimensions can have a significant effect on the value of the ad to the bidder, as well as the market price of the ad placement.

In 2013, major search engines started to allow an advertiser to set bid adjustments or modifiers on their ad campaign in order to account for differences in valuation that are a function of these types of dimensions [30]. The transition to this mode of bidding has been characterized as one of the most important recent changes to search engines [31]. Advertisers are allowed to submit adjustments along with their bids, and the adjustments can be made on features of the search query including time of day, day of the week, location, device type. They are made to allow us

(9)

show the ads more or less frequently based on where, when, and how people search and that is why we believe that using bid adjustments on the already calculated bids can yield better results.

1.2

Aim and Scope

When working on digital advertising, many goals can be set, like creating brand awareness, increasing the revenues or increasing the online traffic to the advertiser. The aim of this thesis is to provide a scalable method for selecting adwords campaigns that need bid adjustments in order to increase conversions and decrease the cost per conversion.

There are several dimensions that can be used in bid adjustments like device, location, day of the week and time in the day. Although the techniques that are developed in this thesis can be used for all dimensions, the focus of the thesis is on the device and day of the week dimensions, as if we incorporate all dimensions, it will be problematic to test all dimensions in real-time experiments. In other words, if the experiment does not yield the intended results, it would be difficult to know what is the cause of the results. It is recommend in A/B testing and online experiments to make only one change at a time to understand the results [6]. In addition, the experiment duration takes more than one month, and for this project, there is not enough time for multiple experiments. At the same time, our model addresses the challenges of interactions between multiple dimensions. As a result, we have to work with more than one dimension to test and validate the interactions part.

1.3

Contributions

Several contributions are presented in this thesis. First we provide a statistical model to select campaigns and dimensions groups that need bid adjustments. Second, we propose several techniques for determining the values of bid adjustments and we compare between them. Third, we develop an evaluation framework that uses historical data of the campaigns in order to evaluate different technique as well as validate the adjustments calculations. Fourth, we discuss the problem of multiplicative bid adjustments and we present a solution for it. Finally, we design real-time experiments to evaluate our bid adjustments and we discuss the results.

1.4

Environment

This master degree project is carried out during an internship within the data science team at Precis Digital which is a data-driven digital marketing company in Stockholm. Precis Digital was founded in 2012 and it takes a data-driven approach to maximize the outcome of its clients’ digital marketing investments. Precis Digital is the winner of the Best Large PPC Agency award in the European Search Awards 2017 [7].

In this project, Google AdWords (owned by Google, the giant search engine.) is the online advertising platform that is used, because it is the main search advertising platform at Precis Digital, and thus, it is the source of the used data is this project. However our work can be used for bid adjustments in other major search engines like Bing.

1.5

Methodology

Although there are lots of good and rigorous literature about methods and methodologies in academic research [2, 3, 4], we follow the framework of research methods and methodologies presented in Fig 1.1. This framework helps in selecting and applying the best suited methods that belong together as well as avoiding picking methods that do not match. It contains the

(10)

methodologies that are commonly used in information technology [18].

Figure 1.1: The portal of research methods and methodologies. Adapted from [18].

When selecting methods for our research, every layer in the portal, starting from the top, is investigated before entering the next layer and towards the bottom. As recommended in [18], we select and apply at least one method on this project, before moving to the next layer. The basic categories of research methods are either quantitative methods and qualitative methods. These two methods are considered to be polar opposites [9], and they are applied on projects that are either numerical or non-numerical [18]. This project is of a quantitative research nature as modeling, experiments and testing are done by measuring metrics to verify or falsify theories and hypotheses which are measurable and quantifiable.

We base our work on large data sets and statistics is used to test hypotheses and evaluate the results. Although the project is mainly of quantitative nature, we use a method called triangulation and we borrow some methods that are actually qualitative like exploratory data analysis in order to get a complete view of the research area, and to ensure correctness, credi-bility and validity of the results [18].

In this project we follow the positivism paradigm [10] which assumes that the reality is ob-jectively given and independent of the observer and instruments, to be concrete, the models are based on real-world historical data of search advertising campaigns. This assumption works in projects that are of experimental and testing character. It dismisses or proves a phenomenon by drawing inferences from the sample to the population, quantifying measures of metrics, and testing hypotheses. The positivist assumption works well for testing performances within in-formation technology.

In this project, we use a hybrid of two methods namely descriptive research & applied re-search. Descriptive research method, also called statistical research, studies phenomenon and describes its characteristics but not its causes. It can use either quantitative or qualitative methods [2]. We focus on finding facts in already existing data, of the effect of several di-mensions like user device on advertising campaigns performances. It can be used for all kinds of research or investigations in computer science genre that aim to describe phenomenon or characteristics [18].

The second method is applied research that involves answering specific questions or solving known and practical problems. The method examines a set of circumstances and it often builds on existing research. In addition, applied research uses data directly from the real work and

(11)

ap-plies it to solve problems and develop practical applications and that is the goal of this project. Applied research is used for all kinds of research or investigations, which is often based on basic research and with a particular application in mind.

In this project, deductive approach [5, 11] is used to verify or falsify hypotheses, and it is almost always, used with quantitative methods with large data sets. Hypotheses are expressed in measurable terms, explaining what and how the metrics are to be measured. The outcome is a generalization that is based on the collected data, along with explanations of the results. The research strategies and designs are the guidelines for carrying out the research [18]. We use the ex post facto research which is similar to experimental research [18] but does not control or change the independent variable since it is carried out after the data is already collected which is the case in this project. Ex post facto = after the fact, means that it searches back in time to find plausible causal factors. The method also verifies or falsifies hypotheses and provides cause-and-effect relationships between variables [12].

1.5.1

Data Collection Methods

In this project, data collection is straightforward, as we use the historical data of many Adwords campaigns during the internship at Precis Digital. The task is automated using Adwords API [20] which allows us to interact directly with the platform, vastly increasing the efficiency of analysing large AdWords accounts and campaigns. In addition, we use real time data for running and evaluating online experiments.

1.5.2

Data Analysis Methods

The data analysis methods are used to analyze the collected material. It is the process of inspecting, cleaning, transforming and modelling data. It supports decision-making and drawing conclusions [18]. In this project, the following methods are used for data analysis:

Statistics: Both descriptive and inferential statistics to analyze the collected data, infer in-formation and evaluate the significance of the results.

Mathematics: Used for calculating numerical methods, modelling and optimizations. Visualizations: For deep and better understanding of the characteristics of the collected data.

1.5.3

Quality Assurance

Quality assurance is the validation and verification of the research material. Since we have a quantitative research, with a deductive approach, we must apply and discuss validity, reliability and ethics [13].

Validity: In quantitative research, we must make sure that the test instruments actually are measuring what is expected to be measured [3, 13]. In this project, we assume that the data collected from Adwords are measured correctly.

Reliability: It refers to stability of the measurements [3] and the consistency of the results for every testing. and to ensure that, we use statistical tests significance.

Ethics: Throughout the work on this project, we maintain the privacy of the clients of Precis digital. The collected data is treated with confidentiality and presented in the thesis after anonymization.

(12)

1.6

Thesis Outline

In chapter 2, we discuss the background of digital advertising, literature review and the context of this project as well as how it relates to previous work. Chapter 3 presents the metric that we optimize for, as well as the model for selecting the campaigns and dimensions groups for bid adjustments. In chapter 4, we propose several techniques for adjustments calculations and we discuss the problem of adjustments interactions. In addition we discuss an evaluation procedure that is used in selecting, validating and evaluating different methods. Chapter 5 presents the design of online experiments to test our model on real data, and in chapter 6, we analyze the results of the experiments. Chapter 7 is for the project discussion, and in chapter 8, we present the conclusions and the future work.

(13)

Chapter 2

Background and Literature

Review

Online search is now ubiquitous and internet search engines such as Google and Bing let com-panies and individuals advertise based on search queries posed by users [29]. In this chapter we discuss the context of our project by a literature study and by presenting an overview of the related work and knowledge needed in order to build upon them. This chapter rests heavily on [26] and [35].

2.0.1

Early Models

Internet advertising started almost simultaneously with the inception of the Internet. The ad-vertisements that we used to see were only banner ads and these are graphical units that we see on web pages. Popular websites charged a certain amount of money for every thousand impressions of the ad, and this is called the CPM rate.1 This model of paying per impressions

was inspired from TV and magazine advertising which are priced based on the circulation of a magazine or the number of viewers of a TV show. That was a good model to start with, but it did not make use of many features that we can have on the web and are not available in TV and magazine advertising. For concreteness, these advertisements are untargeted, so the same advertisement is shown to everyone who comes to a website, and thus, they can be good in branding or to create awareness but they perform poorly in targeting specific users who need the advertiser’s product or service.

This model was shifted to demographics targeting that makes use of demographic informa-tion about the types of users who are likely to see a given web page. Although the model that uses demographic data is better than the initial one, in general these advertisements still do not perform well because they are broadly targeted.

2.0.2

Advertisers Metric of Success

One way the advertisers measure the performance of their advertising efforts is by looking at how many users who viewed the ad, actually clicked on it, in other words, they look at the click-through rate (CTR) which is the ratio between the number of clicks the ad receives and the number of impressions of that ad. It is important to point out that the impressions are what advertisers pay for, and the clicks are what they want. As a results, they measure the return on investment by looking at the ratio of clicks to impressions, and the untargeted early banner ads had very low click-through rates and very low return on investment for advertisers.

(14)

2.0.3

Performance-Based Advertising

The model of online advertising changed with the development of a new form of advertising called performance-based advertising which was introduced by a company called Overture which was a search engine that got acquired by Yahoo!. Overture innovation was allowing advertisers to bid on search keywords and when the user searches for that keyword then the ad of the highest bidder would be shown followed by the actual search results. Another important innovation that Overture introduced was charging advertisers only if the ad was clicked. In other words, advertisers do not pay for the impression but they pay only for the click and so this is called performance-based advertising or Cost per Click advertising to distinguish it from the impression-based or CPM advertising that preceded it.

There are many challenges and research around performance-based advertising that can fall into two categories; the first category addresses the challenges from the perspective of the search engine, and the second set of challenges come from the advertiser perspective [29].

2.1

Search Engine Perspective

The research presented in this thesis primarily addresses advertising from the perspective of advertisers. However, it is also important to consider the search engine point of view since the work of advertisers stems from search engines. For this reason, the challenges faced in adver-tising are presented from a search engine point of view in this section.

Advertisers favour Overture’s model of paying only for clicks compared to paying for impres-sions. Google which is another search engine that was just getting started on roughly the same time, adopted a very similar model to Overture’s around 2002 and they introduced their adver-tising platform Adwords. Google introduced some important changes to the Overture model in terms of how advertisers bid and what ads get shown. Adwords receives a set of keywords from each advertiser with their respective bids. It also receives a stream of search queries from the users, and the challenge is to select and show a few ads from many possible ads eligible to be shown for the same search query. It is worth mentioning that the goal of a search engine is to maximize its revenues by showing the appropriate set of ads and it needs an online al-gorithm. That means, the search engine can only see one query at a time and it must make an irrevocable decision of deciding which ad to show. It cannot go back and change the ad-vertisements it showed in the past nor does it know what queries are going to come in the future. Overture used a naive heuristic of sorting advertisers by bid and the ad of highest bid will be in position one. It turns out that this is not the best way, because the ads behave very differently in terms of how often they get clicked. Therefore, placing the ad of the highest bidder in position one is not the optimal algorithm of maximizing the search engine revenues. The contribution of Google Adwords was introducing the use of the average CTR of each ad in computing the expected revenue for each advertiser which is calculated by multiplying the bid and the ad’s CTR. In other words, it sorted advertisers by expected revenue rather than sorting them bids.

If the CTR of each ad is known and if advertisers have unlimited budgets then the simple algorithm of sorting advertisers by expected revenue is actually optimal, however in practice advertisers don’t have unlimited budgets and the CTR of an ad is unknown. The balance algorithm deals with the fact that advertisers have limited budgets, and for estimating the click-through rate of an ad, one can think of a very simple solution which is showing the ad a large number of times and calculate the CTR historically. Although this is the right way to do it but there are two challenges in this approach, the first is that the click-through rate is actually position-dependent as search engines may show more than one ad for a given query, so an ad that is shown in the first position generally gets more clicks than an ad that shown in the second position. Therefore, we have to measure the click-through rate of an ad for each position

(15)

and not just for one position. The second problem is addressing the explore vs exploit trade-off which is the dilemma of whether to keep showing ads that we already know their CTRs or showing new ads that we still need to know their CTRs. In other words, should we just exploit the known information of an ad or should we explore new ads that can have a better or worse CTR. This problem important and heavily studied [27].

It is worth noting that the current version of Adwords takes more parameters into consid-eration while sorting the advertisers like the relevance of keywords to the search query and the landing page experience as a search engine has to show relevant and useful ads in order for the users to keep using its services.

2.1.1

Google Adwords

Since we use Google Adwords in this project, it is very important to know how ad auctions work in this online advertising platform, since there are many factors that affect the visibility of each ad that participate in the auction as well as the paid cost per each click on the ad. Google Adwords allows the advertisers to show their ads on Google search results, the search terms which are several combinations of words types by users on Google page when they search for products or services.

All search ads on Google have the main structure of a header, two lines of description, the word ad in green to make the users aware that this is an ad, and a link to a web page as shown in figure 2.1. There are additional and optional components that the advertiser may put like site links, phone number, mobile application install link, etc.

Figure 2.1: An example of an ad. source: http://www.google.se, Search terms : precis digital, Date: May 30, 2017.

2.1.2

Search Advertising Terms

It is important throughout this thesis to know the most used definitions in order to understand the topics discussed in this project [36].

• Cost Per Click (CPC): The cost that the advertiser pays when an ad is clicked.

• Maximum CPC: The maximum cost the advertiser is willing to pay when the ad is clicked. • Average CPC: The average cost that advertiser pay per click. It is the ratio of the total

costs to the total number of clicks.

• Cost: The total amount of money the advertiser has spent on clicks. • Conversion Rate (CR): The ratio of conversion to the ad clicks.

• Cost Per Acquisition (CPA): The average cost that advertiser pay per conversion. It is the ratio of the total costs to the total number of conversions.

• Inverse of Cost Per Acquisition (ICPA): It is the ratio of the total number of conversions to the total costs.

(16)

Working on increasing conversions entails optimizing for impressions and also for clicks. In the conversions funnel which consists of three phases, impressions, clicks and conversions. Each phase of the funnel is narrower than the one before it. Consequently, the process of increasing the number of conversions entails increasing the number of relevant impressions which increase the number of relevant clicks, and that leads to more conversions.

2.1.3

Ad Auctions

In order to place an ad on a SERP, advertisers enter an auction that is carried out among all advertisers who bid for ads with keywords that matches the user’s query. There are many types of auctions but the standard one which is heavily used in literature is the generalized second-price auction (GSP) [25]. In GSP auctions, each advertiser pays a price equal to the bid value of the advertiser below them in the ranking.

Since we work with Adwords platform in this project, it is important to describe how Google auction (which is a variant for GSP) works. The motivation of the auctions is to reconcile the interests of three parties, namely the advertiser, the user and the search engine namely Google. Each party has a concern or motivation for participating in an auction. The advertiser wants to show relevant ads of their products or services so that users click on them and possibly make a conversion. The users do not want to be bothered with spam or other irrelevant ads, and Google wants to generate revenues and make a good experience for both the advertisers and the users so that they come back and use its services again in the future.

Each time a user makes a query on Google, the ads which are relevant to the search query (in terms of keywords similarity) participate in an ad auction. The auction determines whether the ad will be shown or not and in what position in the SERP will it be shown. The first step in the auction is that Google ignores the ineligible ads, like the ones that target a different location for the user or are disapproved. Then, Google calculates the ad rank for all the eligible ads, and the rank is calculated based on a combination of advertiser’s bid and ad quality score. Only those with a sufficiently high ad rank are shown in the SERP. The ad quality score is calculated based on the relevance of the keywords to the query text, the quality of the landing page, as well as the historic CTR of the ad. It is worth mentioning that an advertiser can get the top position even if they bid less than their competitors by using highly relevant keywords and quality ads. To improve the ad position, one can increase the bids for the ads and improve the quality of the ads as well as the landing page experience [32]. However, the advertiser can have fluctuations in the position that they get in different auctions for the same keyword, as the competition scene can vary from auction to another.

Now we know how Google runs an ad auction and sorts the advertisers, but what is the actual price that the advertiser pays when the ad is clicked? the answer is just enough to beat the competition. In other words, each advertiser can bid with the maximum amount that they are willing to pay but the actual cost per click is determined by dividing the ad rank of the competitor below them by their own ad quality score plus a small amount as $.01 just to beat the competition [33].

2.2

Advertiser Perspective

The challenge from an advertiser’s point of view is to understand and interact with the auction mechanism. The advertiser determines a set of keywords of their interest and then create ads, set bids for each keyword, and provide a total daily budget.

One of the challenges that the advertiser faces is the choice of keywords and this problem is related to the domain-knowledge of the advertiser, user behavior and different strategic con-siderations. Search engines provide the advertisers with information of the query traffic which can be useful for for optimizing the keyword choices. The choice of keywords is addressed in other papers [24]. Another major challenge is determining the bids for each keyword. This

(17)

problem in heavily studied in papers [29] in which the authors propose uniform bidding as a means for bid optimization in the presence of budget constraints in online ad auctions. However to provide a context for our work, we present a method proposed in [34].

In order to increase the conversions, the relative impressions and clicks have to increase. Indeed, it is possible to increase the clicks and even get less conversions if for example the campaigns, ad groups and keywords are not structured correctly. However, the model assumes that the account is already optimized and getting more clicks would necessarily results on more conver-sions as the clicks are relevant. Assuming the ad quality score is constant, changing the bid values would affect the position of the ads and that affects the number of clicks received which in turn affects the number of conversions. The model is formulated as a constrained integer programming problem. The constraints are the maximum CPC as well as the overall budget B the advertiser is willing to afford.

There are variables that need to be estimated before using the model which are the aver-age CTR and averaver-age CPC for every keyword (or keywords combinations) at every position. The model suggests to invest some money to experiments with the keywords in order to get these values or using keywords performance prediction from Google Adwords.

Let us define a set K that contains m keywords, and a set P that contains n available po-sitions. Then ∀i, i = 1, ..., m and ∀j, j = 1, ..., n

Clicksij = CT Rij∗ Impressioni

Costij= Clicksij∗ CP Cij

We define xij as a decision variable which either has the value 0 or 1. It represents whether a

keyword (or its combinations) will be assigned to a certain position or not. Maximize:X i∈K X j∈P xij∗ clicksij Subjected to:X i∈K X j∈P xij∗ costij≤ B Constraint : P i∈K P j∈Pxij∗ CP Cij P i∈K P j∈Pxij∗ clicksij ≤ maximum CPC Constraint : ∀i, i = 1, ..., mX j∈P xij ≤ 1 xij ∈ {0, 1}

It is important to note that this algorithm does not bid uniformly on keywords as it uses the specified maximum CPC specified by the advertiser, as a limit for bids on each keyword or combination of keywords. In addition, the existing algorithms that bid on keywords do not make use of different features about the search query that can help in determining the bid values in order to increase the conversions. In other words, these algorithms assume that a click that comes from a Tablets on Tuesdays has the same value as the click that comes from Mobiles on Fridays. This is the limitation that we address in this thesis using bid adjustments.

(18)

2.3

Bid Adjustments

The focus of this thesis is adjusting the already calculated bids for the selected keywords using the unique set of dimensions of each individual search query like geographic location, time of day, device, etc. Bid adjustments allow the advertisers to show their ads more or less frequently based on where, when, and how people search. For instance, a click can sometimes be worth more if it comes from a mobile, at a certain time of day, and from a specific location. The advertiser can adjust their bids with percentages anywhere between -90% and 900% for each of these dimensions based on how the click is valued. The bidding dimensions supported by the major search engines include week day, time of day, geographic location, device and keyword targeting. In this project, we define the term “groups” as the possible values that a given “dimension” can take. For example the groups for the device dimension are desktops, mobiles and tablets, and the groups for the weekday dimension are the seven days of the week. There are two main limitations in bid adjustments that we need to point out. First, bid adjustment are not set on the keywords level although the bid values are set on that level. To be concrete, we can not increase the bids of a specific keyword 10% for a certain location. In-stead, we can just increase the bids for the whole campaign that includes this keyword. Hence, in our work, we analyze the performance of the a campaign in order to calculate, evaluate and test the bid adjustments. In addition, bid adjustments do not allow specifying valuation on combinations of features. For example, if an advertiser found that mobile searches were 30% more valuable than desktop searches in Gothenburg, but only 15% more valuable in Stockholm, then this would not be expressible in the language of bid adjustments. Such limitations are inevitable, as the space of possible combinations of these features is very large. This problem of multiplicative bidding is addressed in the paper [28] and the authors formulated the problem as a Knapsack problem. In chapter 4 we discuss another formulation of the problem that has a different setting. It is worth noting that the source of these bid adjustments limitations is Google Adwords.

(19)

Chapter 3

Campaigns and Dimensions

Selection Model

In this chapter, we discuss our model for choosing the campaigns and dimensions groups for bid adjustments. The model relies on the law of diminishing returns and statistical modeling.

3.1

Motivation

There are two main motivations of selecting the campaigns and dimensions groups for bid ad-justments. The first motivation is that for many Adwords campaigns, we can see a difference between dimension groups performances that actually might be due to chance and not statis-tically significant. In other words, the bid adjustments would be calculated based on noise that worsen the current performance, and may result in a loss in advertising investments. To avoid that, we compare between the groups, and if there is a significant difference between them we modify or adjust the already existing bids to account for these value differences. It is not a surprise to see the recommendation in Adwords guides of using bid adjustments if one dimension is performing significantly better or worse than another dimensions [37]. The second motivation is that in many cases, there is a statistical difference between the dimensions groups. However if the groups that perform well are actually at their best performance, then bidding more would not be helpful and actually can worsen the overall campaign performance as we will see in section 3.3.

3.2

Metric of Optimization

Before getting into the details of how to select the campaigns and also calculate the bid ad-justments, we have to select a metric in order to compare between the dimensions groups performances. A metric can be, for example, CTR, CR, ROAS or CPA. In the setting of this thesis, and as discussed in section 1.2, we want to significantly increase the conversions given the same costs so the metric of comparison should be a ratio between the value and the cost, and reason is we might get many conversions for instance on a specific day, however the cost for these conversions can also be high (for example due to competition). In this case, we might decrease the bids for this day, and increase them in another day which has cheap conversions. Although we want to increase conversions, it is even more desirable to increase the total con-version value given the same costs. Consequently, the two metrics of interest are either CPA or ROAS. For some businesses, it does not differ to use CPA or ROAS since the conversions values have low variance. However, for other businesses, ROAS is trickier to use since the conversion value can vary a lot for each conversion which makes it more vulnerable to outliers. In this thesis, we use CPA as it is more robust to outliers than ROAS.

(20)

that does not have any conversion is division by zero. To avoid that, we use the metric ICPA which is the number conversions divided by cost which we want to maximize it. Note that the number of conversions reported from Adwords is assumed to be clean and final because the actual number of conversions can be different from reported one because of several potential situations including but not limited to order returns, order changes via phone and cancellations.

3.3

Diminishing Returns Law

One of the fundamental principles in economics is the law of diminishing returns. In our setting depicted in figure 3.1, It is the decrease in the marginal or incremental number of conversions as the advertising costs are incrementally increased while other factors that affect the conversions stay constant. There can be many reasons for the diminishing returns that depend on the industry. In the case of search advertising, one reason can be the average position of our ads in the search engine results page is already 1 (top position), also we may get an impression for every auction that we participate in (we can know that form a metric called impression share). Therefore, increasing the costs would not be useful, as there is nothing that we can gain with increasing the costs.

Figure 3.1: Example of theoretical diminishing return curves for three devices. We can see that the incremental conversions decrease as we incrementally increase the cost.

In Google Adwords, we can know the average position of our ads in a specific campaign as well as the average impression share lost due to rank. If the average position is less than 1.5 and the lost impression share is more than 20%, we assume that there is a room of improvement and bid adjustments would be useful. On the other hand, if the average position of a campaign is already in top positions (less than 1.5) and the lost impression share due to rank is small (less than 20%) we avoid adjusting for these campaigns as there would be diminishing return. There is just a caveat here, which is the metric of average position and lost impression share returned from Adwords are not reliable or in other words, they are not calculated in the way we want. For concreteness, let us say we have a campaigns that include three keywords of average positions 1, 1 and 3. In Adwords, the average position calculated for the campaign will be just the average (2.5), regardless of the traffic of each keyword. As a results, if 99% of a campaign traffic and conversions is captured by just one keyword of average position 1, in that case, the campaign average position is still 2.5, although it should be almost 1.

To avoid that, we may calculate the actual average position of a campaign by weighting each keyword average position with the number of impression that a keyword has. However, we

(21)

believe it is better to weight the average position (and also the impression share lost due to rank) using the number of conversions of each keyword as in equation 3.1 in order to capture the case of having two keywords, one with more impressions with less conversions and another keyword with less impressions and more conversions.

campaign avgposition=

P

k∈keywordspositionk∗ convk

P

k∈keywordsconvk

(3.1) Although in this project we are weighting the metrics by the keywords number of conversions, it is important to point out that if a campaign has small number of conversions, then weighting with the conversions can be easily affected by outliers. In that case it is better to weight average position using a more robust metric like the number of clicks for each keyword.

3.4

Hypothesis Testing

Hypothesis testing is often used to determine whether the observed data are true in general, i.e., whether the data obtained from a sample can be generalized to the whole population. Let us say that for the devices dimension, we found that the ICPA of mobiles is better than the ICPA of desktops and tablets. Hypothesis testing determines if this is true in general and thus we can adjust the bids based on this fact. In hypothesis testing, null hypothesis is the assumption that there is no significant difference between the groups of interest i.e, no difference between the ICPA of the three devices. We also have the alternative hypothesis which is having significant difference. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null hypothesis can be proven false, given the data that are used in the test.

Null hypothesis Accepted Rejected True Correct decision Type I error False Type II error Correct decision Table 3.1: Statistical errors related to the null hypothesis.

In Hypothesis testing, there are two types of errors that we can get, Type I errors which are null hypothesis is falsely rejected giving a false positive and Type II errors which are null hypothesis fails to be rejected given the fact of actual difference between populations and thus giving a false negative. All statistical hypothesis tests have a probability of making type I and type II errors. A test’s probability of making a type I error is denoted by α. A test’s probability of making a type II error is denoted by β. These error types are always traded off against each other, as the effort to reduce one type of error generally results in increasing the other type of error. We have a number of choices related to the null hypothesis as in table 3.1. Obviously, the null hypothesis can be either true or false. Additionally, we can choose to accept or reject the null hypothesis. This results in four potential decisions, two of which are correct and two of which are incorrect.

Condition Greek symbol Meaning Controlled using Type I error α False positive Significance level Type II error β False negative Statistical power

Table 3.2: Interpretation and control of statistical errors.

Indeed, we prefer not to commit either type I or type II errors, but it is important to point out that the p value is directly related only to type I error, For concreteness, the p value is simply the probability (denoted α ) of committing type I error in a given test. As a result, when we state that the results are significant (p < 0.05), we are saying that we are potentially committing type I error less than 5% of the time. However, in order to decrease the probability of committing type II, we must design our test with sufficient statistical power as in table 3.2.

(22)

3.5

Analysis of Variance (ANOVA)

For devices, we have the ICPA of three groups, which are Desktops, Mobiles, Tablets as shown in fig 3.2. Our goal is determine if the average daily ICPA of any group is significantly different from another group, so that we can adjust the bids for these groups i.e increase the bids more for the group with significantly higher ICPA and decrease them otherwise.

Figure 3.2: The daily device ICPA for one Adwords campaign.

One-way ANOVA is a technique that is useful in this case, as it is used to compare the means of three or more groups. It can be used only for numerical data. It tests the null hypothesis that states that the average ICPAs for each group are equal.

H0: ICP Adesktop−avg= ICP Atablet−avg = ICP Amobile−avg

To do this, We calculate a test statistic called F-statistic which is the ratio between mean square error between the groups and the mean square error within the groups.

F = V ariance between groups V ariance within groups

If the group means are drawn from populations with the same mean values, the variance be-tween the group means should be lower than the variance of the samples [38]. A higher ratio therefore implies that the samples were drawn from populations with different mean values. The results of a one-way ANOVA can be considered reliable as long as the following assumptions are met [39]:

• Independence of observations • Normality of the residuals • Homogeneity of variance

3.5.1

Assumptions Validation

Independence of observations: It is an assumption that simplifies the statistical analysis. In probability theory, two random variables are statistically independent if the occurrence of one does not affect the probability of occurrence of other. In our analysis, each observation is the ICPA of a certain group in a certain day, and we assume that each observation is independent from the other.

(23)

Normality assumption: The second assumption is the normality of the residuals. The first way to check for normality is using the Q−Q plots (”Q” stands for quantile). It is a graphical method for comparing two probability distributions by plotting their quantiles against each other. For example, if we have n data points and we want to determine if they can be as-sumed to be sampled from a certain distribution, we sort these points and plot them against the appropriate quantiles from the distribution of interest (Normal distribution in this case) [14]. If the the plotted points are along a line then we can assume that they sampled from this distribution as in figure 3.3

Figure 3.3: Q−Q plot of the normally distributed residuals. This visualization is one method that can be used in order to verify the normality of residuals. However, it is difficult to be automated and it relies on the opinion of the viewer, i.e. it is a subjective method.

Although Q−Q plot is a powerful method to check for normality, it has two disadvantages, the first one is that it is hard to automate, meaning it is difficult to write an algorithm that examine the plot and decide whether the data can be assumed to be normal. Moreover, the decision is subjective, since two analysts can examine the same plot and one decides that the data can be assumed to be normal and the other does not. To overcome these problems, one can rely on Shapiro-Wilk test which is a test of normality [15]. Its null-hypothesis is that the population is normally distributed. Thus, if the p-value is less than the chosen significance level, then the null hypothesis is rejected and there is evidence that the data are not from a normally distributed population. On the contrary, if the p-value is greater than α, then it is not rejected.

Homogeneity of variance assumption: The assumption states that all comparison groups have the same residual variance. To test for homogeneity of variance, we can rely on Bartlett’s test, however, it is sensitive to normality departures [19].

Levene’s test is an alternative to Bartlett test that is less sensitive to non-normality. It as-sesses the null hypothesis of homogeneity of variance. If the resulting p-value of the test is less than some significance level, the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances. Thus, the null hypothesis of equal variances is rejected and it is concluded that there is a difference between the variances in the population.

(24)

3.5.2

Post-hoc test

ANOVA determines if there is a difference between the groups, but it does not detect the ex-act groups that are different from each others. That is why it is important to follow ANOVA with a post-hoc test that does this task. There are two main ways to detect the groups with significance difference. The first way is multiple t-tests between the group combinations, and in this case our type I error will increase as every additional test has its own type I error which will aggregate on multiple testing. One solution for this, is to use Bonferroni correction, which is using a significance level of original alpha divided by the number of tests [22].

Another way is to use the Tukey method which is a one step multiple comparison method. It finds means that are significantly different from each other. It compares all possible pairs of means, and is based on a studentized range distribution [23]. In this project,we use the Tukey method because it is just one step and thus it faster in execution, especially in the case of the weekdays dimensions for which the Bonferroni correction method would need 21 individual tests since we have 21 weekdays combinations 72.

3.6

Chi-Squared Test for Independence

3.6.1

Motivation

Using ANOVA was to test the null hypothesis of having equal means of the groups daily ICPA values. However, working on the calculations of bid adjustments (discussed in the next chap-ter), we found that it is better to work using the total ICPA of each group to represent its performance instead of using the daily ICPA. As a result, ANOVA is not relevant anymore, because we do not have groups averages that we want to test if they are equal. Instead, we have total costs and total conversions for every group, and we want to test if the total ICPA values are dependent on the dimensions groups, and that is the reason for the shift to the Chi-squared test of independence. This section rests on [40].

There are two types of Chi square tests, namely the goodness-of-fit test, and the test of in-dependence. They are both closely related, but we are interested in the Chi-Squared test of independence, because we want to determine if the there are dimensions groups that affect the ICPA ratio between costs and conversions. The null hypothesis of the Chi-squared test of independence in our case is that the dimension of interest and the ICPA are independent. The alternative hypothesis is being not independent, in other words, there is an effect of a specific dimension on costs and conversions.

3.6.2

Assumptions

There are three main assumptions about the data that we have to meet in order to have re-liable results from the Chi-squared test. First, the sampling of the data should be random. This assumption is already met, since we are not sampling the data or in other words, we are using all the conversions and costs during the time period of interest. Second, the variables data that we are testing should be counts or frequencies that are mutually exclusive and have a total probability equals to one. In our case, any conversion and its cost is only attributed to one dimension group. For example, we can not have a conversion that is counted for Mondays and Thursdays at the same time. Third, the expected frequency for each variable is at least 5, and in this project we make sure that our data also meets this assumption.

3.6.3

Formulation

The first step to use the test is to build the contingency tables for both observed and expected values, and we calculate the deviations between the expected and the observed values. These deviations are scaled based on the expected values. The chi-square statistic which is calculated

(25)

as below is one measure of these deviations. χ2= X

i∈cells

(observedi− expectedi)2

expectedi

The chi-square statistic is based on the Chi-squared distribution which is a non-negative and asymmetric distribution. It is skewed to the right, and has a family of distributions based on the number of degrees of freedom as shown in figure 3.4. The statistic helps us answer the question of whether what we are observing is random or is it unlikely to be random. The Chi-squared table is then used to calculate the p-value using the statistic and degrees of freedom which is (in our case) the number of dimensions groups minus one. The p-value determines the probability that this deviation is due to chance. If that probability is below the significance level, we deduce that it can’t be due to chance, and there must be an effect.

Figure 3.4: The theoretical Chi-squared distribution for different degrees of freedom. Figure is adapted from [40].

If we get a statistical significance of dependence, then we need to identify which groups con-tributed the most to this significance. To do that, we identify the cells with the largest residuals [17]. A residual is the difference between the observed and expected values for a cell. The larger the residual, the greater the contribution of the cell to the magnitude of the resulting Chi-square obtained value. As stated in [16], “a cell-by-cell comparison of observed and estimated expected frequencies helps us to better understand the nature of the evidence” and the cells with large residuals “show a greater discrepancy than we would expect if the variables were truly inde-pendent” (p. 38).

(26)

Chapter 4

Adjustments Calculations and

Evaluation

This chapter intends to present the core model of calculating the adjustments. Different meth-ods are investigated and the reasoning behind each method is discussed. In addition, we propose an evaluation procedure to validate these methods and select the best one for a real-time ex-periment. We present our cost weighting technique for controlling the costs after setting the adjustments. Finally we discuss and propose a solution for the adjustments interactions prob-lem. It is worth mentioning that, for the sake of simplicity, the examples and visualizations are for the devices. However, the concepts are applied to different dimensions.

4.1

Marginal ICPA Method

The marginal ICPA method relies on the law of the diminishing return phenomenon discussed in section 3.3. It increases the bid for a dimension group that has a marginal ICPA larger than the dimension average. Conversely, it decreases a bid in the case of having a marginal lower than the average. These adjustments intend to equalize the marginal ICPA for all the groups. Mathematically, the marginal is the slope of the diminishing return curve at a certain cost. However, if we plot the daily conversions for a certain campaign against the daily costs as in figure 4.1, we hardly see the smooth theoretically diminishing return curve introduced before. To overcome this problem, we use linear regression to fit a line for each device costs and conversions daily observations, and we assume that the slope of this line can represent marginal ICPA. The calculation of the bid adjustment of each group is then a straightforward task, as it is the ratio between marginal of a specific group and the average marginal of all groups.

bid adjustmentgroup=

marginal ICP Agroup

marginal ICP Aaverage

Note that the average here does not mean calculating the marginal of each device and then get-ting the average, but rather means that we fit a line using linear regression for the observations of all devices and the slope of this line is the average ICPA.

(27)

Figure 4.1: Marginal ICPAs for the three devices. For the duration of four weeks, each point in this graph represents the conversions and costs for a specific device in one day.

Using our evaluation procedure, we get poor results using the marginal ICPA method. The reason for this, is the highly dynamic market of search advertising due to several factors like competition changes, daily traffic changes and seasonality, for example. It is important to point out that the campaign performance is the aggregate performance of all the keywords in the campaign, consequently, these factors have different effects on every keyword and thus we get unstable values of the marginal. That means, these marginal ICPA values are highly depen-dent on the time interval used for the calculations. For instance the bid adjustment calculated for desktops in the first two weeks of a month can yield a total different adjustment calculated using the last two weeks of the same month.

4.2

Constrained Linear Regression Method

Constrained linear regression method is based on the fact that with zero costs, the number of conversions is zero. In the previous method we calculate the adjustments based on the daily observations of costs and conversion. We discussed why these observations are not robust enough to base our adjustments on. As every observation can vary for the same cost and the same device, for instance, we can pay 2000 SEK in one day and receives x amount of conversions and in another day we can receive 2x (due to market changes). On contrast, for zero costs, we know that the total conversion value is zero. As a result, we base our adjustments on fitting a line taking this fact into consideration and a constrained least-squares regression is performed in order to get a line that minimizes the error and passes through the origin as in figure 4.2.

(28)

Figure 4.2: Constrained regression through the origin for the three devices. For the duration of four weeks, each point in this graph represents the conversions and costs for a specific device in one day.

The fitted line is not representing the marginal ICPA anymore, but rather it represents an average slope for the diminishing curve. In this method, the adjustments are calculated in a way similar to the previous one, but using the slope of the constrained fitted line instead of the marginal ICPA. If N is the number of observations, and β is the slope of the fitted line, then:

β = P

i∈N∗costi∗ convi

P

i∈Ncost 2 i

The model is: convi= β ∗ costi

We want to minimize S =P

i∈N(convi− β ∗ costi)2

∂S ∂β = −

X

i∈N

2costi∗ (convi− β ∗ costi)

To minimize the error, we equalize the derivative by zero to obtain the optimum β, and thus: −X

i∈N

2costi(convi− βcosti) = 0

X i∈N costi∗ convi = β ∗ X i∈N cost2i β = P

i∈Ncosti∗ convi

P

i∈Ncost 2 i

Thus, the bid adjustment for every group is:

bid adjustmentgroup=

βgroup

βall

The calculated adjustments using this method improved dramatically compared to the marginal ICPA method, as the adjustments values are more robust to market changes due to the

(29)

con-strain of passing through the origin as shown in table 4.1.

4.3

Average of Slopes Method

In the constrained linear regression method, there is a possibility of having the observations with large costs dominating the regression procedure, since these observations could have larger error values and thus affect the calculations of the coefficients that minimizes squared error. One solution to this problem is to scale the error or the penalty so that it varies with the x-values (cost) in order to have roughly a constant relative error. To achieve that, weighted least squares could be used instead of the ordinary unweighted least squares that is used in 4.2. There are many ways we can scale the penalty, and one way to have a constant relative error is to set our weighting factor wi for every observation to be 1/cost2i. Doing so, will make the

solution reduced to using the average of all the slopes through the origin. In other words the slope of the fitted line will be average ICPA of the daily observations.

4.3.1

Formulation

If N is the number of daily observations, and βweighted is the slope of the fitted line, with a

weighted penalty wi then the model is:

convi = βweighted∗ costi

We want to minimize S =P

i∈Nwi(convi− βweighted∗ costi) 2

∂S ∂βweighted

= −X

i∈N

2costi∗ wi(convi− βweighted∗ costi)

To minimize the error, we equalize the derivative by zero to obtain the optimum βweighted, and

thus:

−X

i∈N

2costi∗ wi(convi− βweightedcosti) = 0

X

i∈N

wi∗ costi∗ convi= βweighted∗

X

i∈N

wi∗ cost2i

βweighted =

P

i∈Nwi∗ costi∗ convi

P

i∈Nwi∗ cost2i

If we set our weighting factor to be 1/cost2i, that yields the average of slopes solution.

βweighted= P i∈N convi costi N Consequently, the bid adjustment for every group is:

bid adjustmentgroup=

βweighted (group)

βweighted (all)

4.4

Total ICPA Method

The total ICPA method takes a different approach, as it works using the total or the aggregate ICPA of each dimension group in the training interval unlike the previous methods that uses the daily ICPA observations.

total ICP Agroup=

total conversionsgroup

total costgroup

(30)

and the bid adjustments are calculated using total ICPA which is using all conversions and costs

bid adjustmentgroup=

total ICP Agroup

total ICP Aall groups

(4.2)

Using the evaluation procedure described in 4.6, we find that the total ICPA method yields a better results compared to the first three methods as shown in table 4.1. There are two reasons for that, the first being that the first three methods calculates the bid adjustment based on an estimation of the daily ICPA, in other words, they predict the daily ICPA. However we are interested in using the total ICPA of a certain dimension group for the whole time interval of setting the adjustments. Second reason, is that this method is more robust against outliers, since it works on the aggregate performance during the training interval, unlike other meth-ods that are affected if there is an abnormal behaviour in one or more days in the training period.

4.5

Cost Weighting

The goal of the project is to increase the conversions given the same costs, After calculating the adjustments using any of the previous techniques, there is a risk of overspending which is increasing the costs when we bid more for a certain group. This risk can be mitigated using the cost weighing technique. In this technique we modify the calculated bid adjustment for a certain group using the costs that we paid for this group in the training period. It is based on the assumption that the cost that we will pay for every dimension group changes linearly based on the adjustment of this group as in 4.6

new costgroup= adjustmentgroup∗ training period costgroup (4.3)

In that sense, we calculate a weighing factor αwbased on the calculated adjustments as well as

the training period cost of each group.

αw∗ adjgroup1∗ costgroup1+ αw∗ adjgroup2∗ costgroup2+ .... = costgroup1+ costgroup2+ ... (4.4)

αw=

costgroup1+ costgroup2+ ...

adjgroup1∗ costgroup1+ adjgroup2∗ costgroup2+ ....

(4.5) The weighted adjustment for any group is just the calculated adjustment multiplied by αw

weighted adjustmentgroup= αw∗ adjustmentgroup (4.6)

4.6

Evaluation Procedure

The best way to evaluate our bid adjustments is to run a real time experiment, to make sure that the adjustments actually give better results. Before doing so, it is important to find a way to evaluate the adjustments in order to try out different techniques and methods without real-time experiments as those can be very costly.

4.6.1

Traditional Evaluation Methods

In the data science community, there are several ways to evaluate the developed models using historic data. In this section we discuss the main two evaluation techniques and then we develop our evaluation procedure that fits our research problem.

(31)

The first technique for evaluation is bootstrapping, and this technique helps in many situa-tions like the validation of a predictive model performance which is our goal1. It works by sampling with replacement from the original data, and we take the “not chosen” data points as test cases. We do this several times and calculate the average score as estimation of our model performance. Another well-known evaluation technique is cross-validation, which is a technique for validating and evaluating the model performance. It is done by splitting the training data set into k parts. We take k − 1 parts as our training set and use the “held out” part as our test set. We repeat that k times differently (we hold out different part every time). Finally we take the average of the k scores as our performance estimation.

4.6.2

Proposed Evaluation Method

Our evaluation procedure entails getting historic data of the campaign performance for every dimension of interest like device and week day in the training interval. We split the collected data into training and testing sets. The training data is used to calculate the adjustments using any method of the presented ones, and the testing period is then used to evaluate how good our estimates for the adjustments are. In the testing period, we predict the number of conversions for every dimension group given the costs that we paid for this group and given the total ICPA (in the testing period). The baseline that the results are compared against is the default model of not using any bid adjustment for any group. It assumes that every dimension group has the same value and hence it has an ICPA equals to the total ICPA (of all groups). Since conversions = ICP A ∗ cost, and the number on conversions for each group in the testing period is the ground truth, then the estimation of the conversions using the baseline model is:

estimated conversionsg= ICP Atotal∗ costg

And the baseline error is calculated as follows: errorbaseline=

X

g∈groups

|costg∗ ICP Atotal− actual conversionsg| (4.7)

On the other hand, the bid adjustments model assumes that every dimension group has a dif-ferent value which is calculated by multiplying the total ICPA with the value of the adjustment of every group. Hence, the estimation of the conversions using the adjustments model is:

estimated conversionsg= ICP Atotal∗ adjustmentsg∗ costg

And the adjustments error is calculated as follows: erroradjustments=

X

g∈groups

|costg∗ ICP Atotal∗ adjustmentg− actual conversionsg| (4.8)

In the traditional supervised machine learning techniques, and more specifically, in regression models that we use to predict a real number or an integer like conversions number, the indepen-dent variables or features are used to predict the depenindepen-dent variable of each data point after the training phase. The quality of the regression model predictions can then be evaluated by different metrics and using the data points in the testing data set. Our evaluation technique is different from the traditional ones because it works with aggregate values. In other words, the testing data does not consist of granular data points for which we want to predict a label. Instead, the costs, total ICPA, and the calculated bid adjustments are used to predict the con-versions of each dimension group as in equation 4.8. Moreover, In the general case of machine learning, more data is more preferable as that makes predictive models better [41]. However

1Bootstrapping has other uses in machine learning as in the ensemble methods. For instance, we may build

a predictive model like a decision tree using each bootstrap and aggregate these models in an ensemble like Random Forest. The prediction is done by a majority voting for all of the bagged models.

References

Related documents

To study leaf senescence I used the model plant Arabidopsis thaliana and applied different experimental approaches: Developmental Leaf Senescence (DLS), individual darkened leaves

For example, if the goal of a manager is to increase the number of visitors to the website, this can create motivation to put in the effort to enhance the website’s ranking

Accordingly, this paper aims to investigate how three companies operating in the food industry; Max Hamburgare, Innocent and Saltå Kvarn, work with CSR and how this work has

see also Zlatev, 2009, p. is is how the term is used in most research on child gesture and children’s communicative develop- ment. 113) writes: “current theoretical debates

In addition, a component of the core chloroplast protein import machinery, Toc75, was also indicated for involvement in outer envelope membrane insertion

All recipes were tested by about 200 children in a project called the Children's best table where children aged 6-12 years worked with food as a theme to increase knowledge

The purpose of our research is to identify how professional certified auditors manage in practice the changes that the IFRS and IAS standards have brought

United Nations, Convention on the Rights of Persons with Disabilities, 13 December 2006 United Nations, International Covenant on Civil and Political Rights, 16 December 1966