• No results found

A systematic review and meta-analysis of age discrimination in recruitment

N/A
N/A
Protected

Academic year: 2022

Share "A systematic review and meta-analysis of age discrimination in recruitment"

Copied!
56
0
0

Loading.... (view fulltext now)

Full text

(1)

A systematic review and meta-analysis of age discrimination in recruitment.

Lucija Batinović Marlon Howe

Author: Batinović, L., Howe, M.

Supervisor: Dr. Rickard Carlsson Examiner: Dr. Jens Agerström Term: Spring 2021

Subject: Psychology Level: Master thesis Course code: 5PS22E

(2)

Author’s Note

This study, along with the planned analyses, sample characteristics and research questions, was preregistered on OSF. All gathered data and conducted analyses are openly available at:

https://osf.io/puqb5/

Abstract

Correspondence and vignette experiments have already been an important part of measuring discrimination in hiring decisions for several decades, especially in terms of ethnic discrimination.

Although the body of evidence is growing, no study has provided a systematic overview of age

discrimination in recruitment before. Therefore, the present systematic review investigates the effect of age on discrimination levels experienced in the recruitment process, based on 14 correspondence and vignette studies in 12 distinct articles conducted between 2010 and 2019. We assess age discrimination by looking at call-back rates or indicators of hiring/interview invitation likelihood. Data was analyzed in age groups entailing 30- to 35-year-olds as comparators, and 40- to 49-, 50- to 59-, 60- to 65- and over 65-year-olds as experimental groups. Calculating log odds ratios for the respective comparisons, it was concluded that age discrimination in recruitment is indeed an observable issue, with greatest disparities apparent for participants over the age of 60. Certain limitations of this review will have to be overcome in future; such as restrictions in sample sizes and reported issues on the risk of bias.

Keywords: Age discrimination, recruitment, meta-analysis, systematic review

(3)

Introduction

Considering the current global demand for workforce, age discrimination in recruitment could pose a serious problem for both economic and psychological reasons. Currently, there has been no systematic approach to determine the effect of age discrimination in hiring processes yet. Therefore, this systematic review aims at collecting all available evidence about age discrimination in hiring and inspect it more thoroughly via meta-analyses. First, we will introduce the reader to the current state of research and theory, then present our methodological approach and finally set forth our results and discuss these. We will also dedicate our discussion to the limitations of this review.

Globally, the second half of the twentieth century was marked by an unparalleled rise of life expectancy at birth from 46.5 years at the beginning of the 1950s to 65.4 years at the end of the 1990s (United Nations, 1999). The increasing number of retirees and a decreasing amount of active workforce contributing to social security systems lead to cutbacks in the provision of pensions internationally (Galasso & Profeta, 2004). Population aging and its financial demands on these social security systems ask for new political and economic directives to be considered. In an attempt to prevent the collapse of these systems, many countries aim at incentivizing the extension of work life past the retirement age (Neumark, Burn & Button, 2017). However, current research suggests that senior workforce is among the most targeted group in respect of age discrimination in the recruitment process. In a review based on the US-American Current Population Survey (CPS) and Panel Study of Income Dynamics (PSID), Rupert and Zanella (2015) find positive relationships between the age and unemployment duration of their samples. According to their study, the average unemployment duration of a 35-year-old,

unemployed male person in the United States is approximately 260 days whereas an unemployed male person of 55 years has already been unemployed for approximately 300 days at the time of

(4)

measurement. Carlsson and Eriksson (2019) argue that the motives for age discrimination are often explicit in nature. The researchers conducted a survey on employers’ attitudes on skill levels by age.

Many of the participants believed that from the age of 40, work task flexibility, adaptability and ambition sharply declined which could be interpreted as a basis for discriminatory recruiting behavior.

Age stereotypes and Older Workers

The noticed increase in disfavoring older workers in personnel decision making has instigated finding possible explanations of this behavior based on stereotypes. Age stereotypes, i.e., cognitive categories or schemas used to evaluate others based on their age (Ng & Feldman, 2012), have been researched in regard to the organizational behavior toward older workers, with researchers focusing primarily on negative stereotypes. Older workers have been operationalized as workers older than 40 (Ng & Feldman, 2008). Posthuma & Campion (2007) listed common age stereotypes against older workers reported in the studies, e.g., generally lower performance (lower motivation, lower ability to learn), resistance to change (lower flexibility), shorter tenure, and higher cost. They also report positive stereotypes about older workers, e.g., stability, trustworthiness, and loyalty. They also provide findings which refute stated negative stereotypes. Furthermore, Ng & Feldman (2012) conducted a systematic review and a meta-analysis of six common age stereotypes which suggest that older workers are: less motivated, more resistant and less willing to change, less willing to participate in training and further develop their career, less trusting, less healthy, and are more vulnerable to work–family imbalance.

They found the only stereotype being consistent with the available empirical evidence was older workers being less willing to participate in the training and career developing activities. The absence of evidence supporting other stereotypes raises concern for the HR management and unjust organizational behavior toward older workers. Dordoni & Argentero (2015) conducted a systematic review of literature

(5)

on the older workers’ stereotypes and found that age stereotypes are prevalent in the workplace, and they affect older workers’ well-being as well as employment. Furthermore, they discussed other moderators involved in these relationships, such as the age of employers or job tasks.

Types of Discrimination

Within the discrimination literature in general, there are particular forms of discrimination to be differentiated. The most general distinction can be found between the concept of direct and indirect discrimination. According to Doyle (2007), direct discrimination on the one hand occurs whenever two parties are deliberately and actively distinguished between and the grounds on which this

differentiation rests on are considered illegal by law. In the US-American context specifically, this construct is defined as “disparate treatment” (Zafar et al., 2017). Indirect discrimination, on the other hand, does not require active differentiation, but instead occurs when a party’s actions indirectly result in unequal outcomes for two or several distinct groups of people. In US-American law, this legal doctrine is named “disparate impact” and has developed to be the predominant judicial paradigm determining unintended cases of discrimination in the United States (Feldman et al., 2015). Doyle (2007) notes that the concept of indirect discrimination or disparate impact enables anti-discrimination laws to apply even when the discrimination does not base on a directly observable case of discriminatory behavior.

However, there are two further, widely acknowledged forms of discrimination in the current literature.

As described by Bohren et al. (2019), especially in the economic field, discrimination is also grouped as either statistical or taste-based. Statistical discrimination emerges whenever two or more distinct groups are correctly perceived as unequally productive due to exogenous reasons and therefore the disfavored group experiences a handful of disadvantages as a reaction to the perceived lack of

productivity. However, the authors denote that oftentimes, there is no objective, empirical evidence of lower productivity of one specific group and thus also bring in the concept of inaccurate statistical

(6)

discrimination. On the other hand, taste-based discrimination describes discrimination emerging from a personal distaste to individuals belonging to a certain group, which then leads to avoidance or punishing behavior towards that particular group. The authors conclude that both of these forms can be heavily influenced by inaccurate beliefs about certain groups. Carlsson and Eriksson (2019) discuss the role of age stereotypes and beliefs in the context of hiring discrimination, concluding that older workers are often perceived as less ambitious or flexible, which both is not always necessarily observable

empirically. It is therefore assumed that overall, statistical and taste-based discrimination play a significant role in the context of age discrimination.

Measuring Discrimination

In the scientific literature, there are various types of studies that can be identified in the context of age discrimination or discrimination in general. Frequently used study types in this area are, among others, audit studies (e.g., Farber, Silverman and von Wachter, 2017) or more traditional correlational studies (e.g., Han & Richardson, 2015). Audit studies in their current form made their scientific debut in the 1960s and make use of real-life job applicants who are supposed to be maximally equal to one another (Gaddis, 2018). Only one distinctive trait (such as ethnicity) is then implemented as the independent variable. The dependent variable for these types of studies is then the discriminatory behavior of the recruiters. However, it has been widely acknowledged that, with the exception of the observed dependent variable, audit studies often fail to make its applicants appear identical to the recruiter (Neumark, 2012). Therefore, this review is going to consider a more recent form of

correspondence testing in which applicants are evaluated only based on their resumes. This method, which is often only referred to as a correspondence method, bears the advantage of removing all confounders possibly emerging from in vivo applicant screenings (Neumark, 2012). The second type of study that will be considered in this review is the vignette study, in some cases also referred to as

(7)

scenario study (or factorial survey). Vignette studies are often positively acclaimed for their realistic properties, especially in comparison to question-based, correlational studies. Some of the advantages of vignette studies as brought forth by Hyman and Steiner (1996) are that they offer a broader range of situational or contextual factors, provide standardized stimuli and hence raise internal validity, improve construct validity and enable a more critical distinction of the disparities between ethical principles and the eventual ethical behavior.

Overall, studies in these two formats show similar patterns in the way they examine age discrimination. The most recent literary findings ask researchers to find a suitable baseline to compare discrimination targets to. As stated by Carlsson and Eriksson (2019), there is an observable decline in hiring-positive recruiter decisions from the age of 40, and a noticeable influencing effect of work experience in hiring decisions with growing age. However, other authors such as Lee et al. (2015) argue that younger age might also contribute to a deflation of hiring chances, especially if the specific position requires creativity and/or asks the decision maker to cooperate with the potential new employee in the future. In our literature search, most authors compromised with these findings and chose to use applicants around the age of 30 years as a comparator. Although the discrimination of younger

applicants has also been addressed in some cases, the majority of studies focusses on age discrimination towards older people. While there are some exceptions with intervention groups in the late 30s (e.g. van Borm et al., 2021), the vast majority of studies revolve around discrimination target groups starting at around age 50 (e.g. Fasbender & Wang, 2017; Neumark, 2019).

The main aim of this review is to give answers as to how large the effect of age discrimination is in recruitment, as measured by correspondence testing and scenario-based experiments conducted between 2010 and 2019. It also tries to make it possible for the reader to consider potential pitfalls one might fall into by looking at the individual studies within this particular subject. In general, current

(8)

research generates a lot of potentially insightful knowledge, but methodologically and conceptually, many studies might be restricted in the way they operationalize their outcomes. Therefore, this review is supposed to help interested readers recognize methodological concerns of individual studies and moreover restructure the results as to explore more clearly not only whether age discrimination occurs, but if so, how much.

Method

The following systematic review was conducted and reported in accordance with the official PRISMA guidelines on systematic reviews (Page et al., 2020).

Eligibility criteria Inclusion criteria

Our inclusion criteria were derived from the PICOS guidelines (Higgins et al., 2021) which are a standardized scheme to formulate evidence-based research questions, usually in the medical field.

PICOS is an acronym that translates into participants, intervention, comparator, outcome and study design.

In our systematic review, participants eligible for inclusion were supposed to be active recruiters who have received applications sent out by researchers in correspondence studies (field experiments) or active recruiters who rated fictive job applicants (scenario experiments).

The intervention required the selected studies to include fictitious applications to be used as a basis for either the decision on a callback (correspondence study) or the assessment of an applicant’s employability, job suitability, desirability, hiring priority, recommendation of selection, or likelihood of being invited to an interview, or similar (scenario study). These measures also serve as the required

(9)

outcomes of our studies. Applicants’ ages, resp. the ages presented on the applications, were required to be randomly assigned to the recruiters (e.g., 30 vs. 50) We considered different levels of manipulation of age, based on the applicants‘ age group. These groups are formed as follows: Applicants between the ages of 40 and 49, 50 and 59, 60 and 65 (retiring age) or above 65.

As for the comparator, it was required that the studies included applicants who do not signal any other type of minority group membership that is protected by anti-discrimination law (e.g. sexual orientation minority), and are between 30 and 35 years old. The reason we intended to exclude

applicants younger than 30 is because of the possibility of discrimination against young people, whereas the cut-off of 35 is intended to leave some range to the 40-year category in which studies found age discrimination to increase significantly. However, during the full-text analysis we discovered a number of researchers tended to have 29-year-old applicants as the cut-off age and their control group, therefore we decided to include studies with 29-year-olds as controls as well, thus deviating from the pre-

registration. However, a sensitivity analysis was conducted to inspect the possible impact of this change.

Lastly, study designs we looked for were correspondence studies (field experiments) which included sending out fictitious applications to existing job posts and measuring callback rates, and scenario experiments (vignette studies) which included recruiters as participants who had to rate employability of fictitious applicants. Inclusion criteria for the pre-registered study required occupation manipulation as well, however, we included studies that were eligible based on all other criteria, even if they did not specify different occupations to which fictitious applicants would be applying, as this was not an integral part of our main meta-analysis, thus we would not lose crucial evidence.

(10)

Exclusion criteria

Inversely, studies were excluded if participants were not employed as active recruiters and did not have personnel decision power in any way (e.g., undergraduate students or other non-recruiting participants). They were also excluded if outcomes were of non-verbal behaviours of the participants (e.g. eye-tracking), consisted of applicant interviews or sampled studies which did not focus on the selection stage or more specifically, the stage in which the recruiters choose the potential interview and job offer candidates among the applicants based on their resumes. Moreover, studies were excluded if they gathered information that was only general in nature (e.g. the hiring probability of older workers in general). Studies were also excluded if they were books, book chapters, or published in any language different from English. Finally, we excluded all studies conducted prior to 2010, which we took as a general cut-off based on having relatively recent publications which would reflect today’s working culture and ensure that the majority of studies conducted happened when countries had established anti-discrimination laws.

We grouped included studies for the syntheses based on their study design. Thus, we had four groups of possible designs: Correspondence studies with between-participants design, correspondence studies with within-participants design, scenario experiments with between-participants design, and scenario experiments with within-participants design.

Information sources

Primary databases used to conduct the searches entailed ERIC, PsycINFO, BASE, and Web of Science. Last searches were conducted on April 6th, 2021, April 5th, 2021, April 11th, 2021, and April 8th, 2021, respectively. Furthermore, we searched Google Scholar as our secondary source and

(11)

conducted the final searches on April 9th, 2021. Along with searching through databases, we went through reference lists of reviews that we found eligible for our topic, as well as through reference lists of full-text included studies, and extracted eligible references from two articles: Baert (2018) and Bertrandt & Duflo (2017).

Search strategy

In general, data search was limited to articles published between 2010 and 2021, however, the studies must not have been conducted any prior to 2010 or any later than 2019. With the setoff of the current global pandemic, we anticipated possible, uncontrollable confounders which we wanted to exclude from our studies (e.g., the callback rates might overall be different during the pandemic to avoid unnecessary meetings, age being a risk factor of Covid-19 might increase discrimination in high contact occupations).

Generally, searches from all databases were downloaded and imported into the Zotero

reference management software (Zotero v. 5.0.96.2, Roy Rosenzweig Center for History and New Media, 2021) where they were saved and filtered. This way the search history was saved exactly as it was at the time of search. For further information on our search strategy, such as our search terms and search settings, please consider our study material. All of the search strategies and full-texts have been uploaded to our OSF project (see Appendix). Keywords reported in the protocol have been used and combined to create searches, although there have been updates in search strings when conducting the searches in databases, particularly searching through ERIC and PsycINFO where the searches included thesaurus terms (“Age discrimination”, “Personnel selection”, and “Recruitment” for ERIC and

“Recruitment”, “Employment Discrimination”, “Age Discrimination”, “Ageism”, “Aged (Attitudes

(12)

Toward)” and “Aging (Attitudes Toward)” for PsycINFO) were added along with the keywords “labour market” and “age bias” which were concluded to improve search precision.

Selection process

As a first step of the selection process, searches were uploaded into the reference management software Zotero (Zotero v. 5.0.96.2, Roy Rosenzweig Center for History and New Media, 2021).

Subsequently, both authors (Batinović, L. & Howe, M.) merged duplicates based on the availability of data in each version of the duplicate studies. After excluding book and book chapters, assuming they would not contribute to the content of interest, the remaining articles were moved into the article screening software Rayyan (Ouzzani et al., 2016) which was used to conduct title/abstract screening of articles. The authors of this review then independently screened through all titles and/or abstracts using the blind feature in the software. After all articles were screened by both authors, blind mode was turned off and conflicts were collectively resolved. Articles at this stage were screened against the PICOS criteria, publication date, conduction date, and language of the publication to decide whether the studies should move on to the full-text screening phase or be excluded from further analyses. After finalizing the number of studies to be included into the full-text reading, reports were retrieved if possible, and the two authors then conducted independent full-text screenings of the retrieved articles.

In addition, they independently reported the extracted PICOS data from the included articles and transferred them into a Google sheets spreadsheet. This procedure was not formally blinded, but the authors independently extracted the PICOS data without checking each other’s work, and after independently extracting PICOS data, researchers collectively resolved any discrepancies. Using the PICOS framework, studies were either excluded or included into the content coding/data extraction stage, which was conducted collectively and simultaneously among the two aforementioned researchers

(13)

whereas the data extracted was reported in the Content Coding Google sheets spreadsheet. Apart from the duplicate recognition in Zotero, each part of screening was conducted manually by researchers, including coding the data.

Data collection process

The two authors collectively, simultaneously, and manually extracted data from each report and reported them in the Content Coding spreadsheet. Data coded included:

1) author(s), 2) country,

3) participants (n), 4) outcome measures, 5) study design, 6) study type,

7) discrimination (experimental) age, 8) control (comparator) age,

9) discrimination legality status in the study’s respective country at the time of conduction, 10) peer-review status, and

11) focal tests.

If data was deemed unclear or missing, the authors of the original studies were contacted to obtain data needed, and reported as missing if data was not retrieved. However, in order to meet the time constraints for this thesis, we postpone this stage except for when quick responses can be expected.

(14)

Data items

Outcomes looked for in data extraction

In our correspondence studies, the main studied variable was recruitment discrimination, operationalized by the number of times the applicants are being contacted by employers for further consideration in the recruitment process (“call-back”), while for the scenario experiments we looked for recruiters’ assessments (i.e. ratings, decisions and judgements) about the applicants‘ employability, job suitability, desirability, hiring priority, recommendation of selection, or likelihood of being invited to an interview. Specifically, the hiring discrimination ought to be based on applicants of older age being compared to younger applicants functioning as comparators. Measurements used could be

dichotomous (called back vs. not called back) or continuous using Likert type scales to assess the level of employability, job suitability, (...). As found during the full-text analysis, some studies reported

differences in types of call-backs (e.g. call-back as an invitation to interview versus call-back as only requiring more information from the applicants). In these instances, we decided to only consider call- backs reported in those studies as invitations to interviews or job offers for the quantitative analysis. In case scenario experiments reported multiple eligible outcomes, we prioritized outcomes referring to the likelihood of being hired/selection decisions/job offers, following likelihood to be invited to an

interview, level of employability and job suitability, resp., when extracting data for synthesis.

Other variables sought after in data extraction

Participants in correspondence studies were supposed to be recruiters, and we specifically only included scenario experiments which included recruiters or hiring managers as participants. We looked for interventions which manipulated age in fictitious applications, and had at least one age which served

(15)

as a comparator and fit our control age group (pre-registered as 30 - 35), and at least one age serving as the experimental group (40+).

We deviated from the pre-registration by including studies which had control groups aged 29 (Neumark, 2019; 2016; 2019; Challe et al. 2015 (Study 1); Krings et al., 2010 (Study 3)), as well as one study whose comparators were applicants aged 35 and 36 (Challe et al., 2015 (Study 3)). Thus, we included the entire result, without excluding the 36-year-old applicants from the synthesis. Whenever age groups spanned several ages, we averaged these ages and linked them to the corresponding age group of the age average. In the studies by Neumark et al. (2016; 20191;2), the reported age groups were 49-51 and 64-66, thus we coded these groups as the 50- and 60-year-old group for we were unable to extract data for each age separately. In case of missing data, it was reported as missing, with contacting the authors of the original study in order to obtain it. Study designs had to be experiments, either vignette studies (scenario experiments) or correspondence studies (field experiments). Considering that we changed the inclusion criteria for the comparators in the content coding stage, we went back to the screening phase to check if any studies with 29-year-old comparators have been excluded during this stage. We found no additional studies with this comparator.

Aside from PICOS, we extracted data about the countries in which studies were conducted and their status of age discrimination laws and finally, whether the article was peer-reviewed. Finally, for the pre-planned subgroup analysis, if available, we also extracted the reported occupations where the applications were sent in correspondence studies or the types of jobs fictional applicants were

hypothetically applying to in scenario experiments. All of the extracted data is reported in the Content Coding sheet.

(16)

Risk of Bias Assessment

We conducted our risk of bias assessment based on the pre-registered set of criteria. Because the original assessment tool from Cochrane was ill-suited for these types of studies, we relied on a version developed in the protocol for a discrimination review (https://osf.io/x5tgh) for age

discrimination in Sweden. The risk of bias assessment was limited to all studies with available outcomes.

Therefore, elements of the pre-registered risk of bias assessment considering the scenario-based studies were omitted. To produce a standardized, replicable assessment of bias, we systematically

differentiated risk of bias indicators for (a) the factors within a study, (b) the studies as a whole and (c) the total risk of bias of this review. The factor indicators display the bias assessment for all the single factors (e.g. factor 1: quality of age manipulation) within one study. A full listing of the factors and items can be found in Table 1. The systematic approach to calculate our factor indices can be found in Table 2 in the Appendix.

Within our criteria, we specified certain items to avoid diffuse and non-strategic assessment of bias if needed. This was usually the case whenever items were not distinct in nature and left room for interpretation. The first factor (quality of age manipulation) contains three items. The first item (1.1) asks for studies to make age salient on the applicants’ résumés. We decided only to accept this if age was either directly stated numerically on the application or if it was provided via date of birth. We did reject the assumption of age salience whenever recruiters were to guess the applicants’ age via indirect means such as the year of graduation or likewise. The second item (1.2) asks studies to have an

acceptable difference between the comparators’ age and the experimental group’s age. We specified that age differences were sufficient if they occurred within our pre-defined age groups (e.g. 30 – 35 for the comparator; 40 – 49 for the experimental group, etc.). The third item (1.3) will only be considered in a later extension of this study. It contains the condition that applications are reasonably adjusted for

(17)

work-experience, since we expect age and work experience to be correlated and we want to include studies with applications as realistic as possible. Based on the aforementioned systematic approach, we decided to only rank this factor as of high risk of bias if both of its items received a negative response.

The second factor concerns the quality of our studies’ randomization. To assess whether age was randomly assigned (item 2.1), we automatically decided to accept this assumption if different age groups were equal in size. Otherwise, studies were required to directly state that their applicants’ age has been randomized. Item 2.2 encompasses the order of dispatching the application pairs (or triplets, quadruplets, …). We agreed that studies only received a positive response if the order of age in the dispatched applications was randomized, counter-balanced or if applications were sent out

simultaneously. If item 2.1 received a negative response, the whole factor was considered a high risk of bias factor.

Within the third factor, the quality of the callback procedure is assessed. The first item (3.1) addresses the adequacy of applications in relationship to the job posts applied to. It was decided to assume that an application was well-matched to a job post if the requirements of the offered job overlapped with the skills assigned in the CVs. This was automatically accepted if offered positions did not require any specific academic or otherwise scholarly degree, or any other kinds of specific skills (e.g.

speaking a certain language). The second item (3.2) addresses the callback reception. Only if studies used both phone and mail notifications as callback platforms, they were considered low bias.

The final factor deals with the quality of the applications. Item 4.1 deals with the distinctiveness of each application. The item considers information about whether applications have been randomized beyond the discriminatory trait (age). However, this item will not be considered in the risk of bias assessment for now, since in the literature, there is conflicting evidence about the meaningfulness of distinct applications (see “Heckman critique”, Heckman, 1998). Item 4.2 assesses the completeness of

(18)

the sent applications. CVs were only accepted as complete when, besides age, it at least included gender, name school and / or work experience. Finally, the last item (4.3) considers the application formatting. It was assumed that in case recruiters received more than one application from the total sum of applications sent per study, they could be influenced by the recognition of similar or equal formatting patterns. Applications were accepted to be distinct in format if authors exemplarily remarked that the applications were created based on different types of templates. Factor 4 automatically

submerged to a high risk of bias factor if item 4.2 received a negative response. However, item 4.2 is only applicable to within-subject designs so that this rule will be abrogated in case of between-subject studies.

Due to the assessed relevance of the different factors, we considered a study to automatically be of high risk whenever it scored high in the second factor (quality of the randomization). If factor two was not of high risk, studies were only assessed as high risk studies if the other factors provided at least two high risk of biases. Our general verdict of all the included studies will encompass the average of all the individual study indicators.

Effect measures

Effects were calculated for studies included in the quantitative analysis, which all were

correspondence studies with within- or between-subject designs. All of these studies reported outcome as the call-back rate presented generally as proportion of invited and not invited young and old

applicants, with some studies reporting actual frequencies of individual call-backs. Reported proportions were transformed into frequencies if frequencies were not present, and for each study, odds ratios (OR) were calculated and used in the meta-analysis.

(19)

Synthesis methods

We provided a quantitative synthesis of the findings from the included studies, structured around the age groups and participant demographics. We also provided summaries of intervention effects for each study by calculating the odds ratios between older and younger applicants getting callbacks (for dichotomous outcomes) or standardized mean differences in the form of Hedge's g (for continuous outcomes). Only under the assumption of equal designs and study types (scenario and correspondence testing) will we integrate studies in one meta-analysis.

Using data extracted in the content coding sheet, we used the PICOS framework to decide which studies would be eligible for which group in analysis. Considering all studies included into the content coding had eligible participants, interventions, comparators and outcomes, study designs and study types were used to divide studies into groups for analyses. We conducted data conversions of

proportions into frequencies in order to calculate effect sizes using R language (R Core Team, 2020), and if the data was unable to be recreated or obtained, it was marked as missing.

Whenever studies have used the same type of intervention and comparator (based on coding:

Content coding), with the same outcome measure type, which finally yielded a group of correspondence studies with within-participants design, we pooled the results using a random-effects meta-analysis using the R package metafor (Viechtbauer, 2010), with odds ratios for binary outcomes (call-back rates), and calculated 95% confidence intervals. Heterogeneity was assessed by estimating tau and I2.

Reporting bias assessment

We further examined publication bias and questionable research practices (e.g., p-hacking) through an analysis of the focal tests. We used the p-checker (http://shinyapps.org/apps/p-checker/ ) to calculate the R-index and the p-curve. A sensitivity analysis was planned to be conducted if there was

(20)

reasonable assumption of studies having a moderate to high risk of bias and if there was

disproportionate weighting of studies. However, due to time constraints for the thesis, this was left out for now and will be conducted later for the purposes of publishing the article based on this review.

Certainty assessment

To determine confidence in the evidence, we calculated the 95% confidence intervals, along with tau and I2 , and by assessing the risk of bias.

Results Study selection

Our searches yielded a total of n = 2,599 studies, of which n = 1,728 can be attributed to sources from our primary databases (PsycINFO, Web of Science, ERIC, and BASE) and n = 871 can be attributed to secondary sources (Google Scholar and manually extracted references from reviews). After excluding duplicates and eliminating all books and certain book chapters based on title/abstract, in Zotero (Zotero v. 5.0.96.2, Roy Rosenzweig Center for History and New Media, 2021), we ended up transferring n = 1,733 studies into the screening process in Rayyan (Ouzani et al., 2016), with n = 1,253 being primary and n = 480 being secondary search results. Rayyan further identified n = 104 duplicates, leaving n = 1629 unique articles, however, we decided to screen all of the imported articles without immediately excluding duplicates suggested by Rayyan, to prevent potential errors made by the software. After screening our articles independently and discussing our decisions to accept or reject a paper for further consideration, we excluded n = 1,580 studies in the screening process and ended up with n = 51 studies to be considered in the full text readings.

(21)

However, we only managed to retrieve 49 reports that went further into the full text reading phase. Two reports we were unable to find online, and we received no response from the authors.

These 49 reports consisted of 53 studies which were checked against the PICOS criteria. Based on the criteria we included 14 studies into the analysis stage, which were reported in 12 articles.

Figure 1

Study Selection Diagram.

From: Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. doi:

10.1136/bmj.n71. For more information, visit: http://www.prisma-statement.org/

(22)

Study characteristics

14 studies from 12 articles were eligible to be included into the analyses based on eligibility criteria. However, it proved unfeasible to extract data from certain studies for our data synthesis, thus studies included in the synthesis were these eight studies: Ahmed et al. (2012), Carlsson and Eriksson (2019), Farber (2019), Jansons and Zukovs (2012), and Neumark et al. (2016; 2019; 2019). Further in text we provide a narrative summary of the 14 initially included studies.

Table 1

Studies excluded after the full-text reading stage and exclusion reasons (k = 37)

Ahmed et al. (2012). In this particular study, the authors decided to conduct a within-subjects correspondence study including 466 job offers in Sweden. The applications were sent to job posts looking for either restaurant workers or sales assistants, with call-back rates being the dichotomous outcome variable. Their intervention group encompassed fictitious applicants aged 46 whereas the comparator was a fictitious applicant aged 31.

6

(23)

Carlsson and Eriksson (2019) conducted a between-subjects correspondence study to test age discrimination in hiring in Sweden, by assessing call-back rates for the fictitious applicants. The authors included age as a continuous variable in the interval of 35 to 70 years, and gender which they indicated with applicants’ names. They assigned names and ages to fictitious applications randomly and sent out triplets of resumes to over 2,000 employers, generating a sample of 6,066 applications, which were sent to seven occupations: administrative assistants, truck drivers, chefs, food serving and waitresses, retail salespersons and cashiers, sales representatives, and cleaners.

Challe et al. (2015) conducted four studies, out of which the first three, within-subjects correspondence studies conducted in France, were included in the review. The first study included fictitious applicants aged 29 as comparators, with the intervention group being fictitious 56-year-old applicants, however these applicants were also divided into two groups, based on their expected retirement age. Triplets of resumes (29-year-old, 56-year-old closer to retirement, and 56-year-old further from retirement) were sent out to job posts for two occupations: call center agent (n = 300) and sales assistant (n = 301). The second study aimed to investigate age discrimination in hiring in relation to technological skill obsolescence, and encompassed sending three fictitious applications, with ages of applicants being 32, 42, and 52, to following occupations: IT project managers and IT developers (n = 302), and management accountants and accountants (n = 308). In the third study, they intended to examine age discrimination in context of gender stereotypes around certain occupations. They included fictitious applicants in their 50s (50 and 51) and applicants in their 30s (35 and 36) who were either male or female, and applied to personal service occupations (specifically: home help, cleaning persons, and caretakers). The outcome variable for age discrimination was call-back rates.

Farber et al. (2019) conducted a between-subjects correspondence study in the United States to study age discrimination in hiring. Their intervention comprised manipulating age of fictitious applicants

(24)

(chosen from given age groups: 22–23, 27–28, 33–34, 42–43, 51–52, or 60–61 year-olds) and length of unemployment spell (in weeks: 4, 12, 24, 52), and applying to either low-skill jobs (e.g. receptionist, office assistant) or high-skill jobs (e.g., executive assistant, office manager) (n = 2,122) with the outcome variable being call-back rates.

Jansons and Zukovs (2012) conducted a within-subjects correspondence study in Latvia where they created fictitious resumes, with the intervention group being 55-year-olds and the comparator 35- year-olds. They applied to salesmen jobs (n = 529) and measured age discrimination by difference in call- back rates between the younger and the older candidate.

Krings et al. (2010) conducted three studies in Switzerland, out of which the third one was included in this review. They conducted a between-subject scenario experiment to discover age discrimination in hiring. They manipulated the age of fictitious applicants (29 versus 50) and skill requirements (person-oriented versus task-oriented). Participants were HR professionals (n = 88) and the dependent variables were likelihood of getting invited for an interview and being hired, perceived competence and warmth.

Montizaan and Fouarge (2019) conducted within-subjects scenario (vignette) experiment in the Netherlands to examine age discrimination in hiring, relating to applicants’ and employers’

characteristics. Age of fictitious applicants were either 35, 45, 55, or 60 years old. Employers (n = 1,100) were presented with two fictitious applications and were asked to choose which of the two presented applicants they would hire, but they do not report occupations to which the applicants applied. The outcome variable for age discrimination was likelihood to be hired.

Neumark et al. (2016) conducted a between-subject correspondence study in the United States.

Fictitious applications differed in age (29-31, 64-66) and skill level (high or low) for each occupation they

(25)

were sent to (sales, security and janitor). They sent out 7161 applications and their outcome variable was the call-backs.

Neumark et al. (2019)1 conducted a between-subject correspondence study in the United States.

Fictitious applications differed in age (29-31, 49-51, 64-66), skill level (high or low), and gender. They sent out 40,223 applications to jobs for four occupations (administration, sales, security, and janitors), and their outcome variable were the call-backs.

Neumark et al. (2019)2 conducted a between-subject correspondence study in the United States.

Fictitious applications differed in age (29-31, 64-66) and gender. They sent out 14,428 applications to 3,607 jobs and their outcome variable was the call-backs. They do not report occupations to which the applications were sent.

Oesch (2020) conducted a between-subject scenario experiment in Switzerland. Participants (recruiters, n = 501) had to indicate the likelihood of inviting fictitious candidates to an interview (on a scale from 0 to 10) based on the given resume, and propose an adequate wage for each candidate. The ages of fictitious applicants were 35, 40, 45, 50, or 55 years old. The occupations fictitious candidates were applying to were expert accountant, human resources assistant and building caretaker.

Richardson et al. (2013) conducted a between-subject scenario experiment in the United States.

They recruited 154 participants (students and organization based, n = 102 and 54, resp.). Participants had to assess work-related competencies of fictitious applicants and indicate the likelihood of being hired on a 9-point Likert scale. The age of applicants was taken from a range of 33 to 66 years.

(26)

Table 2 Characteristics of studies included in the review (k = 14)

(27)

Risk of Bias Assessment

Ahmed et al. (2012). Guided by the aforementioned systematic approach, it was concluded that the overall risk of bias for this particular study is of some concern (in the following: moderate). The quality of the age manipulation was sufficient and no deviations from our criteria in this factor could be observed. The quality of the randomization was only partly satisfactory, as indeed applicants’ age had been randomized, but it was not clearly specified whether applications had been sent out

simultaneously or not. The quality of the callback procedure was in no way limited, job offers and applications were well matched and callbacks were collected via phone and mail. The quality of the applications was somewhat diminished by missing information about the randomization of application formatting. However, the CVs were all complete.

Carlsson and Eriksson (2019). This study does not display any limitations in regards to the age manipulation, randomization or callback procedure. However, difficulties were faced with regards to the sent applications. Variables such as employment status or willingness to participate in job trainings were alternated (p. 175). Also, although different templates were used for different occupation clusters, realistically, recruiters in charge of more than one job post would still encounter one template more than once. However, the applications themselves were highly diversified, which is why we agree to assess the risk of pattern recognition as very low. Therefore, we decided to mark this factor with “some concern”. With that being said, the overall study assessment concludes a moderate risk of bias.

Challe et al. (2015) – Study 1. The first study in Challe et al. (2015) generated an overall low risk of bias. The study reported no deviations from our anticipated criteria and is therefore not believed to decrease the validity of our analysis in any significant manner.

(28)

Challe et al. (2015) – Study 2. In contrast to their first experiment, the second experiment in Challe et al. (2015) displays several instances of higher risk of bias. It is not extractable whether age was made salient in this experiment’s applications. Although the assumption of sufficient age differences between the control and experimental group could be confirmed, we consider age salience as one of the most important criteria. We have not specifically weighted this differently in the risk of bias assessment;

however, this will find particular examination in the discussion below. Other than that, it can be assumed that the applications were both age-randomized and time-randomized, so that the risk of bias for randomization overall is low. Nonetheless, there are further restrictions in the quality of the callback procedure. The authors matched the applications to the job offers; however, it is not specified which platforms the authors used to collect callbacks. Finally, we conclude a high risk of bias on the factor level concerning the quality of the applications. There is no extractable data about the completion and formatting of the study and applications. Overall, the study displayed a moderate level of potential risk of bias.

Challe et al. (2015) – Study 3. As in the second experiment, this third study is missing information about age salience and will therefore be considered more carefully later. However, age differences between the control and experimental groups were sufficient. For the second factor, the study lacks proof of age randomization, which leads us to automatically conclude a high risk for this factor.

Nonetheless, applications were well matched to the job offers. However, there is no information about the callback platforms the authors used in their experiment. As in the previous study, the factor

assessment of bias for the quality of applications results in a high risk of bias since the applications were alternated by gender, and no information about CV completion and formatting is given. Given that the

(29)

second factor is a high-risk factor, we conclude that this study displays a high level of potential risk of bias.

Farber et al. (2019). Overall, this study was assessed as a moderate risk of bias study. Although the quality of the randomization and the quality of the callback procedure is unrestricted, Farber et al.

(2019) did not specify the age directly on the applications. Moreover, the assessed factor risk for the quality of the applications was moderate. Although the applications were complete, there was no information about their formatting.

Jansons and Zukovs (2012). As in the first study of Challe et al. (2015), there are no restrictions regarding our criteria in this article. We assume this article not to be a potential threat to the validity of our analysis and label it as a low risk of bias study.

Neumark et al. (2016). This study is considered a moderate risk of bias study. While there are no restrictions in terms of the quality of randomization, the study yields moderate risk of bias in the quality of age manipulation and the quality of the callback procedure. More precisely, the authors provide no information about the age salience, and the job skills mentioned in the applications are matched to the specific jobs only randomly, meaning that they do not completely reverb the requirements of the job post. It is mentioned by the authors that five of seven skill indicators are randomly assigned to each application and that the applications were sent to job offers for sales, security and janitor positions (p.

305). This also implies that besides age, the applications vary in the skill sets that the applicants were given. Moreover, the authors provide no information about either the formatting or completion of the resumes.

(30)

Neumark et al. (2019)1 & Neumark et al. (2019)2. The following two studies will be summarized due to their equal distribution of restrictions among our assessment factors. All of the given studies show no restrictions in regards to the quality of the randomization, callback procedure or applications.

nor the quality of the callback procedure. However, there are conflicts concerning the quality of the age manipulations. In all three cases, the authors imply the applicants’ age indirectly via high school

graduation year (Neumark et al. (2019)1, p. 380 & Neumark et al. (2019)2, p. 938). Finally, these studies are all labelled with “some concern”.

Following the systematic approach, we already used for the calculation of the study scores, we conclude that with at least one study with high risk of bias, our overall risk of bias assessment for this systematic review is high. Moreover, there are many studies with moderate risk of bias. This will be further acknowledged in the discussion below.

(31)

Figure 2

Risk of Bias Assessment Chart

Note. Y = Yes, PY = Probably Yes, N = No, PN = Probably No, NI = No Information. For the full set of bias criteria and coding map see Appendix.

Because of time constraints, item 4.1 was omitted from our assessment for this stage of the review.

1-3 Experiment 1(1), 2(2) and 3(3) of 4 experiments included in the paper.

4 Neumark, D., Burn, I., Button, P. & Chehras, N. (2019). Do State Laws Protecting Older Workers from Discrimination Reduce Age Discrimination in Hiring? Evidence from a Field Experiment. Journal of Law and Economics, Vol. 62, 373 – 402.

5 Neumark, D., Burn, I. & Button, P. (2019). Is It Harder for Older Workers to Find Jobs? New and Improved Evidence from a Field Experiment. Journal of Political Economy, Vol. 127(2), 922 – 970.

Results of individual studies

We present results of syntheses further in text. Meta-analyses which included more than one study are accompanied with forest plots in the text. Other figures and code are available in the

RMarkdown file in the appended materials. All meta-analyses were conducted using the random-effects model with Dersimonian and Laird (1986) tau estimator.

(32)

Results of syntheses

As mentioned prior, this review intended to conduct four different types of meta-analyses, containing one of four combinations from either correspondence or vignette studies and either in- between or within-subjects design. Due to the availability of our empirical material, we had to omit meta-analyses on scenario-based experiments. Moreover, hence the fact that the included studies’

outcomes were all dichotomous in nature, we did not have to make use of Hedge’s g. The results of the meta-analyses can be seen below.

Meta-analysis 1. Our first meta-analysis encompassed all within-subject correspondence studies measuring the hiring disparities between our comparator group and 40- to 49-year-old applicants. Via a random-effects meta-analysis, the included study on the matter of age discrimination revealed an effect of applicant age on the hiring decisions held by the participating recruiters against older applicants; OR = 3.45, 95 % CI [ 2.0, 5.95 ], z = 4.46, p = 8.04-6.Since only one study was obtained, an analysis of

heterogeneity is not further considered. Considering the aforementioned risk of bias assessment, we assess the certainty of evidence as moderate.

Meta-analysis 2. Our second meta-analysis was set to include all within-subject correspondence studies measuring the hiring disparities between our comparator group and 50- to 59-year-old

applicants. Via a random-effects meta-analysis, the included study on the matter of age discrimination revealed an effect of applicant age on the hiring decisions held by the participating recruiters against older applicants (OR = 2.20, 95 % CI [ 1.5, 3.23 ], z = 4.02, p = 5.81-5). Since only one study was obtained, an analysis of heterogeneity is not further considered. Considering the aforementioned risk of bias assessment, we assess the certainty of evidence as low.

(33)

Meta-analysis 3. Our third meta-analysis was set to include all between-subject correspondence studies measuring the hiring disparities between our comparator group and 40- to 49-year-old

applicants. Via a random-effects meta-analysis, the two studies reported on the matter of age discrimination revealed an insignificant effect of applicant age on the hiring decisions held by the participating recruiters against older applicants (OR = 1.06, 95 % CI [ 0.85, 1.33 ], z = 0.52, p = 0.60;

Figure 3). Because this analysis comprised only 2 studies, heterogeneity could not be determined.

Considering the aforementioned risk of bias assessment, we assess the certainty of evidence as low.

Figure 3

Forest plot of the odds ratio effect sizes for between-subject correspondences studies with 40- to 49- year-old target age group (k=2).

(34)

Meta-analysis 4. Our fourth meta-analysis was set to include all between-subject

correspondence studies measuring the hiring disparities between our comparator group and 50- to 59- year-old applicants. Via a random-effects meta-analysis the three studies reported on the matter of age discrimination revealed a significant, although weak effect of applicant age on the hiring decisions held by the participating recruiters against older applicants (OR = 1.35, 95 % CI [ 1.1, 1.68 ], z = 2.73, p = 0.006; Figure 4). The analysis of heterogeneity revealed that the tau = 0.15 and I2 = 61.83 per cent. While I2 showed a substantial amount of heterogeneity (Higgins et al., 2021), tau is considerably small, which would indicate that this substantial between-study heterogeneity might be caused by differences in study populations. Thus, a subgroup analysis is needed for further clarification. Considering the aforementioned risk of bias assessment, we assess the certainty of evidence as low.

Figure 4

Forest plot of the odds ratio effect sizes for between-subject correspondences studies with 50- to 59-year- old target age group (k = 3).

(35)

Meta-analysis 5. Our fifth meta-analysis was set to include all between-subject correspondence studies measuring the hiring disparities between our comparator group and 60- to 65-year-old

applicants. Via a random-effects meta-analysis, the five studies reported on the matter of age discrimination revealed an effect of applicant age on the hiring decisions held by the participating recruiters against older applicants (OR = 1.55, 95 % CI [ 1.40, 1.70 ], z = 8.68, p = 3.99 × 10−18; Figure 5

). The analysis of heterogeneity revealed that tau = 0.13 and I2 = 91.12 per cent, which was interpreted as a considerable amount of heterogeneity (Higgins et al., 2021). Once again, tau showed a low deviation of effect sizes, which could be interpreted as low random error, therefore, the

heterogeneity might be ascribed to contextual factors that need to be further explored by subgroup analysis. Considering the aforementioned risk of bias assessment, we assess the certainty of evidence as low.

Meta-analysis 6. Our sixth meta-analysis was set to include all between-subject correspondence studies measuring the hiring disparities between our comparator group and applicants older than 65.

Via a random-effects meta-analysis, the included study reported on the matter of age discrimination revealed a significant and considerable effect of applicant age on the hiring decisions held by the participating recruiters against older applicants (OR = 1.55, 95 % CI [ 1.40, 1.70 ], z = 5.58, p = 2.40 × 10−8). As this was only one study, heterogeneity was not assessed. Considering the aforementioned risk of bias assessment, we assess the certainty of evidence as low.

(36)

Figure 5

Forest plot of the odds ratio effect sizes for between-subject correspondences studies with 60- to 65-year- old target age group (k = 5).

Note: Neumark et al. (2019)a - Neumark, D., Burn, I. & Button, P. (2019). Is It Harder for Older Workers to Find Jobs? New and Improved Evidence from a Field Experiment. Journal of Political Economy, Vol. 127(2), 922-970.

Reporting Biases

The funnel plots show that the studies’ significance levels do not group around the 5%

significant threshold, which, along with the R-index calculated based on obtainable focal tests of included studies (k = 13), indicate little concern about potential publication bias.

(37)

Figure 6

Funnel plot of between-subject correspondences studies with 50- to 59-year-old target age group (k = 3).

Figure 7

Funnel plot of between-subject correspondences studies with 60- to 65-year-old target age group (k = 5).

(38)

Certainty of evidence

Based on the aforementioned risk of bias assessments, funnel plots presented below, and calculated R-index and p-curve (available in appended materials), we conclude to find a moderate certainty of evidence. Furthermore, we conducted the leave-one-out sensitivity analyses (available in supplemented materials) of the effect size aggregates for between-subject studies examining age discrimination of 50- and 60-year-old applicants. The analyses show robust findings generally, with the only exception being the removal of Carlsson and Eriksson (2019) which somewhat lowers the effect size and tau. Presumably, the Carlsson and Eriksson (2019) study introduces higher deviation in effects because of the continuous variable of age. Following the GRADE guidelines (Higgins et al., 2021), this implies that we are moderately confident in the certainty of our evidence, but we acknowledge the possibility that the true effect might also be outside of our data frames.

Discussion

General report of results

The general results of our studies lead us to assume that there is a measurable effect of age discrimination against older applicants in the recruitment processes among the studies included in this review. It could be observed that this effect is most likely present already when the applicant’s age is between 40 and 49, however, it seems to rise gradually with increasing age.

Limitation of evidence

Following the aforementioned risk of bias assessment, there are certain discussion points to be considered when evaluating the overall explanatory power of our analysis. In some instances, it could be

(39)

observed that applications were sent out simultaneously (e.g., Ahmed et al., 2012; Neumark et al., 2016). It could be argued that time-randomization creates a more realistic replication of the in vivo hiring process and therefore decreases the chances of oversampling. However, it might also be considered that the applications, no matter the time of receipt, must necessarily be processed by the recruiter at different points of time and therefore are randomized eventually either way. Also, since this only concerns two studies, we assume that it does not have any significant influence on our results. The same argumentation we consider valid for the age randomization and callback platforms. Although we consider age randomization more important, it was only omitted resp. not further described in one study (Challe et al., 20153) and thus interpreted as a minor inconvenience, especially under the consideration that group sizes were equal in the experiment. However, we believe that infractions on the quality of the age manipulation and applications play a more central role in our overall assessment, and it could be observed that the assumption of the respective items had to be rejected across several studies within these two factors. One of these items we considered especially problematic was the heterogeneity of applications (see item 4.1). In eight of ten included studies in our meta-analysis, applications were altered in regards to features other than age. This makes it more complex to evaluate whether hiring decisions and assessments were met because of age-related or other, unrelated reasons on an individual study level. The reason for alternating other characteristics of applicants is often done in studies to address the Heckman critique (Heckman, 1998) which implies that making matched fictitious resumes as similar as possible could bias the inferences on age discrimination. While some studies try to address this issue in their designs by randomizing all work characteristics apart from the main variable of interest (Neumark et al., 2016; 2019a; 2019b) the possible solution to this issue could be to aggregate effect sizes of different studies in a meta-analysis. Following this, it is important to address one concern of age discrimination research, which is the possible effect of experience on the

(40)

likelihood of getting hired. As it is natural that experience increases with age, and increases the

likelihood of being hired, true age discrimination might be masked by this problem. Being able to control the effects of experience would increase the validity of inferences made by these studies. Another item that was highly critical in regards to risk of bias was age salience. In seven cases, age was either not directly provided on the applications or no further information could be found in the respective studies.

This we consider especially potent since age is the central unit of our meta-analysis and it mainly determines the meaningfulness of our results. We assume that the indirect measurement of age (e.g., through high school graduation years, see Neumark et al., 2019) might cause biases if recruiters infer false assumptions about an applicant’s age, exemplarily by disregarding the age at which they picked up college education. Finally, we tried to discover the existence of anti-discrimination laws in countries where the studies were conducted, as we considered it could possibly influence the age discrimination effects (see Neumark et al., 2019). In general, all countries had age discrimination laws which prohibit recruiting discrimination based on age, however nuances exist (e.g., in Switzerland) and it is necessary to distinguish which sectors have a higher degree of freedom when making recruiting decisions to properly appraise age discrimination effects.

Limitation of review process

We also acknowledge certain limitations of our review itself. In the earlier stages of the review, we exhaustingly screened the available databases and resources for studies; however, we decided to only integrate the first 200 search results for each individual search of Google Scholar and might therefore have missed a minor number of studies which could have extended our relevant body of evidence. Moreover, through thorough prior research, we sought to capture all of the essential key terms in our search strategy, nonetheless, we might have overseen studies that could have been found

(41)

outside of our search terms. What is more, studies were only restricted to articles in the English

language. Especially since this review intended to screen for the moderating effect of culture, this aspect could have been an important point to extend our literary resources. Moreover, because of time

constraints, we conducted the risk of bias assessment for studies individually, meaning that each study was only assessed by one rater. Although the risk of bias was still proof-read, approved or specified by the respective second author of this review, we acknowledge that we possibly introduced an

incremental source of error. With regards to our meta-analyses, it can clearly be seen that our individual meta-analyses are in parts underpowered, with three of them only including one single study (Meta- analysis 1, 2 and 6). This partially restricts the validity of our findings. It is acknowledged that the corresponding odds ratios were unexpectedly high in comparison with our findings from the other meta- analyses and it might be concluded that these outcomes might be biased in terms of their statistical power, especially with confidence intervals being wide and bordering no effect. In general, we were limited in accessing the available data, especially if studies were less recent. This made it impossible to include some studies that would otherwise have been included after the full text screening phase.

Conclusion and Implications

Findings from this review indicate that applicants’ age has an observable effect on the likelihood of being hired or in any other way considered hirable by independent recruiters. This effect is especially occurrent for senior applicants around their 60s. In the current context of the ongoing political effort to extend working lives past the retirement age, this might prove largely unfavorable as many potential employees are being hindered from (re-)entering the labor market.

Comparable meta-analyses on the topic of hiring discrimination usually yield similar results to the findings of our review. Zschirnt and Ruedin (2016) considered the log odds ratios of 34

(42)

correspondence studies about ethnic discrimination in OECD countries between 1990 and 2015, finding that on average, minority applicants receive lower callback rates than majority candidates, with their odds ratios ranging from OR = 0.27 [ 0.17, 0.43 ] to OR = 0.94 [ 0.73, 1.12 ]. The only outlier in this analysis can be seen in Bendick et al. (1991), who generated an odds ratio of 2.45 [ 1.86, 3.22 ].

However, this is explained by the fact that in their applications, Latino applicants have received more qualifications in their applications. Similar results are provided by Flage (2019), who examined the magnitude of hiring discrimination between hetero- and homosexual applicants in OECD countries. The author finds that the odds ratio in this case is at 0.64; providing that the chances for homosexual applicants to get hired are 36 per cent lower than for heterosexual applicants. In these regards, our meta-analyses mirror these findings, with our odds ratios ranging from 1.06 [ 0.85, 1.33 ] to 3.45 [ 2.0, 5.95 ], implying that there is an observable difference of discrimination between younger and older applicants.

Because of the narrow body of evidence, it would be highly recommendable to extend our findings when more data is available. This also encompasses expanding the language barriers this review faces to generate resources that we could not extract. Moreover, the scope of this review is restricted to age discrimination against older people only. Findings from Bratt et al. (2018) and Finkelstein et al.

(2012) suggest that age discrimination is often also targeted against younger applicants, since recruiters often appear to be biased against this group, especially in-between generations. The subgroup analyses were not conducted as planned because of time constraints. It would be insightful to extend future literature in these regards to contribute to a more differentiated understanding of how age

discrimination can be influenced by certain contextual factors. Finally, it would be interesting to examine whether certain stereotypes towards older people would positively influence the likelihood of being hired. Perhaps the positive stereotypes toward older people, e.g., being considered stable, loyal

(43)

and trustworthy (Ng & Fedelman, 2008) would benefit them when recruiting for such occupations (e.g., doctors, commercial flight pilots). While different occupations were examined in reviewed articles and subgroup analyses should be conducted, there is a need for new research including mentioned

occupations which would focus on possible positive bias toward older applicants.

References

Related documents

Whilst there are several studies that research the effect of training on pain intensity and headache frequency in patients with tension-type headache (26, 27, 28, 29, 30, 31), there

Clinical characteristics and long-term clinical outcomes of Japanese heart failure patients with preserved versus reduced left ventricular ejection fraction: a prospective cohort

De främsta motivationerna för en individ att ingå i en grupp är relaterat till prestation och social interaktion, det vill säga att antingen göra framsteg inom spelet eller att

Rädsla för att utveckla komplikationer till diabetes påverkade livet och gjorde kvinnorna sårbara samt ökade känslan av att vara begränsad av blodsockerkontroller (Rasmussen et

Det finns också en risk att GM-arbetet avpolitiseras då byråkratisk integrering tillämpas eftersom strategin använder byråkratiska verktyg, vilka har kritiserats för att sakna

Vid jämförelse av de två olika mätvärdena som användes i denna studie (medelvärdet av det andra registrerade mätvärdet från vardera undersökningsomgång jämfört med

Skattade parametervärden redovisas från tre regressionsmodeller ((1), (2), (3)), där inverkan av kroppslängd, utbildningsnivå, kognitiv förmåga och

Om animationer inaktiverades ökade bildhastigheten med 1.29 % och om därefter även objekten inaktiveras ökas bildhastigheten med ytterligare 1.84 %, till 102 bildrutor per