• No results found

Chapter 3. Social studies of science and economics: Previous research

5. The sociology of valuation and quality judgement in science

I have now discussed the literature on styles of reasoning and its application to economics extensively. I have also discussed Bourdieu’s notion of a scientific habitus as a locus for the reproduction of styles by individual actors in scientific fields. However, if styles are reproduced by social actors, they are not just passive dopes, repeating a learned pattern of behaviour. The commitments and dispositions of scientists are reproduced through social practices, and a central such practice is the activation of the embodied dispositions of the habitus in quality judgement, above all in formal peer review processes. This section provides a review of a broad literature on the evaluation of quality in various scientific peer review processes. It starts with a review of selected studies on peer review processes and quality judgements, and then turns towards a more general literature that considers valuation, evaluation, as general social phenomena. Finally, it briefly turns to recent work on the epistemic impact of bibliometric indicators on science, and to applied work using expert evaluation reports from academic hiring and promotion as empirical material.

The judgement and evaluation of scientific quality in peer review processes forms the backbone of modern science. Quality evaluation takes place in settings at different levels of the academic system. At the macrolevel, we find largescale evaluations of national research systems, research policy and scientific areas.

Second, the distribution of research grants from public or private research funding agencies evaluates large long-term interdisciplinary collaborative projects as well as smaller grants to individual scholars within single disciplines or research fields.

Third, in processes of academic hiring and promotion, expert evaluations of the merits of applicants rank individuals and provide the foundation for fair and meritocratic processes. Fourth, and increasingly important as bibliometric measures come to play a greater role, every day microlevel quality judgements are made about scientific articles by journal reviewers and editors. To this we could also add the everyday personal evaluation of papers by researchers—evaluations that generate citations when found relevant and interesting.

Although the setting and the object of evaluation varies, the principles of quality evaluation are similar. Peer experts in the scientific field (or, in interdisciplinary settings, from another scientific field) use their trained judgement, acquired through training and practice, to evaluate the quality of the object in question using a set of implicit or explicit criteria. The outcome of the process affects distribution of academic recognition and resources. Studies of these

processes use a range of approaches and focus on different aspects. In the following, I review the relevant literature, divided into five parts.

First is a range of Swedish studies on scientific quality judgement mainly from the mid 1980s to the 1990s, some of which use expert evaluation reports. This literature focuses mainly on explicating the quality criteria used by experts.

Second, a number of writers in the Swedish context since the early 2000s adopt a primarily Bourdieusian perspective, with a focus on the conservative effect of a system that perpetuates its doxa through quality judgements that rely on experts’

professional habitus. Third, in the literature that one could perhaps call post-Bourdieusian, inspired both by authors like Michele Lamont, but also to a significant degree by the sociology of science, the focus shifts from quality criteria and the system-preserving effects of peer evaluation to the complex practices of quality judgement as a process where notions of quality are applied and negotiated. In the fourth part, I briefly widen the perspective with some recent works that discuss the sociology of valuation and evaluation as more general social phenomena, to show that the principles behind evaluation of scientific quality is an instance of a wider class of phenomena. The fifth and final part discusses recent literature in the intersection between bibliometrics and science studies focused on the epistemic impact of quantitative bibliometric indicators, and their use as judgement devices in research evaluation practices. The review will not cover the large literatures on, for example, quantitative studies of research quality evaluation or gender bias, but will be limited to aspects that are relevant for the present chapter, including the use of expert evaluation reports (sakkunnigutlåtanden), the conservative effect of habitus in quality judgement, the practice of applying quality criteria in evaluation, and the use of bibliometrics as judgement devices.

Pioneering studies of quality criteria

Pioneering the Swedish studies on scientific quality judgements were a group of psychologists in Gothenburg (Hemlin, Montgomery and Johansson), that conducted a first interview study (published in Swedish in 1985) on conceptions of scientific quality, and a later similar study using peer expert evaluations (Hemlin and Montgomery 1990, 1993; Nilsson 2009:141). The interview study (Hemlin and Montgomery 1990) first extracted a set of quality criteria from the scientific literature, and then investigated how these were actually understood through interviews with professors from different faculties. The quality criteria were divided into six aspects of the research to be evaluated (problem, method, theory, results, reasoning, and writing style) in terms of eight different attributes (correctness, novelty, stringency, intra-scientific effects, extra-scientific effects,

utility in general, breadth, and competence). While the interview study mainly emphasises the generality of quality conceptions, in terms of “consistency in views from different disciplines”, the authors also note a marked difference in how

“theory” is differently emphasised at different faculties (Hemlin and Montgomery 1990:80).

In their subsequent study of peer evaluation reports, 31 evaluation reports of professorship candidates at five faculties were coded according to four categories (including the already mentioned aspects and attributes) in a semi-quantitative analysis. The authors maintain in this study that scientists across faculties use

“approximately the same conceptual system” in evaluating scientific quality, but that the “stress laid on particular components . . . may vary across disciplines”

(Hemlin and Montgomery 1993:20). According to the authors there seems to be an evident distinction between what they call “soft” and “hard” sciences, where evaluators in the former write longer evaluations, concentrate more on individual publications, and emphasise the theory aspect to a greater extent. This could be interpreted, they argue, as the difference between Kuhnian normal science (“hard”) versus pre-paradigmatic (“soft”) sciences, or between Whitley’s

“restricted” versus “configurational” sciences. “Restricted” sciences in this sense share theoretical ideals and conceptual assumptions, and use mathematical formalism to a greater extent. The same general approach to studying scientific quality is also presented in a more recent article by Hemlin (2009).

The public availability in Sweden of expert evaluation reports from academic hiring and promotion (sakkunnigutlåtanden) has also been exploited by historians of ideas as a rich source of primary material. A good recent example is Rangnar Nilsson’s recent doctoral dissertation (2009). Her study uses sakkunnigutlåtanden to study conceptions of good science and how it has changed during the post-war years in three disciplines: literature, political science and physics. Drawing largely on the methodological framework of Montgomery and Hemlin, Nilsson (2009:50) shifts the focus from the universality of quality criteria to their differentiation over time and between disciplines. She notes that the conceptions of good science and quality criteria are normally not fully explicit, but the implicit must be explicated in writing evaluation reports as reviewers need to form arguments and use them as rhetorical devices. Since the expert reviewers are senior scholars chosen as legitimate authorities within their fields, they are representatives of their disciplines, and their reasoning must take the disciplinary audience into account in the choice of relevant and reasonable arguments. Thus, evaluation reports provide an excellent window into the conception of science at work in a particular time and discipline. Nilsson’s study is methodologically thorough and provides qualitative depth, and her main contribution is to show

how the supposedly universal criteria found by Hemlin and Montgomery turn out to be applied with variations with respect to time and discipline. However, the common focus of all authors here is the explicated quality criteria found in evaluation reports or in interviews, and the problems of the process of evaluation itself falls outside the scope of this literature.

Old boy-ism, habitus and potential conservatism in peer review

A second set of studies is driven by a strong Bourdieusian theoretical influence, and a focus on the potential system-conserving properties of evaluation systems relying on peer judgement. Since the first studies of scientific evaluation in the 1970s, a central question has been whether the peer review system actually functions according to the universalist norm codified by Merton (1973), or if there is anything to the frequent suspicions of “old boy-ism”, the nepotist bias towards members of one’s own social network (Gemzöe 2010). A central aspect here has long been gender, and the question of the existence and extent of discrimination against women in academia, which has been shown to exist in a range of studies (for reviews, see Gemzöe 2010; Mark 2003). One effect is an interest in the study of scientific quality motivated from a gender perspective. For example, a series of reports commissioned by Gothenburg University in 2003 investigated gender equality in academic recruitment practices using different approaches.

One of these reports used discourse analysis to illuminate the gendered language used in sakkunnigutlåtanden (Gunnarsdotter Grönberg 2003), and the philosopher Eva Mark (2003) has provided a conceptual analysis of the evaluation and recruitment process relying strongly on the conceptual framework of Bourdieu. Mark argues that Bourdieu’s theory of practice and the notion of the habitus as internalised and embodied dispositions works to reproduce an extra-individual system through the practice (in this case, the quality judgements) of individuals. Scientific quality judgement is a form of practical or tacit knowledge, which means that it is a practice learnt through practice, and competence does not necessarily mean the ability to explicate how the practice is performed or the principles behind it. The evaluator may not even be fully aware of the criteria he or she puts to use, or they may not be the criteria he or she explicitly says are used (Mark 2003:59).

A recent doctoral thesis by Ingegerd Gunvik-Grönbladh (Gunvik-Grönbladh 2014) studies the evaluation of pedagogical skills empirically in sakkunnigutlåtanden using a similar Bourdieusian approach, constructing habitus based on the actor’s position in the academic field and in terms of cultural

(academic) capital, measured by proxy indicators for academic authority. For both authors, Bourdieu’s theoretical framework provides a general model of how conservative and field-perpetuating forces potentially arise through the central role of the habitus in peer review quality judgements. While both authors mainly follow Bourdieu and his notions of habitus and doxa, Mark also voices serious concerns about the lack of reflexivity and potential for actors initiating social change in his conception of the habitus. Furthermore, the notion of doxa as the taken-for-granted common ground of a field is also adopted without modification from Bourdieu. These are two areas where scholars in the next set of works develop both empirically and theoretically beyond the Bourdieusian understanding.

Quality judgements as practice and cognitive particularism

The third theme can be found in a number of different studies that draw together insights from both the sociology of scientific knowledge and from a post-Bourdieusian conception of quality judgement as practice. Lena Gemzöe (2010) has provided a valuable and up-to-date literature review of studies on peer review for the expert group on gender at the Swedish Research Council (Vetenskapsrådet).

The study draws heavily on a seminal study by Travis and Collins (1991), the work of Norwegian scholar Liv Langfeldt, and the work of Michele Lamont and colleagues. These all share a basic approach to studying science with my present study, and furthermore provide important insights that are useful to understanding the role of the peer review process in modern economics which is characterised by, on the one hand, a broad cognitive consensus, and on the other, the mainstream-heterodoxy divide and the presence of a minority of heterodox scholars.

Travis and Collins (1991) utilise insights from the sociology of scientific knowledge to further the understanding of peer review processes and the types of potential bias these may generate. They show that earlier studies of potential bias in peer review have confused two types of potential bias: cognitive and social. The latter is close to the common sense notion of bias, whereby reviewers are biased for example regarding to social position (e.g. gender) or institutional affiliation or social network (e.g. old boy-ism). However, in line with the sociology of scientific knowledge, the authors argue that we must understand science not primarily as a social, but rather as a cognitive structure.48 They coin the term “cognitive

48 This shift echoes the move from a Mertonian sociology of science that emphasizes the social organisation of science, to a Kuhnian sociology of scientific knowledge, where the cognitive structure of a paradigm, discourse, style of thought, etc. becomes the main object.

particularism” to pinpoint a more relevant aspect of potential peer review bias, whereby reviewers “sometimes make decisions based upon their membership in scientific schools of thought”, so that biasing occurs not primarily on the basis of institutional or social, but on the basis of cognitive similarity (Travis and Collins 1991:323).

They furthermore argue that this may or may not be of great importance, depending on the level of consensus and cognitive boundaries in the specific area of science. If there is widespread consensus and no clear boundaries, “all the scientist working in the area share a similar conception of the current paradigm”, whereas “where there are well-defined cognitive communities, on the other hand, the more pronounced the divisions, the greater will be the effect on the development of science of drawing reviewers from one side of a cognitive boundary rather than the other” (Travis and Collins 1991:327, 328). While there may be strong links between social and cognitive organisation, that is, connections between a social group and a way of thinking, this primarily holds true at the macrolevel. On closer inspection, the authors argue, we may well find that cognitive boundaries don’t map completely onto social or institutional boundaries, as when scientists from a dissenting school of thought find themselves scattered in small numbers across university departments. The implications for analysing the role of the peer review system in economics should be apparent. In such a situation, cognitive particularism will potentially work against heterodox economics.

The cognitive aspects of peer review has also been emphasised in works on scientific quality judgement by Michele Lamont, culminating in her monograph on quality judgement in interdisciplinary grant panels, How Professors Think (Lamont 2009). She emphasises the fundamentally diversified nature of quality concepts and practices of quality judgement and evaluation, and points to the existence of different epistemic styles in different disciplines. However, these epistemic style do not follow disciplinary boundaries neatly, and sometimes cross disciplinary boundaries. Evaluation is here seen fundamentally as a social practice, and quality judgement as outcomes that are not predetermined. For example, a central finding is the way that experts across disciplines manage to reach agreement and deliberate, despite their varying conceptions of quality and evaluation. Similar work has been done by Liv Langfelt (2004, 2006), showing how quality evaluation is made up of somewhat messy practices, relying on tacit knowledge and silent agreements as consensus is reached in cooperative panels.

The important insight from this strand of work is that quality evaluation cannot be reduced to a set of criteria like “originality”, like in the early studies by Hemlin and others, because these criteria are empty in themselves. The interesting

question is how a criteria is applied in practice by experts, and the variations in the ways that these practices take place. Criteria are resources that are used by experts to achieve an outcome—a valuation in accordance with their judgement in a specific epistemic style—a social practice that is highly institutionalised, although it may be institutionalised in a variety of ways. This insight then lines up well with the idea of the disunity of science and of distinct styles of reasoning:

the fundamental notion that although there is not one monolithic science, there are several scientific approaches, which may or may not align well in any particular case. However, an important message of both Lamont’s and Langfeldt’s work is also the possibility, after all, of communication and establishment of consensus across epistemic styles.

The recent literature on scientific quality evaluation also connects to a growing more general literature on valuation and evaluation as a broader class of basic social phenomena, focusing on the process whereby actors produce classifications and establish value of some object, ranging from art and commodities to scientific oeuvres. A thorough review of this literature is provided by Lamont (Beljean, Chong, and Lamont 2016; Lamont 2012b), bringing together insights from the social nature of evaluation in economic sociology (for example Fourcade 2011) and classification practices in cultural sociology (Beljean et al. 2016). The general social phenomenon of (e)valuation then includes the various processes by which social actors construct, use, maintain, or justify symbolic/ cognitive classifications or schemes, and the ways these social phenomena are sorted or classified.

The epistemic impact of bibliometric indicators and judgement devices A final strand of literature that also draws to some extent on the literature on evaluation practices comprises recent work, primarily by professional bibliometricians, working at the intersection between bibliometrics and science studies. I refer particularly to work on the role of bibliometrics and bibliometric indicators in scientific evaluation practices. For example, the epistemic impact of bibliometric indicators has attracted much recent attention in this field (Castellani, Pontecorvo, and Valente 2016; Rijcke et al. 2016). This attention has followed partly from the increasing actual prevalence and use of bibliometrics in research evaluation at all different levels, a development which has even been called a “metric tide” (Wilsdon et al. 2016), and countered by calls from bibliometricians for sensible use of quantitative data in tandem with peer review in evaluation practices (Hicks et al. 2015).

The integration of bibliometric indicators in peer review has been studied using Swedish expert evaluation report data in recent work by Swedish bibliometrician

Björn Hammarfelt and colleagues (Hammarfelt 2017; Hammarfelt and Rushforth 2017). Hammarfelt and Rushforth (2017) suggest that the use of bibliometric indicators, like citation counts, h-index, and journal rankings, can fruitfully be understood in terms of judgement devices. They borrow this concept from Lucien Karpik who used it to denote the methods and devices used in the valuation of intangible goods, like unique pieces of art. A judgement device is then some form of device that can be used as a tool by an evaluator to support and simplify qualitative judgement and classification, like the appellations of wine producers or various forms of ranked lists. They argue that the use of such indicators should be understood not as an opposite of traditional peer review that has come to replace it, but as an integrated aspect and tool used by evaluators to form judgements on scientific oeuvres in peer review.

An important finding of their work, comparing the fields of biomedicine, history and economics, is the variable way in which evaluation and the use of indicators is put into practice across disciplines. These are interesting and novel studies, pointing to the understudied nature of this type of evaluation of scientific oeuvres, which plays a crucial role in academic careers, and arguably the reproduction of thought collectives. The role of journal rankings is mentioned, in passing, as having a special status in economics by tradition, although the authors do not focus on this particular aspect in their studies. On the other hand, journal impact factors (JIF) as a quantitative measure of the status of publication outlets are used both in economics and biomedicine (Hammarfelt and Rushforth 2017).

The role of top journal rankings in maintaining the disciplinary mainstream has also increasingly been discussed by heterodox economists, for example in relation to the United Kingdom’s Research Assessment Exercise, and has resulted in attempts to produce alternative rankings of heterodox economics journals (Lee 2009; Lee et al. 2010; Lee, Pham, and Gu 2013). There has also been considerable interest within the economics discipline in constructing rankings of the discipline’s top journals (Kalaitzidakis, Mamuneas, and Stengos 2003, 1999, 2011), often focusing on the technical aspects of producing better rankings.

However, these studies and ranking exercises by economists, whether mainstream or heterodox, are further evidence of the importance of rankings as such in the economics discipline.