• No results found

In Search of Pleasure : Decision-Making in Uncertainty

N/A
N/A
Protected

Academic year: 2021

Share "In Search of Pleasure : Decision-Making in Uncertainty"

Copied!
66
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköping University | IDA Master’s Thesis, 30hp | Cognitive Science

Spring 2020 | LIU-IDA/KOGVET-A--20/011--SE

i

In Search of Pleasure -

Decision-Making in Uncertainty

Leo Kowalski

Supervisor: Erkin Asutay, Assistant Senior Lecturer, IBL, Linköping University Examiner: Arne Jönsson, Professor, IDA, Linköping University

(2)

ii

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page:

http://www.ep.liu.se/.

(3)

iii Abstract

Everyday people seek pleasant experiences, acting in ways they believe will lead to emotional gratification. By engaging with the world people learn about which actions lead to certain outcomes and use this information to navigate their environment. Ultimately, this knowledge helps people to improve their decision-making in order to increase positive emotions.

This paper seeks to investigate this process of learning about and seeking out pleasant experiences. To understand the choices people make, I propose a theoretical framework of decision-making based on two components: learning and emotions. Reinforcement learning provides a computational model for how agents receive information from the environment about the outcome of different actions. This form of learning is coupled with psychological theories of emotion, explaining how affect plays an important role in

shaping decisions.

The paper describes research conducted to empirically test the validity of this approach. In the experiment, participant affect was measured during a gambling task in which participants made decisions under uncertainty. Results from the study suggest that people do in fact have affective responses to choice outcomes, and that they are able to utilize this information to improve decision-making.

(4)
(5)

v

Acknowledgment

I would like to thank my family and friends for their support and everyone at this institution for contributing to my education.

(6)
(7)

vii Table of Contents Upphovsrätt ………...………...……….. ii Copyright …..………..ii Abstract……….iii Acknowledgment……….………v 1. Introduction ……….. 1 2. Theoretical Background ……….2 2.1 Decision-Making ………....2 2.2 Learning ………...3

2.2.1 From Conditioning to Reinforcement Learning …….……..3

2.2.2 Reinforcement Learning and the Brain ……….5

2.2.3 Neural Processing of Learning Signals ………7

2.2.4 A Deeper Perspective ………10

2.3 Affect………...11

2.3.1 Valence-Arousal Model ……….11

2.3.2 Integral and Incidental Affect ………...14

2.4 Additional Relevant Concepts ……….15

2.4.1 Explicit and Implicit Learning ………...15

2.4.2 Dual-Process Theories ………..16

2.4.3 Exploration vs. Exploitation ………..18

3. The Research ………..20

3.1 Goals and Questions ………..20

3.2 Paradigm ………...21

3.3 Methods ……….22

3.3.1 Participants ……….22

3.3.2 Experimental Procedure ………...22

(8)

viii

4. Results ………..28

4.1 Learning and Performance ………28

4.1.1 Learning Blocks ………..……28

4.1.2 Estimating Value ………...………….…32

4.1.3 Final Choice Block ……….…34

4.2 Affect Results ………34

5. Discussion………37

5.1 Learning and Performance ………...…………37

5.2 Affect………...………40

5.2.1 Valence ………....40

5.2.2 Arousal ………...….42

5.3 Limitations ………....…………44

5.3.1 “Missing Data Points” ………....………44

5.3.2 Self-reported Affect ………....………45 5.4 Future Directions …………...……….………46 5.5 Contribution...47 6. Conclusion………49 References……….50 Appendices………53 Appendix A ………53 Appendix B ………55

(9)
(10)

1

1) Introduction

“Choices are the hinges of destiny” - Edwin Markham

Decisions shape the course of our lives. The choices we make greatly influence our health, relationships, lifestyle, and overall quality of living. Given that decision-making is such a critical tool of cognition it is important to understand how it functions. Why do people choose the things they do? What factors influence the decision-making process?

The overall literature strongly suggests that emotions play an important role in

decision-making. Emotional states provide us with information about what objects in the environment are desirable and motivate us to pursue them. This paper aims to more closely explain the interaction between emotions and choice by combining two existing theories into a coherent framework. The framework proposes that affective reactions to events shape future decisions by teaching people to act in ways that increase pleasant emotions. Results from a study testing this approach are presented, providing initial support for its validity.

(11)

2 2) Theoretical Background

This section outlines the main theoretical concepts related to this work and research. I briefly introduce the core aspects of decision-making, followed by a deeper explanation of learning and affect and how they fit together in a larger framework.

2.1 Decision-Making:

Decision-making is a critical cognitive process that significantly influences our lives. The food we eat, the partner we date, and the job we work are all dependent on the choices we make. Essentially, decision-making is a process by which a person identifies and chooses a particular action among several options: Should I take the bus or ride the train? Dinner at the restaurant or take-out? Needless to say there are many factors that affect decision-making, and there is no shortage of theories aiming to explain this phenomenon.

Research since the 1950s provides evidence that many psychological processes play a role in explaining the choices people make, for example attention, memory, and learning (Weber and Johnson, 2009). In recent decades automatic and affective processes have received an increase in attention, giving rise to dual-process theories of

decision-making. These theories suggest that people have a deliberate, analytic part that makes decisions together with a more automatic and emotionally driven process (Gawronski and Creighton, 2013).

In this research, I am mainly interested in the role that learning and affect play in decision-making under uncertainty. Previous experience with a situation has a large impact on future choices, helping people avoid past mistakes and repeat favorable results. By exploring and learning from the environment, people acquire knowledge that influences the decision-making process. Affective processes also provide valuable information about what decision is likely to be preferable, as well as motivating people to pursue this option (Schwarz, 2011).

(12)

3 As will be explored, learning and affect are tightly linked, and it is possible to connect these processes into a coherent theoretical framework. The following sections explain this framework and how it relates to decision-making by introducing each concept separately and subsequently bringing them together.

2.2 Learning

A huge topic within psychology, learning can be broadly defined as a persistent change in behavior or mental representation as a result of experience. For instance, burning your hand changes your behavior in the future to avoid touching hot stoves, while attending math class changes the way you mentally represent numbers and symbols. Oftentimes changes in behavior and internal representation are interdependent and cannot be wholly separated. By explaining the location of a coffee shop, a friend can update their internal map of a neighborhood which enables them to physically find their way there. Though there are many kinds of learning, the following sections will focus on those that are relevant to this research.

2.2.1 From Conditioning to Reinforcement Learning - A Brief History

Few psychologists have had as much impact as Ivan Pavlov, whose experiments on salivating dogs gave rise to the foundational idea of conditioning (Clark, 2004). At its core, conditioning refers to a learning process in which a given stimulus increasingly predictably evokes a specific response. This is a form of associative learning in which agents learn about regularities in the environment which enable them to respond in specific ways.

For instance, after pairing the sound of a bell with the arrival of food shortly thereafter, Pavlov's dogs started salivating in response to the sound of the bell - they had been

conditioned to expect food after a specific stimulus. In short, the dogs had learned

something: they exhibited a persistent change in behavior (salivating) as a result of experience (observing the pairing of bell and food several times).

(13)

4 The idea of conditioning became very influential in psychology, especially within the school of behaviorism. B.F. Skinner, as a prominent example, conducted experiments on rats in order to more closely understand this concept (Berridge, 2000). He discovered that certain responses from the environment would greatly affect the behavior of rats. For example, if pushing a lever in the rat box produced a food pellet, rats would push this lever at a much higher ratio than if the lever did not produce a food pellet. This was referred to as a reinforcer - an environmental response that increases the probability of repeating that behavior.

This is contrasted to punishers, which are environmental responses that decrease the probability of repeating that behavior. If an action would regularly trigger a shock, rats learned to avoid doing this behavior. As can be seen, rats are conditioned to respond in certain ways dependent on regularities they experience and observe in the environment.

This type of learning is known as operant (or instrumental) conditioning, in which the

actions of agents are what is being reinforced or punished by environmental stimuli.

When rats act in certain ways - pushing levers or drinking from a particular bottle - it is this action that is more or less likely to be repeated depending on the response it evokes.

Importantly, Skinner did not advocate or propose a psychological construct or theoretical framework for why or how this process works. Why are certain things reinforcing while others are punishing? What is the internal representation of the environment that allows agents to alter their behavior? These questions, which behaviorism was unfit to answer, are critical to this research and will be explored in subsequent sections.

Over the years the idea of operant conditioning was developed and refined, and in

recent decades reinforcement learning has evolved as a successful method for modeling this kind of learning. Stemming from the field of computer science, reinforcement

learning (RL) involves computational algorithms that calculate how an agent should act in order to maximize the amount of reward in a given environment (Kaelbling et al. 1996). (The notion of reward is similar to a “reinforcer” and is explained shortly). At a basic level, agents learn to approach and commit to actions that increase reward, while avoiding actions that decrease reward.

(14)

5

This is exactly the problem rats learn to solve in Skinner’s box - what levers should they push or avoid in order to get as many food pellets as possible without getting a shock.

One reason for the success of reinforcement learning is due to its increasingly sophisticated algorithms that can explain complex behaviors and problems. For instance, model-based reinforcement learning uses algorithms that model internal representations of agents and calculates how these states affect decision-making.

This kind of modeling is capable of explaining behaviors of animals that the behaviorist stimulus-response model is unable to account for (Dayan & Berridge, 2014).

Reinforcement learning is also capable of handling the credit-assignment problem, which regards how to assign rewards properly when a series of actions results in a reward (Maia, 2009).

Not only is RL a computationally powerful way of accounting for a wide range of

behaviors, there are also biological reasons as to why reinforcement learning may be an actual learning mechanism for many organisms. Firstly, evolutionary biology provides a cogent explanation for why certain actions and outcomes are more or less rewarding. Additionally, the computations proposed by reinforcement algorithms map on very closely to neural processing in the brain of animals and humans. The next section will explain this in further detail.

2.2.2. Reinforcement Learning and the Brain

As stated, the core problem that reinforcement learning attempts to solve is how agents maximize the amount of reward in a given environment. Reward can be seen as an abstract value attached to certain objects, actions, or outcomes, allowing agents to compare the outcome of different actions and choose the one with the highest value. Dependent on the task and environment, reward value can be associated with different elements. In a game of chess, a good RL algorithm will assign higher reward values to moves that increase the likelihood of winning the game, while an algorithm for betting on the stock-market will attach high value to actions that are likely to earn more money. In this way, RL guides agents in their environment toward outcomes with high value and away from actions that result in low reward value.

(15)

6

In order to better understand how RL is related to human and animal decision-making, it is helpful to look at it from an evolutionary perspective. In evolutionary terms, rewards are those outcomes in the environment that increase the chance of survival and

reproduction; organisms that are better able to seek out these rewards are more likely to pass on these strategies to future generations. As Schultz (2015) succinctly explains about rewards, “Their function is to make us eat, drink, and mate. Species with brains that allow them to get better rewards will win in evolution.” This explains why outcomes such as food pellets are reinforcing for rats, while getting shocked is punishing -

consuming nutrition is more evolutionarily advantageous than receiving a shock. Seen from this perspective, it should be clear that organisms fighting for survival are in many ways also attempting to solve the problem of reinforcement learning by

maximizing amount of evolutionarily advantageous outcomes in the environment.

The question remains as to how this process of seeking rewards is carried out in

animals and humans. Just because evolutionary pressures present a somewhat similar challenge to RL does not mean that the solutions found in RL algorithms are

meaningfully similar to how organisms solve the problem of maximizing reward. Interestingly, decades of research have indicated that there indeed is a close

relationship between RL and the nervous system. At a basic level, brain regions that react to primary rewards such as food, water, and sex have been found in a wide range of animals (Loonen & Ivanova, 2016).

This “reward system” in the brain has the function of guiding organisms toward outcomes the result in passing on genes to the next generation. In terms of RL, an evolutionarily successful reward system is one which attaches high value to actions necessary for gene survival (those leading to nutritious food and mating opportunities), while attaching lower value to actions that lead to danger (such as approaching

predators or eating poisonous foods). Just like a chess computer navigates a game and chooses the highest value action in order to win, animals navigate the natural

(16)

7

Of course, simply having brain regions that activate in the face of immediate benefit or danger is not sufficient evidence that reinforcement learning is the mechanism actually at play when people learn about the environment and make decisions. As will be shown in the following section, the relationship between RL algorithms and neural activity is significantly deeper than this.

2.2.3 Neural Processing of Learning Signals

This section outlines a well-established theory of how the brain’s reward system instantiates RL algorithms. In one type of reinforcement learning, Temporal Difference Learning, agents make predictions about what value they will receive if they commit a certain action (Niv, 2009). The idea of prediction is critical; by predicting the outcome of various actions, agents can then choose the action they predict to have the highest value. Temporal difference learning adds an extra dimension to this idea by introducing the concept of prediction error. A prediction error signals to the agent that there is a mismatch between the anticipated value of an action and the actual outcome received. If you think a move in blackjack will earn you 10 dollars and you instead win 15, this is a positive prediction error - the actual outcome had higher value than the model predicted.

Positive prediction errors suggest that this kind of move should be assigned a higher reward value and thus be more likely to be repeated in the future - having an

unexpectedly good meal makes you more likely to return to that restaurant. As a corollary, negative prediction errors (when the actual outcome has a lower value than the model predicted) update future predictions of this action to be lower and thus less likely to be repeated. In this way, temporal difference learning allows agents to learn from “mistakes” and adjust their internal model to more accurately represent actual reward value. By continuously interacting with the environment agents gradually refine their predictions, enabling them to more effectively maximize reward.

(17)

8

One reason temporal difference learning has become very influential in cognitive

neuroscience is that a large and robust body of research has found a tight link between prediction error signals and dopaminergic activity. Though dopamine has long been associated with rewarding stimuli, seminal work by Schultz et al. (1997) showed that dopaminergic neurons do not simply signal motivational value, but that their firing pattern actually resembles prediction error signals.

Their experiment was conducted as an instrumental conditioning task in which monkeys were conditioned to pull a lever after seeing a visual stimulus in order to receive a rewarding sip of fruit juice. In early trials, neural activity spikes immediately following the sip of juice. This has been interpreted as a positive prediction error - the monkey has no reason to expect a reward following pulling the lever, so he is positively surprised by the sugary liquid.

Over the course of many trials, however, the neural activity changes. After conditioning, the neural spike shifts from occurring after the sip of juice to occurring after the light stimulus. The sip of juice (reward) no longer elicits a dopamine response, while the flashing light (conditioned stimulus) causes an increase of dopaminergic activity. This spike of neural activity acts as a prediction that a reward is soon to follow, motivating the monkey to retrieve it.

This is exactly the computational pattern suggested by temporal difference learning. A reward that is larger than expected produces a positive prediction error, signaled by dopaminergic neurons upon receiving a surprising sip of fruit juice. After conditioning, the signal occurs at the time of the predictive stimuli, signaling to the agent that there is a likely reward in the environment.

(18)

9

Figure 1: Depicts how dopaminergic neurons encodes prediction errors in learning tasks. The red line represents activity of dopaminergic neurons.

Before learning, an increase in dopamine activity is observed following reward - positive prediction error. (Top part)

After repetition, when the reward is anticipated, increase in dopamine follows after the predictive stimulus. Since reward is accurately predicted, no dopamine spike occurs after

reward. (Middle part)

After conditioning, when omitting reward, a decrease in dopaminergic activity is observed at the point in time when the predicted reward was expected - negative prediction error. (Bottom part)

(19)

10

This experiment shows the multifaceted role that dopamine and prediction errors play in shaping behavior. Rather than simply being a “pleasure-chemical”, it is more accurate to

think of dopamine as a learning and motivational signal. Dopamine reacts to unexpected rewards in the environment, informing agents about their value, and also learns about predictive stimuli that regularly precede the rewards (Bromberg-Martin et al., 2010). It is this basic pattern that allows organisms to connect various sensory input (be it the visual appearance of a cookie or the smell of rotten food) with an associated reward value, providing information about how motivated the agent should be to approach or avoid this object.

This process is likely due to the effect dopamine has on neuroplasticity. Essentially, dopamine acts as a sort of “glue” that binds active neural connections closer together so that they are more likely to trigger in the future (Glimcher, 2011). In short, the brain’s activation pattern at the time of dopamine release is strengthened. When eating a cookie, the neural representation of the cookie - its visual appearance, taste, semantic category etc. - gets reinforced through the spike of dopamine rushing through the system. When this neural pattern is activated by future encounters with cookies, it is more likely to trigger the action of consuming this reward. In this way, dopamine binds together the process of learning about rewards with motivating agents to collect them in the future. Very likely, this is the neural basis of reinforcement learning.

2.2.4 A deeper perspective:

So far, reinforcement learning has been investigated at a computational level in which either computer programs or neural processing instantiates these computations. One limitation to this approach is the fact that computations are fundamentally abstract - the numbers and values that go into the equations are just computer code or various

patterns of brain activity. This is sufficient if the goal is to simply solve a problem or explain behavior, but in this case we are interested in finding out the motivating factors of actual living organisms making decisions. What does a prediction value actually refer to for a human being? What psychological construct are dopamine neurons coding? What exactly is the “reward” animals try to maximize?

(20)

11

Fundamentally, reinforcement learning and related computational models boil down to Thorndike’s law of effect - “responses that produce a satisfying effect are more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation” (Gray & Bjorklund, 2018). The language used by Thorndike, referring to satisfying or discomforting effects, suggests that the rewards sought by agents are related to their emotional state. In this sense, a reward is

something that results in positive emotions, such as pleasure and satisfaction. Inversely, negative reward values are those that induce negative emotions, such as pain and anger.

Thus, reward values ultimately signal the predicted affective state of taking certain actions. Pleasant experiences teach agents that whatever action or circumstance led to them is good and should be repeated in the future, while unpleasant experiences teach agents to avoid those actions in the future. Choosing to engage in activities with positive reward values simply means that an agent anticipates experiencing positive affect. In this way, what agents attempt to maximize by their decisions is their affective state; the content of reward values refers to emotional impact. As will be explored further, reward value is a concept that ties together learning and emotional cognitive processes.

2.3 Affect:

Affect refers to the feeling experience of an emotion or mood. This section presents two key aspects of affect that are essential for the research - the valence-arousal model, as well as a distinction between integral and incidental affect.

2.3.1 Valence-Arousal Model

An influential model in affective psychology models affective states along two

dimensions - valence and arousal (Posner et al. 2005). Valence refers to the hedonic quality of the affective state, whether it is positive or negative. Positive valence refers to generally pleasant and desirable states, such as joy and cheerfulness, while negative valence refers to unpleasant states, such as sadness and anger.

(21)

12

Arousal, on the other hand, refers to the intensity or “activation” of the experienced emotion. Melancholy and despair are both negatively valenced, but most would agree there is a difference of intensity between them; this difference in intensity is modeled by the degree of arousal. (Of course, the model does not account for the full richness of the human emotional experience; despair and melancholy differ on more accounts than simply intensity. On the whole, however, the model offers a parsimonious way of mapping the affective landscape.)

The model has gotten significant empirical support, with research finding correlations between valence-arousal and cognition at various levels of processing. For example, valence and arousal affect psychological processes such as memory (Gerber et al., 2008); there is evidence of physiological measures that correlate with valence and affect (Brainerd et al., 2010); there is support for a neural basis that codes for these two

dimensions (Gomez and Danuser, 2004).

This model is especially useful for our purposes because it maps onto the computational framework in a rather straightforward way. Positively valenced emotions correspond with positive reward values (approach-behavior) while negatively valenced emotions refer to negative reward values (avoidance-behavior). Arousal, on the other hand, refer to the magnitude of the reward. Intensely pleasurable experiences (high arousal) have a higher reward value than mildly stimulating pleasures (low arousal), and vice versa for

negatively valenced emotions. In this way, the two-dimensional space of the Valence-Arousal model is a way of translating the abstract reward model into human psychology.

(22)

13

Figure 2: The Circumplex Model of Affect

Shows one way of imaging the valence-arousal theory of emotion. Left-right indicates the valence dimension of pleasant-unpleasant while down-up indicates the arousal

(23)

14

2.3.2 Integral and Incidental Affect:

A last theoretical construct necessary for framing our study is the difference between integral and incidental affect. As stated, environmental stimuli affect the affective state of an agent, which in turn influences the decision-making process. In general, research is focused solely on the emotions elicited by the objects directly related to making a decision. For instance, how does seeing an ice cream make you feel and how likely does it make you to eat it? Affect that is directly caused by the object of decision-making is referred to as integral affect - “affective influences that result from consideration of the decision or judgmental target itself.” (Västfjäll et al., 2016) Thus, my avoidance of rotten food has to do with the negative integral affect induced by a nasty smell.

In real life, however, we rarely make decisions in a vacuum and are subject to many influences other than those from the target object. Our emotional state is affected by any number of factors, relating to how much we slept, what we had for breakfast, whether it is raining, and so on. When considering whether to buy an ice cream, the fact that you are having a bad day likely influences your decision, even though your mood has nothing to do with the ice cream.

All affective states caused by factors unrelated to the decision target are known as incidental. One can think of incidental affect as “background” emotion, such as mood or priming. Importantly, incidental affect has been shown to have a significant influence on decision-making. As an example, mood influenced by reading a newspaper article influenced later risk judgments (Loewenstein et al., 2001).

On the whole, both integral and incidental affect interact in various ways in the decision-making process. As Västfjäll et al. (2016) suggest, much research fails to look into this connection and study one or the other, or in some cases fails to make the distinction at all. The following research is an attempt to bridge this gap by analyzing the effects of both integral as well as incidental affect.

(24)

15

2.4 Additional Relevant Concepts

This section introduces three additional concepts necessary to understand the research. The concepts provide further depth to different types of processing and strategies

employed during learning and decision-making.

2.4.1 Explicit vs. Implicit Learning:

The key difference between implicit and explicit learning is whether the learner is conscious of the learning process and the acquired knowledge. A person actively engaged in learning new vocabulary is explicitly learning the meaning of new words - they are conscious of the learning process and can verbally explain their new knowledge in a declarative way. Implicit learning is a comparatively passive process that occurs automatically as a result of repeated exposure or repetition of a task; one can learn to ride a bike without being able to explain exactly what one is doing.

A good example that illustrates the explicit-implicit distinction is language. Native

speakers can effortlessly speak grammatically correct regardless if they can explain why certain word orders are correct; they have learned the regularities of the language

implicitly without explicit knowledge of grammatical structural rules (Yang & Li, 2012). This pattern highlights the important finding that people can learn and act based on implicit knowledge without being explicitly conscious about the reason for their action - “we know more than we can tell” (Bierman et al. 2005). Other studies have also found this pattern, in which task performance on learning tasks increases before participants are able to state exactly how they solve the problem (Sun et al. 2001). In other words, people are able to learn about underlying patterns or regularities in the world and act accordingly without being consciously aware of this.

The implicit/explicit distinction maps on fairly well to two experimental paradigms used within decision-making research - “learning by description” and “learning by experience” (Wulff et al., 2018). Learning by description means that participants learn about a

decision situation through explicit description of the choices by using language or other abstract symbols. For example, one may simply write out the outcomes associated with various actions, or sketch out a diagram, from which participants make their decision.

(25)

16 In learning from experience, however, participants are not explicitly shown information about the problem space, but have to learn about it through directly engaging with the task. For instance, they may sample different options many times in order to learn about action-outcome pairings.

Importantly, whether people learn about a choice situation explicitly or implicitly reliably alters the decisions they make. Even if objective values of the choices are the same, people will behave differently depending on if they learn from description or from experience - this difference in behavior is known as the “description-experience gap” (Hertwig & Erev, 2009). For instance, when learning from description, people tend to overweight the importance of rare events (behaving as if the event is more likely than it in fact it), while when learning from experience the opposite pattern is observed (they behave as if the rare event is less likely to occur than it in fact is).

2.4.2 Dual-Process Theories:

In recent decades, dual-process theories of decision-making have become increasingly influential. Dual-process theories propose that there are two types of psychological processes that influence decision-making (Evans, 2008). One of these (referred to as “System 1”) is mostly unconscious, driven by emotion, and works quickly. This system is largely responsible for our instinctive reactions, habits, and other behaviors we do not control deliberatively. In addition, this system is rather effortless and requires

comparatively little cognitive resources.

The other process (referred to as “System 2”) is more conscious, slower, and driven by reasoning. System 2 processes are active when, for example, planning or analyzing the best course of action rather than relying on intuition. Compared with System 1, these processes are effortful and require more cognitive resources. To understand the relation between these two modes of processing one can imagine the instinctive pull of System 1 to eat a cookie, while System 2 attempts to reason that you are after all on a diet.

(26)

17 Of course, this neat distinction is a simplified conceptualization, and it is more accurate to think of these two systems as representing basic characteristics of many different psychological processes. As Bavel et al. explain well, “Although these models [dual-process] serve as a useful heuristic for characterizing the human mind, recent

developments in social and cognitive neuroscience suggest that the human evaluative system, like most of cognition, is widely distributed and highly dynamic”. (Bavel et al. 2012).

In practice, it is unlikely that evaluation and decisions are based wholly on one process that is accurately described by system 1 or 2. Actual decision-making is driven by many various processes (associated with one of the two systems) that interact in complex ways.

Lee et al. (2014) present a framework for how this interaction works which is especially useful for this research. They suggest that the two systems can be thought of as

opposite poles of the same spectrum, and that both types of processing provide input into the decision-making process. Importantly, the relative weighting of the input shifts depending on context. Sometimes System 1 is mostly involved, and at other times System 2 has more influence. Overall, their model suggests that whichever system most reliably predicts the environment (and thus can choose the best course of action)

receives more influence in the decision-making process.

In general, this means that system 1 will have more influence when we are in familiar environments and performing well-established habits. Tying our shoes or driving to work require little “analytic reasoning”, as we can simply rely on intuitive and automatic

processes. However, when the environment presents novel challenges and automatic processing is not sufficient to handle the situation (perhaps the usual road to work is closed and one needs to plan a detour), system 2 processes get increasingly more involved. This weighing of how much relative influence each system provides changes dynamically depending on the demands of the environment.

(27)

18 In short, their theory states that the amount of uncertainty or novelty in a situation determines the relative influence of the two systems. This framework proves especially useful for our research as it provides a valuable account of how dual-process theory works in a learning context. In initial stages of learning, when the environment is still uncertain and people perform relatively poorly, decision-making is largely driven by system 2 processes trying to “figure out” and analyze the situation to get the best outcome. Over time, however, as people learn about the domain and have more experience, more of decision-making can be allocated to the automatic and intuitive processes of system 1. As habits and intuition become reliable enough to make satisfying decisions, these kinds of processes will gain more influence.

In this way, people will generally use more deliberate and effortful cognitive processing to explore and learn about a situation until they automatize satisfying responses which they run habitually.

2.4.3 Exploration vs. Exploitation

A final concept that is important for this paper is the exploration vs. exploitation dilemma. This problem is well-known within reinforcement learning as well as psychology, and refers to different strategies agents use in order to maximize reward (Berger-Tal et al. 2014). Exploration is the process of exploring new alternatives and behavioral strategies to gather new information about what actions can be rewarding. One may try a new route to work to see if it’s quicker, or try a different restaurant to see what the food is like. Exploitation, on the other hand, is the process of utilizing available information to make the best decision: one goes to a familiar barber to get the same haircut.

Exploration vs. exploitation becomes a “problem” in that it presents a trade-off for how an agent should allocate their time and effort. Spending time in exploration means that one can find better alternatives and gather valuable information, such as finding a stylist that provides a better haircut. However, one also risks wasting time on something that is not fruitful, for example by trying out a terrible restaurant.

(28)

19 Spending more time in exploitation provides “safe” and predictable outcomes, but one also forgoes the opportunity to learn about potentially better rewards and more effective strategies. By going to the same coffee shop every day, I might miss out on better coffee from another café.

The exploration vs. exploitation dilemma is relevant to this work in that decision-making strategies are reliant on the same trade-off. How much time should one spend exploring the problem and learn about uncertain outcomes? At what point is it worthwhile to “settle” and simply use current knowledge to make the best decision? As can be seen through this lens, there is a close interaction between learning and decision-making. By exploring various choices, one learns about the available alternatives and makes

(29)

20 3) The Research

This section provides an overview of all parts of the research. I introduce the research questions and experimental paradigm, followed by a detailed description of the study.

3.1 Goals and Questions

The overarching goal of this research is to connect theories of learning and affect into a coherent framework which can be used to understand decision-making under

uncertainty. Specifically, the research investigates whether prediction errors of

reinforcement learning models map onto affective experiences of people, and whether this type of learning accurately models the decisions people make. Do prediction errors have an affective component experienced by the agent? Is positive/negative affect what conditions approach/avoid behavior? How do agents learn to estimate rewards and make decisions to maximize value? The aim of the current experiment is to provide data on whether rewards induce affective responses, and if this information helps participants in improving decision-making.

Ultimately, data from the experiment can help in a long-term goal of developing a computational model that predicts agents’ decisions based on affect as a core component. This model could prove a potent way of answering these questions and understanding decision-making, as it combines rigorous computational algorithms with emotional processes. The proposed model makes predictions about how agents would act in particular situations - what they learn, how they feel, and how this influences their choices. Collecting empirical evidence and comparing this to the predictions of the model provides evidence for whether there is support for this framework and if it is a fruitful avenue for further investigation.

(30)

21 Special areas of interest regard investigating the role of incidental and integral affect, aiming to fill the gap of previous research which rarely combines the influence of both processes. Does incidental affect influence decision-making? What is the relative degree of influence between integral and incidental affect?

In addition, the research also looks into the distinction between explicit and implicit learning. How much of operant conditioning and decision-making is implicit or explicit? Do people make their choices with a conscious awareness or do they only “sense” what decision they should choose? The next section describes the experimental paradigm and how it is fit to answer these questions.

3.2 Paradigm

The experiment uses a gambling-paradigm in which participants make relatively uncertain choices in order to maximize monetary reward. This type of paradigm is commonly used and well-established in decision-making research, providing the benefit of being well-tested and giving a clear space for our research to fit in with previous literature on the subject. One important difference, however, is that this research uses a learning from experience paradigm, in which participants learn about gambling values through direct experience.

There are several motivations for using this experimental paradigm. Firstly, it is rather underexplored in recent literature which mostly focuses on learning from description. For this reason, the experiment could provide valuable information and empirical data to fill a gap in the currently available literature.

Additionally, a learning from experience paradigm allows us to investigate and model how participants learn about various rewards and how this influences their decisions. The research goal of understanding the interaction between implicit and explicit learning is only possible in this type of paradigm.

(31)

22 3.3 Methods:

3.3.1 Participants

We recruited 95 individuals who participated in the study. 10 of these participants were excluded because their results came from the same IP address, potentially causing the problem that these results came from the same person. The demographics of the remaining 85 participants were 57 male, 26 female and 2 non-binary (mean age = 26.4, sd = 7.14). The study was run online using Inqusit (Inqusit version 5, 2016). Participants were recruited through a university participant pool at Linköping University. The study was conducted according to ethical standards in the Declaration of Helsinki and was approved by a regional ethics committee.

Participants were compensated after the study based on task performance. Each

participant started with a 75 SEK (approx. $7.5) and made repeated choices that led to a gain or a loss of 3kr. At the end they will be paid the total amount they end up earning throughout the study (average compensation, 96.6kr, sd = 42.5kr).

The fact that performance in the study alters participation fee is significant, since this is intended to incentivize participants to do their best and really try to maximize their earnings. Importantly, due to public health and logistical issues at the time of the study, participation fees could not be paid out until some months after the study was

conducted. Potential consequences due to this will be expanded on in the “Discussions” section.

3.3.2 Experimental Procedure

Participants, after reading instructions, completed four blocks: three “learning blocks” and one “choice block”. Each learning block consisted of 36 trials which involved making a choice between two distinct “shapes” (see fig. 2). Each block had three unique shapes that were paired against each other 12 times (A vs B, B vs C, A vs C) resulting in the 36 trials. Each shape equals a unique probability of winning or losing 3 SEK (roughly $0.3), giving each shape a distinct “value” associated with it.

(32)

23 The likelihoods of winning were 30%, 50%, and 70%, and will be referred to as low, medium, and high value respectively. After each choice, participants received feedback on whether they won the gamble or not (see fig. 3). If participants did not make a choice within 4 seconds the gamble was discounted and the experiment moves to the next gamble.

Figure 3 - Depicts basic structure of learning blocks. Participants are shown two distinct shapes and make a choice between them. Immediately following the decision they

(33)

24 Participants were instructed to choose shapes in order to maximize their winnings, which they were also motivated to do since their participation fee was affected by performance. Each of the three learning blocks used different shapes, for a total of nine different shapes. The value (low, medium, high) assigned to each shape was randomized for each participant to minimize any potential effect that shape may have on value estimation. Likewise, the combination of shapes in each block was randomized, i.e. different shapes were paired against each other for each participant.

Additionally, in the learning block, a Wheel of Fortune lottery was drawn every 4 trials. Participants were told that if, at the end of the experiment, the total number of wins was higher than the total number of losses then they would earn an additional 30kr.

Otherwise they would lose 30kr. Participants had no influence on the outcome of this draw, and the outcome was immediately available to participants as feedback (see fig. 4). Importantly, the feedback from wheel of fortune (WoF) draws contained no

information that could affect participants in their decision-making on choice trials.

Figure 4 - Wheel of fortune draws that happen every 4 trials in learning blocks. Participants see a spinning wheel followed by feedback regarding the outcome of the

draw. WoF draws induce different incidental affect for each block dependent on win probability.

(34)

25

The purpose of including WoF draws in learning block was to induce different incidental affect. The three learnings block had a different likelihood that a WoF would result in a win, intended to produce different affective responses in participants. The three

conditions were a low-probability block (25% chance of win), a medium-probability block (50% chance of win), and a high-probability block (75% chance of win). The order of these blocks was randomized for each participant.

Lastly, in the learning block, participants were also prompted to rate their momentary affective experience every 3-5 trials. Participants rated their affective experience on visual analog scales of valence (unpleasant to pleasant) and arousal (sleepiness to high activation). The values of the scale ranged from 0 to 1. The prompt to rate their affect could appear after receiving feedback on a decision as well as after receiving feedback from a WoF draw. Finally, each learning block ended with participants estimating the value of each shape. On a slider, they were asked to estimate the probability of winning for each shape present in that block.

In addition to the three learning blocks, each experiment ended with a “choice” block. The choice block also asked participants to make choices between pairings of two shapes, but included all nine shapes. We formed specific test trials in which two shapes with the same objective value from different blocks were the choice alternatives (9

possible pairs, each repeated twice). We also included 18 trial, where shapes of different value from different blocks were the choice alternatives.

Importantly, the choice block did not provide participants with feedback, did not include WoF draws, nor did it ask for participant affect. Participants simply made choices based on their perceived value of each shape, trying to pick the highest valued option in each trial.

(35)

26

3.3.3 Data Analysis

All data was analyzed in R (3.6.1), primarily using the aov and lm functions as well as the dplyr library. Additional functions and libraries used are referenced in the

“Reference” section.

A first descriptive analysis looks at performance over time. Performance is measured by whether participants choose the correct option in a particular trial, meaning if they

choose the highest-value option available. To see performance over time, a mean performance score based on all participants was calculated for each trial. Dependent variable is thus performance (correct or not) and independent variable is trial number. The hypothesis is that mean performance should increase with later trials.

A second analysis also looks at performance over time using a Friedman test, the non-parametric version of a one-way repeated measures Anova. In this case, learning blocks are divided into four “quartiles” (Trials 1-9, 10-18, 19-27, 28-36), to see whether there are significant differences in performance between these quartiles. The dependent variable is performance (correct or not) and independent variable are the four levels of “quartile”.

The hypothesis is that later quartiles will have significantly higher performance scores. It is not evident if there will be significant differences between all quartiles, which depends on learning rate and ability. Perhaps participants reach “peak-performance” after 10 trials, in which case later quartiles will have similar scores. Alternatively, it may take participants many trials to learn about option-values, meaning that early quartiles will have similar scores.

Another analysis is carried out to compare the value-judgments participant make of shapes with the actual value of that shape. A one-way repeated measures anova is used: participant rating is dependent variable, and the three levels of value (low, medium, high) as independent variable. The hypothesis is that there will be significant differences in judgments between the various value-options, with higher judgments corresponding to the high-value option and so forth.

(36)

27 It is not evident that all value-options will be significantly different from each other. A possibility is that we will only see a significant difference between the high-value and low-value option, and that the mid-value option is too close for participants to be able to make proper distinctions. In this case it is interesting to see whether the mid-value option consistently is over- or underrated (i.e. closer to the high-value or low-value option).

A fourth analysis investigates affect as dependent on outcome and wheel of fortune draws. The analysis includes two linear regressions, one for each of the two affect measurements: valence and arousal. Independent variable is the outcome of particular trials (win or lose 3kr) as well as outcome of WoF draw (win or lose 30kr). The

hypothesis is that valence will correlate positively with outcome - better outcomes (higher earnings) means more positive valence and vice versa.

The hypothesis for arousal is not so straightforward as it depends on experienced intensity, and it is difficult to predict whether positive or negative outcomes will give rise to more activation in participants. Since all blocks contain equal sizes of gains and losses it is possible that there will be no significant differences, or possibly that we will find some unpredictable interaction effect.

Lastly, an analysis is carried out investigating the results from the final choice block. Performance (whether they choose the highest-value option) is dependent variable to see accuracy of participants. Interestingly, in the choice block, shapes of the same value are sometimes pitted against each other. An explorative analysis investigates whether participants consistently choose values that appeared in particular WoF blocks. Are shapes that appear in high-probability win WoF blocks preferable to other shapes?

It is difficult to make a hypothesis about this since there are many factors at play. One potential prediction is that high incidental affect will positively influence evaluation so that shapes from high-probability WoF blocks are considered higher value and will be chosen more often. It is also likely that we will not find a difference in preference

between shapes with the same win probability, since if participants evaluate shapes accurately this will result in an even distribution.

(37)

28 4) Results

All results from the experiment are presented in this section. They are divided into two parts - one part displaying results related to task performance and learning, and the other part showing results from affective measurements.

4.1 Learning and Performance

There were three tasks related to learning in the experiment: Learning about shape values in the learning block; estimating shape value after each learning block; and choosing higher-valued shape in the final choice block. Results from participant performance on these tasks are presented in the following sections.

4.1.1 Learning Blocks

A first analysis investigated how learning occurs by measuring performance over time. Performance is measured by whether participants chose the “correct” option; if they chose the shape with the highest probability of winning. Differences in performance over time imply participants are learning, since behavior changes due to experience. Fig. 5 shows mean performance for all participants for each trial.

As can be seen, performance increases overall as participants perform more trials. Importantly, trials in which participants did not make a choice have been omitted. This omission is due to a disproportionate number of participants (26%) failing to respond on the first trial of each block, skewing the data. Presumably, participants were not quite ready to start the experiment and were slow or confused on the very first trial. Before omitting these data points, the average correct for the first trial was suspiciously low at 34%, when it should be closer to 50% (since participants have a 1/2 chance of choosing the correct option).

(38)

29

Figure 5 - Performance Over Time - X-axis shows trial number while Y-axis shows mean correct responses per trial. Higher values indicate better performance.

(39)

30 Another analysis investigated performance and learning by dividing up each learning block into four quartiles and comparing performance on these. Table 1 presents the mean score for each quartile.

Table 1

Mean correct for each quartile in learning blocks.

Quartile Trials 1-9 Trials 10-18 Trials 19-27 Trials 28-36

Mean Correct 0.576 0.626 0.641 0.661

Note. Mean score for each quartile Learning Blocks.

As suggested by results from Fig. 5, one can see that performance is higher in later quartiles. In order to test for significance, I conducted a Friedman test comparing performance in the four quartiles. (This is a non-parametric version of a repeated measures Anova). The test reveals that differences in performance between quartiles are significant (X² (3) = 22.9, p < 0.0001).

A post-hoc analysis was conducted using a Nemenyi test, the results of which are presented in Table 2. Interestingly, the post-hoc analysis reveals that only the first quartile differs significantly from the others. This suggests that most of the learning and performance improvement demonstrated by participants occurs within the first half of each block, possibly within the first nine trials.

As an exploratory analysis, the data was divided into six sections (“sixtiles”). This yielded similar results in that only the first sixtile differed significantly from the others, suggesting an even quicker learning rate in which participants reach “peak performance” already after six trials. The full results of this analysis can be found in Appendix B.

(40)

31

(41)

32 Table 2

Differences in mean scores between all quartiles

Quartiles Compared Difference in Means p-value

2nd - 1st 0.05 0.13 3rd - 1st 0.065 0.0089 ** 4th - 1st 0.085 0.00003 *** 3rd - 2nd 0.015 0.76 4th - 2nd 0.035 0.08 4th - 3rd 0.02 0.5

Note. Nemenyi test to compare quartiles with each other. There were only significant differences between the 1st quartile with 3rd and 4th.

4.1.2 Estimating Value:

Another task performed by participants was when they estimated the probability that an individual shape would result in a win. Essentially, this asks participants for their explicit knowledge about the value of each shape. The dependent variable was the estimates provided by participants at the end of each block, and the independent variable was the actual value of the shape.

As can be seen in Table 3, participants did rather well on the task, clearly estimating different shapes at their correct relative values. Especially for medium and high value shapes participants did very well - guessing 0.517 for shapes whose probability is 50%, and guessing 0.68 on shapes whose value is 70% probability of winning. A repeated measures Anova indicates that differences between participant estimates for different values are significant (F(2, 6) = 107.8, p < 0.0001 ***).

(42)

33 Table 3

Participant estimates of shape value and standard error

Shape Win Probability Low (30%) Medium (50%) High (70%) Mean Estimate (Std. error) 0.393 (0.014) 0.517 (0.013) 0.680 (0.014)

Note. Mean participant estimates for probability of win for each of the three types of shapes. Standard error in parenthesis.

Tukey Honest Significant Difference method was used as a post-hoc analysis. Results from this analysis, presented in Table 4, reveals that all groups were significantly different from each other in the correct ordering: i.e low chance of win had the lowest estimate and so forth.

Table 4

Differences between participant estimates of shape values

Value Type Difference between Means p-value

High - Low 0.29 p < 0.0001

High - Medium 0.16 p < 0.0001

Medium - Low 0.12 p < 0.0001

Note. Differences in estimates between shapes with different values. Participant estimates were highest for shapes with the highest probability of winning, and lowest for shapes with the lowest

(43)

34

4.1.3 Final Choice Block:

Lastly, there was a final choice block in which all nine shapes from the learning blocks were compared with each other. In trials where differently valued shapes were pitted against each other, participants chose the highest valued shape 65% of the time.

Notably, this performance is very similar to how participants performed toward the end of learning blocks. To deepen this analysis I performed binomial analyses on each of the different value pairings - low vs. high, medium vs. high, and low vs. medium. In each case, there was a significant preference for the higher valued option.

Additionally, in the final choice block, shapes with equal probabilities of winning were compared with each other. For these shapes, I performed binomial analyses to see whether participants preferred shapes that occurred in blocks with low, medium, or high probability of winning on a WoF draw. There was no significant difference between choice and WoF block. Full results from these analyses can be found in the appendix.

4.2 Affect Results

The main analysis carried out on affective measurements were two linear regressions, one with either valence or arousal as dependent variable. The two independent

variables were outcome (win or loss on trials) as well as the type of wheel of fortune block (low, medium, or high probability of winning WoF draw). Fig. 7 and 8 show the results from the analyses. The full regression output is available in the appendix.

As can be seen, higher outcome results in higher valence ratings in all WoF blocks. The regression reveals that the effect of outcome on valence is significant (b = 0.32, t(2546) = 5.8, p < 0.0001). Additionally, valence ratings from the low-probability WoF block are significantly lower than those from the highprobability WoF block (b = 0.08, t(2546( = -6.8, p < 0.0001). Valence ratings between the medium probability and high probability WoF blocks are not significantly different from each other.

(44)

35

Figure 7 - Shows the effect of outcome on valence for each wheel of fortune probability block. The blue line shows valence ratings in low-probability (25% win chance) blocks, the green line for medium-probability (50% win chance) blocks, and red line shows valence ratings for

(45)

36

Figure 8 - Shows the effect of outcome on arousal for each wheel of fortune probability block. The blue line shows arousal ratings in low-probability (25% win chance) blocks, the green line

for medium-probability (50% win chance) blocks, and red line shows arousal ratings for high-probability (75% win chance) blocks.

Looking at arousal, we observe a very similar pattern with arousal ratings increasing with higher outcomes in blocks with low and medium incidental affect. Notably, there is an interaction effect in blocks with high incidental affect, where arousal decreases with higher outcomes. The linear regression indicates that outcome has a significant effect on arousal scores (b = 0.2, t(2546) = 3.15, p = 0.0016). Similarly to valence, only the low probability WoF block has significantly lower arousal ratings than high probability blocks (b = -0.055, t(2546) = -3.78, p = 0.00016). There is no significant difference in arousal ratings between medium and high probability WoF blocks.

(46)

37 5) Discussion

This section examines results from the experiment and explore what they mean for the research questions. Methodological limitations are mentioned, as well as the overall contribution of this study.

5.1 Learning and Performance

On the whole, the results from the learning tasks support our hypotheses. As can be seen in Fig. 5-6, participants do learn what shapes are more likely to result in a win and increasingly choose the best option over the course of each block. What is interesting to note is exactly how this learning occurs - how long it takes participants to increase performance, and how good they become at the task. One can note that performance initially increases rather quickly, and then reaches a plateau. Dividing the blocks into smaller “chunks” reveals that it takes no more than nine trials until learning tapers off and performance does not increase significantly. This pattern raises the question as to why participants stop improving after initial trials. How come they learn a lot initially and then plateau?

One interpretation of this is that participants simply reach “peak performance” quickly, and are unable to improve their performance past a certain point. Perhaps the task is rather difficult, and a mean score of 65% correct is as good as the average person performs on this kind of task. However, other data points presented in Table 3 and 4 contradict this interpretation. Looking at the estimates participants made of the likelihood that certain shapes would provide a win, it is clear that they had a good understanding of their relative value. Even though their guesses for “low” probability shapes was slightly off, participants could nevertheless differentiate between the three value types in the correct order.

If participants fully utilized this explicit knowledge they should be able to give the correct answer on all later trials, and performance should be much higher. Overall, it seems as if the explicit knowledge does not translate into better decision-making.

(47)

38 Interestingly, this is the opposite pattern than what has been found in previous studies mapping the interaction between explicit and implicit knowledge (Bechara et al. 2005). Usually, people are able to perform better than they can explain explicitly, but in this case participants actually know better than they perform.

This opens up the question of why participants do not use their acquired knowledge to further improve decision-making. This is a complex question with potentially many

answers, and I will present three explanations for why performance stagnates after initial trials: exploration vs. exploitation, dual-process theory, and lack of motivation/incentive.

One useful framework for interpreting the observed pattern is that of exploration vs. exploitation. As stated in section 2.4.3, exploration vs. exploitation is a common trade-off in reinforcement learning in which agents have to choose between exploring various alternatives (hoping to find better pay-offs but risk wasting their time and resources) or exploiting their current strategy by continuing with the same behaviors.

Changes in performance can be interpreted as a period of exploration, in which participants try various options to see what works or not. Since they try different strategies and choose different options, performance is likely to change. Stable performance, on the other hand, suggests exploitation, since participants are using/exploiting a strategy they are content to stick with.

Importantly, the exploration vs. exploitation framework does not imply that agents necessarily find the perfect or best strategy to solve a problem. Rather, agents will explore different options until they find a strategy that is “good enough”, in which the benefits of continuing to use the current strategy outweigh the risk of exploring alternative solutions.

Looking at the data through this lens provides the interpretation that the initial stage of performance increase is a period of exploration, in which participants look for the best options and strategy. The subsequent plateau is a period of exploitation in which participants continually use the same strategy they developed in the first trials. Even though participants learn enough to perform better, they seem to continue with a settled strategy.

(48)

39 Perhaps they are not confident enough in their knowledge to risk exploring alternatives, and thus stay with exploiting their initial decision-making process. To understand this at a deeper level it would be valuable to explore how estimates of shape value change over the course of trials and see how this affects choice. Unfortunately, we do not have the data to support such an analysis, something which will be covered more thoroughly in the “Limitations” section.

A complementary explanation for the observed performance can be found by looking at dual-process theories. As explained in section 2.4.2, the relative influence of the two systems is dependent on context. In familiar contexts, such as a repeated task,

automatic and intuitive processes are likely to drive most of the decision-making. On the other hand, a novel task which has not been performed many times will trigger more conscious and analytic reasoning in order to solve the problem.

In the case of this experiment, responding to 36 similar choice problems is certainly a context that could be increasingly automatized and driven by system 1 processes. As such, responses will remain similar as participants rely on habits to solve the problem (as can be seen on the plateauing performance).

The task of estimating shape value, however, occurs only three times at largely spaced intervals. Because this is a novel task and an unfamiliar context, it will likely trigger system 2 processing, and participants might be more reflective and deliberate about their action. In this way, it is possible for participants to know the correct values without applying this knowledge to their choice, since they apply different types of cognitive processes to solve the different tasks.

A final explanation for why participants settle in their performance even though they know better is lack of motivation. It is possible that participants simply do not care enough to expend their available cognitive efforts into this decision-making problem. Firstly, a gain or loss of 3 SEK ($0.3) is not a huge sum, and may not trigger large enough emotional reactions for participants to invest further into perfecting their decision-making process (the role of affect will be explored more in the following section). Perhaps larger sums at stake would have incentivized participants to make more careful selections.

References

Related documents

There was a small but significant increase in MBP at night in both dofetilide exposed offspring and controls of about 3.7% compared to before stress, but the differences

This case study examines the role of music and music-making for the youth in Brikama, the Gambia in terms of freedom of expression, sustainable development and social change.. The

Outside information studies, the thesis relates to re- search emanating from media and communication studies that focus on information seeking online among young people

The essential meaning of learning to live with long-term illness is constituted by the following elements: learning to know and live with a stranger, the driving forces of

First, it explored the benefits of expanding the existing cognitively-oriented definition of individual differences in decision-making competence (i.e., measured by performance on

For instance, if there exist two optimal policies with the same worst-case values for the Decision Maker (and thus she issues equal weights to them), then there are three ξ’s for

We are the first to report the resting glucose uptake values for PwMS regularly using cannabis. The results showed that cannabis users had greater uptake in several areas

We like to obtain depth rather than scope in our study and therefore we prefer the case study as a research strategy. The case study suits our purpose and research problem in a