The Accuracy of EFL Self-assessments made by Swedish Primary School Students

(1)

(2)

Abstract

The objective of this study was to examine the accuracy of Swedish primary school students’ EFL self-assessments. Self-assessment is an important component of formative assessment, which is the main tool used by Swedish primary school teachers when evaluating EFL proficiency. In the study, a survey and a vocabulary picture test were conducted. The survey consisted of Likert-scale statements and was used for students’ EFL self-assessments. It was followed by an English vocabulary picture test to evaluate students’ English vocabulary. To investigate whether there was a correlation between the results of students’ self-assessments and their results on the vocabulary picture tests, a Spearman’s rank correlation coefficient was produced. Furthermore, results were also analysed by individual, item, and gender. In conclusion, EFL self-assessments made by the students were found to have a weak correlation of 0.22 to their results on the English vocabulary picture test. Thus, the findings of this study indicate that young Primary School students’ EFL self-assessments should not be used as the sole evaluation of English proficiency.

Keywords: English as a foreign language, EFL, self-assessment, self-assessment accuracy, vocabulary, picture test, correlation, primary school, elementary school.

(3)

Table of Contents

1. Introduction ... 1

1.1 Aim and Research Question ... 2

1.2 Essay structure ... 2

2. Review of the Literature ... 3

2.1 Theoretical Framework ... 3

2.2 Assessment ... 5

2.2.1 Self-Assessment ... 6

2.3 The Accuracy of Students’ Self-Assessments ... 8

2.4 Vocabulary and Language Proficiency ... 15

2.4.1 Vocabulary Picture Tests ... 15

3. Methodology ... 17

3.1 Context and Participants ... 17

3.2 Materials ... 18

3.2.1 EFL Vocabulary Self-Assessment ... 18

3.2.2 Vocabulary Picture Test ... 20

3.3 Procedures ... 21

3.3.1 Piloting ... 21

3.3.2 Administrating the EFL Self-Assessment Survey ... 21

3.3.3 Conducting the Vocabulary Picture Test ... 22

3.4 Method Discussion – Before Study ... 22

3.4.3 Reliability and validity ... 24

3.5 Analysis ... 26

4. Results ... 27

5. Discussion ... 33

5.1 Do Swedish Primary School Students’ EFL Self-assessments Correlate with How the Students Perform on an English Vocabulary Picture Test? ... 33

5.2 Additional Findings ... 35

5.3 Method Discussion – After Study ... 35

5.3.1 Reliability and Validity ... 35

5.3.4 Ethical Principles ... 37

6. Conclusion ... 37

References ... 39

Appendix ... 41

(4)

1. Introduction

There are no proficiency criteria in English as a foreign language (EFL) during the first three years of Swedish primary school (Lundberg, 2016, p. 151). This makes formative assessment the main tool used by teachers for student evaluation. Formative assessment, and within it self- assessment, is mainly based on the critical perspective of language assessment (Dragemark Oscarson, 2009, p. 56). This perspective aims at giving students a more influential part in the assessment process by creating an assessment context where students’ views are given more room (Dragemark Oscarson, 2009, p. 58). Students who are taught self-regulation and self- assessment are given tools which will help them learn how to study (Dragemark Oscarson, 2009, p. 59). In other words, a large part of formative assessment is to make students aware that they are responsible for their schoolwork (Harrison & Howard, 2009/2013, p. 37; Wiliam &

Leahy, 2015/2015, p. 25). By inspiring dynamic thinking, formative assessment shapes students who perceive their results as dependent on their own efforts (Wiliam & Leahy, 2015/2015, p.

144), efforts that can always be improved. To make this happen, Wiliam and Leahy (2015/2015, p. 225) suggest regular self-assessments where students fill in a form in which they can easily rate statements that begin with, I think I can…, My teacher thinks I can…, I like to… etc.

Previous research on students’ self-assessments seems to mainly focus on older students in higher education. In a meta study of empirical studies regarding student self-assessment conducted between 2008 and 2018, Papanthymou and Darra (2019, p. 59) call for more self- assessment research in primary education. Existing studies mostly show a weak to moderate correlation between students’ self-assessment and summative tests and/or teacher-assessments (Andrade, 2019, p. 6; Blanch-Hartigan, 2011, p. 7; Falchikov & Boud, 1989, p. 420; Han &

Riazi, 2018, pp. 394-395; Stauffer, 2011, p. 88). Although, Dragemark Oscarson (2009, pp.

230-231), Goral & Bailey (2019, p. 403), Hasselgren (2000, p. 267), Kaderavek, Gillam, Ukrainetz, Justice and Eisenberg (2004, p. 45), and Chen (2008, p. 246) found students’ self- assessment skills to correlate with the evaluations made by the teacher. Blanch-Hartigan (2011, p. 7), Chen (2008, p. 246), Dragemark Oscarson (2009, p. 232), and Han and Riazi (2018, pp.

394-395) also show results indicating a stronger correlation between students’ self-assessments and summative tests and/or teacher-assessments with time and practice.

Adding to previous research, this study will investigate younger primary school students’

EFL self-assessment ability, which seems to be a relatively unknown area. This will be done by comparing students’ self-assessed EFL vocabulary to how many correct English words they can identify during a twenty-minute time span. Although the main focus area in this study is

(5)

younger primary school students’ EFL self-assessments, the accuracy of students’ self- assessments in English at a younger age should also be applicable to other subjects and thus interesting for other practitioners in the field of teaching.

1.1 Aim and Research Question

The aim of this study is to investigate the accuracy of EFL self-assessments made by second and third grade students in a Swedish primary school.

The question this study intends to answer is: Do Swedish primary school students’ EFL self-assessments correlate with how the students perform on an English vocabulary picture test?

1.2 Essay structure

Review of the literature begins with the theoretical framework followed by a brief summary of assessment in general, a description of self-assessment practices, and a review of relevant studies regarding student’s self-assessment accuracy. Lastly, the link between vocabulary and language proficiency is described, and a review of relevant studies about vocabulary picture tests as a measure of proficiency follows.

The methodology section describes the context and the participants, the EFL vocabulary self-assessment survey and the vocabulary picture test. The procedures are then explained, followed by a method discussion made before the study, and lastly, the methods for analysis are described.

Findings contains tables, figures and results from the study.

Under Discussion the answer to the question of the study is presented, together with additional findings. This is followed by a method discussion made after the study.

The Conclusion summarizes the results of the study. Secondly, pedagogical implications of the study are discussed, and finally, suggestions for further research are made.

(6)

2. Review of the Literature

This section begins with presenting the theoretical framework, which is first described in relation to the political movement in Sweden during the last fifty years, secondly, to theories of constructivism and learner autonomy, and lastly, to Foucault’s thoughts on pastoral power to describe the teacher-student relationship. Furthermore, formative and summative assessment is described in general, followed by a more detailed account of self-assessment. Then comes an account of relevant studies regarding student’s self-assessment accuracy. Lastly, there is a general description of vocabulary, how it affects language proficiency and how it can be assessed, followed by a few studies where picture tests were used to measure language proficiency.

2.1 Theoretical Framework

The political movement in Sweden from the 1970’s until today has brought a change, from a welfare state with quite rigid frameworks, to a stronger focus on the individual and how he or she can maximize his or her life opportunities (Dragemark Oscarson, 2009, p. 47). With this, the theoretical view of the student has changed, from the idea that the student is a blank slate for the teacher to fill, to the view where the student already possesses qualities that gradually mature with the support and nourishment of the teacher (Strandberg, 2009, p. 18). This change can be seen from two different perspectives. The first perspective is that the change results in an improvement of the individual progress and brings a form of empowerment to the students.

The second perspective views the change as a way of releasing the state from responsibility for its citizens, or in this case students, by generating individuals who control themselves (Dragemark Oscarson, 2009, p. 44).

If the aim is to teach students to be self-governing in learning and in future endeavours, there has to be room for reflection (Dragemark Oscarson, 2009, pp. 34-35). This is supported by theories of constructivism and learner autonomy (Chen, 2008, p. 237). Constructivism states that knowledge is constructed by the student, who then modifies this knowledge continually based on interaction with the environment. John Dewey, one of the three foundational psychologists of constructivism, advocated learn by doing, where students’ autonomy was to be developed through the opportunity to study based on their own interests and needs, and where reflection through self-assessment was necessary for them to understand their own learning (Dragemark Oscarson, 2009, pp. 35-36). The cognitive constructivist Jean Piaget’s view of learning is that it is inherent in man to be active in one’s own learning and to construct meaning from experience (Dragemark Oscarson, 2009, pp. 35-36). This is further developed in

(7)

social constructivism, in which the work of Lev Vygotsky states that knowledge is built by interacting with other students and with the supporting framework from a teacher (Dragemark Oscarson, 2009, p. 37). Vygotsky believed that self-assessment was dependent on environment and social interactions, with both an exchange of ideas and inner dialogue. The students are not isolated from their surroundings - where they are affects who they are (Strandberg, 2009, p. 20).

Meanwhile, the second perspective sees self-regulation as a way of freeing the state from responsibility for the individual (Dragemark Oscarson, 2009, p. 44). Dragemark Oscarson (2009, p. 47) refers to the work of Michel Foucault where he uses the term pastoral power to describe the teacher-student relationship. The teacher has, or thinks he or she has, deeper knowledge than the students and uses it to nurture them into disciplined citizens. Students’ self- assessments is most of the time validated by the teacher, which in a way cancels the empowering function of the self-regulated practice. Foucault thought power exists and is created between individuals, and self-assessment will not even out the unequal power situation that exists between teacher and student. The teacher is rather a part of society's all-seeing surveillance, which makes the students self-regulate themselves, and each other, to please the power.

Furthermore, a world that is changing faster and faster requires that its inhabitants are able to make decisions that are not based on tradition (Dragemark Oscarson, 2009, pp. 33-34). Dewey saw knowledge as something that could help people deal with future problems, and one can argue that a more reflective approach to learning will make students better at adapting to different situations in life. In education, the concept of lifelong learning is a result of this (Dragemark Oscarson, 2009, p. 47; Chen, 2008, p. 237; Council for Cultural Co-operation Modern Languages Division Council of Europe, 2001, pp. 3, 5; Skolverket, 2018, p. 1, 7).

Without formative assessment students have very little agency, when all validation and decisions come from external entities (Goral & Bailey, 2019, p. 392). This does not prepare them for a modern society where they, as self-regulated adults, are expected to assess their own needs, their progress and to make appropriate choices. But with self-assessment skills, students can develop a critical attitude towards learning and reach full self-sufficiency (Chen, 2008, pp.

237-238).

Consequently, this study recognizes the uneven distribution of power between teacher and students in primary school, and the research conducted can be placed within the theoretical framework of constructivism. Self-assessment is not supposed to replace teacher-assessment, but to develop students’ learning skills (Little, 2005, p. 335). Its use is also in line with the guidelines of the Common European Framework of Reference for Languages (CEFR), and an important part of the democratic didactics emphasized in the Swedish curriculum as the

(8)

teacher’s traditional power in the classroom becomes somewhat shared with the students when there is an ongoing dialogue regarding assignments, assessment criteria and standards (Chen, 2008, p. 238; Council for Cultural Co-operation Modern Languages Division Council of Europe, 2001, p. 186; Skolverket, 2018, p. 7).

2.2 Assessment

Assessment can be seen as having two aspects: the measurement aspect and the learning aspect (Butler & Lee 2010, p. 6). The measurement aspect of assessment focuses on students’

level of understanding, knowledge and skills. Summative assessment, as it is also called, is often based on tests with points and specific requirements, and is mainly practiced as a final measurement of achievement (Dragemark Oscarson, 2009, p. 61; Harrison & Howard, 2009/2013, p. 46). The learning aspect of assessment focuses on advancing students’ learning (Butler & Lee 2010, p. 6.). Students are in this aspect provided with opportunities to evaluate their performance and are given teacher feedback based on their evaluations, which can make them more aware of their learning process and performance, and in turn more proficient in learning. The learning aspect of assessment, or formative assessment, is used to improve learning by both the teacher and the students assessing the work while it is still being done (Dragemark Oscarson, 2009, pp. 61-62). It is to be descriptive, focus on the students’ abilities and aim forward (Harrison & Howard, 2009/2013, p. 46). With formative assessments, the quality of the students’ knowledge is essential, in that they should be able to apply it to different situations (Wiliam & Leahy, 2015/2015, p. 53). All activities providing information that can be used as feedback for additional learning are to be included in formative assessment (Dragemark Oscarson, 2009, p. 62).

Moreover, it is crucial for formative assessment practices that students understand where they stand in relation to the subject's knowledge goals (Dragemark Oscarson, 2009, p. 31;

Harrison & Howard, 2009/2013, pp. 13-14). Understanding this develops metacognition and control over one's own learning. Consequently, when students are aware of what they know and what they need to learn, they become more independent (Dragemark Oscarson, 2009, p. 63).

Independent students trust their own abilities and learn from their mistakes, which will help them throughout their schooling. In order to achieve this, the teacher should help the students become aware of what they can do (Lundberg, 2016, p. 88). To clarify, Afitska (2014, p. 29) divides the construct of formative assessment into three concepts: formative teacher- assessment, peer-assessment and self-assessment. Teacher- and peer-assessment should deal with the student's work in relation to set criteria, previous performance and also include advice

(9)

on how to improve. It should not be done in comparison to other students (Dragemark Oscarson, 2009, p. 62). For self-assessment, see below.

2.2.1 Self-Assessment

Self-assessment is a form of metacognition, the student becomes aware of the strategies and mental activities needed to complete the assignment in a process involving memory, comprehension, communication, and learning (Brown & Harris, 2014, p. 24; Butler & Lee 2010, p. 8; Kaderavek et al., 2004, p. 38). This is why it is considered to be an important part of formative assessment (Dragemark Oscarson, 2009, p. 62). Students who show success at complex cognitive tasks tend to plan an approach, organize their resources, and monitor and adapt to developments as they perform the task (Kaderavek et al., 2004, p. 38). Andrade (2019, p. 2) states that self-assessment is feedback, and as such meant to promote learning and improve performance. She continues by saying that the learning aspect makes it formative, and that self- assessment without the opportunity to adjust and correct is pointless.

Studies have shown that using self-assessment when teaching results in a slower learning process, but more deeply rooted knowledge, and it seems to help low achievers especially (Dragemark Oscarson, 2009, p. 62; Wiliam & Leahy, 2015/2015, pp. 223-224). Thus, students did not need repetition as much and they passed standardized examinations better, compared to students who did not work formatively with self- and/or peer-assessment.

Low performing and/or younger students need to practice the self-assessment process with the support of clear instructions and teacher feedback (Brown & Harris, 2014 p. 25; Kaderavek et al., 2004, p. 39). Brown and Harris (2014, p. 26) suggest three stages of self-assessments, where students can begin by estimating how many correct answers they will get on a spelling/math/vocabulary test, and next time estimate how well they will do compared to their last performance. This provides an actual and personal reference point. At the intermediate stage, students should be introduced to comparing their work to common set standards and against previous work. Self-correction and self-rating are introduced when students show competence in assessing their own work. At the advanced stage, rubrics or criteria with progression, preferably developed together with the students, can be used. Throughout the development of the students’ self-assessment proficiency, the teacher needs to emphasize the focus on realism and the aspect of the task, regardless of the level of the individual student’s performance (Andrade, 2019, p. 4; Brown & Harris, 2014, p. 26). Research shows that I could not spell most of the English words is a more effective self-assessment than I am bad at English (Andrade, 2019, p. 4). The use of self-assessment must not lead students to wrongly conclude that they are good or weak in some domain and then base personal decisions on this (Brown &

(10)

Harris 2014, s. 23). Teachers and researchers should also be aware of the impact the prevailing culture may have (Brown, Andrade and Chen, 2015, p. 452). When importance is placed on high performance in combination with pressure from teachers and parents to continually do better, it might very well discourage or even prevent realistic self-evaluations.

There are many different ways for students to self-assess, and this study will only describe a few. Besides those already mentioned above, teachers could use portfolios, traffic lights, exit- tickets and surveys. In compiling a portfolio of school assignments, the students practice self- assessment continuously (Little, 2005, p. 323). The portfolio can have a checklist of features to guide the portfolio process from the beginning, and students can then rate their portfolio against the checklist. Students can also be asked to write an evaluative account of their portfolio progress. For younger students Wiliam and Leahy (2015/2015, p. 218) and Andrade (2019, p.

4) suggest the use of ‘traffic lights’, e.g., by putting a green, yellow or red card on their desks.

By using a green card, they are signalling that they understand the ongoing assignment, by using a yellow card they are signalling that they need a little help, and by using a red card they are signalling that they are stuck. Andrade (2019, p. 4) sees this as exercising students’ reflections on how well they understand a concept or have mastered a skill, and labels it formative self- assessments of one’s learning. Yet, Brown and Harris (2014, p. 23) find evidence that this form of disclosed self-assessments creates strong psychological pressures on students that lead to pretending and dishonesty. Students may intentionally disguise the truth in order to protect their reputations. Another self-assessment method is for the teacher to hand out “exit tickets” with a few statements for the students to fill in (Wiliam Leahy, 2015/2015, pp. 219-222), for example:

I thought _____ was easy, I thought _____ was difficult, and I thought _____ was interesting.

This helps students formulate their self-assessments. The exit tickets are to be given back to the teacher when the students leave the classroom. Finally, self-assessment can be made by using regular surveys, where students fill in a form in which they can easily rate statements that begin with, I think I can…, My teacher thinks I can…, I like to… etc (Wiliam Leahy, 2015/2015, p.

225). Older students can be asked to rate their performance using the same criteria as the teacher.

The importance of self-assessment in language learning has been recognized in the CEFR (Council for Cultural Co-operation Modern Languages Division Council of Europe, 2001, pp.

25-26). The framework asserts both that students can self-assess and that they should self- assess. What is more, it asserts that self-assessment can be productive in the process of learning language with even very young school-age learners. Studies show that foreign language students find it difficult to compare themselves to native speakers of the language (Dragemark

(11)

Oscarson, 2009, p. 66). Also, the complexity of the process of learning a language is another factor which makes foreign language self-assessment more difficult. Younger students are often eager to show what they have learnt and are generally very honest when self-assessing, but already around age nine they start to criticise themselves, which makes them reluctant to try to talk in a foreign language (Lundberg, 2016, pp. 77, 159). The latter can make it difficult for teachers to evaluate their foreign language proficiency. Finally, if the foreign language learnt in a school context is to be used by the students outside of the classroom, the capacity to correctly self-assess allows them to turn occasions of target language use into opportunities for further language learning (Little, 2005, p. 322).

2.3 The Accuracy of Students’ Self-Assessments

This study has found five analyses of relevant literature, articles, and empirical studies about student self-assessment accuracy: Andrade (2019), Blanch-Hartigan (2011), Brown and Harris (2014), Brown et al. (2015) and Falchikov and Boud (1989). Out of these five, two analyses show inconsistency between self-assessments and external evaluations: Andrade (2019) and Brown and Harris (2014).

Andrade (2019, p. 5) reviewed 76 studies based on the terms ‘self-assessment’ and ‘self- evaluation’ published roughly between 2013 and 2019, with the age of students varying from kindergarten to university. Of these 76 studies, 44 were inquiries into the correlation between students’ self-assessments and other evaluations (e.g., teacher grades or test results). Also, 25 studies investigated the relationship between self-assessment and achievement, 15 explored students’ experiences of self-assessment, 12 studies focused on the association between self- assessment and self-regulated learning, one examined self-efficacy, and two documented the cognitive processes involved in self-assessment. In her discussion, Andrade (2019, p. 6) suggests that the word ‘consistency’ should be used instead of the word ‘accuracy’ when comparing students’ self-assessments to those made by teachers or researchers. There is evidence of unreliability of teachers’ grades, and therefore the use of the term consistency when referring to the degree of alignment between evaluations made by both sides is thought to be more fitting. Results of the meta-analysis show inconsistency between summative self- assessments and external evaluations, with males tending to overrate and females to underrate.

Furthermore, results show that older and more competent learners tend to be more consistent, and that consistency can be improved through practice. Lastly, Andrade (2019, p. 8) finds that formative self-assessment endorses development of knowledge and skill.

The analysis by Brown and Harris (2014, p. 22) responds to recent and influential reviews and position papers on self-assessment. There is no summary information on the age or gender

(12)

of the participants. In the analysis, they state that the usefulness of self-assessment for learning related decision-making seems to depend, in part, upon whether students can accurately judge the qualities of their own work (Brown & Harris, 2014, p. 23). However, the authenticity of those self-assessments is difficult to measure, since it is mainly determined through comparison to other people’s judgements. To conclude, Brown and Harris (2014, p. 27) find student self- assessment to generally have a positive impact on academic performance, although not being a precise assessment method in terms of accuracy.

In comparison, the analyses by Blanch-Hartigan (2011), Brown et al. (2015) and Falchikov and Boud (1989) show consistency between self-assessments and external evaluations. Blanch- Hartigan (2011, p. 4) conducted three meta-analyses on the results from 35 published articles on self-assessment accuracy made by medical students, 18 years of age or older. The selection was based on the terms ‘self-assessment’, ‘accuracy’, ‘confidence’, and ‘medical education’, among others. One meta-analysis was conducted for each of the three different ways self- assessment consistency is typically reported. Firstly, correlation between self-assessment and criterion scores, secondly, paired comparisons by measuring the difference between each self- assessment and a criterion paired with that self-assessment, and thirdly, when the mean self- assessment score is compared with the mean criterion score for a group of self-assessors (Blanch-Hartigan, 2011, p. 3). Results from the three meta-analyses show that the students’

self-assessments did not correspond perfectly to criteria, but the correlation was significantly above chance, so, students seem to have some ability to self-assess, but with limited accuracy (Blanch-Hartigan, 2011, p. 7). Students were more accurate when self-assessing after an assignment than before, and when they had specific information about the evaluation criteria.

Those who had just started their medical education were less accurate than students in later years. Finally, Blanch-Hartigan (2011, p. 8) found that females tend to underestimate and males tend to overestimate their performance, but she also points out that an alarming number of studies did not report gender composition in their samples and calls for thorough sample description in future research.

Brown et al. (2015, p. 444) review relevant literature from educational psychology and psychometrics, and express the need for a better understanding of accuracy in self-assessment.

There is no summary information on the age or gender of the participants. Their review finds the correlation between student self-ratings and other measures to be positive, ranging from weak to moderate, with the consistency with other measures being higher for test scores than for cognitive competence or for complex performances such as writing and projects (Brown et al., 2015, p. 446). Besides, consistency seems to improve with students’ age and experience

(13)

with school, and with greater academic ability (Brown et al., 2015, p. 447). Still, the accuracy of student self-assessment cannot be said to be uniform throughout the student’s life course, nor across the full range of learning activities (Brown et al., 2015, p. 448).

Lastly, Falchikov and Boud (1989, p. 395), conducted a meta-analysis, in which they examined 48 quantitative studies comparing self- and teacher-assessments. Studies came from the areas of science, social science, and the arts, though the majority of studies came from the first two areas (Falchikov & Boud, 1989, p. 398), with the ages of participating students varying from 17 years old to adults (Falchikov & Boud, 1989, pp. 399-415). Results show that most students tended to overrate themselves compared to teacher grades, with a mean correlation value of 0.39, defined as medium correlation (Falchikov & Boud, 1989, p. 420). The analysis further showed that students with good knowledge within a particular field made self- assessments more consistent to teachers’ assessments, compared to students with higher seniority or duration of enrolment within the same field (Falchikov & Boud, 1989, p. 425).

Moreover, the nature of the task being assessed does not seem to affect the consistency in any direction.

This study would also like to refer to the literature analysis made by Papanthymou and Darra, (2019, p. 49), where they examine 37 empirical studies focusing on self-assessment and academic performance published globally between 2008 and 2018. Papanthymou and Darra, (2019, p. 59) draw the conclusion that most research regarding self-assessment is made in Higher and Secondary education, and they therefore call for more research in primary education. This is something mentioned by Brown and Harris (2014, p. 27) as well, as they call for research to identify if there are ages, below which, particular types of self-assessment are unrealistic for students to perform accurately.

To summarize, one can start with the fact that this study is seen as somewhat irrelevant by Andrade (2019, p. 6), who thinks that the question of self-assessment accuracy is more related to clinical research on calibration, which has very different aims and does not transfer well to classroom assessment research. She states, that the tasks in self-assessment accuracy studies often have little to do with authentic student related situations and that those type of studies are a distraction from the true purposes of self-assessment (Andrade, 2019, pp. 6-7): to be used formatively and, supported by assessment practice, aid students’ achievements and self- regulated learning (Andrade, 2019, p. 10). Even so, a review of the above analyses shows that there are very few self-assessment studies conducted with younger students, and with the support of Papanthymou and Darra, (2019, p. 59), and Brown and Harris, (2014, p. 27), the aim of this study still might have something to contribute. What is more, the method in this study,

(14)

to be described further below in section 3 ‘Methodology’, is chosen based on tasks that the students are familiar with, both in their classroom- and everyday context, thus making the tasks authentic student related situations. Secondly, Andrade (2019, p. 5), Blanch-Hartigan (2011, p.

8), Brown and Harris (2014, p. 23), Brown et al. (2015, p. 450), and Falchikov and Boud (1989, p. 427) all acknowledge that there can be problems when using teacher- or researcher assessments as the criteria for self-assessment accuracy, since they may suffer from their own biases and are not reliable indicators in all situations. In response to this, the criterion used in this study is a vocabulary picture test where the result in not interpreted by either the teachers present or the researcher. Third, and last, Brown et al. (2015, p. 450) draw attention to the social response bias that might occur when self-assessment measures are threatened by socially desirable responding, and suggest that students hand in self-assessments with no identifying information attached to avoid this. A researcher can also reduce the risk of social response bias by giving students sufficient time to perform the task, thus avoiding emotional stress by not associating the task with high stakes (Brown et al., 2015, p. 451). The first variant, student anonymity, was not possible to implement in this study, due to the need of comparing the individual student’s first task with the second, but the participants were clearly informed that the results of the assignments were not to be shown to any teacher or to be used related to grading in the English subject, which hopefully reduces the social response bias. This can, in addition, reduce students’ concerns about their psychological safety if their self-evaluations were to be made public to peers, parents and teachers (p. 451). The second variant, sufficient time to perform the task, was used when students were given as much time as they needed for the first self-assessment assignment and by having a generous time limit for the vocabulary picture test.

Additional search for studies examining the self-assessment accuracy in language learning, specifically with younger students, led to Goral and Bailey (2019) and Kaderavek et al. (2004), both studies examining self-assessment of first language oral production. In the study by Goral and Bailey (2019, p. 395), the 58 participants were students in grades two (ages 7-8 years old), three, four and six. These students were asked to solve a math problem individually and then explain their solution to a researcher (Goral & Bailey, 2019, pp. 396-401). This explanation was recorded. The researcher then guided the students in assessing other students’ recorded explanations. With this fresh in mind, they had to listen to and assess the explanation they made after solving the math problem, and argue for their assessment. Results show that the majority of students’ self-assessments had consistency with those made by the researcher, but with a larger proportion of the second-graders showing inconsistency, compared to students in grades

(15)

three through six (Goral & Bailey, 2019, pp. 403-404). Girls’ self-assessments were more consistent with researchers’ assessments, compared to boys’ self-assessments.

Kaderavek et al. (2004, p. 40) asked 401 students, ages 5 through 11, to self-assess their performance immediately after the completion of tasks within the Test of Narrative Language (TNL). The TNL is a measure of comprehension and expression of connected speech used to narrate stories. The self-assessment was a 5-point rating scale with simple line drawings of five faces. The study was based on the theory that accurate self-assessment is an important metacognitive skill that helps in adjusting performance to meet assignment criteria (Kaderavek et al., 2004, p. 45). Results show older students’ self-assessments to have a low to medium correlation to examiners’ assessments of the students’ narrative production, but the very young students, ages 5 to 6, were highly inaccurate in their self-evaluations.

When the search for studies was further restricted to only include studies examining self- assessment accuracy in foreign- or second language learning, the following were found: Butler and Lee (2010), Chen (2008), Dragemark Oscarson (2009), Han and Riazi (2018), Hasselgren (2000), and Stauffer (2011).

Butler and Lee (2010, p. 11) studied 254 EFL students attending 6^th grade, ages 11 to 12, during one semester in South Korea. Two types of self-assessments were administered in their study (Butler & Lee, 2010, p. 12). First a general self-assessment for summative purposes, and then a series of self-assessments adapted to each teaching unit in the Korean national curriculum. Butler and Lee (2010, p. 24) found that the students improved their self-assessment accuracy after practicing it regularly for one semester, in comparison to the control group. Self- assessment also seemed to have a marginal but positive effect on students’ EFL learning.

Chen’s (2008, pp. 240-241) study was conducted within a twelve week long EFL oral training course at a national university in southern Taiwan, in which 28 students took part.

Assessment criteria were developed, practiced, and discussed by the teacher and students collaboratively, during the first two weeks. During the rest of the semester students took turns in giving oral presentations each week, simultaneously being peer- and teacher-assessed, and self-assessing their own performance after the presentation. At the end of the class, students exchanged assessments and suggestions in groups. This was conducted over two assessment cycles, from which the collected results showed that self- and teacher evaluations differed significantly in the first cycle, but were consistent in the second (Chen, 2008, p. 235). Students’

practise and added knowledge from the first cycle of assessment might have contributed to their improvement in self-assessment accuracy (Chen, 2008, p. 246).

(16)

Dragemark Oscarson (2009, p. 16) investigated Swedish EFL students’ self-assessment accuracy in reference to their writing. Participating were 102 students between 17 and 20 years of age (Dragemark Oscarson, 2009, p. 94). The study began with an introduction to self- assessment and group discussions about the EFL writing criteria (Dragemark Oscarson, 2009, pp. 106, 109). During the study several different self-assessment forms were used, as evaluations of existing EFL knowledge, for assessing performance on two written assignments, and for predicting test-results (Dragemark Oscarson, 2009, pp. 101-103). The latter conducted by students directly after the National Test of English. Dragemark Oscarson’s results show that the students in the study demonstrated competence in self-assessing their EFL writing, when compared to teacher grades, and that they improved their ability to self-assess accurately through training (Dragemark Oscarson, 2009, pp. 230-232).

Han and Riazi (2018, p. 390) studied the EFL self-assessment accuracy for students in an English – Chinese interpretation course, during one semester. Participants in the study were university students with Mandarin Chinese as their first language. In the beginning of the semester, students took part in developing common set criteria to be used in both self- and teacher-assessment. These criteria were then used for assessments after four, nine and ten weeks of the course (Han & Riazi, 2018, p. 394). In sum, results of the study show an overall weak- to-moderate consistency between self and teacher assessments. Besides, students’ self- assessment accuracy improved over time, from week four to week nine, and from week nine to week ten.

Hasselgren (2000, p. 261) was involved in a project at the University of Bergen in Norway, with the aim of developing material and procedures for systematically introducing formative assessment of the EFL ability of Norwegian primary school pupils towards the end of the sixth grade, 11 to 12 years old. The project was initiated at the request of the Norwegian Ministry of Education. During the national piloting of the final version, a set of tests and assessment instruments for teachers and students, 1000 students (Hasselgren, 2000, p. 261) filled in self- assessment forms at the end of each subtest (Hasselgren, 2000, p. 265). Using a four-point scale they rated specific parts of their own performance, as well as their overall performance (Hasselgren, 2000, p. 265). The analysis of the self-assessments shows that most students, except for the very weakest, are very accurate compared to their test-results.

Lastly, Stauffer (2011, p. 84) investigated the self-assessment accuracy of 156 American sign language (ASL) students at an American university. Participating students were enrolled in courses from beginning level to advanced level, ages 17 to 18 and up. The students assessed their ASL ability before participating in a structured conversation with a deaf interviewer

(17)

(Stauffer, 2011, p. 85). Their ASL proficiency was then assessed by teachers in the courses, using the same rating material as the students. A moderately strong self-assessment correlation was reported between the ASL students as a whole and their teachers. At the same time, a weaker, yet significant, correlation was reported between the advanced students and their teachers, compared to the moderate self-assessment correlation of the beginning students.

To conclude, first, all the language focused studies reviewed report students’ self- assessments to be fairly accurate to assessments made by teachers or researchers. This seems to be regardless of whether or not the skill being assessed is of a comprehensive or productive nature, or performed in oral or written form. Second, all longitudinal studies have results indicating a higher self-assessment accuracy over time. It should be added, that in all longitudinal studies except one, the teachers or researchers gave students guidance on how and why to self-assess before starting the actual tasks. In the one study without guidance, the involved teachers found this to be a lacking factor, and the researchers suggest that it affected the self-assessment accuracy negatively (Butler and Lee, 2010, p. 25). This is all in line with the analysis made by Andrade (2019, p. 6), saying consistency can be improved through practice. Third, a higher self-assessment accuracy with older age is reported in two of the three studies where students from different grades participated (ages 5 through 12), suggesting that the ability to self-assess is linked to the development of one’s cognitive abilities. This correlates to the analyses made by Andrade (2019, p. 6), Blanch-Hartigan (2011, p. 7), and Brown et al.

(2015, p. 447). Meanwhile, the third study with participants from different grades shows a somewhat lower self-assessment accuracy with age. Students in the third study were 17 years old or older, indicating that age affects the ability to self-assess more in primary and secondary school. Fourth, out of eight studies only two have participants younger than 11 years old. This confirms the already mentioned call for more research in lower primary school made by Papanthymou and Darra, (2019, p. 59), and Brown and Harris (2014, p. 27). Fifth, and last, it becomes clear that all the language focused studies reviewed use teacher or researcher evaluations to measure student self-assessment accuracy. Dragemark Oscarson (2009, p. 124) refers to work by Falchikov and Goldfinch where they state that confirming students’

assessments against those of teachers, and having those teacher-assessments as a standard, is a concern for validity. As mentioned above, teacher assessments were avoided in this study.

Moreover, the focus in this study is on the level of agreement between student self-assessment and student skill, and no account is taken of the undoubted learning benefits of many self- assessment structures.

(18)

2.4 Vocabulary and Language Proficiency

Vocabulary can be viewed from different perspectives. Each perspective affects the choice of test method (Read, 2000, p. 16). The first perspective is that vocabulary knowledge involves knowing the meaning of a word. This leads to tests where the participants, for example, should try to match each word with a synonym or with an equivalent word in their own language. At a beginner’s level of learning a foreign language, it can be enough to know the meaning of a word by being able to match it with a corresponding word in the mother tongue. The second perspective is that words are not single items only, but also a term for larger lexical items, such as idioms and common phrases (Read, 2000, p. 17). So, as foreign language proficiency develops the learner needs to understand lexical items too. The third perspective suggests that vocabulary not only involves language, but strategic competence as well. This means that students need to learn a large number of lexical items and have ready access to this knowledge to be able to draw on it effectively.

In general, when vocabulary is tested, focus is on the knowledge of content words (Read, 2000, p. 18). Articles, prepositions, pronouns, etc., are more often viewed as belonging to the grammar of the language than to the vocabulary, and are referred to as function words. The function words are mainly used as links to modify the meanings of the content words, which are nouns, verbs, adjectives, and adverbs. In all, knowing a word means knowing its receptive and productive use (Read, 2000, pp. 25-26). In spoken form that includes how it sounds and how it is pronounced, and in written form how to read and write it. Grammatically, knowing a word means having knowledge of the patterns in which the word occurs, and what word types to use with that particular word. To know the function of a word one has to know when and when not, and how often, to use the word. Finally, to know the concept and associations of the word, one has to know what the word means and what other words one could have used instead.

Vocabulary knowledge is critical for students’ language learning as it mediates receptive, expressive, and written communication and supports reading achievement and proficiency (Marcotte, Clemens, Parker & Whitcomb, 2016, p. 230; Ouellette, 2006, p. 562). For young children, vocabulary size is connected to the acquisition of phonological awareness skills, and vocabulary knowledge has also been shown to help with decoding and word identification skills (Marcotte et al., 2016, p. 230).

2.4.1 Vocabulary Picture Tests

In this study, vocabulary size is used as a measure of language proficiency. Given the young age of the students, a vocabulary picture test was considered the most appropriate tool, as it can be conducted regardless of students’ ability to read or write. The implementation of the test, to

(19)

show a number of pictures to the students and let them name as many as possible, measures their expressive vocabulary.

Relevant studies about vocabulary picture tests as a measure of proficiency, and vocabulary picture tests, are many and only two will be referred to in detail here: Ouellette (2006) and Vedyanto (2016). Some more examples where vocabulary picture tests are used in relation to language proficiency are: Boersma, Baker, Rispens and Weerman (2018), Edyburn, Quirk, Felix, Swami, Goldstein, Terzieva and Scheller (2017), and Yeung and King (2016).

Ouellette (2006, pp. 557-558) studied the relationship between vocabulary measures and reading skills in a project where 60 Canadian grade four students (approximately 10 years old) participated. All students had English as their first language. Vocabulary picture tests were used to measure both receptive and expressive vocabulary, together with tests for nonverbal intelligence, word definitions, decoding, visual word recognition, and reading comprehension.

The results show that reading comprehension was related to both vocabulary breadth and depth of vocabulary knowledge, and the study suggests that oral vocabulary is related to word recognition and further related to reading comprehension (Ouellette, 2006, p. 563). In short, Ouellette (2006, p. 555) proposes that it is the number of words added to the personal lexicon that is the important factor in decoding.

Vedyanto (2016, p. 55) studied the correlation between picture use in test format and vocabulary achievement. The participants included 41 Indonesian grade seven students (12 to 13 years old). Two vocabulary tests containing the same set of words were conducted, one without pictures in the control group, and one with an associated image for each word in the experimental group (Vedyanto, 2016, pp. 55-56). Students in the experimental group had a higher test score, and the findings report a strong correlation between picture use in test format and vocabulary achievement (Vedyanto, 2016, p. 58).

In relation to what has been mentioned above, the use of a vocabulary picture test as a quick measure of EFL proficiency seems acceptable.

(20)

3. Methodology

This section begins with a presentation of context and participants, followed by an account of the survey and the vocabulary picture test used in the study, and how they were conducted.

After this comes a method discussion, made before the study, and finally there is an account of how the analysis was carried out.

3.1 Context and Participants

The setting for the study was a Swedish primary school located in the outskirts of a larger city. The population was a sample of convenience since the researcher knew these students (McKay, 2006, p. 37) and consisted of fifty-six students in second and third grade, ages 8 through 10. All students were informed by the researcher about the reason for the study, and that it was to be performed on a voluntary basis. A form of approval was sent home to caregivers together with students’ homework, and brought back signed. Three students were not allowed to participate in the study.

Participating students were fluent in Swedish, although some were bilingual and spoke either Arabic, Finnish, or German. Two students who had English as a first language were excluded, due to the focus of the study being learning English as a foreign language. Out of the participants who took part in both the EFL self-assessment survey and the vocabulary picture test, six were students with a lower Swedish literacy level, according to their teachers. To the knowledge of the students’ teachers, none of the participating students had had more than standard exposure to English outside of school.

Furthermore, each class had been taught EFL from the beginning of second grade, by different teachers over time, and no uniform EFL material had been used. The students studied EFL forty-five (second grade) and sixty (third grade) minutes per week. After consulting associated English teachers at the school and the CEFR (Council for Cultural Co-operation Modern Languages Division Council of Europe, 2001, p. 24), students’ approximate EFL level was set to A1: Breakthrough.

The EFL vocabulary self-assessment survey was conducted by forty-three students, as some were ill at the time of implementation. The following vocabulary picture test was conducted by eighteen students, seven girls and eleven boys. The low attendance was a result of the COVID- 19 pandemic. Due to the virus, many parents worked from home and this resulted in very few students participating in extracurricular activities, and this was the only possible time for the researcher to carry out the vocabulary picture tests. Table 1, gives an overview of the final participants.

(21)

Table 1.

Table of Participating Students

3.2 Materials

3.2.1 EFL Vocabulary Self-Assessment

A survey with ten Likert-scale statements (Bryman, 2016, p. 154; McKay, 2006, p. 38) written in Swedish, with an even number of options, was used for students’ EFL self- assessments. An even number of grading options was chosen in hopes that it would encourage insecure students to take a clear stand when the ability to choose the middle option was removed (McKay, 2006, p. 38). Grading options were not true at all, a little bit true, pretty much true, and completely true. Moreover, they were pre-coded (Bryman, 2016, p. 227) 1 through 4, with 4 indicating high self-assessed EFL vocabulary skill, and 1 indicating low self-assessed EFL vocabulary skill (see Appendix A). This resulted in a maximum total sum of forty, and a minimum total sum of ten on the EFL self-assessment. Codes were not shown on the student

(22)

questionnaire, due to the risk of them seeing it as some form of grading and therefore selecting the higher number.

Due to students’ A1 EFL level, the survey was written in Swedish. Formulation of the statements was based on Dragemark Oscarson (2009, p. 69), Lundberg (2016, pp. 88, 127), Pinter (2017, p. 113) and Wiliam and Leahy (2015/2015, p. 225), who suggest the use of descriptions of specific situations where students can grade what they can do, for example: I can understand many English words on signs, in movies and commercials. Dragemark Oscarson (2009, p. 69) refers to a study made by Blanche and Merino (1989) stating a higher correlation between students’ self-assessments and teacher grading when this type of format was used. Students who were asked to do general self-appraisals of skills such as reading or writing obtained a lower teacher grade correlation. On the other hand, Dragemark Oscarson also mentions the study by Bachman and Palmer (1989) which proposes that foreign language learners might be more aware of what they cannot do. The latter was combined with Bryman's (2016, p. 228) argument that surveys with a Likert scale should have some items where the coding is reversed to identify participants who exhibit response sets, i.e., students who might answer in a way they think is “correct” rather than how they actually think. Therefore, four statements regarding situations described as difficult were included in the survey, for example:

I find it difficult to speak English and I do not understand when my teacher speaks only English.

These four negative statements were interspersed evenly throughout the test. In keeping with McKay (2006, p. 39), the statements were written as short as possible, while trying to avoid double-barrelled, leading, embarrassing and biased ones (McKay, 2006, p. 39; Bryman, 2016, pp. 251-255).

The items in the survey were checked for internal reliability (Bryman, 2016, pp. 154, 157;

McKay, 2006, p. 41) through a trial test with fifteen students. The results from the trial test were analysed with IBM SPSS Statistics, resulting in a Cronbach’s Alpha value of 0.89.

Bryman (2016, p. 158) recommends the value 0.8 or higher as an acceptable level of internal reliability. Cronbach’s Alpha calculates the average of all split-half reliability coefficients (Bryman, 2016, p. 158). In this survey, it means that the ten items were divided into two groups, which can be done in several ways, e.g., by odd and even numbers, on a random basis, or by first half and second half (Bryman, 2016, p. 157), thereby resulting in five items in each group.

Finally, if the five items in each half of the survey provide similar results, this would suggest that the test has internal reliability.

The final version of the survey contained the following items:

1. I can understand almost all words when my teacher speaks English.

(23)

2. I can understand many English words on signs, in movies and commercials.

3. I find it difficult to speak English.

4. I can understand many English words when playing games at home on a computer, tablet, smart phone ...

5. I can speak some English with my friends and they understand.

6. I know almost every word in several English songs.

7. I prefer to watch movies where they speak Swedish.

8. I want the help of an adult, a sibling or a friend when I play English games at home.

9. I can remember many English words when I speak English.

10. I do not understand when my teacher speaks only English.

Statement and related statement number were colour coded for added clarity, and the layout was designed so that the survey fitted onto one A4 page, making it easier for students to get an overview. For this to be possible, the fixed answers were lined horizontally, even though Bryman (2016, p. 227) recommends a vertical format. A horizontal format can cause confusion as to where the respondent should tick, but this was avoided by placing the graded options in distinct boxes that could be crossed (see Appendix B).

3.2.2 Vocabulary Picture Test

In order for the vocabulary test to be feasible for all students, regardless of reading and writing ability, sets of pictures from different areas of life were used, since research shows that categorizing vocabulary makes it easier to remember (Sandström, 2011, p. 93). The choice of sets was based on Read (2000, p. 40), who refers to studies conducted by Rodgers (1969) and Ellis and Beaton (1993b) stating that nouns are easiest to learn, followed by adjectives, and that verbs seem to be one of the most difficult word classes to learn. In the end, the vocabulary picture test contained a total of 354 images, representing 296 nouns, 37 adjectives, and 21 verbs.

The test consisted of already existing word mats from Twinkl (n.d.), a website with educational resources in English. These had a uniform appearance, which gave a more composite overall impression. In the selection of word mats, as many areas as possible were attempted to be covered. The final twenty-three word mats chosen were Weather, the set Parts of a House where Bedroom, Bathroom, Kitchen, Dining Room and Garden were picked, page two of the set Character Adjective Word Mat, Clothes, Colours, My Family, Farm Animals, Food, Fruit Word Mat, Hobbies, Minibeast Word Mat, Ocean Animals, Parts of the Body, Pets Word Mat, Sports, Transport, Vegetable Word mat, Verbs, and Zoo Word Mat (see Appendix C-Y). Due to the design of the word mats, some images appeared more than once, for example:

cat and dog were both to be found on the Pets Word Mat and on the Farm Animals word mat,

(24)

but if the student named both images of a dog it was still only counted as one answer. All text was removed from the word mats by the researcher and they were then sorted by content, as accurately as possible, and then printed onto six A3 sheets. The A3 sheets were then laminated.

Since each class had been taught EFL by different teachers over time and no uniform EFL material had been used, it seemed to be a more objective EFL vocabulary test with larger groups of associated words. If the test had consisted of a smaller number of images for the students to identify, there was the risk of them knowing several words within a subject that was not represented in the test. A greater number of images provided a better opportunity for students to demonstrate their knowledge.

3.3 Procedures 3.3.1 Piloting

The EFL self-assessment survey was piloted (McKay, 2006, p. 41) with two students from the second grade and one from the third grade. This led to exclusion of the statement I sometimes think in English, which was too abstract for the second-grade students, and two statements which all three students thought were too similar. In those two statements, the researcher had made a distinction between traditional English children’s songs and lyrics in popular music, but the students were unable to detect any differences between the two. The second-grade students also found the boxes with the close-ended answers too large, since they seemed compelled to draw their x:es diagonally from corner to corner. The design was thus changed.

The vocabulary picture test was piloted with one second grade student. This student showed no signs of difficulty in recognising the images. No changes were made and twenty minutes seemed to be plenty of time to examine all the sets carefully.

3.3.2 Administrating the EFL Self-Assessment Survey

Students in 2^nd grade were the first ones to self-assess their English proficiency. The researcher started by explaining why she was doing the survey, how it was not related to their school work and that results would not be shown to their English teachers. It was also made clear to the students that they were under no time limit and that the researcher would guide them through the survey. After this introduction, the researcher showed a dummy survey on the smart board, with three statements regarding candy, that had the same colours and layout as the EFL self-assessment survey. The candy self-assessment survey was then completed together, researcher and students, and it was explained to the students how the next survey would look the same but be answered individually. Now somewhat prepared, the students were handed the

(25)

real survey. With the EFL self-assessment survey now on the smart board instead, the researcher read each item aloud and waited for the students to mark their responses. Students were free to ask supplementary questions and a few asked for some statements to be read again. When finished, the students were asked to write their names on the surveys and return them to the researcher. The survey was conducted in the same way with participating students in 3^rd grade, and it took approximately twenty-five minutes from start to finish on both occasions.

3.3.3 Conducting the Vocabulary Picture Test

The researcher met with each student in a quiet room. The room is well known and often used by students. This context can even out the power relationship between researcher and student to some extent (McKay, 2006, pp. 54-55). To further minimize this bias, the researcher began the interview by fully explaining why she was doing the test, how the information would be used and that she was grateful to the student for helping. The student was informed how he or she could point at the pictures and name them in English, and positive feedback was provided in Swedish by the researcher throughout the test, to reduce any tension.

The student was presented with the A3 sheets in a pile, so only the top one was visible, and as the test continued the researcher spread the sheets out and encouraged the student to examine them again, in case he or she could remember any more English words by looking at the pictures one more time. When the student named an image, it was crossed out with a whiteboard pen.

After twenty minutes, the researcher stopped the test. All students seemed to enjoy themselves during the test.

After the testing, the researcher wrote the student’s name on one of the word mats and then took pictures of them. The name and the crosses were then wiped off and the same word mats were used for the next student. All tests were audio recorded as a precaution, but not transcribed.

3.4 Method Discussion – Before Study 3.4.1 EFL Vocabulary Self-Assessment

The survey format was chosen for its capacity to gather a lot of attitudinal information in a short period of time (McKay, 2006, p. 35), and close-ended items in form of statements since they are easy to answer, code and analyse (McKay, 2006, pp. 37-39). Students also had some previous experience from similar surveys, since they evaluate their self-efficacy and overall proficiency twice a year before the teacher-parent-student meeting. This is done, to some extent, by grading statements using three different smiley faces.

Bryman (2016, p. 223) points out disadvantages of surveys compared to structured interviews. He mentions the risk of respondents not understanding the questions, and that

(26)

surveys leave no room for the respondents to elaborate on an answer. Nevertheless, a misunderstanding of the statements was hopefully avoided by the researcher explaining them continuously at implementation, and students’ elaborate thoughts are material for a more qualitative study than this, however interesting they may be. Besides, McKay (2006, pp. 39,53) states that, in order to avoid the risk of respondents not understanding the questions, a survey investigating foreign language proficiency can be conducted in their mother tongue. This information was used when the survey was written, in Swedish.

3.4.2 Vocabulary Picture Test

The use of pictures in combination with testing EFL vocabulary skill stimulates students’

interest, contributes to the context of currently used language, and can encourage students to speak in the foreign language (Rokni & Karimi, 2013, p. 240; Vedyanto, 2016, p. 55). In the study by Vedyanto (2016, p. 58) students in grade 7 achieved better results on vocabulary tests with images than on tests without. The students were also observed to be more confident and in a better mood during tests with a picture format compared to standard vocabulary tests (Vedyanto, 2016, p. 54). A majority of the students reviewed the picture test format to be easier to understand than tests without pictures (Vedyanto, 2016, p. 58). The findings mentioned above, together with the participants’ A1 EFL language level, and the fact that some students within the population still struggle with reading and writing in Swedish, led to the use of an English vocabulary picture test without any written words.

According to Read (2000, pp. 8-11), this type of testing can be seen as evaluating vocabulary knowledge as an independent construct, with specific vocabulary items in focus.

The respondent can produce answers without referring to any context. The opposite would be vocabulary testing where vocabulary is part of another larger construct, and the respondent’s ability to use context information when producing answers becomes significant. Even though the latter might give a more complex view of the students’ English proficiency, the chosen method was judged to be sufficient in measuring their vocabulary within a reasonable time frame.

A number of different vocabulary picture tests are mentioned in the literature. Firstly, the Receptive and Expressive One-Word Picture Vocabulary Tests, and the Peabody Picture Vocabulary Test–Fourth Edition were too expensive for the budget of this study. Secondly, the Dynamic Indicators of Vocabulary Skills (DIVS), and the Singapore English Action Picture Test (SEAPT) were, according to their respective websites, available via the researchers who created them. Those were contacted by e-mail, but have not responded. Lastly, the Boston Naming Test was found at a speech therapist at a nearby hospital, but due to the COVID-19

(27)

epidemic, the clinic closed for visitors before the researcher got hold of the test. To summarize, the vocabulary picture test used in this study was created with several scientifically approved tests as models, and was the best that could be achieved with the study's finances and timeframe in mind.

3.4.3 Reliability and validity

In quantitative research, reliability refers to the consistency of the measures being used (Bryman, 2016, p. 156), namely, how dependable the results of a study or a measuring test are.

Bryman (2016, p. 157) mentions four factors to be considered when deciding whether a measure is reliable. Firstly, stability, which is how unchanging a test might be over time, so that the researcher can be sure that results related to the measure will not fluctuate. For instance, if a group of participants do the same test twice there should be little variation between first and second results. Secondly, internal reliability, which refers to how consistent the measure is within itself. The indicators in the scale should be consistent, so that the participants’ answers on one item is related to how they respond to the other items. Thirdly, inter-rater reliability, which is relevant when there is a great amount of subjective judgement involved during gathering of data, during the analysis, and also if there is more than one researcher involved in those activities. The subjective views could result in a lack of consistency in the study’s decisions. A context where this could happen is, for example, when open-ended questions have to be categorized. Lastly, external reliability, which refers to how consistent the results would be if the study was to be replicated (Bryman, 2016, p. 383), meaning, whether another researcher making a similar study would end up with similar results, or not (McKay, 2006, pp.

12-13).

Validity refers to whether the study measures what it is set out to measure (Bryman, 2016, p. 158). McKay (2006, pp. 12-13) points out three major types of validity in quantitative research. First, internal validity deals with whether the study is designed in a way so that only the chosen variable is causing the result, or if there are other, not thought of, confounding variables. Second, external validity refers to how well the study can be generalised to a wider population. For the findings to be generalised the population should preferably be a random sample of a representative group. Last, construct validity, which means that the methods used in a study are suitable for assessing the construct being investigated. Bryman (2016, p. 159) suggests that the researcher assumes a hypothesis from a theory relevant to the concept of the study.

With the above in mind, the students’ EFL self-assessment survey and the vocabulary picture test cannot be seen as having high stability. If these students were to take part in the