• No results found

Is the Yes/No method reliable for measuring vocabulary size?

N/A
N/A
Protected

Academic year: 2021

Share "Is the Yes/No method reliable for measuring vocabulary size?"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Estetisk-filosofiska fakulteten

Lisa Andersson

Is the Yes/No method reliable for measuring vocabulary size?

Engelska

C-uppsats

Termin: Vårterminen 2008 Handledare: Solveig Granath

Karlstads universitet 651 88 Karlstad Tfn 054-700 10 00 Fax 054-700 14

60

(2)

Abstract

Titel: Is the Yes/No method reliable for measuring vocabulary size?

Författare: Andersson, Lisa

Engelska C, 2008

Handledare: Solveig Granath Antal sidor: 31

Abstract: The main purpose of this paper was to construct and try out a test that could measure the size of both the receptive and productive vocabulary. This was a joint project, done by three students at the C-level in English in 1997. Before the test was constructed the students looked into previous investigations and different test methods used. The project group chose the Yes/No method as their test format. 23 students in their second year of their education at a theorectical programme in upper secondary school and 16 adult students at Komvux took the test in this paper. The results of the test taken by the students showed that it is impossible for a language teacher to construct a reliable and valid test for measuring vocabulary size using the Yes/No method.

Nyckelord: vocabulary, receptive vocabulary, productive vocabulary, testing, measuring

(3)

Table of contents

1 Introduction 1

1.1 Aims 1

2 Background 2

2.1 Measuring vocabulary size 2

2.2 Test methods 3

2.3 Previous investigations 4

2.4 Comparison and contrast of the previous investigations 8

2.4.1 Sampling 8

2.4.2 Method 8

2.4.3 Testees 9

2.4.4 Range of vocabulary size 9

3 Method 9 3.1 Sampling 9 3.2 Testees 10 3.3 Test procedure 11 3.4 Scoring 11 4 Results 12 4.1 Receptive vocabulary 12

4.1.1 How the usage of the pseudo-words affected the receptive voc. 14

4.2 Productive vocabulary 14

4.2.1 How the usage of the pseudo-words affected the productive voc. 15 4.3 Comparison between receptive and productive vocabulary 15

4.4 Pseudo-words 16

4.5 My test compared to previous investigations 17

5 Discussion 18

6 Conclusion 20

List of references 21

Appendices: Appendix1. Example of the calculations 23

Appendix 2.Table 1. The size of the receptive vocabulary 25 Appendix 3.Table 2. The size of the productive vocabulary 26

(4)

1. Introduction

The world has come closer to us than it was earlier. We have access to media from all over the world on the Internet, by satellite TV and we have opportunities to work and travel around the world. These opportunities demand new things of us, like being able to communicate with people that speak other languages than Swedish. It is therefore a necessity that the educational system prioritises language studies, especially English. Many students want to study on various levels both in Sweden and abroad and need to have skills in advanced English. Furthermore the job market is more global today with many multinational companies, which have English as the language of communication both in and outside Sweden. What is more is that the European Union citizens have the opportunity to work in all countries in the union and move between countries during their career. Many people move to countries without knowledge of the language spoken in that country and need a language to communicate in. Lastly, a lot of manuals for technical appliances are printed in English only, which raises expectations of good knowledge in English for skilled workers such as car mechanics and plumbers.

Since I started this investigation eleven years ago I have worked as a teacher and I have taught English to Swedish primary and secondary school children. A useful vocabulary is needed in order to learn a new language. I think that useful is a better word than basic to describe the vocabulary we want our pupils to learn. There must be tools to help teachers to find the right level of what they teach. Testing vocabulary size is one way of finding the level of word knowledge of the pupils and it could be helpful to teachers to base the content of the course on the results of a vocabulary size test. This paper will discuss how reliable such tests are in measuring the size of learners’ vocabulary.

1.1 Aims

The main aim of this paper is to see if it is possible to measure the size of a person’s vocabulary. Another aim is to find out if the results of such a test are reliable and valid as a scientific measurement that teachers can use in their profession. Furthermore, it would be useful to know whether language teachers can construct a test like this on their own.

(5)

It is necessary to define what we mean by vocabulary in order to measure it. In the background, I will present different types of words or categories of words that can be distinguished from each other in a vocabulary. Lastly this paper will investigate whether it is possible to measure the size of two of these categories separately, namely so-called

productive vocabulary which learners use when speaking and writing and receptive vocabulary which learners use when listening to spoken language and when reading.

2. Background

2.1 Measuring vocabulary size

Before I start looking into the problems linguists come across when they try to measure the size of an individual’s vocabulary, it is important to know what a vocabulary is. In the first section of this part of the paper, I will briefly discuss different ways of defining vocabulary. The words in a language can be divided into two main categories: function words, which are words like prepositions and determiners, and content words, which are nouns, verbs, adjectives and adverbs (Finegan 1994:161). Groups of words can be divided into lemmas or word families. Lemmas consist of a base word – a root form – and words that are derived from the root. An example of a lemma is the verb work and its forms, such as working, works and worked. Furthermore, a lemma can have words that are obviously related to it like

worker, which is not an inflected form of the verb but a noun derived from the root work. In

order to qualify as parts of a lemma, all the forms have to belong to the same word class (Daller et al. 2007:3). The size of a vocabulary will be larger if the measurement is based on lemmas rather than word families since in word families, inflected and derived forms count as one, whereas with lemmas, derived forms make up separate lemmas (Daller et al. 2007:4). Linguists are not in agreement whether there are different parts of a vocabulary with different functions. One suggestion is that there is a passive or receptive part of the vocabulary that humans use for recognition of words and an active or productive part that humans use for producing speech and written texts (Melka 1997:84). Melka claims that there is a gap in size between these two parts of the vocabulary. The receptive vocabulary is said to be twice to five times the size of the productive vocabulary. The receptive part is what humans acquire first when learning a new language, whether it is their mother tongue or a second language. Later in the language learning process the productive vocabulary increases (Melka 1997:92). Melka

(6)

(1997:93) concludes that humans need interaction between these two parts to be able to learn and use their mother tongue as well as a second language.

As I will show in section 2.3, the size of an individual’s vocabulary has been of interest to many linguists over the last century. The general idea has been that a large vocabulary reveals how well educated, well read and intelligent a person was (Nation & Waring 1997:7). Various investigations show huge differences in vocabulary size. Some investigations state that a person may store 3,000 words, others as much as 216,000 words in their vocabulary (Goulden et al. 1990:342). Nation and Waring (1997:7-8) claim that a native speaker of English at university level knows roughly 20,000 word families, although they stress that there are individual differences. Non-native speakers at university are said to know about 5,000 word families in English, even though some may reach the same level as native speakers. The difference in the results of the investigations accounted for in section 2.3 can be explained by the method the researchers used for sampling words for the tests as well the ways in which the tests were constructed and assessed.

In the introduction of their book, Daller et al. (2007:4) mention two different ways of deciding whether someone knows a word. The first way is when a person can identify an item as a word in the language and the second way is when a person can attach meaning to a word, either by explaining or translating the word (Daller et al. 2007:6).

2.2 Test methods

Traditionally, two different test methods have been used for measuring vocabulary size. In order to follow the account of the investigations in the next section, it will be necessary to understand the differences between the two, referred to as the multiple-choice method and the

Yes/No method respectively. The multiple-choice test is based on words in the target language

that are presented with four alternatives each and the testee is asked to match one of these four words with the word tested. The alternatives can be single words or phrases, and they can be given either in the target language or in the mother tongue (Meara and Buxton 1987:143). The Yes/No method is based on the testee’s self-assessment of the knowledge of the words and asks the testee to tick yes or no next to the words included in the test. Such a test includes several pseudo-words, i.e., words that are non-existent in the language tested but follow the morphological and phonological rules of that language. That means that the words look and

(7)

sound like real words in the target language. These words are used to adjust the scoring depending on how often a testee has claimed knowledge of these words. If a testee has ticked

yes next to a pseudo-word, the results are adjusted down in order to estimate the true size of

that person’s word knowledge (Daller et al, 2007:11). The advantage of this test method, according to Meara & Buxton (1987:146), is that this form of test is easy to construct and does not require a lot of time to do. It is also easy to evaluate for those who give the test. Another advantage is that it is possible to test a larger number of items than with the multiple-choice test.

2.3 Previous investigations

In the first few decades of the 20th century, a new science called psychometrics was gaining

ground in America (Read 1997:304). This had effects on education, where interest in language testing grew. The first vocabulary size test was conducted in 1916 by a person called Starch (accounted for in Read 1997:304). His test measured vocabulary knowledge by giving the testees lists of words in e.g. French with English equivalents. Starch used the multiple-choice method and the tests consisted of isolated words. It was fairly easy for the testees to match the words or phrases with the target words. The method was claimed to be reliable however, and was said to correlate well with other tests like reading comprehension (Read, 1997:304).

Seashore & Eckerson (1940) conducted the first modern investigation of vocabulary size in the 1940’s. Their investigation marks a clear boundary between the old language testing methods and the modern view of language testing. The investigation used a multiple-choice response test based on Funk and Wagnall´s New Standard Dictionary (Seashore & Eckerson, 1940:15); these two researchers were the first linguists to use such a large dictionary for their sampling. They took every third item from the top of the first column on the left-hand pages of the dictionary, which gave them a list of 1378 words. Of these, 58 were prefixes, suffixes, abbreviations and other unusable entries and therefore they were excluded from the test (Seashore & Eckerson, 1940:19). The remaining 1320 words were divided into four groups and were handed over to undergraduate students who were asked to arrange them in order of difficulty. The students were also asked to define each word by a synonym or by using it in a sentence. Words that none of the students could explain were excluded from the test. Thereafter, only 336 words remained (which for the test were divided into two lists). The first list consisted of 178 words, which were common basic words. These words were put into a

(8)

four choice multiple-response test, which the testees were asked to complete before they continued to the second list. This list consisted of 158 words that were rare basic words, which the testees were to define, use in a sentence or mark that they recognized the word in written language (Seashore & Eckerson 1940:29). This test was taken by 237 students from Ohio State University and 116 North Western undergraduate students. The results showed that an average college undergraduate recognized thirty-five percent of the common basic words, forty-seven percent of the derived words but as little as one percent of the rare basic words after correction for guessing. A total score on the test showed that an average student knew 155,736 words. The overall results showed that the students’ vocabulary ranged between 112,100 and 192,575 known words (1940:33). Seashore & Eckerson (1940:34) concluded that the reliability of the test was quite low, since the beginning of the test was too easy for students on this level and they could therefore spend more time working on the difficult part of the test.

Zettersten (1977:79) tested students in Finland, Norway, and Sweden. He chose to use the multiple-choice test based on 120 English words as the method to test vocabulary size. He divided the 120 words into six groups according to frequency. The words were sampled from Gothenburg Computing Centre (Zettersten, 1977:80). The aim of his investigation was to see if the students from these three countries differed with respect to vocabulary size or if they would get similar scores. His results showed that the students’ vocabulary size were roughly the same in all three countries (Zettersten 1977:81). In his conclusion, Zettersten (1977:84) stressed that one must consider discrimination, reability and validity when constructing a test like this. He claimed that the test should discriminate among the students, using a discrimination coefficient to indicate differences. Zettersten concluded that the test was reliable, in the sense that it would show the same results if it were given to the students on a later occasion. Finally he found that the test was valid, which means that it measured what it was intended to measure.

Meara & Buxton (1987:143) rejected the use of the multiple-choice test for measuring vocabulary size. They argued that a testee might mark the wrong alternative in a multiple-choice test even though s/he knows the right answer, either because the student may not understand the given definitions or due to the fact that the alternatives available may not match the meaning the testees attach to the word (Meara & Buxton, 1987:143). As an alternative, they suggested the use of the Yes/No test method.

(9)

In order to evaluate the Yes/No test, the researchers asked their testees to do both a multiple-choice test and a Yes/No test. Both tests were based on the same items, used in the Cambridge First Certificate Examination (1987:146). The Yes/No test consisted of 100 words, 60 real words and 40 pseudo-words. The testees were students at the Cassio College of further education. All of them were second language learners and 100 students took the tests. The linguists found that these two test types gave almost the same results and even that the Yes/No test had a higher reliability due to the fact that the Yes/No method is better at distinguishing the results of the testees (1987:150).

In their article Meara & Jones (1987:2) describe how they constructed their test. The aim was to test the testees’ passive vocabulary. Consequently, they describe different models for doing this and discuss what their advantages and disadvantages are. They claim that the multiple-choice test takes too much time to complete, the definitions are often too complex and the students can easily fail as a result. Another great disadvantage is that the testees can guess and get the right answer (Meara & Jones, 1987:22). The researchers favor the Yes/No method because the testees only have to tick the words they believe they know and no definitions are needed. Furthermore, it can be quickly constructed and quickly completed by the students. Like Meara & Buxton did in their investigation, Meara & Jones used pseudo-words to be able to adjust for reliability (1987:22). They also used a statistical formula, which reduced the total score if a testee ticked yes next to a pseudo-word. Meara & Jones (1987:30) took every tenth word from Thorndike and Lorge and the Kucera-Francis dictionary. When they sampled their words they only used headwords; inflections were seen as variants of the words. The linguists excluded words belonging to the following categories: obvious anomalies, words whose frequency is obviously at variance with the likelihood of their being known by foreign learners, words that were transparently derived from other words of much higher frequency, international words like hotel which are likely to be known everywhere, false friends, and finally words whose meanings are transparent to speakers of particular languages (1987:25-27). The words were then sorted into frequency bands. The 268 students who were tested were asked to rate each word and decide whether they could attribute a meaning to it or not. Their ratings correlated quite well with additional traditional tests (1987:35).

Goulden, Nation & Read (1990:344) argue that the main problem when sampling items that are going to be included in a test for measuring vocabulary size is that editors of dictionaries tend to increase the number of entries in a dictionary instead of decreasing them. They often

(10)

count derivations as base words. This creates a lot of entries that could be collected under the same entry. Goulden et al. (1990:344) used Webster´s Third New International Dictionary from 1961, which was the largest non-historical dictionary of the English language at the time of their study. The linguists also used a book called 9000 words from 1983 to get updated frequency figures since they felt that Webster’s Third was a bit outdated in terms of contemporary frequency. All sampled words were counted and divided into groups that consisted of base words and derived words. The linguists checked how the percentage each word in every group represented in comparison to the total number of words in the dictionary used for sampling (1990:349). They chose 542 words (492 base words and 50 from spaced sampling). From this list they excluded derivations, abbreviations, alternative spellings, inflected words and some compounds which were transparent in meaning. The linguists checked the frequency in Thorndike and Lorges´ word frequency list and adjusted their list so it would sample a representative group of words that would represent the total vocabulary in the dictionary (1990:350). In the test, each item represented 100 words. This test was taken by 20 university students who were native speakers of English. All testees were over 22 years of age (1990:356). The testees were asked to look at the words, to tick the words they knew and to put a question mark next to unfamiliar words. The total score of the test varied between 13,200 and 20,700 words known. The average score in the group was a vocabulary size of 17,200 words (1990:356).

The Yes/No method was also used by Beeckmans et al. (2001:242), who tested 488 Belgian French-speaking university students of Economics and Business Administration in order to measure the students’ knowledge of Dutch. The students had all studied Dutch as L2 in primary school and some had continued their studies at secondary school. The method was used as a complement to a grammar test and a multiple-choice test in order to sort the students into homogeneous groups for compulsory Dutch studies for the second year of university studies (2001:243). The test was not supposed to measure the vocabulary size of each testee but to see whether they knew the core vocabulary of Dutch, which the students needed to know to be able to understand the course material (2001:243). Two versions of the test were made. The tests each consisted of the same ratio of words as the test Meara & Buxton created in 1987, 60 real words and 40 pseudo-words (2001:243). The words were randomly sampled from Woorden in Context, which is a standard Dutch reference book. It contains 3,700 Dutch words based on high frequency and of utility for users of Dutch. The test contained all sorts of words. The pseudo-words were based on existing Dutch words, which were altered in two

(11)

ways. Either the affixes were changed, so that a word like prettig (fun) became pretaching, or they changed the graphemes so that timmerman (carpenter) became tommerman (2001:245). They were careful to stick to the phonological and morphological rules of Dutch. When the linguists assessed the test they noticed that a lot of the students had ticked the pseudo-words. They also noticed that it was not good to use the same correction formulae as they used for the multiple-choice test they included in their testing of the students (2001:242). The test showed huge variations in levels of word knowledge, from weak to advanced students, which was in line with what the researchers already knew (2001:242). Beeckmans et al. (2001:272) concluded that this form of the Yes/No method is not reliable for measuring vocabulary knowledge since the students often ticked pseudo-words and therefore were not honest in their answers. Another problem, which affected the results, was the formulae they used which was mostly used in multiple-choice tests.

2.4 Comparison and contrast of the previous investigations

Before I describe the methods I used in my own investigation it is necessary to see why the results varied so much between the different investigations presented above. The difference in results of the investigations described in section 2.3 can be explained by the method the linguists used for sampling words for the tests as well as how the tests were constructed and corrected as you will see.

2.4.1 Sampling

All investigations used different dictionaries to select words from. The size of the dictionary used varied and this may have affected the samples that were tested. All words were checked in frequency in all the investigations, except in the investigation conducted by Seashore & Eckerson, who let students define the words. The linguists excluded similar categories of words from their tests, however.

2.4.2 Method

The methods used in the investigations were the multiple-choice method and the Yes/No method. It was assumed that the Yes/No method would give better results regarding vocabulary size since the testees could tick off a word that they recognized without having to attach a meaning to it. The different investigations, however, showed that this was not always the case. My assumption is that this has to do with the different dictionaries that were used for sampling rather than the method used.

(12)

2.4.3 Testees

The number of participating testees varied from 20 testees in Goulden et al. (1990:356) to 488 in Beeckmans et al. (2001:242). The results of the tests are therefore hard to compare. For the different tests and the methods to be comparable, it would be better to have the same number of testees.

2.4.4 Range of vocabulary size

Average vocabulary size varies from 17,200 words in Goulden et al.’s (1990:356) investigation to 192,575 words in Seashore & Eckerson’s (1940:33) investigation. The reason for the huge difference between these tests might be that they used different criteria for what words to include or exclude from the items they tested.

3. Methods

The present project was part of a larger project, conducted by three students. The main aim of the project was to see whether it was possible to construct a reliable and valid vocabulary size test for testing Swedish learners of English in upper secondary school. After considering the test methods described in section 2.2 above, the group decided to use the Yes/No test method.

3.1 Sampling

The words that were to be tested were sampled from Rabén Prisma’s English-Swedish/

Swedish-English Dictionary (1995). This dictionary contains 85,000 words and phrases. The

English-Swedish section contains 65,000 entries from which 140 words were drawn. It was decided to exclude all proper nouns, pronouns, prepositions, abbreviations, articles and numerals if they appeared during sampling. The sampled words were checked for frequency in the General Service List of English Words (West, 1971) and were excluded if they were in the book because in that case they were too frequent. In order to get 140 words, it was necessary to sample words from the dictionary three times.

In the first sample, the first bold-faced entry in the first column on every fourth left hand page was selected, which gave us a total of 111 words. If a word from the categories that we had decided to exclude was drawn, the next word in the same column was chosen instead. In this sample 25 of the words were found in the frequency book and were therefore omitted because

(13)

of their high frequency. The second sample was taken in order to replace the omitted words. Another reason for taking a second sample was that we had originally decided to base the test on 100 English words but changed our minds in order to test more words. In the second sample every first bold-faced entry in the first column on every tenth right hand page was chosen. Again some of the words sampled were among the most frequent words of English and therefore had to be omitted. To get the right number of items a third sample was needed. This time the first entry in bold-face in the second column on every tenth left hand page was selected. The items were again checked for frequency. The three samplings gave the group 140 words that were used for constructing the test discussed in this paper. After a group member typed up the test a spelling mistake was found. She had written speedback instead of

speedboat, i.e. a word that does not occur in English. The total number of real words in the

test was therefore 139.

In order to construct the test according to the Yes/No test method twenty pseudo-words were included. The group chose twenty different pseudo-words found in Meara & Jones (1987:27), Read (1988:22) and Finegan (1994:68) (it should be noted that Finegan did not use his words in a test but for showing how the structure of the English language works). The word

speedback was kept as a pseudo-word as well since it is not a word that exists in English.

Therefore the total number of pseudo-words was twenty-one. The total 160 words, 139 real words and 21 pseudo-words were arranged alphabetically in a list. After each word, there were three columns for the testees to mark. The first column was to be ticked if a testee knew the meaning of the word, the second column was to be ticked if a testee knew and used the word in writing or in speech, and the third column was to be ticked if a testee did not know the word. The first two yes columns could both be ticked and we expected to find out about the size of the testees’ receptive vocabulary if column 1 had been ticked in comparison with the size of their productive vocabulary if column 2 had been ticked. The testees were asked to tick the second column only if they used that item in either their written or spoken English. As described in section 2.2 the receptive vocabulary is estimated to be larger since it only involves word recognition.

3.2 Testees

In November 1996, my two groups of upper secondary school students took our test. The first group consisted of 23 pupils from the second year of studies in a theoretical programme. The second group consisted of 16 pupils from KomVux who were either going to upper secondary

(14)

school again due to poor grades from their first time or pupils who had not attended upper secondary school in their late teens for various reasons. This made a total number of 39 testees. I forgot to ask the first group to indicate their gender on the test, which I did in the second group. The second group was asked to indicate their age, gender and previous education. Not everyone filled in this information. Their ages varied from 20-40 years and most of those who wrote down the requested information were females.

3.3 Test procedure

The testees took the test during an English lesson, which was 50 minutes long. They were allowed to use all that time if required. The aim of the test was explained. The testees knew that they were participating in an investigation that was supposed to measure their vocabulary size. They were asked to take the test seriously. It was also explained that there were pseudo-words included in the test and that their test results would be adjusted if they were ticked.

3.4 Scoring

After the test sessions all tests were assessed. First, all pseudo-words that the testees claimed to know the meaning of and/or use in either writing or in speech were counted. The number of marked pseudo-words was written down on the front page of the test. After that, all ticks in the first column (included in order to measure receptive vocabulary) – Yes, I know the

meanings of the word – were counted and after that the ticks in the second column that

supposedly measured the student’s productive vocabulary – Yes, I know and I use the word in

written or spoken English – were counted. Lastly the results of these two Yes columns were

separately written down on the front page.

A minor problem turned up when we assessed our tests. Sometimes the testees had only ticked the second yes column, which said that they used the word in writing or in speech but they had not ticked the first yes column, which meant that they knew the meaning of the word. We decided that such answers should be counted as if the testees had ticked both yes columns since it goes without saying that if a testee used a word in both written and spoken language s/he was likely to know the meaning of the word.

(15)

In order to measure the receptive vocabulary of the testees’ test results a formula from Meara & Jones (1987:30) was used;

P(k)= P(h) – P(fa) 1 – P(fa)

P(k) signifies the probability that a testee knows a certain word. P(h) shows the quotient of “hits”, which is the testee’s score on the test, divided by the total number of real words in the test. P (fa) indicates the quotient of the ticked pseudo- words divided by the total number of pseudo-words. Fa stands for false alarm, which signifies that a testee has claimed knowledge of a pseudo-word. P (k) is multiplied by the number of words from the dictionary used for sampling, in this case 65,000, to get the average size of the vocabulary of the testee. An example of a calculation can be found in Appendix 1 based on Nilsson (Nilsson, 1997). The difference is that I have calculated with 21 pseudo-words instead of her using 20 since we had 21 pseudo-words in the test and therefore the number of pseudo-words in the calculation must be 21 in order to represent the total number of words.

4. Results

In this section the results of my investigation will be presented and compared with the surveys presented in the theoretical part of my essay. The estimated size of both the receptive and productive vocabularies will be presented together with the results for pseudo-words. The results of the third column, the No column, will not be presented since that column is not necessary in the calculation.

4.1 Receptive vocabulary

The following sections will compare the two groups’ vocabulary sizes based on the test I handed out for them to complete. In this section I will present the results for the size of the receptive vocabulary, which was calculated from the first Yes column in the test. I will present the two groups separately. There were 23 testees in Group 1 who were in their second year in upper secondary school. Group 2 had 16 adult students who took the same course at KomVux. In the group of younger students the receptive vocabulary size ranged between 44,879 words as the highest score and 13,326 words as the lowest. The average size of the

(16)

receptive vocabulary was 25,521 words as can be seen in Table 1 in Appendix 2. These results are similar in results to those of Goulden et al. (1990:356) where the results ranged between 20,700 words and 13,200 words. The average vocabulary size was 17,200 words. It has to be pointed out that their investigation was based on native speakers, however. In Group 2, the adult group, the highest score was 47,580 words and the lowest was 9,188 words. The average size of the vocabulary was 27,346 words. The results for Group 2 are also presented in Table 1 in Appendix 2.

4.1.1 How the marking of pseudo-words affected the results for the receptive vocabulary

The testees had to tick all words of the test and they ticked a surprisingly large number of words as being part of their receptive vocabulary. When a testee marked a pseudo-word, the score of the test was reduced according to the formula that was used by Meara & Jones (1987:30), which was presented in section 3.4. To my surprise, the adult group (Group 2) and the younger students (Group 1) generally marked almost the same number of pseudo-words. If we look at the number of pseudo-words the testees in Group 1 ticked as words they recognised, we see that the highest number was nine pseudo-words. On the other hand five of the testees did not tick any pseudo-words at all. In Group 2, which was the adult students, the results show that one student ticked six pseudo-words and three testees ticked none.

Receptive vocabulary, pseudo-w ords 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Testees N u m b er o f ti ck e d p se u d o -w o rd s Group 1 Group 2

Figure 1. Number of pseudo-words ticked as part of the receptive vocabulary by students in Group 1 and 2.

(17)

4.2 Productive vocabulary

The results show that the size of the students’ productive vocabulary was, as expected, much lower than that of their receptive vocabulary. The highest score calculated for Group 1 was 31,200 words, which represents the size of this student’s productive vocabulary. The lowest score was 911 words. The average size of the productive vocabulary in Group 1 was 7,653 words (see Appendix 3). In Group 2 the highest score was 47,580 words whereas the lowest score was 911 words. The average size in this group was 8,362 words as can be seen in Table 2 (see Appendix 3). Four students in Group 1 and two students in Group 2 ended up with negative scores. The reason for this will be explained in section 4.2.1

4.2.1 How the marking of pseudo-words affected the results for the productive vocabulary

The number of pseudo-words marked as part of students’ productive vocabulary was counted and the total score of the testees’ results were adjusted downwards. If a testee had marked many pseudo-words the total score for productive vocabulary came out negative which was the case for some of the testees in both groups as shown in Appendix 3. The highest number of pseudo-words marked as used by the testees in speaking and writing for Group 1 was four. Seven testees did not mark any pseudo-words at all. In Group 2 one of the testees marked an alarmingly high number, ten, whereas four testees did not mark any of the pseudo-words. The most interesting thing about this part of the results is that the younger students generally ticked more pseudo-words as part of their vocabulary than the adult students did. I had assumed it would be the other way around. I return to this discussion in section 4.4.

Productive vocabulary, pseudo-words 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Testees N u m b e r o f ti c k e d p s e u d o -w o rd s Group 1 Group 2

(18)

4.3 A comparison of students’ receptive and productive vocabulary

In section 2.1 it was said that the receptive vocabulary was somewhere between two and five times larger than the productive vocabulary. Figure 3 shows that for the students in the present investigation there are tremendous differences. There are five testees who actually have the same size of both their receptive and productive vocabulary. Four testees ticked so many pseudo-words that the results are adjusted downwards so much that the results turned out to be negative, which cannot of course be the case.

Comparison betw een the recptive and productive vocabulary

-30 000 -20 000 -10 000 -10 000 20 000 30 000 40 000 50 000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Testees group 1 V o c ab u la ry s iz e Receptive Productive

Figure 3. Comparison between the sizes of the receptive and productive vocabulary in Group 1.

The results for Group 2, which can be seen in Figure 4, show that one testee has the same size of both the receptive and productive vocabulary. Two testees had ticked so many pseudo-words that the results were adjusted downward and were negative. On the other hand, five testees show a remarkably large productive vocabulary.

(19)

Comparison between the receptive and productive vocabulary -50 000 -40 000 -30 000 -20 000 -10 000 -10 000 20 000 30 000 40 000 50 000 60 000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Testees, group 2 vo ca b u la ry s iz e Receptive Productive

Figure 4. Comparison between the sizes of the receptive and productive vocabulary in Group 2.

In conclusion, it appears that the results for the two groups are not reliable, partly because of the high numbers, which would be unlikely for learners at upper secondary level. I base this conclusion on figures from Minkova & Stockwell (2006: 463) who suggest that a native adult speaker of English has 10,000-60,000 words in his/her receptive vocabulary. If the results of my investigation are compared to this, the students have proficiency similar to adult native speakers of English.

4.4 Pseudo-words

This section will show how many times the testees ticked each pseudo-word (see Figure 5).

How many times each pseudo-word was ticked

0 5 10 15 20 25 char p clam ate coler itic elbon ics felin der glarp harlo w hest icate hozo ne obse rvem ent pertu se petri bar proh onsit y pung e renk spibb le spee dbac k spre t tebb it trans cest uouswepet Testees N u m b er o f ti m es t ic ke d Group 1 Group 2

(20)

The words that were most often ticked by students in Group 1, which were the younger students, were the words that looked like real English words e.g. charp (sharp), hesticate (hesitate), observement (observation), and punge (plunge), which may support the assumption that the younger testees had been more exposed to English and that they were therefore more aware of the structure of English words. The adult testees in Group 2 showed more variety in their answers. I also noticed that the pseudo-words they ticked looked less like real English words than the words ticked by the younger students. Only one of the pseudo-words, tebbit was not ticked by a single testee. The fact that all the other pseudo-words were ticked by at least one testee strongly indicates that the test did not manage to measure the students’ vocabulary size, i.e. the results are not reliable.

4.5 Test results compared to results in previous studies

The test that I used was quite similar to Meara & Jones’ test from 1987 since the test in this investigation was based on the same number of items tested. The number of pseudo-words differed because of a spelling mistake that was made when typing up the test. However, my investigation did not include the number of testees needed to make it valid since such an investigation needs quite a large group of informants to make it representative of the population as a whole. The investigations accounted for in the theoretical background in most cases included over 100 testees, with the exception of Goulden et al.’s investigation (1990), which included only 20 testees. The results in my investigation do not correlate with the previous results due to the high proportion of real words that many of my testees claimed knowledge of. My conclusion is that the test results are neither reliable nor valid compared to earlier test results discussed in the background.

(21)

5. Discussion

Earlier tests have shown striking differences in figures for vocabulary size as do those in the present investigation. In this section I will discuss the reasons for my results. Based on the different figures given in the various studies I think there should be a standardised test to measure vocabulary size, the results of which can be compared with those from tests given on different occasions. Furthermore, a standardised test will have better reliability and validity.

The sampling of different researchers did not differ significantly from ours; when we did our sampling we strove to use the same methods as in the literature we studied. This made me wonder if the test methods used could affect the results more than we believed when we chose the method for our investigation. Eyckmans (2004: 165), who participated as a linguist in Beeckmans et al.’s project on with Belgian students, claims that the Yes/No method needs to be revised in order to make it reliable and valid for vocabulary size testing. Furthermore, the number of testees can lead to different results since a small group is not likely to be representative of the total population although it is representative for the sample. One of the aims of my investigation was to find out whether it was possible for language teachers to construct a test for measuring vocabulary size on their own. After having studied different investigation methods and having constructed a test and tested it on students I do not find it possible for teachers in general to construct a reliable and valid test since it takes more time than I thought to get a representative sample of words in the adequate frequency band for the groups in question.

The difference between the results of my two test groups may also be caused by other factors, e.g. adult students may feel more pressure from society to succeed in their studies, since many of them might have failed in their previous studies and are afraid failing again. This can be seen in Swedish schools today as well. Weak students have very low confidence in their skills and often give up before trying even when they know things because they are so used to failing. Many of the testees in the adult group were women and their self-esteem may be lower than the self-esteem of a girl in her late teens who is good at English. As a result, the older women may have ticked more words in their eagerness to be good students, rather than ticking only the words they were completely certain that they knew. Also, they might have

(22)

felt that it was expected of them to know more words since they were older and should have more general knowledge. It is possible that some students in the adult test group suffered from dyslexia. Dyslexic students often encounter problems when they are supposed to read and identify words and attach meaning to written information. The pseudo-words are not easier or harder to identify than real words since many dyslexic people have difficulty separating graphemes from each other. These types of students often benefit from hearing words since many of them, in general, have a good listening comprehension. I have had a student in my secondary school, year 9, who suffers from severe dyslexia but this year he managed to complete the listening part of the national exam in English with good results, whereas his reading ability was very poor. If he had taken the test for this investigation, he would have ticked words randomly throughout. I think many students with reading difficulties guess a lot when taking written tests.

I expected that the younger students would be more familiar with English than the adult students due to the fact that younger people come into contact more with English in various situations and also make use of it in writing, speech and listening. The younger students who took my test were all students at a theoretical programme, which prepares the students for further education. This might also have affected the results since they were motivated to study. They have done well at school and their self-esteem is good. Figure 5 (on the pseudo-words) in section 4 shows that the younger testees (Group 1) are more aware of the syllable structure of the English language and how English words are constructed since they ticked the words that looked more like real words. The adult students had probably been less exposed to the English language in their childhood than the younger students.

(23)

6. Conclusion

To sum up, were the aims I set up in section 1.1 realized? The main aim, which was to see if it was possible to measure a person’s vocabulary, is difficult to determine. The results differ a lot from those of the investigations that were conducted previously. My testees’ answers also varied considerably. The results are also hard to compare to the other investigations since we used a dictionary for Swedish learners of English. In my opinion, the results are not valid, and I suspect that if a testee took the test several times we would get different results. This suggests that it is hard to construct a test, which is both reliable and valid. My conclusion is therefore that it is unlikely that a language teacher could construct a test that could be used to gauge the vocabulary level of a group of students.

If I were to write a paper in linguistics again, it would be intriguing to construct a multiple-choice test based on the same items included in our investigation and compare the results of the two tests. Would the results differ or would they be similar?

(24)

List of references

Beeckmans, Renaud, Eyckmans, June, Janssens, Vera, Dufranne, Michel & Van de Velde, Hans. 2001. Examining the Yes/No vocabulary test: Some methodological issues in theory and practice. Language Testing 18(3): 235-274.

Daller, Helmut, Milton, James & Treffers-Daller, Jeanine. 2007. Modelling and Assessing

Vocabulary Knowledge. Cambridge: Cambridge University Press.

Eyckmans, June. 2004. Measuring Receptive Vocabulary Size. Utrecht: LOT.

Goulden, Robin, Nation, Paul & Read, John. 1990. How Large Can a Receptive Vocabulary Be? Applied Linguistics 11(4): 341-363.

Finegan, Edward. 1994. Language: Its structure and use. Forth Worth: Harcourt Brace. Meara, Paul & Buxton Barbara. 1987. An alternative to multiple choice vocabulary tests.

Language Testing 4(2): 142-151.

Meara, Paul & Jones, Glyn. 1987. Tests of Vocabulary size in English as a foreign language.

Polyglot 8(1): 1-39.

Minkova, Donka & Stockwell, Robert. 2006. English Words. In Aarts Baas & McMahon, April (eds.). The handbook of English Linguistics. 461-482. Oxford: Blackwell.

Schmitt, Norbert & McCarthy, Michael. 1997. Vocabulary: Description, Acquisition and

Pedagogy. Cambridge: Cambidge University Press.

Melka, Francine. 1997. Receptive vs. productive aspects of vocabulary. In Schmitt, Norbert & McCarthy, Michael (eds.). 84-102.

Nation, Paul & Waring, Robert. 1997. Vocabulary size, text coverage and word lists. In Scmitt, Norbert & McCarthy, Michael (eds.). 6-19.

Nilsson, Monica.1997. Is the YES/NO Checklist Method a Reliable Test Method for

Measuring Vocabulary Size? Unpublished term paper. Karlstad: English Department. Prismas Engelsk-Svenska och Svensk-Engelska Ordbok. 1995. Stockholm: Rabén Prisma.

Read, John. 1988. Measuring the vocabulary knowledge of second language learners. RELC

Journal 19(3): 12-25.

Read, John. 1997. Vocabulary and testing. In Schmitt, Norbert and McCarthy, Michael (eds). 303-320.

(25)

Seashore, Robert H. & Eckerson, Lois D. 1940.The measurement of individual differences in English vocabularies. Journal of Educational Psychology 14(38): 14-38.

West, Michael. 1971. A General Service List of English Words. London: Longman Group. Zettersten, Arne. 1977. A Report on Experiments in English Vocabulary Testing in Denmark, Finland, Norway and Sweden. In Zettersten, Arne (ed.). Papers on English Language Testing

(26)

Appendix 1

Examples of calculations of the receptive and productive vocabulary

The examples of the calculations in this appendix are similar to those described in a term paper by Nilsson (1997). Nilsson was one of the members in the group that constructed this test.

Receptive vocabulary size

The testeee in the example below ticked 100 out of the 139 real words in the first yes column (words recognized by the testee). The number of ticked real words was first divided by the total number of real words (i.e. 100/139). The testee also ticked two pseudo-words as a part of his/her receptive vocabulary. The number of pseudo-words was divided by 21, since there were 21 pseudo-words altogether in the test. The quotient of the division with the pseudo-words was subtracted from the quotient of the division of the real words. Then the difference was divided by 1 minus the quotient of the pseudo-words was subtracted and that gave the quotient 0.6898901.

The total number of real words in the test was multiplied by this quotient. 0.6898901∙139≈96.

In order to get the representation of the total number of words from the sampling that contained 65,000 words in the English-Swedish part, 65,000 was divided by 139 which gives the quotient 486 which the product 96 was multiplied by.

96∙468 44,928

The result showed that this testee had a receptive vocabulary of 44,928 words.

100 139− 2 21 1− 2 21

=

0,6898901

(27)

Productive vocabulary size

The testee in the example below ticked 15 of the words in the second yes column. The same calculation as above is used with this new figure as can be seen below.

0.01401∙1391.94739 1,94739∙468911

The result shows that this testee has a productive vocabulary of 911 words.

15 139− 2 21 1− 2 21 =0,01401

(28)

Appendix 2

Table 1. The size of the receptive vocabulary

Group 1 Group 2

1 44 879 24 36 059 2 31 200 25 47 580 3 41 349 26 23 154 4 36 972 27 15 395 5 23 010 28 44 361 6 19 188 29 9 188 7 26 257 30 20 212 8 18 369 31 23 671 9 24 430 32 14 438 10 15 912 33 16 895 11 18 860 34 20 050 12 31 268 35 45 396 13 25 740 36 23 283 14 32 292 37 32 099 15 17 004 38 39 499 16 15 444 39 26 257 17 37 206 18 22 300 Average 27 346 19 22 271 number of 20 13 326 words 21 15 912 22 15 912 23 37 880 Average 25 521 number of words

(29)

Appendix 3

Table 2. The size of the productive vocabulary

Group 1 Group 2 911 23 283 31 200 47 580 7 818 7 119 9 594 6 084 10 998 40 223 19 188 11 257 13 326 -12 958 18 369 6 084 -16 193 2 644 1 716 1 661 7 558 9 705 7 927 -38 142 6 084 14 438 - 25 038 7 818 -7 020 6 084 15 444 911 18 096 Avarage number of words

22 300 8 362 -3 744 3 498 5 049 6 084 22 849 Avarage number of words 7 653

(30)
(31)
(32)
(33)
(34)
(35)

References

Related documents

At first, in the aspects of learning strategies, the students in Group A and Group B learned the target words with two completely different strategies: incidental

Interrater reliability evaluates the consistency of test results at two test occasions administered by two different raters, while intrarater reliability evaluates the

People who make their own clothes make a statement – “I go my own way.“ This can be grounded in political views, a lack of economical funds or simply for loving the craft.Because

Methodology/approach – A literature study, with Lean implementation, measuring starting points for improvement work, soft values and the effects of the improvement work in focus

The previous mentioned study by Gulz and Haake [1] showed, for example, that a female ECA typically were described with less positive words than the male version, but those

while considering that the given matrix A is diagonalizable with eigenvalues having identical geometric and algebraic multiplicities, the columns of the matrix Q evalu- ated as

För närvarande analyseras varken aldosteron eller renin vid laboratoriemedicin på Sundsvalls sjukhus, inkomna prover skickas vidare till laboratoriemedicin på Karolinska.. 3

superimposed jagged short waved erratic surface roughness. Together these forms the.. a) Surface profile; from top to bottom, surface profile, surface wariness, surface roughness.