Table of Contents
Assessing speaking is an extremely difficult task. The goal of this review is to see what researchers have to say about speaking assessments and tests. The paper includes an overview of the theoretical models of speaking assessment. A brief review of literature is performed. Implications for teaching practice are provided. The basic idea underlying the current work is that the current state of speaking assessments raises more questions than it can reasonably answer. Therefore, teachers should follow a set of recommendations in order to ensure that the results of their language assessments are valid.
Keywords: assessing, speaking, test, model, language, skills.
Today’s scholars and practitioners in teaching/learning generally agree that, of all macroskills of language, speaking, particularly second-language speaking, is extremely difficult to assess. Assessing speaking is problematic for numerous theoretical and practical reasons. Very often, tests for assessing speaking lack theoretical grounds. At times, such tests also do not fit into the conditions and requirements of the learning process. Subjective characteristics of teachers and examiners have huge impacts on the way students display their language abilities and the way these language abilities are assessed. Therefore, assessing students’ speaking remains one of the greatest challenges facing teachers in the classroom. The goal of this review is to consider the theoretical underpinnings of speaking assessments/tests and understand what contemporary researchers have to say about the major problems surrounding speaking assessments and their implications for teaching.
Assessing Speaking: Theoretical Insight
Tests and models of assessing speaking represent an interesting object of analysis. According to Brown and Abeywickrama (2010), assessing speaking is a task that is not simply important, but also extremely challenging. These challenges are associated both with the nature of the speaking activity in humans and with the methods used by modern practitioners to assess speaking in children and adults. To evaluate the relevance and efficiency of various types of speaking assessments, the basic types of speaking need to be distinguished. Speaking by itself is the “verbal use of language to communicate with others” (Fulcher, 2003, p.23). Basically, speaking can be imitative, intensive, responsive, interactive, and extensive (Brown & Abeywickrama, 2010). Imitative speaking is one’s ability to imitate someone else’s speaking (Brown & Abeywickrama, 2010). Intensive speaking is about being able to produce short oral stretches of language (Brown & Abeywickrama, 2010). However, responsive speaking takes place when there is an interaction with the interlocutor (Brown & Abeywickrama, 2010). Interactive speaking differs from responsive speaking in that it is longer and always interpersonal; thus extensive speaking involves the delivery of monologues (Brown & Abeywickrama, 2010). These differences in speaking modes, as well as the complexity of the speaking processes, create considerable difficulties for the specialist assessing individual speaking abilities and skills.
One of the biggest questions is what the process of assessing speaking should involve. Brown (2004) refers to the macro- and microskills of speaking. Microskills cover the ability to produce small pieces of oral language/information, such as words, morphemes, and phrasal units; whereas macroskills imply that the speaker can focus on large language elements, for instance, fluency or style cohesion (Brown & Abeywickrama, 2010). Needless to say, the criteria used to assess the language microskills differ considerably from those used to evaluate individual macroskills in language and speech. For example, assessing microskills requires evaluating the ability to produce reduced words and phrases, while macroskills evaluation can involve the analysis of registers and styles used by the speaker (Brown & Abeywickrama, 2010). It is a serious problem, facing education professionals and teachers, what criteria to choose and how to assess micro- and macroskills. Another problem is that, theoretically, macro- and microskills are defined explicitly but, in reality, isolating one macro- or microskill during speaking is never possible (Brown & Abeywickrama, 2010). Moreover, teachers should understand that assessing speaking alone is absolutely impossible, and any assessment tasks/tests should be designed to enable the analysis of the interaction between speaking and reading, elicit the target criterion from the speaker, and produce reliable scores (Brown & Abeywickrama, 2010).
As of today, numerous models of testing have been developed to assist teachers in assessing the speaking skills and capabilities of learners. Depending on the type of speaking, various modes of assessment can be used. For imitative speaking, word repetition tasks and a phone pass test can be utilized (Brown & Abeywickrama, 2010). Intensive speaking requires the use of directed response, read-aloud, picture-cued, and sentence-dialogue completion tasks (Brown & Abeywickrama, 2010). In case of assessing responsive speaking, questions and answers or elicitation-of-questions tasks can be used (Brown & Abeywickrama, 2010). Interactive speaking is usually tested with the help of interviews, role plays, discussions, and conversations (Brown & Abeywickrama, 2010). Unfortunately, there is still no single speaking assessment task that could become a universal answer to the major questions, facing education professionals in various assessment situations. Moreover, the process of assessing speaking is associated with considerable difficulties, many of which have been described in literature.
Assessing Speaking: Issues
As mentioned earlier, one of the speaking assessment problems is that professionals cannot isolate microskills from the macro- ones (Brown & Abeywickrama, 2010). This, however, is not the only problem affecting speaking assessments. Generalizing the results of assessments and tests to all communicative situations may be quite problematic (Wesche, 1983). Wesche (1983) writes that the “learner’s ability to participate in a social gathering or read a newspaper” may not be related to his/her ability to make quality inquiries at the train station (p.46). Therefore, the results of assessing one speaking skill among others may not provide sufficient information regarding the overall speaking skills for a particular individual. In situations where overall speaking abilities need to be assessed or a holistic picture of individual speaking skills has to be created, it is always better to rely on the common rule systems rather than separate criteria and speaking routines (Wesche, 1983). Practitioners can solve this problem, by testing the so-called “subsidiary skills”, which underlie most communicative acts, or use context-free linguistic codes to foster greater validity of the general communicative tests (Wesche, 1983).
Problems with generalizing assessment results are inseparable from the problem with establishing the right evaluation criteria for speakers (Wesche, 1983). These problems were also highlighted by Brown and Abeywickrama (2010). Developing or establishing the specific criteria for speaking assessments requires attention to both global and specific communication abilities and skills (Wesche, 1983). In the meantime, these assessment criteria will vary, depending on the context and situation, for example, would-be interpreters will have to display the grammatical and sociolinguistic components that differ from students, who simply study a second language (Wesche, 1983). Likewise, immigrants that seek a job abroad will require a different set of communicative strategies and skills than individuals, who simply want to test their capacity to live and operate in a foreign country (Wesche, 1983). Apparently, when it comes to the assessment criteria, the type of speaking to be assessed will have to be considered (Brown & Abeywickrama, 2010). Otherwise, all these complexities will not allow evaluating one’s speaking skills and potentials to the fullest. Actually, the variability of the assessment tests, not only their criteria, presents considerable difficulties. The methods of assessments used by teachers vary greatly across schools (Akiyama, 2003). Nonetheless, the inclusion of speaking assessments in school curricula has positive impacts on students and results in better achievements and grades (Akiyama, 2003). All these issues find their reflection in literature on speaking assessments.
The current state of literature provides enough information to evaluate the progress made by teachers in terms of assessing speaking. Unfortunately, at present, the process of assessing speaking in schools creates more questions than researchers can reasonably answer. One of the most serious issues is that of validity; as a result, Brown (2003) claims that numerous factors affect the validity and reliability of assessment results. The interviewer’s personality and bias can become a serious barrier in achieving greater validity of assessment results (Brown, 2003). Different interviewers display different communicative skills and have different perceptions of how an ideal candidate should speak (Brown, 2003). One and the same candidate speaking to different interviewers will, most likely, be graded in entirely different ways. Therefore, based on what Brown (2003) writes, assessing speaking is not a matter of someone’s abilities or assessment criteria, but about a matter of the way interviewer perceives each candidate. The adequacy of interviewers should also be considered (Brown, 2003). Unfortunately, until present-day, the importance of interviewers’ training and its implications for the validity and reliability of assessment results have been persistently overlooked (Brown, 2003). Still, the problem of validity and reliability in assessing speaking is multifaceted.
While interviewers are criticized for the lack of competence and excessive personal bias, rated speakers also display variations in language performance. Iwashita, Brown, McNamara and O’Hagan (2008) suggest that raters who work with the same rubrics and evaluate different speakers with different language abilities sometimes arrive to quite similar assessment results. Yet, again, the problem is not in speakers, but in the way they are being assessed. The situation when test scores have little relation to the quality of language abilities, displayed by the subjects, is not uncommon (Iwashita et al., 2008). To a large extent, Iwashita et al. (2008) support the earlier findings by Brown (2003) in that “raters’ interpretations of test-taker performance may vary according to which faces are being assessed and how these interact” (p.28). Added to this is the emphasis made on speakers’ grammatical accuracy, which is further reflected in the way both speakers and raters behave (Iwashita et al., 2008). Definitely, interviewers’ personality and perceptions of speaking assessments play a huge role in the way various speaking abilities are assessed. Future researchers will have to focus on the development of more appropriate test scales and the analysis of interviewers’ attitudes towards different types of assessments.
Given the speaking assessment problems highlighted in the literature, one of the main questions is in how such assessments have to be designed. No single answer to this question can be provided, but some recommendations to enhance the validity and reliability of assessment results can still be followed. To begin with, Wesche (1983) supports Iwashita et al. (2008), assessing speaking should not be limited to grammar. Rather, it should cover the entire scope of communicative competences. In order to be valid, tests should be pragmatic and interactive (Wesche, 1983). They should assess language abilities in both verbal and situational contexts (Wesche, 1983). Furthermore, an ideal assessment of someone’s speaking is necessarily direct, which also means that the tasks included in assessment need to be realistic (Wesche, 1983). An ideal assessment of someone’s speaking will help the examinee to display the whole range of his/her language functions, whereas the examiner will have to use criterion referencing (Wesche, 1983). All these recommendations shape the basis for the development of more valid and reliable assessment models, although, since the time Wesche (1983) wrote her work, no considerable improvements have been made. As of today, still too many problems are facing language examiners (Iwashita et al., 2008). All this information also has far-reaching implications for teaching and assessment in schools.
The results of the recent studies and observations have far-reaching implications for the modern teaching practice. First and foremost, teachers can use a variety of assessment models and approaches to evaluate the quality of language abilities in different groups of learners. The choice of these models depends on many factors, from the type of speaking to be assessed (interactive, responsive, etc.) to the level of learning (Brown & Abeywickrama, 2010). However, at all levels of language proficiency, teachers should avoid emphasizing grammatical accuracy and, instead, focus on the analysis of many language features (Iwashita et al., 2008). Language tests and assessment scales should be developed in ways that allow assessing learners’ vocabulary knowledge, language production features, etc. (Iwashita et al., 2008).
Second, the quality of teachers’ training and their personal bias do play an important role in the way language assessments and tests are administered. Consequently, traditional approaches to training teachers and other education professionals participating in language assessments have to be abandoned (Brown, 2003). In the current state of speaking assessments, teachers’ personality and perceptions of students’ language abilities often distort the real, objective picture of these abilities and skills (Brown, 2003). Thus, teachers should become more attentive to how they test students and how well they can cope with personal bias. In the meantime, systemic changes in education will have to be made to ensure that styles examiners and perceptions of assessment do not become too diverse (Brown, 2003).
Third, teachers have to design language tests, depending on the level of learners’ proficiency. The importance of proficiency levels, while assessing speaking, was stressed by Brown (2003), Magnan (1986), and Iwashita et al. (2008). Depending on the level of language proficiency, individuals will also differ in their vocabulary knowledge and production features (Iwashita et al., 2008). Still, even the most promising assessment tasks should be trialed (Akiyama, 2003). Probably, this is one of the best ways for teachers to design an appropriate and, more importantly, valid assessment model for school.
Finally, the results of this review give enough ‘food’ for future researchers and theoreticians in the language studies. Future researchers will need to focus on the two main tasks: first, develop unambiguous and explicit definitions of the abilities that are to be measured by teachers (Brown, 2003) and, second, investigate the existing assessment tasks and their empirical results in order to develop assessment models that will be universal or, at least, comparable across schools (Akiyama, 2003). Apparently, the language science and practice will continue to evolve, and it is one of the most essential tasks to create a reliable, valid, and flexible speaking assessment instrument that will allow examiners to reduce the scope of their personal bias.
To sum up, assessing speaking is one of the most challenging tasks facing today’s teachers. Numerous theories and models of language assessments have been developed, but most problems surrounding the process of assessing speaking continue to persist. This paper provides a wealth of information and recommendations to improve the quality, validity, reliability, and flexibility of language tests. Finally, future researchers will have to design explicit theoretical definitions of the language abilities that are to be measured and explore the existing empirical results to develop new models of assessing speaking at various levels of language proficiency.