Welcome to “Testing: A Double-sided Analysis of Testing Tools and Techniques” (The two sides of a coin) Prof. Juan Diaz – 2009 Content of the workshop/talk: 1. The teacher/tester personality 2. Testing activities: pros and cons, and solutions Testing: A Double-sided Analysis of Classroom Practices and Testing Tools and Techniques at School. Juan Oscar Díaz (2009) Abstract Teaching and testing should be considered two phases of the same process of education. They should be viewed as part of a cyclical process. Our educational system and educationalists have encouraged and insisted on the idea that teaching and testing are two different instances in education that should be kept well apart, each bearing its own rules, conditions, expected behaviors and set of attitudes. These beliefs are so deeply rooted in our culture that even well-meaning individuals with robust theoretical background find it hard to shun this view and stop acting accordingly. As a result of this divide between teaching and testing, two very different, clashing and sometimes schizophrenic personalities emerge, that of the teacher and that of the tester. At a closer look, this phenomenon may pose several problems and inconveniences, to say the least, to all the participants in the process of teaching and learning. This personality split is more than apparent in the wide gap existing between teaching and testing practices, instruments, behaviors, attitudes, emotions, and other variables. Becoming a better educationalist and teacher starts with an exploratory journey of ourselves, the language we use, and the unjudgemental observation of some our basic human emotions. In a first section, this workshop/course basically is aimed at the exploration of our own attitudes and beliefs and is meant to raise awareness on the great divide that separates our teaching practices from our testing ones. Besides, we also seek to evaluate well-known (but usually misunderstood or ill-employed and badly constructed/applied) testing tools and techniques in order to become familiar with the pluses and minuses of each one of them and be able to construct testing/assessment instruments that are finely tuned and geared to our learners and the learning objectives set for each particular course or learning event. Key words: teaching vs. testing, assessment, awareness, attitudes and beliefs, classroom practices, testing tools, instruments. Biodata Juan Oscar Díaz is a teacher of English Language and Literature for higher education, a translator of English, and he has successfully completed the Master’s Degree programme in Applied Linguistics at Facultad de Lenguas, Universidad Nacional de Córdoba. He has been a teacher of Reading Comprehension (ESP) in the engineering courses (electronics, telecommunications, aeronautics and computing) at Instituto Universitario Aeronáutico (IUA). He has also been chief instructional designer and the coordinator of the Proficiency Test at the Facultad de Ingeniería. He has worked as a learning designer at Asociación de Investigaciones Tecnológicas on the design and development phases of e-learning courses. He has held the chairs of English Language 3, 4 and 9, Grammar 3 and 4, and Linguistics at Siglo 21 University and has also been in charge of mentoring undergraduates of the Licenciatura en Lengua Inglesa in their Final Dissertations. He has worked as a counselor and facilitator for e-learning, blended and face-to-face courses. He has trained staff at Siglo 21 University, the Language Department of IUA, IDIPE Empresas (incompany training institute), the German School, the National University of Villa María, and the National University of Santiago del Estero, etc. in the instructional design of reading comprehension material and other related topics. He has been counselor and lecturer in the articulation project between the National University of Villa María and Secondary School Education: Reading Comprehension and Instructional Design and has also lectured at several ACPI-organized seminars, courses and workshops. He has participated in National and International Conferences. He has published reading comprehension teaching materials and courses in both English and Spanish. He is the author of E-Learning English: Strategic Reading and Listening for Business, a series of three ebooks for business English addressing the two receptive skills of reading and listening comprehension from a strategy-based instructional approach. At present he is working for Trainex Group, a Buenos Aires-based Instructional Design Company working for the Royal Dutch Shell Company. At Trainex Juan is Project Manager, Coordinator and supervisor of blended learning developments based on e-learning formats. The job involves negotiation with clients, project management, distribution and allocation of human resources, quality control, training to personnel, etc. He is also working as a teacher of conversation and reading comprehension at the Engineering School of the Catholic University of Córdoba. Meet some of our experts… Salim Razi Douglas Brown Elana Shohamy Penny Ur Charles Alderson John Munby Andrew Cohen Arthur Hughes Tim McNamara Barbara Dobson Cyril Weir Participants in the Testing Process/Event Boring - I see they’re frightened Disappointed when I see the results No me gusta prepararlos ni corregirlos Hate to correct tests, exams, quizzes,…. Las horas que paso en eso para nada No veo que sirvan de mucho Les quisiera decir que no es de vida o muerte No sé muy bien qué es lo que estoy evaluando Múltiple opción - Me dan mucha bronca los aplazos – No ME estudian Trato de ser innovadora pero no es fácil Ya sé quienes van a aprobar y quienes no Correcting takes so much time How to help students to feel better A nadie le gusta pero son una obligación Tiring - Not sure how to test something Llenar actas me harta – Irritating Uso y reuso cosas que ya tengo porque si no, no me alcanza el tiempo Always have problems to know how to be fair Se copian y eso se nota y me molesta No sé cómo controlarlos Ese día estoy hecha una bruja Hacer un plan de estudio y tener poco tiempo Saber quién corrige y adaptar el discurso Me gusta que el docente me ayude a entender Me ponía muy nerviosa y tenía dolores de estómago Odiaba los fill-in-the-blanks y los multiple choice con trampas Preguntas con trampas Estudiar de memoria…y ahora los reto! Sentía que no iba a poder con tanto Me ponía re-ansiosa Las notas, era lo único que miraba ¿Cómo estoy? ¿Cuánto sé? ¿Qué me van a pedir? Dificultad Resultados, si dependían del “humor” de las profes - I remember I cried several times at home ¿Y si me va mal? Tenía ganas de matar a mi hermano…. Angustia en los orales Ganas de llorar con algunos profes Nervios, presión, tensión Posibilidad de dar a conocer lo que aprendí Incomodidad, sin saber cómo me iban a evaluar Cierto aburrimiento Se quedaba todo en lo teórico Muy paciente Me encanta enseñar I see myself as a tolerant teacher Creo que soy clara y amena Tengo muy buena relación con los alumnos Me gusta la disciplina pero también el desorden y el ruido Me percibo como paciente, eficiente, sé cómo hacer mi trabajo Soy de ayudar y de contestar todas las dudas Trato de que estén preparados para los parciales y exámenes Querida por casi todos mis alumnos I think I’m a good teacher Impaciente y acelerada No me gustan los exámenes No tolero que se copien o que la “canchereen” Sometimes I feel I’m not fair Me odian en las pruebas, soy muy demanding Trato de que se den cuenta solos de los errores A veces no sé qué tomarles No sé si soy justa en cómo corrijo,… Siento molestia de que me hagan tantas preguntas en los exámenes Soy un poco autoritario A veces se me va la mano y apunto demasiado alto con las actividades Very strict with time How do we start making changes (for the better)? Situation Interpret -ation Emotion Behavior Reflecting on our own “testing” practices. • • • • • • • • • • • • • • • • • • • How often do you test your students? What type of instruments do you use? Do you test the four skills? What abilities do you find harder to test? How do you prepare your tests? Do you test only “book” material? Do you try to innovate in the type of exercises you include? Do you ever include a “surprise” element? Do you take a long time to prepare the test? What type of activities/exercises do you include? Why do you test your students? What type of exercises do you find harder to score? Do you write comments? Do you give feedback? What type? When? How? What do you do before the test? During the test? After the test? How do your students react to the tests you prepare? Where do you get the written and oral texts for your tests from? What aspects of your tests do you think need improving? What aspects are you satisfied with? What does the TESTER do? • Plans what to include in the test (according to objectives, range of topics covered, etc.) • Prepares the test and the testing props • Administers the test • Marks and scores the test • Hands out the tests with/without feedback H. Douglas Brown, Language Assessment (2004) A deeply-rooted dissociation What is a test? A method of measuring a person’s ability, knowledge, or performance in a given domain. • A test is a method • A test must measure • A test measures an individual’s ability, knowledge, or performance • A test measures performance (competence) – knowledge • A test measures a given domain Why take testing so seriously? Tests represent a social technology deeply embedded in education, government, and business; as such they provide the mechanism for enforcing power and control. Tests are most powerful as they are often the single indicators for determining the future of individuals. INSTRUCTIONS TESTS ACTIVITIES T E S T E R How to test? The best method! • MultipleMultiple-choice tests • True and False tests • GapGap-filling tests • Questionnaires • Ordering tasks • Summary writing The cloze test The C-test The cloze-elide test Fill-in-the-blanks tests • • • • Reading Teachers ‘ discomfort Variety of techniques in reading classes, they do not tend to use the same variety of techniques when they administer reading tests. Any teaching activity can easily be turned into a testing item, and vice versa. No single method satisfies reading teachers --- different purposes • • • • Reading teachers : what they need to test Selecting the most appropriate testing method for their students; Discrete-point techniques when they intend to test a particular subject at a time, Integrative techniques when the aim is to see the overall picture of a reader. Multiple-choice Activities: a definition What is the right option to complete the blank in the following sentence? ““They They ______ Mary with her boyfriend in the pub last night.” night.” 1. 2. 3. 4. looked saw turned watched ? A multiple-choice question consists of a stem and a number of options (usually 4), from which the testee has to select the right one. Multiple-choice Activities 1. It is time-saving to score 2. It provides a very objective way of testing 3. It is a widely-researched technique 4. It provides testers with the means to control test-takers’ thought processes when responding 5. It allows the tester to control the range of possible answers 6. Testees are very familiar with the technique 1. 2. 3. 4. 5. 6. It tests only receptive kowledge/recognition knowledge Guessing may have a considerable but unknowable effect on scores It restricts what can be tested It is very difficult to write successful items Washback may be harmful Cheating is facilitated Multiple-choice Activities: some caution! Being a good reader doesn’t garantee being successful in multiplechoice tests. Distracters may trick deliberately, which results in a false measurement. Test-takers do not necessarily link the stem and the answer in the same way that the tester assumes. Test-takers are provided with possibilities that they might not otherwise have thought of. Multiple-choice Activities: difficult to construct? It is very difficult to write successful items for a number of reasons. • Items meant to test grammar only, may also test for lexical knowledge as well. • Distracters may be eliminated as being absurd. • Correct responses may be common knowledge. • Items are answerable on the basis of just memory and not comprehension. • Items get encumbered with material meant to trick the careless student (i.e., the use of double negatives). • Items are constructed with more than one correct answer (even if students have not been taught both possible answers). • Items are constructed with clues as to which is the correct answer (the difference in length or structure). • It may require training in improving educated guesses rather than in learning the language. Multiple-choice Activities: using errors …the possibility of using high frequency errors among students (in completion exercises) to construct the distracters for grammar and vocabulary. …the answers provided by students in a blanks exercise were somehow different from the options chosen by professionals to use in the multiple choice exercise. …all or most of the distracters are somewhat appropriate, but only one is the best answer. …suitable for developing students’ problem solving abilities. It is the students’ task to arrive at certain conclusions why one option is the best and why discard the others. It may also be useful, at least from time to time, to use totally implausible distracters, just to see which students are not actively engaged in the reasoning process of finding the best answer. Multiple-choice Activities How about including errors as part of the options? Ingram • • incorrect forms in item elicitations, perfectly acceptable teaching and testing: a learning situation and a discrimination situation. Chastain • • students make enough errors with some ingenuity, teachers can test common errors Multiple-choice Activities Multiple-choice editing task Multiple-choice Activities: some useful tips! • Only one option should be correct. Otherwise, call it M-R and warn sts. • There should be no question about the right option. • Distracters must be plausible. • Options should agree grammatically with the stem. • Options should be balanced in terms of syntax, gender, number, verb tense, etc. • Options should not repeat, or use synonyms. • They should be roughly the same length. • One of the options should not help infer the right one. • Display them at random unless a chronological or sequence order is needed. • Options should include students’ common errors as distracters. • Do not use: “all of the above,” “none of the above,” “I don’t know,” etc. (they only show the tester’s inability to think of other plausible options) • Avoid the use of negative forms (not-any, no, never, etc.) or absolute terms (never, always, totally). • Do not repeat words in the stem and in the options. True & False Activities Dichotomous Items (True-False Technique) 1. 2. 3. 4. T – F (or Yes – No) T – F – Not given T – F + correcting T – F + How do you know? True & False Activities 1. They are easy to design (echoing statements?) 2. Easy and fast to score 3. It provides a very objective way of testing 4. It is a widely-used – most students are familiar with the technique 5. They may tap meaninginference skills (comprehension?) 6. Good for machine-markable activities 7. (Perceived as) less threatening than other types of testing techniques 1. 2. 3. 4. 5. Do not know exactly what the test measures – local, global comprehension, inference of word meaning, guessing, ability to spot lexicogrammatical patterns? 50% chance of guessing 33% chance of guessing Chance of hitting the right one without actual comprehension High chances of cheating Gap-filling Tasks • Fill-in-the-blank tests: rational deletion • Cloze tests - Fixed-ratio deletion - Multiple-choice cloze tests • C-tests • Cloze-elide tests Cloze Test ...typically constructed by deleting from selected texts every n-th word ... and simply requiring the test-taker to restore the word that has been deleted. … ‘n’ usually differs from intervals of every 5th word to every 12th word…. …according to research, in order to achieve reliable results there should be at least 50 deletions in a cloze test. • • • • Reading passages of 150 to 300 words 30-50 blanks Deletion of every 7th word (-/+ 2) Integrative test approach Cloze Test Cloze results: good measures of overall proficiency. A number of abilities (competence in language): Knowledge of…. • • • • • vocabulary grammatical structure discourse structure reading skills and strategies internalized expectancy grammar Global language proficiency Cloze Test “n” is a number between 5 to 11. This type of test does not require extracting information by skimming. “n” is the a number between just 5 to 7. Cloze tests are sensitive to constraints beyond 5 to 11 words on either side of a blank. Cloze tests do not assess global reading ability but they do assess local-level reading. Cloze Test 1. It is time-saving and easy to mark and score 2. It provides a very objective way of testing 3. It is a widely-researched technique 4. It MAY provide testers with the means to control output and focus on specific language points 5. It MAY allow the tester to control the range of possible answers 6. It tests local reading 7. It activates content and formal schemata 1. 2. 3. 4. 5. 6. 7. 8. Construction is problematic It restricts what can be tested Do not know exactly what the test measures It is very difficult to write successful and balanced pieces geared to particular objectives Scoring based on “acceptable/appropriate” is problematic It is irritating for testees High levels of exam anxiety. It does not assess global comprehension The C-test One cool autumn evening, Bob Lang, a young professional, returned home from a trip to the supermarket to find his computer gone. Gone! All so- of cr- thoughts ra- through h- mind: H- it bestolen? H- it be- kidnapped? H- searched h- house f- a cl- until h- noticed a sm- piece o- printout pastuck un- a mag- on h- refrigerator do-. His hesank a- he re- this sim- message: can’t continue, file closed, bye.” The C-test The C-test • integrative testing instrument • measures overall language competence • consists of four to six short, preferably authentic, texts in the target language, • “the rule of two” has been applied: the second half of every second word has been deleted, • beginning with the second word of the second sentence; • the first and last sentences are left intact. • If a word has an odd number of letters, the “bigger” part is omitted, e.g., proud becomes pr-. • One-letter words, such as I, are ignored in the counting. • The students’ task is to restore the missing parts. • In a typical C-test there are 100 gaps, that is, missing parts. • Only entirely correct restorations are accepted. The C-test …el C-test o “prueba C” es una prueba de cierre que fue desarrollada a partir de los clozes tradicionales. Sus creadores, Klein-Braley y Raatz (1981), lo consideraron un instrumento de evaluación muy adecuado para medir la competencia lingüística global en lengua extranjera. C-tests are more reliable and valid than cloze tests in terms of assessing but are thought to be more irritating than cloze tests. …[they are] based upon the same theory of closure as the cloze test. The Cloze Elide test A test or an examination (or "exam") is an open assessment, which often administered on this paper or on the computer, intended to measure the test-takers' or respondents' (often a good student) knowledge, skills, sport aptitudes, etc. Tests that usually are often used in education, the professional certification, counseling, psychology, the some military, and many other the fields. The measurement that is the goal of assessment testing is to called a test score, and is "a summary of the written evidence which contained in an examinee's few responses to the items....." The Cloze Elide test • • • • alternative integrated approach technique introduced as the ‘Intrusive Word Technique’ also as “...‘text retrieval’, ‘text interruption’, ‘doctored text’, ‘mutilated text’ and ‘negative cloze’...” tester inserts words and the test-taker is asked to find the words that do not belong to the text be sure that the inserted words do not belong to the text. The Cloze Elide test Cloze-elide tests are good, indirect measures of English language proficiency, comparing very favorably with more commonly used testing procedures. Questions Questions can relate to: • • • • • • • • • • Main idea/topic Supporting details Implied information (inference) Textual reference Scanning of details Recognition of unstated information Inference of word/idiom meanings Grammatical features Pragmatic function of a chunk of text Cohesion devices Questions 3 main levels or strands of comprehension: Superficial, explicit 1. Literal comprehension 2. Interpretive or Referential comprehension 3. Critical reading comprehension Deeper, implicit Questions 3 main levels or strands of comprehension: Comprehension at this level involves surface meanings. Teachers can ask students to find information and ideas that are explicitly stated in the text. In addition, it is also appropriate to test vocabulary. 1. Literal comprehension 2. Interpretive or Referential comprehension Students need to be able to see relationships among ideas. Re-arrange the ideas or topics discussed in the text. Explain the author's purpose for writing the text. Summarize the main idea when this is not explicitly stated in the text. Select conclusions which can be deduced from the text . 3. Critical reading comprehension Ideas and information are evaluated. The ability to differentiate between facts and opinions; to recognize persuasive statements; and to judge the accuracy of the information given in the text. Ordering tasks Through ‘ordering tasks’, test-takers are asked to put the scrambled words, sentences, paragraphs or texts into correct order. Put the following sentences in order. A. B. C. D. E. F. G. it was called “The Last Waltz” the street was in total darkness because it was one he and Richard had learnt at school Peter looked outside he recognized the tune and it seemed deserted he thought he heard someone whistling DGECABF DBFGECA Ordering tasks 1. It is not difficult to desing (?) 2. It tests cohesion-detection abilities 3. It assesses the recognition of overall text organization 4. It tests the spotting of grammatical patters, lexical chains, etc. 1. 2. 3. 4. It is difficult to administer (proposed new order) Scoring is very problematic (some sequences may be correct but not the whole ordering) Scoring: wholly correct or wholly incorrect. Protocols are too complex to make the effort and time worth the while ….testing professionals think it unfair to evaluate this type of question according to the traditional method of marking it completely right or completely wrong. Ordering tasks Summary-writing • Summary tests • Gapped summary tests Summary-writing 1. It involves almost no contruction of the activity (?) 2. It tests global comprehension 3. It taps the ability to select from a hierarchy of ideas 4. It activates content and formal schemata 5. It allows for free-recall techniques 1. 2. 3. 4. 5. 6. 7. Scoring is very problematic Does the test assess reading /listening comprehension or writing? Very difficult to know exactly what the test measures It lends itself for very subjective measures Scoring based on selection of main ideas (?) Should spelling, grammar, lexical-choice errors be considered? It may allow for a copy-andpaste approach Summary of main ideas Main ideas for whom? Parallel Texts • Author? • Teacher? • Student? • A reader with a particular purpose in mind? IDEA UNITS TEXT IU 1 IU 2 STUDENT’S SUMMARY IU 3 IU 4 IU 5 Is the test assessing READING COMPREHENSION or WRITING PRODUCTION? 1- Asking the test-takers to write the summary in their first language 2- Presenting a number of summaries and asking the testees to select the best summary. 3- The gapped summary. 1. Students read a text for a specific period of time. 2. They put away the text. 3. They are presented with a summary of the text with words missing 4. They are to restore the missing words. 5. Teacher scores the summary as it was a gapfilling test. A tester’s obligation…. Trianguale measurements of 2, 3 or more performances and/or contexts before drawing a conclusion INSTRUCTIONS TESTS ACTIVITIES Questions or no questions? I won’t answer any questions whatsoever! Where…? What….? How….? ???? Please, ask me any questions you want. How do you answer questions 5, 6, 7 and 8? Are 1, 2, 3 and 4 correct? ¿Cómo marco la correcta? ¿Se me acabó el lugar, sigo escribiendo atrás? ¿Está bien la respuesta así? ¿Tenemos más tiempo? Estoy en la 3 todavía. ¿Sacamos otra hoja para escribir las respuestas? ¿Y pongo todo lo que dice de….? ¿Puede ir más de una palabra en el espacio en blanco? ¿Qué significa ….? ¿Y corrige las faltas de ortografía? ¿Y si no está del todo bien, vale la mitad? ¿De cuántos renglones? ¿Primero lo leo? ¿Y cómo va a corregir esto? ¿Cuánto vale este ejercicio? ¿Y cómo hago este ejercicio? Nunca lo hicimos en clase. Este artículo está lleno de palabras que no entiendo, ¿cómo hago? ¿Puede dar un ejemplo de cómo se hace el ejercicio? ¿Se puede contestar con lápiz? ¿Marcamos con una cruz en la tabla o ponemos la palabra en el espacio? A generic test template Introduction, objectives, topics tested, timing, tips about how to do the test, anticipation of student’s FAQs Test activities with clear and complete directions, reminders of time, tips about particular activities, anticipated answers to students’ FAQs, marks per activity or item, etc. Area for students to comment on their difficulties, thought processes, express uncertainty, communicate with the teacher, express feelings, etc. Conclusion, motivating and relaxing remarks, information about test results delivery date and format, type of feedback to be expected, etc. The use of an AVATAR in the test/exam sheet. So, where are the 2 sides of the coin? Teaching and Testing The Teacher and the Test The Teacher and the Tester Pluses and Minuses of activity types INTEGRATION THANK YOU VERY MUCH for taking part in “Testing: A Double-sided Analysis of Testing Tools and Techniques” (The two sides of a coin) Prof. Juan Díaz – 2009