Development and trialling of Computer-Based Cambridge English: Young Learners tests LTF conference, 17 Nov 2013 Szilvia Papp & Agnieszka Walczak © UCLES 2013 Overview - Introduction What’s new and what’s not new in CB YLE? Stages of development and trialling Examples of validation studies Results of PB/CB comparability study © UCLES 2013 What’s not new in CB YLE? • • • • • • • • test content – identical syllabus, same wordlists, same grammar and structures list, same topics task types – identical across PB and CB number of questions and tasks – identical overall timing of papers – identical marking – marking identical, speaking assessed by same YLE examiners level of difficulty – all material calibrated to same standard results – shields for each skill and same YLE certificate purpose – to measure children’s language ability, offering candidates a positive, fun experience that has a beneficial impact on future language learning © UCLES 2013 What’s new in CB YLE? • • • • new style graphics navigation: arrows and light bulbs response mechanisms test functionality – adjustable sound volume – onscreen keyboard – enlargeable graphics • onscreen timer • simplified device-neutral rubrics © UCLES 2013 © UCLES 2013 Test Format –what’s new? © UCLES 2013 Test Format –what’s new? © UCLES 2013 Test Format –what’s new? © UCLES 2013 Test Format –what’s new? • response mechanisms © UCLES 2013 What’s not new in CB YLE Speaking? • same 1:1 ratio • same examiner script • same visual prompts • same timings © UCLES 2013 What is new in CB YLE Speaking? • speaking test delivered via computer • uniform test experience • animated characters to indicate speaking window • no support • CB marking process © UCLES 2013 Test Format –what’s new? © UCLES 2013 © UCLES 2013 CB YLE development & trialling Development started Autumn 2011 (Papp, Dec 2011) Stages of trialling Movers & Flyers March-Nov 2012 : China, Shanghai Beijing, Shanghai, Guangzhou, Suzhou January 2013 April 2013 Hong Kong Spain demo road show Starters July-Aug 2013 (Galaczi & Miller, Apr 2012) (Papp, Khabbazbashi & Miller, June 2012) (Papp & Miller, March 2013) (Papp, June 2013) Hong Kong (Papp & Miller, Sep 2013) Starters, Movers & Flyers August 2013 Sep 2013 Nov 2013 © UCLES 2013 Mexico Spain Mexico, Argentina, Spain, Hong Kong (Papp and Walczak 2013) Launch Aim of trials To ensure comparability btw CB and PB/f2f In CB L and RW this involved fine-tuning functionality, ensuring delivery mode has no impact on marks. In CB Speaking, this involved investigating – – – – © UCLES 2013 quality of sound files timing support marking. Timing in Speaking tests Measuring length of responses in f2f Starters Speaking tests Response length Under 1 second (= average 75 ms examiners 73 ms candidates) 1 second 2 seconds 3 seconds 4 seconds 5 seconds 6+ seconds Total Examiners (No. = 17) 8% 26% 30% 15% 7% 4% 10% 738 Candidates (No. = 17) 41% 34% 9% 5% 2% 2% 7% 658 © UCLES 2013 Timing in Speaking tests Measuring length of processing time (including hesitation phenomena) in f2f Starters Speaking tests Processing time response frequency average length maximum total Noemi 27 times 1.7 seconds 4.7 seconds 46.4 seconds Cheng Quing 29 times 1 second 3.2 seconds 28.9 seconds © UCLES 2013 Timing in Speaking tests Fine-tuning length of response windows in CB Movers Part Task description Input Candidate thinking and talking time allowed March 2012 Shanghai trial 25 seconds May 2012 Beijing trial 30 seconds Jan 2013 Hong Kong trial 30 seconds Part 1: Spot the difference Part 2: Telling a story Describing 2 pictures by using short responses 2 similar pictures, 4 questions Understanding the beginning of a story and then continuing it based on a series of pictures 1 picture sequence (consisting of 4 pictures) 35 seconds (approx 12 seconds per picture) 42 seconds (approx 14 seconds per picture) 45 seconds (approx 15 seconds per picture) Part 3: Odd one out Suggesting a picture which is different from a group of pictures and explaining why 3 questions about picture sets 15 seconds per question (45 seconds total) 15 seconds per question (45 seconds total) 15 seconds per question (45 seconds total) 4 open-ended questions about the candidate 15 seconds per question (60 seconds total) 12 seconds per question (48 seconds total) 12 seconds per question (48 seconds total) 2 min 45 seconds (165 seconds) 2 min 45 seconds (165 seconds) 2 min 48 seconds (168 seconds) Part 4: Understanding and Question responding to personal and questions Answer Total speaking time available © UCLES 2013 Support in Speaking tests Quantifying and classifying examiner support in f2f YLE Speaking tests Hm. Put the tomato in front of the train. Where’s the train? Where is the train? [circling] Is this the train? [pointing] George: You put the pencil on the box. [gives him card] Where is the box? On the box. [pointing] Trish: © UCLES 2013 Starters (N = 12) Ave % Movers (N = 12) % Flyers (N = 12) Ave % 15.6 35% acknowledging candidate response Aha. OK. Good. Right. I see. 17.8 50% 10.3 23% back-channelling Uhum. Yes. 4.7 13% 3.3 9% Ave acknowledging candidate response Aha. OK. Good. Right. I see. 18.2 50% acknowledging candidate response Aha. OK. Good. Right. I see. pointing to direct candidate’s attention 5.3 15% pointing to direct candidate’s attention 8.7 19% checking comprehension OK? Hm? Yes? asking a YES/NO question 4.2 11% back-channelling Uhum. Yes. repeating question, And…? 3.8 11% asking a YES/NO question 2.8 6% pointing to direct candidate’s attention 2.8 8% 2.5 6% repeating question, And…? 2.6 7% back-channelling Uhum. Yes. 3.3 9% checking comprehension OK? Hm? Yes? checking comprehension OK? Hm? Yes? 0.8 2% repeating question, And…? 1.7 4% asking a YES/NO question 2.2 6% asking a Whquestion 0.5 1% asking a Whquestion 1.6 4% asking a Whquestion 1.7 5% supplying the answer 0.2 0% giving first half of response 1.2 3% supplying the answer 0.4 1% giving first half of response 0.0 0% supplying the answer 0.5 1% giving first half of response 0.0 0% © UCLES 2013 Marking in Speaking tests Movers PB CB March March 2012 2012 Shanghai Shanghai N=25 N=25 Flyers CB May 2012 Beijing CB July 2012 China N=39 N=259 CB PB CB Jan March March 2013 2012 2012 Hong Kong Shanghai Shanghai N=25 N=23 N=23 CB May 2012 Beijing N=32 CB CB July Jan 2012 2013 China Hong Kong N=263 N=25 Total mark 8.60 ( .76) 8.41 ( .87) 6.91 (1.84) 6.21 (2.39) 7.00 (1.63) 11.13 (1.29) 11.01 (1.24) 9.19 (2.18) 9.64 (2.36) 9.52 (3.38) Reception: Listening & responding 2.92 ( .28) 2.92 ( .27) 2.41 ( .77) 2.10 ( .88) 2.36 ( .69) 2.83 ( .39) 2.93 ( .25) 2.56 ( .67) 2.60 (0.70) 2.50 ( .84) Production: Appropriacy, extent and promptness 2.72 ( .46) 2.61 ( .49) 1.95 ( .73) 1.89 ( .85) 2.08 ( .72) 2.83 ( .39) 2.62 ( .49) 2.00 ( .67) 2.24 ( .68) 2.40 ( .90) - - - - - 2.61 ( .50) 2.59 ( .50) 2.03 ( .65) 2.18 ( .67) 2.46 ( .86) 2.96 ( .20) 2.88 ( .33) 2.37 ( .77) 2.22 ( .84) 2.34 ( .69) 2.87 ( .34) 2.87 ( .37) 2.59 ( .61) 2.62 ( .69) 2.40 ( .95) Production: Grammar & vocabulary Production: Pronunciation © UCLES 2013 PB/CB comparability study Research design • 129 Mexican & 219 Spanish Movers and Flyers trial candidates took a live paper-based (PB) test version & its computer-delivered format (CB) (all exam components) • CB test taken either on PC/laptop or tablet • 56 Starters trial candidates took CB test (all exam components) and a f2f Speaking test © UCLES 2013 Are PB & CB YLE tests comparable? RQ1: How do PB and CB scores relate to each other? RQ2: What explains trial candidate performance in PB and CB tests? RQ3: If there are differences in scores in the two delivery modes, what can they be attributed to? © UCLES 2013 RQ1: Relationship between PB & CB scores for Movers & Flyers © UCLES 2013 (1) How do PB and CB scores relate to each other? © UCLES 2013 RQ2: Performance in PB & CB tests for Movers & Flyers © UCLES 2013 (2) What explains trial candidate performance in PB and CB tests? • M4 For Movers and Flyers separately, the effects of – individual background variables (age, gender) M3 M1&2 – years of English instruction – preference for exam mode (on computer, on paper [B], no difference) – frequency of computer use (every day, once or twice a week, at weekends [B]) – reason for computer use (English homework, games, email/chat, other) – type of computer at home (PC/laptop [B], tablet, combination) • For Movers and Flyers, the effects of computer device ((PC/laptop, tablet) used in CB tests on candidate performance © UCLES 2013 Statistical tests show that there is a curvilinear relationship between PB/CB total scores and age for Movers and … © UCLES 2013 …for Flyers. © UCLES 2013 MOVERS © UCLES 2013 There is a difference in the performance of Mexican and Spanish Movers trial candidates. © UCLES 2013 For Movers, only age matters for explaining trial candidate performance on PB… © UCLES 2013 … and CB tests. © UCLES 2013 Movers trial candidates who use computer every day perform significantly better than those using computer only at weekends, but … © UCLES 2013 …the type of computer at home does not affect performance in the CB test. © UCLES 2013 FLYERS © UCLES 2013 In Flyers, trial candidates in Spain performed better than trial candidates in Mexico, both in PB and CB tests. © UCLES 2013 Flyers trial candidates with more years of English instruction and a preference for taking tests on computers perform better in the PB test… © UCLES 2013 … and in the CB test, and boys perform better in CB test than girls. © UCLES 2013 Frequency of computer use does not affect Flyers trial candidates’ performance in the CB test. © UCLES 2013 Flyers trial candidates who have a tablet at home perform significantly better than those with a PC/laptop, but the reason for computer use does not effect CB performance. © UCLES 2013 STARTERS © UCLES 2013 For Starters, years of English instruction have a positive effect on candidate performance on the CB test. © UCLES 2013 Effect of device used (ipad vs PC) on CB performance for Movers and Flyers © UCLES 2013 Which device (IPAD or PC) trial candidates (Flyers & Movers) used in the test does not have a significant effect on their performance in the CB test. © UCLES 2013 RQ3: Difference in PB & CB scores © UCLES 2013 Difference in PB and CB scores is not affected by frequency of computer use or type of computer at home. © UCLES 2013 The test order does not explain the difference in trial candidates’ PB and CB scores. © UCLES 2013 Summary • CB and PB provide comparable results • Scores are affected by the same variables on PB and CB: years of English instruction, preference for computer • Difference between PB and CB scores is not affected by individual variables that might put some children at a disadvantage • Device which they took the CB test on does not have an effect on CB performance © UCLES 2013 Conclusions For YLE candidates • • A real choice between PB and CB YLE tests, in line with Cambridge English’s ‘test for best’ principle CB YLE tests an intuitive, accessible, contemporary, fun alternative way to assess children’s language ability. For Cambridge English • CB YLE tests provide access to learner performance data and examiner scores which allow – on-going research into the nature of the CB test construct and interaction – review and QA of test material and assessment criteria – refinement of data-based scales and performance descriptors. © UCLES 2013 Candidate feedback (Flyers, Hong Kong) I enjoyed taking the test on the computer because it was easy to use (Taylor Holly Nor Chen) I liked the speaking test the most. (Adam Chris Wong) Yes, because I can use the computer to do the test which I think it’s not bored. (Cheuk Long Ngan) © UCLES 2013 Candidate feedback (Flyers, Hong Kong) Speaking - It's fun/special - I can say to the computer I enjoyed taking the test because it was easy and fun and helped my english. Yes, because it is not easy and not too hard, it just right. Because I learnd new things. © UCLES 2013 Candidate feedback (Starters, Hong Kong) Computer Face-to-face • • • • • • very funny I like compewter relax using computer, real human nervous I don't like the human • • because I like to talk to men because I can listen the teacher voile caily / because rew people I can hear caily can speak more speak slower Observer comment: They seemed really eager and keen in speaking to the computer - spoke freely and followed instructions well. © UCLES 2013 Candidate feedback (Starters, Hong Kong, Spain) Computer Face-to-face Yuk Ting Tina Wong, Starters trial candidate, age 8, Hong Kong Marta Asuncion Lacomba Gascó, Starters trial candidate, age 9, Spain Hiu Ching Chow, Starters trial candidate, age 5, Hong Kong © UCLES 2013 Yuet Yiu Cheung, Starters trial candidate, age 7, Hong Kong Lucia Ariza Espejo, Starters trial candidate, age 7, Spain Candidate feedback (Starters, Spain) Ana De Hevia Selma, age 8, Spain Sergio Ruiz Lozano, age 9, Spain Marta Asuncion Lacomba Gascón, age 9, Spain Elena Abadía Gonzalvo, age 9, Spain © UCLES 2013 Candidate feedback (Movers & Flyers, Mexico) © UCLES 2013 Candidate feedback (Movers & Flyers, Mexico) Natalia Moreno Trejo, Flyers trial candidate, age 12, Mexico Jose Daniel Hurtado Bravo, Flyers trial candidate, age 12, Mexico © UCLES 2013 Candidate feedback (Movers & Flyers, Mexico) © UCLES 2013 Candidate feedback (Movers & Flyers, Mexico) © UCLES 2013 Selected candidate testimonials (Spain) I enjoyed the computer exam, it was like a game - it was fun. I would tell my friends to take the exam because it's from Cambridge and they study a lot for this. Javier Gámiz Fernández, Flyers trial candidate, age 11, Spain I enjoyed taking the exam on the computer because you don't get as nervous and it is more fun. The best bit was the listening exercise. I would recommend it to my friends because it's a difficult exam that's fun at the same time. Marti Ambros Viedma, Movers trial candidate, age 9, Spain I like it - it's quicker and more fun, to tell you the truth I liked all of it, but if I had to choose one part it would be the speaking. I would recommend it to my friends, I would tell them: try it, it's fun and not boring! Maria Palazon Guerrero, Movers trial candidate, age 11, Spain I enjoyed taking the test on the computer - it's very fun. I would tell my friends to do the exam because it's fun, cool and entertaining. Antonio Hidalgo Calderón, Starters trial candidate, age 8, Spain © UCLES 2013 Selected parental testimonials (Spain) The Teacher recommended that my children try the exam. They enjoyed the test because it was easier to correct yourself if you make a mistake, and it's more comfortable than the paper-based exam. I would recommend it then because the children enjoyed it, and I think it's more environmentally-friendly than on paper. Parent of Lidia Lopez Nuñez, Starters trial candidate, age 10, Spain Our child took the test because it seemed a good experience and you could learn how good your child is with language. She liked the listening exercises because you can hear really well with the headphones, it's easier to concentrate. Parent of Pilar Herbella Narváez, Flyers trial candidate, age 11, Spain My child took the test to gain more knowledge, She said it was like a game and as a mother I have seen more motivation with the computer and overall. Parent of Lucia Arize Rubio Barreda, Starters trial candidate, age 8, Spain © UCLES 2013 © UCLES 2013 [email protected] [email protected] © UCLES 2013