Nghiên cứu xác trị các điểm cắt của bài thi VSTEP 3-5 đánh giá năng lực tiếng Anh

Luận văn thạc sĩ VNU ULIS nghiên cứu xác định điểm cắt bài thi nghe đánh giá năng lực tiếng Anh từ bậc 3 đến bậc 5 theo khung năng lực ngoại ngữ.

2018

208
0
0

Phí lưu trữ

55 Point

Mục lục chi tiết

DECLARATION OF AUTHORSHIP

1. CHAPTER I: INTRODUCTION

1.1. Statement of the problem

1.2. Objectives of the study

1.3. Significance of the study

1.4. Scope of the study

1.5. Statement of research questions

1.6. Organization of the study

2. CHAPTER II: LITERATURE REVIEW

2.1. Validation in language testing

2.2. Standard setting for an English proficiency test

2.2.1. Definition of standard setting

2.2.2. Overview of standard setting methods

2.2.3. Common elements in standard setting

2.2.4. Selecting a standard-setting method

2.2.5. Choosing a standard setting panel

2.2.6. Preparing descriptions of performance-level descriptors

2.2.7. Providing feedback to panelists

2.2.8. Compiling ratings and obtain cut scores

2.2.9. Evaluating standard setting

2.2.10. Comparisons to other standard-setting methods

2.2.11. Comparisons to other sources of information

2.2.12. Reasonableness of cut scores

2.3. Communicative language testing

2.4. Statistical analysis for a language test

2.4.1. Statistical analysis of multiple choice (MC) items

2.4.2. Investigating reliability of a language test

2.5. Review of validation studies

2.5.1. Review of validation studies on standard setting

2.5.2. Review of studies employing argument-based approach in validating language tests

2.6. Summary

3. CHAPTER III: METHODOLOGY

3.1. Context of the study

3.2. About the VSTEP

3.2.1. The development history of the VSTEP

3.2.2. The administration of the VSTEP.3-5 test in Vietnam

3.2.3. Test structure and scoring rubrics

3.2.4. The establishment of the cut scores

3.2.5. Building an interpretive argument for the VSTEP

3.3. Description of methods of the study

3.3.1. Analysis of the test tasks and test items

3.3.1.1. Analysis of test tasks
3.3.1.2. Analysis of test items

3.3.2. Analysis of test reliability

3.3.3. Validation of cut-scores

3.3.4. Description of Bookmark standard setting procedures

3.3.5. Selection of participants of the study

3.3.5.1. Test takers of early 2017 administration
3.3.5.2. Participants for Bookmark standard setting method

3.3.6. Descriptions of tools for data analysis

3.3.6.1. Text analyzing tools
3.3.6.2. Speech rate analyzing tool
3.3.6.3. Statistical analyzing tools

4. CHAPTER IV: DATA ANALYSIS

4.1. Analysis of the test tasks and test items

4.1.1. Analysis of the test tasks

4.1.2. Characteristics of the test rubric

4.1.3. Characteristics of the input

4.1.4. Relationship between the input and response

4.1.5. Analysis of the test items

4.1.6. Overall statistics of item difficulty and item discrimination

4.2. Analysis of the test reliability

4.3. Analysis of the cut-scores

4.4. External evidence

5. CHAPTER V: FINDINGS AND DISCUSSIONS

5.1. The characteristics of the test tasks and test items

5.2. The reliability of the VSTEP

5.3. The accuracy of the cut scores of the VSTEP

6. CHAPTER VI: CONCLUSION

6.1. Overview of the thesis

6.2. Contributions of the study

6.3. Limitations of the study

6.4. Implications of the study

6.5. Suggestions for further research

LIST OF THESIS-RELATED PUBLICATIONS

REFERENCES

APPENDIX 1: Structure of the VSTEP

APPENDIX 2: Summary of the directness and interactiveness between the texts and the questions of the VSTEP

APPENDIX 3: Consent form (workshops)

APPENDIX 4: Agenda for Bookmark standard-setting procedure

APPENDIX 5: Panelist recording form

APPENDIX 6: Evaluation form for standard-setting participants

APPENDIX 7: Control file for WINSTEPS

APPENDIX 8: Timeline of the VSTEP

APPENDIX 9: List of the VSTEP.3-5 developers

Trích đoạn nội dung tài liệu

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES ****** NGUYỄN THỊ QUỲNH YẾN DOCTORAL DISSERTATION AN INVESTIGATION INTO THE CUT-SCORE VALIDITY OF THE VSTEP.3-5 LISTENING TEST MAJOR: ENGLISH LANGUAGE TEACHING METHODOLOGY CODE: 9140231.01 HANOI, 2018 LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES ****** NGUYỄN THỊ QUỲNH YẾN DOCTORAL DISSERTATION AN INVESTIGATION INTO THE CUT-SCORE VALIDITY OF THE VSTEP.3-5 LISTENING TEST (Nghiên cứu xác trị các điểm cắt của kết quả bài thi Nghe Đánh giá năng lực tiếng Anh từ bậc 3 đến bậc 5 theo Khung năng lực Ngoại ngữ 6 bậc dành cho Việt Nam) MAJOR: ENGLISH LANGUAGE TEACHING METHODOLOGY CODE: 9140231. FRED DAVIDSON HANOI, 2018 LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com This dissertation was completed at the University of Languages and International Studies, Vietnam National University, Hanoi. This dissertation was defended on 10th May 2018 This dissertation can be found at: - National Liberary of Vietnam - Liberary and Information Center -Vietnam National University, Hanoi i LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com DECLARATION OF AUTHORSHIP I hereby certify that the thesis I am submitting is entirely my own original work except where otherwise indicated. I am aware of the University's regulations concerning plagiarism, including those regulations concerning disciplinary actions that may result from plagiarism. Any use of the works of any other author, in any form, is properly acknowledged at their point of use. Date of submission: _____________________________ Ph.D Candidate’s Signature: _____________________________ ii LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Nguyễn Hòa (Supervisor) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Fred Davidson (Co-supervisor) iii LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com TABLE OF CONTENTS LIST OF FIGURES………………………………………………………………………. viii LIST OF TABLES………………………………………………………………………… ix LIST OF KEY TERMS……………………………………………………………………. xix CHAPTER I: INTRODUCTION………………………………………………………. Statement of the problem………………………………………………………………. Objectives of the study…………………………………………………………………. Significance of the study …. Scope of the study………………………………………………………………………. Statement of research questions………………………………………………………… 5 6. Organization of the study………………………………………………………………. 5 CHAPTER II: LITERATURE REVIEW………………………………………………. Validation in language testing………. The evolution of the concept of validity. Aspects of validity. Argument-based approach to validation………………………………………… 11 2. Standard setting for an English proficiency test………………………………………… 15 2. Definition of standard setting……………………. Overview of standard setting methods…………………………………………. Common elements in standard setting……………………………………………. Selecting a standard-setting method………………………………………. Choosing a standard setting panel…………………………………………. Preparing descriptions of performance-level descriptors…………………. Providing feedback to panelists…………………………………………… 26 2. Compiling ratings and obtain cut scores…………………………………. Evaluating standard setting………………………………………………. Evaluating standard setting…………………………………………………….…………………… 30 iv LUAN VAN CHAT LUONG download : add luanvanchat@agmail. Comparisons to other standard-setting methods…………………. Comparisons to other sources of information……………………. Reasonableness of cut scores……………………………………… 34 3. Communicative language testing………………………………………………… 34 3. Statistical analysis for a language test…………………………………………………. Statistical analysis of multiple choice (MC) items………………………………. Investigating reliability of a language test………………………………………. Review of validation studies……………………………………………………………. Review of validation studies on standard setting………………………………… 49 5. Review of studies employing argument-based approach in validating language tests………………………………………………………………………………. Summary………………………………………………………………………………… 60 CHAPTER III: METHODOLOGY……………………………………………………. Context of the study……………………………………………………………………. About the VTEP.1 The development history of the VSTEP. The administration of the VSTEP.3-5 test in Vietnam……………………. Test structure and scoring rubrics…………………………………………. The establishment of the cut scores ………………………………………. About the VSTEP. The establishment for the cut scores of the VSTEP. Building an interpretive argument for the VSTEP. 70 v LUAN VAN CHAT LUONG download : add luanvanchat@agmail. Description of methods of the study……………………………………………. Analysis of the test tasks and test items…………………………………. Analysis of test tasks……………………………………………. Analysis of test items…………………………………………… 73 3. Analysis of test reliability………………………………………………. Validation of cut-scores…………………………………………………. Description of Bookmark standard setting procedures …………………………. Selection of participants of the study……………………………………………. Test takers of early 2017 administration…………………………………. Participants for Bookmark standard setting method……………………. Descriptions of tools for data analysis…………………………………………… 83 3. Text analyzing tools……………………………………………………… 83 3. Speech rate analyzing tool………………………………………………. Statistical analyzing tools………………………………………………. 87 CHAPTER IV: DATA ANALYSIS……………………………………………………. Analysis of the test tasks and test items…………. Analysis of the test tasks………………………………………………. Characteristics of the test rubric…………………………………………. Characteristics of the input………………………………………………. Relationship between the input and response……………………………. Analysis of the test items…………………………………………………………. Overall statistics of item difficulty and item discrimination……………… 102 vi LUAN VAN CHAT LUONG download : add luanvanchat@agmail. Analysis of the test reliability……. Analysis of the cut-scores……. External evidence………………………………………………………………… 132 CHAPTER V: FINDINGS AND DISCUSSIONS……………………………………… 145 1. The characteristics of the test tasks and test items……………………………………… 145 2. The reliability of the VSTEP. The accuracy of the cut scores of the VSTEP. 151 CHAPTER VI: CONCLUSION ………………………. Overview of the thesis…………………………………………………………………. Contributions of the study………………………………………………………………. Limitations of the study…………………………………………………………………. Implications of the study…. Suggestions for further research………………………………………………………… 159 LIST OF THESIS-RELATED PUBLICATIONS………………………………………… 161 REFERENCES……………………………………………………………………………. 162 APPENDIX 1: Structure of the VSTEP. 172 APPENDIX 2: Summary of the directness and interactiveness between the texts and the questions of the VSTEP. 174 APPENDIX 3: Consent form (workshops)………………………………………………. 177 APPENDIX 4: Agenda for Bookmark standard-setting procedure………………………. 179 APPENDIX 5: Panelist recording form…………………………………………………… 180 APPENDIX 6: Evaluation form for standard-setting participants ………………………. 181 APPENDIX 7: Control file for WINSTEPS………………………………………………. 183 APPENDIX 8: Timeline of the VSTEP. 185 APPENDIX 9: List of the VSTEP.3-5 developers………………………………………… 186 vii LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com LIST OF FIGURES Figure 2.1: Model of Toulmin’s argument structure (1958, 2003)……………………….2: Sources variance in test scores (Bachman, 1990)…………………………….3: Overview of interpretive argument for ESL writing course placements……… 57 Figure 4.1: Item map of the VSTEP.2: Graph for item 2……………………………………………………………….3: Graph for item 3………………………………………………………….4: Graph for item 6……………………………………………………………….5: Graph for item 13……………………………………………………………… 115 Figure 4.6: Graph for item 14……………………………………………………………… 117 Figure 4.7: Graph for item 15…………………………………………………………….8: Graph for item 19…………………………………………………………….9: Graph for item 20…………………………………………………………….10: Graph for item 28……………………………………………………….11: Graph for item 34……………………………………………………….12: Total score for the scored items……………………………………………. 129 viii LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com LIST OF TABLES Table 2.1: Review of standard-setting methods (Hambleton & Pitoniak, 2006)………….2: Standard setting Evaluation Elements (Cizek & Bunch, 2007)….3: Common steps required for standard setting (Cizek & Bunch, 2007)………….4: A framework for defining listening task characteristics (Buck, 2001)………… 38 Table 2.5: Criteria for item selection and interpretation of item difficulty index………….6: Criteria for item selection and interpretation of item discrimination index…….7: General guideline for interpreting test reliability (Bachman, 2004)…………….8: Number of proficiency levels & test reliability………………………………… 48 Table 2.9: Summary of the warrant and assumptions associated with each inference in the TOEFL interpretive argument (Chapelle et al.1: Structure of the VSTEP.2: The cut scores of the VSTEP.3: Performance standard of Overall Listening Comprehension (CEFR: learning, teaching, assessment)……………………………………………………………………….4: Performance standard of Understanding conversation between native speakers (CEFR: learning, teaching, assessment)…………………………………………………….5: Performance standard of Listening as a member of a live audience (CEFR: learning, teaching, assessment)…………………………………………………………….6: Performance standard of Listening to announcements and instructions (CEFR: learning, teaching, assessment)…………………………………………………………….7: Performance standard of Listening to audio media and recordings (CEFR: learning, teaching, assessment)…………………………………………………………….8: The cut scores of the VSTEP.9: Criteria for item selection and interpretation of item difficulty index…….10: Criteria for item selection and interpretation of item discrimination index…… 75 Table 3.11: Number of proficiency levels & test reliability……………………………….12: The venue for Angoff and Bookmark standard setting method……………….13: Comparison between the Flesch-Kincaid readability analysis and the CEFR - IELTS grading systems……………………………………………………………………. 85 ix LUAN VAN CHAT LUONG download : add luanvanchat@agmail.14: Summary of the interpretative argument for the interpretation and use of the VSTEP.3-5 listening cut-scores …………………………………………………………….1: General instruction of the VSTEP.2: Instruction for Part 1……….3: Instruction for Part 2……….4: Instruction for Part 3…….5: Information provided in the specifications for the VSTEP.6: Summary of the texts for items 1-8……….7: Description of language levels for texts of items 1 -8 in the specification…….8: Summary of the texts for items 9-20………………………………………….9: Description of language levels for texts of items 9 -20 in the specification….10: Summary of the texts for items 21-35………………………………………… 100 Table 4.11: Description of language levels for texts of items 21-35 in the specification…….12: Summary of item discrimination and item difficulty………………………….13: Summary statistics for the flagged items…………………………………….14: Information for item 2………………………………………………………….15: Item statistics for item 2……………………………………………………….16: Option statistics for item 2…………………………………………………….17: Quantile plot data for item 2………………………………………………….18: Information for item 3………………………………………………………….19: Item statistics for item 3……………………………………………………….20: Option statistics for item 3…………………………………………………….21: Quantile plot data for item 3……………………………………………….22: Information for item 6………………………………………………………….23: Item statistics for item 6……………………………………………………….24: Option statistics for item 6…………………………………………………….25: Quantile plot data for item 6………………………………………………….26: Information for item 13……………………………………………………….27: Item statistics for item 13……………………………………………………… 115 Table 4.28: Option statistics for item 13…………………………………………………… 116 Table 4.29: Quantile plot data for item 13…………………………………………………. 116 x LUAN VAN CHAT LUONG download : add luanvanchat@agmail.30: Information for item 14……………………………………………………….31: Item statistics for item 14……………………………………………………… 118 Table 4.32 Option statistics for item 14…………………………………………………… 118 Table 4.33: Quantile plot data for item 14………………………………………………….34: Information for item 15……………………………………………………….35: Item statistics for item 15……………………………………………………… 120 Table 4.36: Option statistics for item 15…………………………………………………… 120 Table 4.37: Quantile plot data for item 15………………………………………………….38: Information for item 19……………………………………………………….39: Item statistics for item 19……………………………………………………… 121 Table 4.40: Option statistics for item 19…………………………………………………… 122 Table 4.41: Quantile plot data for item 19………………………………………………….42: Information for item 20……………………………………………………….43: Item statistics for item 20……………………………………………………… 123 Table 4.44: Option statistics for item 20…………………………………………………… 124 Table 4.45: Quantile plot data for item 20………………………………………………….46: Information for item 28…………………………………………………….47: Item statistics for item 28……………………………………………………… 125 Table 4.48: Option statistics for item 28…………………………………………………… 125 Table 4.49: Quantile plot data for item 28………………………………………………….50: Information for item 34……………………………………………………….51: Item statistics for item 34……………………………………………………… 127 Table 4.52: Option statistics for item 34…………………………………………………… 127 Table 4.53: Quantile plot data for item 34………………………………………………….54: Summary of statistics………………………………………………………….56: The person reliability and item reliability of the test………………………….57: Number of proficiency levels and test reliability…………………….58: The test reliability of the VSTEP.59: Order of items in the booklet………………………………………………….60: Summary of Output from Round 1 of Bookmark standard-setting Procedure ……. 135 xi LUAN VAN CHAT LUONG download : add luanvanchat@agmail.62: Summary of statistics in raw score metric for round 1……………………….63: Summary of Output from Round 2 of Bookmark standard-setting Procedure……….64: Round 3 Feedback for Bookmark Standard-setting Procedure……………….65: Summary of Output from Round 3 of Bookmark standard-setting Procedure……….66: The cut scores set for the VSTEP.3-5 listening test by Bookmark method…… 144 Table 4.67: The cut scores set for the VSTEP.3-5 listening test by Angoff method……….68: Comparison between the results of two standard-setting methods……………. 144 xii LUAN VAN CHAT LUONG download : add luanvanchat@agmail.com LIST OF KEY TERMS Construct: A construct refers to the knowledge, skill or ability that's being tested. In a more technical and specific sense, it refers to a hypothesized ability or mental trait which cannot necessarily be directly observed or measured, for example, listening ability. Language tests attempt to measure the different constructs which underlie language ability. Cut score: A score that represents achievement of the criterion, the line between success and failure, mastery and non-mastery. Descriptor: A brief description accompanying a band on a rating scale, which summarizes the degree of proficiency or type of performance expected for a test taker to achieve that particular score. Distractor: The incorrect options in multiple-choice items. Expert panel: A group of target language experts or subject matter experts who provide comments about a test. High-stakes test: A high-stakes test is any test used to make important decisions about test takers. Inference: A conclusion that is drawn about something based on evidence and reasoning. Input: Input material provided in a test task for the test taker to use in order to produce an appropriate response. Interpretive argument: Statements that specify the interpretation and use of the test performances in terms of the inferences and assumptions used to get from a person’s test performance to the conclusions and decisions based on the test results. Item (also, test item): Each testing point in a test which is given a separate score or scores. Examples are: one gap in a cloze test; one multiple choice question with xiii LUAN VAN CHAT LUONG download : add luanvanchat@agmail.

Nội dung được bảo vệ bản quyền — Tải xuống đầy đủ