论文部分内容阅读
Abstract: Test development is the entire process of creating and using a test. The process is organized into three stages: design, operationalization, and administration. While test development is generally linear, with development progressing from one stage to the next, the process is also an iterative one, in which the decisions that are made and activities that are completed at any stage may lead us to reconsider and revise decisions and repeat activities that have been performed at another stage. Organizing test development in this way helps us monitor the usefulness of the test throughout the development process and produce a useful test.
Keywords: stages, activities, language, test, development
1. Introduction
Test development is the entire process of creating and using a test, beginning with its initial conceptualization and design, and culminating in one or more achieved tests and the results of their use. The amount of time and effort we put into developing language tests will, of course, vary depending upon the situation. At one extreme, with low-stakes tests, the processes might be quite informal, as might be the case if one teacher were preparing a short test to be used as one of a series of weekly quizzes to assign grades. At the other extreme, with high-stakes tests, the processes might be highly complex, perhaps involving extensive trailing and revision, as well as coordinating the efforts of a large test development team. This might be necessary if a test were to be used to make important decisions affecting, a large number of people. We would again point out that although the amount of time and effort that goes into test development may vary, depending on the use for which the test is intended, the qualities of usefulness need to be carefully considered and this consideration should not be sacrificed in either low-stakes or high-stakes situations.
We organize test development conceptually into three stages: design, operationalization, and administration. We say “conceptually” because the test development process is not strictly sequential in its implementation. In practice, although test development is generally linear, with development progressing from one stage to the next, the process is also an iterative one, in which the decisions that are made and the activities completed at one stage may lead us to reconsider and revise decisions, and repeat activities, that have been done at another stage. While there are many ways to organize the test development process, we have discovered over the years that this type of organization gives a better chance of monitoring the usefulness of the test throughout the development process and hence producing a useful test.
2. Stages and activities in language test development
Stage 1: Design
In the design stage we describe in detail the components of the test design that will enable us to insure that performance on the test tasks will correspond as closely as possible to language use, and that the test scores will be maximally useful for their intended purposes. Design is in general a linear process, but in some cases some activities are iterative, that is, will need to be repeated a number of times. For example, there are certain parts of the process, such as considering qualities of usefulness and resource allocation and management, that are recurrent and will need to be considered and thought about throughout the process.
The product of the design stage is a design statement, which is a document that includes the following components:
1. a description of the purpose (s) of the test,
2. a description of the TLU (target language use) domain and task types,
3. a description of the test takers for whom the test is intended,
4. a definition of the construct (s) to be measured,
5. a plan for evaluating the qualities of usefulness, and
6. an inventory of required and available resources and a plan for their allocation and management.
The purpose of this document is to provide us with a principled basis for developing test tasks, a blueprint, and tests. It is important to prepare this document carefully, for this enables us to monitor the subsequent stages of development.
There are six activities involved in the design stage, corresponding to the six components of the design statement, as indicated above. These are described briefly below.
Describing the purpose (s) of the test
This activity makes explicit the specific uses for which the test is intended. It involves clearly stating the specific inferences about language ability or capacity for language use we intend to make on the basis of test results, and any specific decisions which will be based upon these inferences. The resulting statement of purpose provides a basis for considering the potential impact of test use.
Identifying and describing tasks in the TLU domain
This activity makes explicit the tasks in the TLU domain to which we want our inferences about language ability to generalize, and describes TLU task types in terms of distinctive characteristics. It provides a set of detailed descriptions of the TLU task types that will be the basis for developing actual test tasks. These descriptions also provide a means for considering the potential authenticity and interactiveness of test tasks.
Describing the characteristics of the language users/test takers
This activity makes explicit the nature of the population of potential test takers for whom the test is being designed. The resulting description provides another basis for considering the potential impact of test use.
Defining the construct to be measured
This activity makes explicit the precise nature of the ability we want to measure, by defining it abstractly. The product of this activity is a theoretical definition of the construct, which provides the basis for considering and investigating the construct validity of the interpretations we make of test scores. This theoretical definition also provides a basis for the development, in the operationalization stage, of test tasks. In language testing, our theoretical construct definitions can be derived from a theory of language ability, a syllabus specification, or both.
Developing a plan for evaluating the qualities of usefulness
The plan for evaluating usefulness includes activities that are part of every stage of the test development process. A plan for assessing the qualities of usefulness will include an initial consideration of the appropriate balance among the six qualities of usefulness and setting minimum acceptable levels for each, and a checklist of questions that we will ask about each test task we develop. Assessing usefulness in pretesting and administering will include collecting feedback. This will deal with a range of information, both quantitative, such as test scores and scores on individual test tasks, and qualitative, such as observers’ descriptions and verbal self-reports from students on the test taking process. Finally, the plan will include procedures for analyzing the information we have collected. This will include procedures such as the descriptive analysis of test scores, estimates of reliability, and appropriate analyses of the qualitative data.
Identifying resources and developing a plan for their allocation and management
This activity makes explicit the resources (human, material, time) that will be required and that will be available for various activities during test development, and provides a plan for how to allocate and manage them throughout the development process. This activity further provides a basis for considering the potential practically of the test, and for monitoring this throughout the test development process.
Stage 2: Operationalization
Operationalization involves developing test task specifications for the types of test tasks to be included in the test, and a blueprint that describes how test tasks will be organized to form actual tests. Operationalization also involves developing and writing the actual test tasks, writing instructions, and specifying the procedures for scoring the test. By specifying the conditions under which language use will be elicited and the method for scoring responses to these tasks, we are providing the operational definition of the construct.
Developing test tasks and a blueprint
In developing test tasks, we begin with the descriptions of the TLU task types provided in the design statement, and modify these, again taking into consideration the qualities of usefulness, to produce test task specifications. These comprise a detailed description of the relevant task characteristics, and provide the basis for writing actual test tasks. We would note that the particular task characteristics that are included and the order in which they are arranged in the test task specifications are likely to vary somewhat from one testing situation to another.
A blueprint consists of characteristics pertaining to the structure, or overall organization, of the test, along with test task specifications for each task type to be included in the test. The blueprint differs from the design statement primarily in terms of the narrowness of the focus and the amount of detail included. A design statement describes the general parameters for the design of a test, including its purpose, the TLU domain for which it is designed, the individuals who will be taking the test, what the test is intended to measure, and so forth. A blueprint, on the other hand, describes how actual test tasks are to be constructed, and how these tasks are to be arranged to form the test.
Writing instructions
Writing instructions involves describing fully and explicitly the structure of the test, the nature of the tasks the test takers will be presented, and how they are expected to respond. Some instructions are very general and apply to the test as a whole. Other instructions are closely linked with specific test tasks.
Specifying the scoring method
Specifying the scoring method involves two steps:
1. defining the criteria by which the quality of the test takers’ responses will be evaluated and
2. determining the procedures that will be followed to arrive at a score.
Stage 3: Test administration
The test administration stage of test development involves giving the test to a group of individuals, collecting information, and analyzing this information, for two purposes:
1. assessing the usefulness of the test, and
2. making the inferences or decisions for which the test is intended.
Administration typically takes place in two phases: try-out and operational testing.
Try-out involves administering the test for the purpose of collecting information about the usefulness of the test itself, and for the improvement of the test and testing procedures. The revisions made on the basis of feedback obtained from a tryout might be fairly local, and might consist of minor editing. Or the analysis of the results of the try-out might indicate that a more global revision is required, perhaps involving returning to the design stage and rethinking some of the components in the design statement. In major testing efforts, tests or test tasks are almost always tried out before they are actually used. In classroom testing, try-outs are often omitted, although we strongly recommend giving the test to selected students or fellow teachers in advance, since this can provide the test developer with information that can be useful in improving the test and test tasks before operational test use.
Operational test use involves administering the test primarily in order to accomplish the specified use/purpose of the test, but also for collecting information about test usefulness. In all cases of test development, we administer and score the test and then analyze the results, as appropriate to the demands of the situation.
Procedures for administering tests and collecting feedback
Administering a test involves preparing the testing environment, collecting test materials, training examiners, and actually giving the test. Administrative procedures need to be developed for use in both try-out and operational test use. Collecting feedback involves obtaining qualitative and quantitative information on usefulness from test takers and test users. Feedback is collected first during tryouts and later during operational test use.
Procedures for analyzing test scores
Describing test scores: using descriptive statistics to characterize the quantitative characteristics of test scores.
Reporting test scores: using statistical procedures for determining how to report test scores most effectively both to test takers and other test users.
Item analysis: using various statistical procedures for analyzing and improving the quality of individual test tasks, or items.
Estimating reliability of test scores: using a number of statistical procedures for estimating the consistency of test scores across different specific conditions of test use.
Investigating the validity of test use: includes a number of logical considerations and empirical procedures, both quantitative and qualitative, for investigating the validity of inferences made from test scores under specific conditions of test use.
Archiving
Archiving involves building up a large pool, or bank, of test tasks so as to facilitate the development of subsequent tests. Archiving makes it possible to make the test potentially more adaptable or appropriate to specific kinds of test takers. Typically, archiving procedures are designed to allow easy retrieval of tasks and important information about the task. Archiving also facilitates the maintaining of test security. Finally, archiving procedures may be used to facilitate the selection of tasks with particular characteristics.
3. Conclusion
This specific set of procedures is for developing useful language tests. Whatever the situation might be, we strongly believe that careful planning of the test development process in all language testing situations is crucial, for three reasons. First, and most importantly, we believe that careful planning provides the best means for assuring that the test will be useful for its intended purpose. Second, careful planning tends to increase accountability: the ability to say what was done and why. As teachers we must expect that test users (students, parents, and administrators) will be interested in the quality of our tests. Careful planning should make it easier to provide evidence that the test was prepared carefully and with forethought. Third, we favor careful planning because it increases the amount of satisfaction we experience. When we have a plan to do something that we value, and complete it, we feel rewarded. The more careful the plan (the more individual steps it contains) the more opportunities we create to feel rewarded. The less careful the plan, the fewer the rewards. At the extreme—no plan at all except the completion of the test—there is only one reward: the completed test.
参考文献:
[1] Lyle F. Bachman & Adrian S. Palmer, 1996, Language Testing in Practice [M], Oxford, U. K., Oxford University Press.
Keywords: stages, activities, language, test, development
1. Introduction
Test development is the entire process of creating and using a test, beginning with its initial conceptualization and design, and culminating in one or more achieved tests and the results of their use. The amount of time and effort we put into developing language tests will, of course, vary depending upon the situation. At one extreme, with low-stakes tests, the processes might be quite informal, as might be the case if one teacher were preparing a short test to be used as one of a series of weekly quizzes to assign grades. At the other extreme, with high-stakes tests, the processes might be highly complex, perhaps involving extensive trailing and revision, as well as coordinating the efforts of a large test development team. This might be necessary if a test were to be used to make important decisions affecting, a large number of people. We would again point out that although the amount of time and effort that goes into test development may vary, depending on the use for which the test is intended, the qualities of usefulness need to be carefully considered and this consideration should not be sacrificed in either low-stakes or high-stakes situations.
We organize test development conceptually into three stages: design, operationalization, and administration. We say “conceptually” because the test development process is not strictly sequential in its implementation. In practice, although test development is generally linear, with development progressing from one stage to the next, the process is also an iterative one, in which the decisions that are made and the activities completed at one stage may lead us to reconsider and revise decisions, and repeat activities, that have been done at another stage. While there are many ways to organize the test development process, we have discovered over the years that this type of organization gives a better chance of monitoring the usefulness of the test throughout the development process and hence producing a useful test.
2. Stages and activities in language test development
Stage 1: Design
In the design stage we describe in detail the components of the test design that will enable us to insure that performance on the test tasks will correspond as closely as possible to language use, and that the test scores will be maximally useful for their intended purposes. Design is in general a linear process, but in some cases some activities are iterative, that is, will need to be repeated a number of times. For example, there are certain parts of the process, such as considering qualities of usefulness and resource allocation and management, that are recurrent and will need to be considered and thought about throughout the process.
The product of the design stage is a design statement, which is a document that includes the following components:
1. a description of the purpose (s) of the test,
2. a description of the TLU (target language use) domain and task types,
3. a description of the test takers for whom the test is intended,
4. a definition of the construct (s) to be measured,
5. a plan for evaluating the qualities of usefulness, and
6. an inventory of required and available resources and a plan for their allocation and management.
The purpose of this document is to provide us with a principled basis for developing test tasks, a blueprint, and tests. It is important to prepare this document carefully, for this enables us to monitor the subsequent stages of development.
There are six activities involved in the design stage, corresponding to the six components of the design statement, as indicated above. These are described briefly below.
Describing the purpose (s) of the test
This activity makes explicit the specific uses for which the test is intended. It involves clearly stating the specific inferences about language ability or capacity for language use we intend to make on the basis of test results, and any specific decisions which will be based upon these inferences. The resulting statement of purpose provides a basis for considering the potential impact of test use.
Identifying and describing tasks in the TLU domain
This activity makes explicit the tasks in the TLU domain to which we want our inferences about language ability to generalize, and describes TLU task types in terms of distinctive characteristics. It provides a set of detailed descriptions of the TLU task types that will be the basis for developing actual test tasks. These descriptions also provide a means for considering the potential authenticity and interactiveness of test tasks.
Describing the characteristics of the language users/test takers
This activity makes explicit the nature of the population of potential test takers for whom the test is being designed. The resulting description provides another basis for considering the potential impact of test use.
Defining the construct to be measured
This activity makes explicit the precise nature of the ability we want to measure, by defining it abstractly. The product of this activity is a theoretical definition of the construct, which provides the basis for considering and investigating the construct validity of the interpretations we make of test scores. This theoretical definition also provides a basis for the development, in the operationalization stage, of test tasks. In language testing, our theoretical construct definitions can be derived from a theory of language ability, a syllabus specification, or both.
Developing a plan for evaluating the qualities of usefulness
The plan for evaluating usefulness includes activities that are part of every stage of the test development process. A plan for assessing the qualities of usefulness will include an initial consideration of the appropriate balance among the six qualities of usefulness and setting minimum acceptable levels for each, and a checklist of questions that we will ask about each test task we develop. Assessing usefulness in pretesting and administering will include collecting feedback. This will deal with a range of information, both quantitative, such as test scores and scores on individual test tasks, and qualitative, such as observers’ descriptions and verbal self-reports from students on the test taking process. Finally, the plan will include procedures for analyzing the information we have collected. This will include procedures such as the descriptive analysis of test scores, estimates of reliability, and appropriate analyses of the qualitative data.
Identifying resources and developing a plan for their allocation and management
This activity makes explicit the resources (human, material, time) that will be required and that will be available for various activities during test development, and provides a plan for how to allocate and manage them throughout the development process. This activity further provides a basis for considering the potential practically of the test, and for monitoring this throughout the test development process.
Stage 2: Operationalization
Operationalization involves developing test task specifications for the types of test tasks to be included in the test, and a blueprint that describes how test tasks will be organized to form actual tests. Operationalization also involves developing and writing the actual test tasks, writing instructions, and specifying the procedures for scoring the test. By specifying the conditions under which language use will be elicited and the method for scoring responses to these tasks, we are providing the operational definition of the construct.
Developing test tasks and a blueprint
In developing test tasks, we begin with the descriptions of the TLU task types provided in the design statement, and modify these, again taking into consideration the qualities of usefulness, to produce test task specifications. These comprise a detailed description of the relevant task characteristics, and provide the basis for writing actual test tasks. We would note that the particular task characteristics that are included and the order in which they are arranged in the test task specifications are likely to vary somewhat from one testing situation to another.
A blueprint consists of characteristics pertaining to the structure, or overall organization, of the test, along with test task specifications for each task type to be included in the test. The blueprint differs from the design statement primarily in terms of the narrowness of the focus and the amount of detail included. A design statement describes the general parameters for the design of a test, including its purpose, the TLU domain for which it is designed, the individuals who will be taking the test, what the test is intended to measure, and so forth. A blueprint, on the other hand, describes how actual test tasks are to be constructed, and how these tasks are to be arranged to form the test.
Writing instructions
Writing instructions involves describing fully and explicitly the structure of the test, the nature of the tasks the test takers will be presented, and how they are expected to respond. Some instructions are very general and apply to the test as a whole. Other instructions are closely linked with specific test tasks.
Specifying the scoring method
Specifying the scoring method involves two steps:
1. defining the criteria by which the quality of the test takers’ responses will be evaluated and
2. determining the procedures that will be followed to arrive at a score.
Stage 3: Test administration
The test administration stage of test development involves giving the test to a group of individuals, collecting information, and analyzing this information, for two purposes:
1. assessing the usefulness of the test, and
2. making the inferences or decisions for which the test is intended.
Administration typically takes place in two phases: try-out and operational testing.
Try-out involves administering the test for the purpose of collecting information about the usefulness of the test itself, and for the improvement of the test and testing procedures. The revisions made on the basis of feedback obtained from a tryout might be fairly local, and might consist of minor editing. Or the analysis of the results of the try-out might indicate that a more global revision is required, perhaps involving returning to the design stage and rethinking some of the components in the design statement. In major testing efforts, tests or test tasks are almost always tried out before they are actually used. In classroom testing, try-outs are often omitted, although we strongly recommend giving the test to selected students or fellow teachers in advance, since this can provide the test developer with information that can be useful in improving the test and test tasks before operational test use.
Operational test use involves administering the test primarily in order to accomplish the specified use/purpose of the test, but also for collecting information about test usefulness. In all cases of test development, we administer and score the test and then analyze the results, as appropriate to the demands of the situation.
Procedures for administering tests and collecting feedback
Administering a test involves preparing the testing environment, collecting test materials, training examiners, and actually giving the test. Administrative procedures need to be developed for use in both try-out and operational test use. Collecting feedback involves obtaining qualitative and quantitative information on usefulness from test takers and test users. Feedback is collected first during tryouts and later during operational test use.
Procedures for analyzing test scores
Describing test scores: using descriptive statistics to characterize the quantitative characteristics of test scores.
Reporting test scores: using statistical procedures for determining how to report test scores most effectively both to test takers and other test users.
Item analysis: using various statistical procedures for analyzing and improving the quality of individual test tasks, or items.
Estimating reliability of test scores: using a number of statistical procedures for estimating the consistency of test scores across different specific conditions of test use.
Investigating the validity of test use: includes a number of logical considerations and empirical procedures, both quantitative and qualitative, for investigating the validity of inferences made from test scores under specific conditions of test use.
Archiving
Archiving involves building up a large pool, or bank, of test tasks so as to facilitate the development of subsequent tests. Archiving makes it possible to make the test potentially more adaptable or appropriate to specific kinds of test takers. Typically, archiving procedures are designed to allow easy retrieval of tasks and important information about the task. Archiving also facilitates the maintaining of test security. Finally, archiving procedures may be used to facilitate the selection of tasks with particular characteristics.
3. Conclusion
This specific set of procedures is for developing useful language tests. Whatever the situation might be, we strongly believe that careful planning of the test development process in all language testing situations is crucial, for three reasons. First, and most importantly, we believe that careful planning provides the best means for assuring that the test will be useful for its intended purpose. Second, careful planning tends to increase accountability: the ability to say what was done and why. As teachers we must expect that test users (students, parents, and administrators) will be interested in the quality of our tests. Careful planning should make it easier to provide evidence that the test was prepared carefully and with forethought. Third, we favor careful planning because it increases the amount of satisfaction we experience. When we have a plan to do something that we value, and complete it, we feel rewarded. The more careful the plan (the more individual steps it contains) the more opportunities we create to feel rewarded. The less careful the plan, the fewer the rewards. At the extreme—no plan at all except the completion of the test—there is only one reward: the completed test.
参考文献:
[1] Lyle F. Bachman & Adrian S. Palmer, 1996, Language Testing in Practice [M], Oxford, U. K., Oxford University Press.