Computers were originally introduced to language testing as a means of establishing greater objectivity in the assessment process. Adaptive testing was first developed in the 1970s based on Item Response Theory, calculating the probability of a given person getting a particular item right and modifying the difficulty level so there becomes a 50-50 chance of a correct response. Traditional computer-adaptive language tests determine ability levels in reading comprehension, listening comprehension and general language proficiency by presenting a sufficient number of items, one by one, for the test to make a reliable estimate of their ability with subsequent items based on these responses.
The adaptive exam generally provides a more difficult question after correct responses or an easier one when the test taker has made a wrong answer. This is an efficient method for estimating ability because the test taker spends time responding only to items at an appropriate level of difficulty. Such evaluations are particularly useful when a score is quickly required for placement into one of various courses, and for determining proficiency under time constraints. Advances in test creation software have made it easier for Computer-Adaptive Testing (CAT) to be developed and modified with templates for placement, achievement and licensing purposes.
CAT development begins with a clear identification of assessment purpose and taking steps to ensure that it accurately measures an examinee's true proficiency level. To achieve this goal, a sufficiently broad range of content areas and skill tasks must be provided to evaluate a range of low to high proficiency levels. A calibrated item bank large enough to administer well-targeted items across the range of candidate ability is essential for effective computer-adaptive testing. In addition, the items need to include a variety of tasks. For example, to evaluate listening and comprehension in language learners, an exam creator could include items testing comprehension of the main ideas of a conversation or passage, recognition and recall of a conversation's details, identification of specific words and phrases used in a passage, and so forth. The item selection algorithm must constrain the selection of items not just on the difficulty level but also present a variety of designated tasks.
In an adaptive test, it isn't only the items that can be presented adaptively, but also the construct that the test measures can be adapted according to performance. The efficiency and storage capacities enabled by computer-assisted technologies can provide an expanded set of test issues, but this is dependent on the conceptual foresight of the test developers. It is important that expectations, instructions and the test format are clearly communicated to examinees before they take it to ensure reliability. First-time users should be introduced to the specific orientation of the hardware and test generator software, such as scrolling and adjusting audio volume. Providing a few relevant practice questions with equivalent structure and content will encourage a greater test validity and reliability.