 |
|
 |
Sample report: DiscussionTo summarize, the
results of this project indicate that our CRTs for the ELI reading courses are
functioning reasonably well. The two forms appear to be working about the equally
well in both levels of our reading courses. Unfortunately, they are acting more
like norm-referenced tests than Criterion-referenced ones. The means appear to
be fairly well centered and the scores are well dispersed around the means. In
general, from a Criterion-referenced point of view, all four tests (i.e., both
forms in both courses) should be too difficult for the students when they take
the test at the beginning of the course and relatively easy at the end of the
course. If the tests were difficult for the students at the beginning, i.e., they
performed relatively poorly, that would indicate that indeed they need to study
the objectives being tested. In contrast, if the tests were relatively easy for
the students at the end of the course, that would indicate that they had learned
a fair amount of the material. Some of the pattern just described did appear in
our results in the form of gains on each and every test, but a stronger, clearer
pattern would be more compelling in terms of defending both the tests and the
curriculum they were designed to assess. In our defense, other programs do not
have this problem largely because they never address it through systematic testing
like we have. Also, the item selection processes and revisions that we have set
in motion will be designed to improve this situation so that: (a) our tests will
better reflect whatever learning is going on in the courses and (b) the tests
will better help us make fair decisions about exemptions (at the beginning of
the course) and about whether students pass or fail our courses (at the end of
the course). To those ends, all of the various item statistics are turning
out to be very useful. The NRT item statistics are telling us about how our tests
are functioning in terms with which we have long been familiar. In addition, the
NRT statistics may turn out to be useful for converting items that do not prove
useful in the Criterion-referenced tests into items for our NRT placement tests.
In like manner, we expect the IRT analyses to be useful in setting up item banks,
and in improving out pass/fail decisions. Our four tests also appear to be at
least moderately reliable from the NRT perspective, especially in light of the
restrictions of range that are involved in such testing. From a Criterion-referenced
point of view, the tests appear to be moderately consistent in terms of domain
score dependability (as shown by the phi coefficients). However, the dependability
of these tests, as estimated by phi(lambda), seems to depend more on which decision
is involved. The phi(lambda) estimates for the pretest exemption decisions (with
the lambda decision level at .90) are all excellent ranging from .925 to .932.
The phi(lambda) estimates for the posttest achievement pass/fail decisions (with
the lambda decision level at .60) are lower ranging from .686 to .892. Further
analysis of these results must be considered when we are making the actual pass/fail
decisions, now and with revised versions of the tests. Furthermore, we must use
the confidence interval statistics and obtain additional information about students
who fall close to our cut-pointsespecially for those students who fall within
one CI above or below the cut-point. Clearly, for pass/fail decisions, the CRT
dependability approaches and the CIs are much more useful than the analogous NRT
reliability estimates reported in the same table. We must also keep the
validity issue alive and not rest on our laurels. Yes, at the moment, we can say
with some pride that all of the test items in this project were thoroughly scrutinized
for content validity by the appropriate ELI teachers. We can also say that all
four tests in this project showed some sensitivity to instruction. We should nonetheless
learn from these experiences and revise the tests so they will be even more sensitive
to instruction. We can do this by selecting those items that had high CRT item
analysis statistics for future versions of the tests, but also by carefully examining
the curriculum to insure that (a) objectives the students already know when they
arrive are no longer included in the curriculum (and tests) and (b) the lessons
in the reading courses are effectively addressing all of the objectives that are
being tested.
|  |