Last revision: Feb. 5, 2005 (c) Georg Lind | Home


Why must the MCT not be used for high-stakes testing but for research and program evaluation purposes only?

Turn on java script to use this automated translations service.
Note that automated translations may contain fatal errors.


The MCT has been designed to answer important research questions like "What fosters moral judgment competence?" "How relevant ist moral judgment competence for other kinds of behavior like cheating, helping, learning or decision-making?" etc. And it has been designed for evaluating programs of moral and character education (see Lind, 2002; in press) ... more.

The MCT has not been designed for, and must not be used for, selecting or sanctioning people or groups of people ("high stakes testing"). The latter use would be a clear instance of misuse ... more.

The main reason not to allow the MCT to be used for selection and sanctioning purposes is that the test would be rapidly de-valued as an research and evaluation instrument. Using the MCT for individual diagnosis and selection would quickly trigger activities to cheat the MCT and thus invalidate it as an instrument for research and program evaluation. See Bracey, 2006; Nichols & Berliner, 2006).

Because we have carefully protected the MCT against abuse, it was possible for over 30 years now to keep up its integrity.

The second reason is that we do not believe that it is possible to measure the moral judgment competence of a n individual person in a reliable way because its manifestation depends very much on situational factors like fatigue, involvement, prior experience etc.. In research and evaluation studies with groups of people, those sources of measurement error tend to cancel each other. According to the central limit theorem, error variance decreases as the size of the sub-sample increases (cf. Hays, 1963). The average C-scores of a sub-sample of N = 10 or larger can be reliably interpreted as the "true" level of moral judgment competence of this sub-sample. (Note 1)

While the measurement error within each sub-sample should be as small as possible, the design of the study should make sure that the variance of the C-score in the total sample is as large as possible to detect correlations if they exist.


1 Generally, statistical reasoning should be used with great caution. As Hays (1963) notes, "it is a sad fact that if one knows nothing about the probability of occurrences for particular samples of units for observation, we can use very little of the machinery of statistical inference. This is why the assumption of random sampling is not to be taken lightly. ... Inferential methods [of statistics] apply to random samples, and there is no guarantee of their validity in other circumstances" (p. 217).
This precaution is especially important when talking about the "significance" of a finding. Usually, this word is used to signify statistical significance only, which has no direct relation to the psychological or educational significance, in which one is actually interested (see Carver, 1993; Thompson, 1996).
-> back


Bracey, G.W. (2006).Reading Educational Research: How to Avoid Getting Statistically Snookered. Heinemann.

Carver, R.P. (1993). The case against statistical significance testing, revisited. Journal of Experimental Education, 61(4), 287-292.

Hays, W. (1963). Statistics for psychologists. New York: Holt, Rinehart & Winston.

Nichols, S.L. & Berliner, D. (2006). Collateral damage: How high-stakes testing corrupts schools. Cambridge, MA: Harvard Education Press.

Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: three suggested reforms. Educational Researcher, 25(2), 26-30.
-> back