Please turn on Java-script to use automated translation service. Note that automated translations can contain errors.
References on, and founding, the MJT
The first article on the MJT has been published in 1978: Lind (1978)
The most recent publication is Lind (2008)
A s implified version of the MJT for eight year olds and upward is now available in German ... more.
Prof. Lawrence Kohlberg, Ph.D., Center for moral education, Harvard University:
"[...] The methodology of Lind and his colleagues gets preference scores and content (pro and con), as well as a stage [structure]. Since preference is determined by both content and structure, a scoring algorithm can be arrived at for assigning a pure stage structure score for an individual. Some subjects are more consistent in preferring stage structure than content, factor considered in the tests of Lind and his colleagues ... I believe this to be a highly promising approach." (in: Lind et al., 2010, p. xvii).
Comment: While for Kohlberg 'stage' and 'structure' were synonyms, Lind adheres only to the notion of structure but has given up the notion of stages.
Prof. Peter H. Rossi, Ph.D., University of Massachusetts, Amherst, Social and Demographic Research Institute:
"I was delighted to ... learn of your EQ method [he refers to my method of "Experimental Questionnaires" on which the MJT is based, GL]. Your method has a lot more theory behind it than we have put behind the idea of the Factorial Survey and, with your permission, I would borrow some of your ideas." (personal communication).
Prof. Michael Gross, Ph.D., Department of International Relations, University of Haifa, Israel:
"The [MJT] produces two sets of scores in an effort to distinquish between the affective and cognitive aspects of moral judgment, that is, between the moral preferences which one has and the ability to use them consistently. In this way the MJT offers a significant improvement over the single score interview technique which conflates these two elements." (p. 248) more
Prof. Dr. Manfred Schmitt, University of Trier, Department of Psychology:
"The advantages of an experimental questionnaire (...) make the MJT attractive and, in my opinion, superior to the DIT." (p. 12; my translation, GL) more
Prof. Dr. Horst Heidrink, University Hagen, Department of Psychology:
"The large size of the Vst [validity coefficient] in both studies can be interpreted as a clear support for the MJT, and also for the validity of Kohlberg's theory." (p. 91; my translation) more
For these and other references using and founding the MJT see more
When you plan to use the MJT ... more
Frequently Asked Questions ... more
Abuse of the MJT ... more
List of certified versions of the MJT ... more
Validation and certification procedure ... more
Scoring and interpreting the MJT ... more
References: Studies and reviews on Lind's Dual-Aspect Theory and the MJT ... more
Meaning and measurement of Moral Judgment Competence (Lind, 2008) ...faksimile (password = kohlberg)
Constructing new dilemmas for the MJT ... more
MJT online (ask the author for details)
Cultural fairness of the MJT (Lind, 1995) ...
| Cross-cultural validity of the MJT (Lind, 2003) ...
False news on the MJT ... more
When you plan to use the MJT
Before you use the MJT, you should have some basic understanding of moral psychology and should make yourself acquainted with the theory behind the MJT. Otherwise you may risk to misinterpret your findings ... more
If you have specific questions, you can consult the section "Frequently Asked Questions" below for a quick answer.
Before you start planing your research or self-evaluation study, you may be interested to read my advice, which is based an over 30 years of research and evaluation in the field of moral psychology and education especially with the MJT:
- It is important to be aware of biasing factors in MJT research in order to draw correct conclusions from your findings. Below I will discuss factors which can bias C-scores upward and factors which can bias them downward. The question, whether these factors should be considered either as "measurement error" or as substantial influences which need to be discussed, cannot be answered once for ever. You, the researcher must decide what the best interpretation is, and must defend this with good reasons. Whatsoever: Always keep your analysis fully transparent for the reader!
- The MJT is a competence test. Like all tests of competence, ability, proficiency etc., the MJT cannot be faked upward (Wasel, 1994; Lind, 2002), but a person's moral competence score (C-score) can be below his or her real competende because of some depressing circumstances. Many of these circumstances causing biases of measurement are listed below.
- The strongest biasing factor is fear and anxiety, which can depress the C-score. Therefore, do not make the MJT look like a high stakes school test which produces fear and anxiety (unless you have chosen to study these factors more closely). Fear can be created by the instruction to give the "right" answers, or by implict sigmals like placing the MJT behind a high stakes mental test, letting a feared authority for the participants administer the test or be mentioned as director of the research; having the participants place their names on the questionnaire; etc. The MJT must only be used anonymously!
- The MJT contains a difficult task for most participants! Only when participants are confronted with a really difficult task can we observe and measure his or her competence. A test without a difficult task can never let us measure competence. Hence, it is quite natural that some participants complain about the difficulty of the MJT.
- Instruction for participants:
- Make the instructions as short as possible, and as long as necessary. Avoid words which the participants do not understand or may mis-understand. E.g., "moral" is such a potential confuser. Say, for example: "This questionnaire contains two stories in which people have to solve a conflict. What do you say about their solution, and about the arguments that people have given on these stories?"
- If you want your participants to fill out the MJT more than one time (e.g., if you want to use it for measuring the effect size of an intervention), you must make the participants aware of this, otherwise they might feel irritated and get lower C-scores: "This questionnaire contains again the two stories in which people have to solve a conflict. We are interested to see how your answer have changed. What do you say about their solution, and about the arguments that people have given on these stories?"
- Aministering the MJT: If you cannot control the conditions in which the MJT is administered, you should at least make sure that they are always the same for all your participants, and that you know how the test was administered, so you can document this in your research report. Only this way you can be sure that the C-scores reflect differences between the participants' moral competence and not differences between the conditions of test-taking. Good documentation of the test-taking situation helps to make valid inferences from comparing MJT-findings across studies with different test-taking conditions.
- Never delete data, at least not before you have documented and analyzed them! It can happen that some participants do not fill out the MJT completely or show pattern of responses which appear to be invalid to you. Deleting these data must be considered a breach of scientific standards, and also a waste. Some of the incomplete data can be used for analysis. If not more than two answers are left out, you can substitute them by the individual mean value (please count these cases and include this count into your research report) and include them in your analysis. If you have many cases you should do some analyses with and without these cases, seeing how this might change your central findings. Cases with more than two missing data in the standard MJT should not be included in your analysis but you should inform your reader about this in your research report.
- Again: Never delete cases whose response pattern appear invalid to you! By throwing away such data you create a bias because you are likely to throw away data which indicate low moral competence. Deleting these data increase artificially the mean C-score.
- The MJT has been very thoroughly validated over a period of 30 years (Lind, 2006). Yet it is far from being perfect. In some circumstances it may not function as it should. Then you should insist on critical discussion and revision of the test. However, be sure that unexpected results are really artefacts, rather than some new phenomenon which should be studied in its own rights. "Segmentation" is such a phenomenon through which we became aware of the depressing impact of various kinds of authority and fear on moral judgment competence. Instead of changing the MJT in order to rid segmentation, we dicided to let the MJT unchanged in order to measure segmentation. Possibly, for the observation of certain types of authority and fear, we need to develop new dilemmas.
- When you analyze your data, remember that mean scores are sufficiently reliable only when they are based on the data of 15 or more individuals. This is true if you are interested only in substantial differences of 5 points and more. If you are interested into smaller differences, you should increase the number of individuals used for calculating the mean C-scores. The exact determination of numbers must be left to future publications. For now it suffices to say: take more individuals if you want to be on the safe side.
- Follow the conventions for the graphical display of findings as closely as possible in order to allow the reader to make quick comparisions between similar studies and their findings, and to prevent false impressions, even if the data are correctly depcited. If the Y-axis is stretched too much even the smallest differences appear to be meaningfull. Here are the most important conventions:
- C-score on Y-axis: If the (mean) C-score is shown as dependent variable on the Y-axis of a graph as the vertical dimension, the Y-axis should range from "0" to "40" (of course, if higher scores are to be reported, the Y-axis should be made longer). Most statistical programs allow you to set this range manually.
- Mean acceptance on Y-axis: If the (mean) acceptance score for the six moral orientations are given, the y-axis should range at least from "-2" to "+2", or even better from "-4" to "+4". If sum scores are used, the y-axis should range from "-8" to "+8".
- Digits: In the graphs, the numbers should always be shown with only one digit behind the decimal comma (or point). More digits fake a higher accuracy that is available, and they blur the picture.
- Effect size reporting: The concept of "statistical significance" is often mis-used and is not very informative for telling us about the real significance of differences and correlations. Their main drawback is that their value depends very much a) an sample size, and b) on the variance of the measures in the sample. Both things can vary very much between studies and thus make them incomarable, and both can be influenced by the researcher. Unfortunately, many journals and reviewers still ask for it. So you'd better report "statistical significance." But you should also report (and discuss!) relative and absslute affect sizes. Relative effect sizes [rES] like "r" and "d" are now also required by scientific associations like APA and AERA. Good statistics textbooks show how to convert significance statistics into r and d (I prefer r, but reporting both seems a good policy). Because rES still depends on the variance of the data, it is also not optimal. Better is absolute effect size [aES] which depends only on the absolute differences between means, i.e., means of prestest and posttest measurement of the scores of (experimental) interventions group and of comparison groups. For more information and for calculating formulas for aES, see Lind (2010).
- If you have also suggestions for improving MJT research, please send me a note.
Frequently Asked Questions
What does the MJT measure? ... more
For what age groups can the MJT be used? ... more
"Can that be?" List for checking possible sources of error ... more
What is the psychological and methodological background of the MJT? ... more
Is the MJT a 'preference' test? ... more
Is the MJT similar to the DIT? ... more
Can I calculate a Stage score from the MJT data? ... more
Can one apply the rationale behind the C-score to any test? ... more
Applying the MJT to participants with little or low education ..more
Where can I get a copy of the MJT and different language versions? ... more
Can the MJT be Used for High-Stakes Testing and Diagnostics? ... more
Can we interpret an individual's MJT data? ... more
What is the standard version and standard administration & instruction of the MJT ? ... more
Can I construct a new dilemma myself? ... more
How do I have to prepare the raw data to get them scored? ... more
How can we reduce test-taking fatigue in follow-up studies? ... more
How can one protect privacy in follow-up studies ...more
Why do you call the MJT a "N=1 experiment" or Experimental Questionnaire? ... more
How do I have to prepare the raw data for getting them scored by you?... more
How can I check the my scoring for errors ... more [My scoring error corrected ...more]
Missing data: What if a participant has not fillied out all 26 question of the standard MJT? ... more
Is the MJT valid? ... more
Is the MJT reliable? ... more
More FAQs ... more
What does the MJT measure?
The MJT measures two aspects of judgment behavior, a) moral judgment competence as defined by Kohlberg (1964; see also Lind, 2006; 2008), and b) moral orientations or moral preferences as defined by Kohlberg's Stages of Moral Orientation. In contrast to the Kohlberg's Moral Judgment Interview, the MJT measures both basic aspects, the cognitive and the affective, simultaneously but independently, and thus does not give a mixed Stage score .
The MJT is the only test which provides measures for these two aspects. While there are many tests of moral preferences or attitudes, it is one of the few, if not the only, measurement instrument which contains a real moral task for the participant. The task is to listen to and evaluated moral arguments about a moral dilemma, especially arguments which oppose his or her stance of the dilemma.
For nearly all participants this is a difficult, if not very difficult task. Only a very few respondents get a maximum score of one hundred; even most university graduates get a score below 45.
What is the psychological and methodological background of the MJT?
The MJT rests on Lind's Dual-Aspect Theory of moral judgment behavior (see Lind, 2002; 2008), which borrows one of its two central psychological concepts -- the concept of cognition and affect being two inseparable, but distinguishable aspects (rather than two separable components or substances) -- from Spinoza, Piaget, and Kohlberg (though Kohlberg's writing seems to fluctuate between an one-component (= one substance) point of view on the one hand and a multiple component point of view on the other). The other psychological concept, the concept of moral judgment competence is taken directly from Kohlberg (1964), who defines this as "the capacity to make decisions and judgments which are moral (i.e., based on
internal principles) and to act in accordance with such judgments" (p. 425). Interestingly, Darwin already has talked about "moral competencies" (see above). Yet only Kohlberg has attempted to measure it, trespassing the border between the cognitive and the affective domain, a border erected by many psychological theorists (e.g., Bloom et al., 1956; Rest & Narvaez, 1995).
The methodology of the MJT, the concept of Experimental Questionnaire (Lind, 1980; 2006, 2008), has a cognitive science background, rooting in N. Anderson's concept of cognitive algebra, G.A. Kelly's personal construct Theory, W.S. Torgerson's concept of response-stimulus scaling, L. Guttman's measurement as structural theory, and L. Kohlberg's postulate of moral competencies or structure as manifest pattern of behavior (1984, p. 407).
- Bloom, B.S., Engelhart, M.D., Hill, W.H., Furst, E.J. & Krathwohl, D.R. (1956). Taxonomy of educational objectives. Handbook I: Cognitive domain. New York: McKay.
- Kohlberg, L. (1964). Development of moral character and moral ideology. In: M. L. Hoffman & L.W. Hoffman, Eds.,
Review of Child Development Research, Vol. I,, pp. 381-431 New York Russel Sage Foundation.
- Kohlberg, L. (1984). The meaning and measurement of moral judgment. In L. Kohlberg
Essays on moral development, Vol. II, The psychology of moral
395-425 San Francisco, CA Harper & Row (Original 1981).
- Rest, J.R. & Narvaez, D. (1995). The four components of acting morally. In: W. Kurtines & J. Gewirtz, eds.,
Moral behavior and moral development: An introduction, pp. New York: McGraw Hill.
Is the MJT a preference test?
No, the MJT is a competence test. Psychologists basically distinguish two kinds of psychological dispositions which they measure: competencies (or abilities or cognitive structures) on one hand and attitudes (inclinations, motivations, values) on the other. The most distinctive feature of these two kinds of psychological measures is whether or not the measures produced with the test can be simulated "upward." Clearly, competence measure cannot be faked upward, but attitudes measures can.
Some authors make the distinction between 'preference' and 'production' tests related to the response mode (closed versus open questionnaire). They call the MJT a "preference" test because the participant is asked whether s/he would accept or reject (=prefer or not prefer) a series of arguments, in contrast to Kohlberg's Moral Judgment Interview in which the participant is to elicit ("produce") his/her moral philosophical orientations while discussing the solution of certain moral dilemmas.
But this distinction is not as important as they seem to believe. It characterizes only the response mode but not the nature of the target disposition which is to be measured (e.g., competence or attitude). There are many scholastic aptitude tests which are closed (the correct answer has to be "preferred") which nobody would call a preference test.
Similarly, there are attitude tests which which use an open format. The only difficulty they have for the interviewee is to articulate their preferences in their own words. Because moral measurement is not testing linguistic ability, the scoring procedure for an production test must make sure that it is not biased toward higher linguistic skills by some means. Kohlberg's scoring system does so by various means, amongst other by the so-called "upper-stage inclusion rule" (Colby et al., 1987, p. 177; for a critical discussion see Lind, 1989).
Colby, A., Kohlberg, L., Abrahami, A., Gibbs, J., Higgins, A., & ... (1987). The measurement of moral judgment. Volume I, Theoretical foundations
and research validation.
New York Columbia University Press.
Is the MJT a similar test to the Defining-Issues-Test, DIT?
No, not at all, although some textbooks say so.
The MJT is different from most other instrument in the domain of moral psychology because it is a moral competence test (see above), though the MJT allows one to assess simultaneously six moral orientations (attitudes, preferences) of the participants. In contrast to most, if not all other tests of moral development, the MJT contains a moral task, namely the task for the participants to apply their moral orientations consistently regardless of the opinion-agreement of the arguments to be rated. The design of the test is experimental, three-factorial, with pro and contra arguments balanced.
In contrast, the Defining-Issues-Test (DIT) by Dr. James Rest measures only the preference for post-conventional moral reasoning: "The P score of the DIT provides a percent score that indicates the amount of post-conventional thinking (in contrast to other kinds of thinking) preferred by the participant." (Narvaez, 1998, S. 15). The DIT contains no moral task. The DIT's P-score does not let one assess the preference for low-stage moral orientations.
Both tests have been compared and contrasted in validation and in intervention studies, e.g., by Schmitt (1982),
Lind (1996a, b), Ishida (2006) and Kim (2006).
A narrower comparison of the two scoring techniques (P-score versus C-score) only for the DIT has been made by Rest et al. (1997). Because the DIT does not contain a moral task and is not designed as a multi-factorial, N=1 experiment like the MJT, the use of the C-score is not warranted.
Narvaez, D. (1998). The influence of moral schemas on the reconstruction of moral narratives in eighth graders and college students. Journal of Educational Psychology, 90, 13-24.
Rest, J.R., Thoma, S.J., & Edwards, L.
(1997). Designing and validating a measure of moral judgment: Stage preferences
and stage consistency approaches. Journal of Educational Psychology, 89(1), 5-28.
Can I calculate a Kohlbergian Stage score from MJT data?
No, because the Kohlbergian Stages are based on a single-aspect model while the MJT is based on a dual-aspect model of moral behavior (cf. Lind, 2008). With Kohlberg's Moral Judgment Interview, a persons gets assigned the highest out of six "Stages" a) if he or she prefers the moral orientations typical for this stage more or equally often as all other Stage orientations, and b) if she or her does so with a certain consistency. (This is tested in an open interview situation in which the interviewer discussion moral dilemmas with the interviewee.) In other words, Stage scoring tries to combine affective and cognitive aspects into one single score.
In contrast, we decided to construct the MJT to let us measure both aspects independently though with the very same test (see above).
In early publications I suggested an algorithm for a Stage score for the MJT. However, because the Stage theory of moral development has lost grounds, and is now replaced by a more multifaceted theory of continuous developmental process which allows also for regression of moral competencies (Lind, 1985a; Lind, 1985b; Lind, 2008) there is no need anymore for such a combined score.
Can one apply the rationale behind the C-score to any test?
No. An attitude test cannot be turned into a competence test by a special kind of scoring but only through the definition and operationalization of some (difficult) moral task. The C-score (or Competence score) is meaningful only if it is calculated for a moral competence test (see above). When calculated for a moral preference test like the DIT, C means only some kind of cross-situational consistency of moral preferences but not competence (Lind, 1996)..
Where can I get a copy of the MJT? Where can I get a specific language version of the MJT?
The original German version of the MJT ("Moralisches Urteil Test", MUT) and validated foreign language versions can be obtained from the author. Contact:
In your request, please explain briefly your institutional affiliation and the purpose of the use of the MJT.
The MJT can be used freely by members of public institution of education and research if not used commercially. For all others, written permission by the author(s) must be obtained.
Can the MJT be used for high-stakes testing and diagnostics?
The MJT has been designed to answer important research questions like "What fosters moral judgment competence?" "How relevant is moral judgment competence for other kinds of behavior like cheating, helping, learning or decision-making?" And it has been designed for evaluating programs of moral and character education. (see Lind, 2002; 2008) ... more.
The MJT has not been designed for, and must not be used for, selecting or sanctioning people or groups of people ("high stakes testing"). The latter use would clearly be an instance of misuse ... more.
Can we interpret an individual's MJT data?
"Why do you stress that MJT it is not meant to provide information about individuals? Is it because there are reasons involving the construction of calculation or are there any other issues involved?"
A person's moral judgment competence is only one among several factors influencing his/her judgment behavior in a particular situation like the MJT: fatigue, attitude toward the test and the test administrator, associations created by the particular dilemma, time pressure etc. may case the indicator for moral judgment competence (C-score) to decrease more or less. So we could err considerably if we take the C-score of an individual as the "true score" and judge him/her accordingly.
However, if we use the average score of several people (N>10) as a basis for making inferences, the MJT is a valid instrument for evaluating educational interventions or to test theoretical propositions. If we look at the mean C-score of several people who have something in common (like having participated in a dilemma discussion), we can assume that most of the other factors influencing an individual's judgment behavior are similar or cancel each other so that we can safely infer, for example, the impact of a treatment from comparing mean C-scores.
Remember, it is not the data which are valid or not valid but the use we make of them (cf. Messick, S., 1995, Validity of psychological assessment. Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741-749).
My advice: never even look a an individual C-score. Inadvertently, you will make a false judgment on the person who filled out the MJT.
What is the standards version of the MJT and what is the standard administration and instruction of the test?
The standard version of the JT consists of two dilemmas (Doctor's Dilemma [mercy killing], and Workers' Dilemma [Breaking into a firm]) constructed in its present form in 1977, and then only slightly modified for stylistic reasons. The standard version and the certified foreign language versions have been rigorously validated, and used in many studies around the world including some 300.000 participants.
If an uncertified modified version of the MJT is used, this must be noted in the publication to caution the reader.
The standard administration is this:
- No speed test (only when the response times are extremely long, the participant may be encouraged to make up his/her mind).
- Instruction: "Please read the following stories, in which people have to make a decision, and decide whether the person's decision was right or wrong. Then rate the arguments following the stories whether you accept or reject them."
- Instruction for follow-up survey: "Next you will find the same task as in the first survey. Please respond to it as sincerely as you did the first time, so one can see how your answer have changed. ...[continue as above].
"Can that be?" List of checks for possible sources of error in interpreting C-score data.
Problems with circumstances of questionnaire-taking which have been found to lower the C-score
Time pressure: Were the participants instructed to make quick answers? Had the participants meet a deadline or schedule?
Testing fatigue: Has the MJT been part of a larger questionnaire and has it been given after many other questions?
Testing anxiety: Has the MJT been perceived as a kind of (ethical) knowledge test or ethical intelligence test?
Cross-over effects from other parts of the questionnaire:
A preceding speed-test (for assessing IQ or specific abilities) may "prime" the participant to also view the MJT as speed task. Solution: present the MJT first.
If the MJT is embedded in a questionnaire that has no obvious relationship to it, the participants may show bewilderment and develop negative feelings toward the MJT. Solution: Alert the participant to this change of topics; rarely an explanation is needed.
If there is not technical explanation for low C-scores, your results may indicate severe problems of the learning environment
Low ethical standards of recruitment to this profession: The profession serves only to make money
Few or no opportunities for responsibility-taking through out high school and study
Emphasis solely on learning by rote
Traditional ethics teaching: learning many ethical concepts and theories
Other possible causes of low C-scores:
Administration of the test on the door-step (by commercial interviewers) (know cause)
Payment for taking the test can lower text-taking motivation (suspected cause)
Self-constructed, non-certified version of the MJT: Did you use the standard MJT?
Wrong translation: Did you use a certified version of the standard MJT?
If you wish you can send me your raw data for checking the validity of your data. See:
http://www.uni-konstanz.de/ag-moral/mut/mjt-certification.htm#certification [A service fee may apply. Please inquire.]
If none of the technical explanation for low C-scores cited above apply, your results may indicate severe problems of the tested curriculum:
Few or no opportunities for responsibility-taking through out high school and study
Emphasis solely on rote learning
Traditional ethics teaching: only ethical concepts and theories
No practical training of moral and democratic competence (see: the Konstanz Method of Dilemma-Discussion... more)
Can I construct a new dilemma myself?
The construction of new dilemmas is encouraged. Yet one should note that the standard MJT is applicable in most cases even though it may lack "face validity" in a particular context.
If you want to construct a new dilemma for the MJT, please read the guidelines.
After construction you can get your new dilemma certified (see certification procedure) in order to label it "certified MJT-extended." New dilemmas without a certificate should not carry the label "MJT" or "MJT-extended."
The criteria for validating new dilemmas for the MJT are as rigorous as for the standard MJT, to ensure that the new dilemma measures moral judgment competence. In order to get a new dilemma certified, the raw data of the validation study must be sent to the author.
How can we protect privacy in follow-up studies?
To protect privacy, we use a special code instead of the names of the participants. The code consists of the house number (last two digits, e.g., 05), the day of birth (e.g., 24, when the birthday is Oct. 24), the first two letters of mother's first name and the first two letters of father's name or, if the father is not known, grandfather's first name.
How can we reduce test-taking fatigue?
With repeated measurement, usually the problem is not a learning effect (i.e., artificial elevation of scores due to test knowledge) but fatigue and frustration that lowers the scores. When used for evaluating educational or therapeutic interventions in a pretest-posttest design, some subjects may respond with test-taking fatigue or frustration because of the fact that the test is administered twice within a rather short period of time (a few weeks or months apart). Such reactions often lead to lower C-scores and an underestimation of intervention or therapy effects.
According to our experience, this problem can be solved through proper instruction ... more
How do I have to prepare the raw data to get them scored?
If you use the standard MJT without any modifications in the ordering of the items, a scoring service is available on request for a fee. For this the raw data must be submitted for scoring in this form:
- Ordering of data as in the standard MJT; that is, no manual re-ordering according to item stages; manual re-ordering is more error prone than re-ordering through a scoring program.
- Minimum additional information: Interview ID or consecutive numbering; data of opinion on each dilemma and of all 12 arguments per dilemma. Desirable would also be information on age, gender, and level of education in order to check on the validity of the MJT data
- All data in text-format; TABs as delimiters; first row: names of the variables/columns
Error in older scoring guideline corrected
In an early guideline, the stage code for the third and fourth pro argument in the doctor dilemma was false; it must be corrected as indicated in the table below:
||Doctor Dilemma PRO-Arguments
False Stage Code
Correct Stage Code
MJT-data which have been scored in Konstanz are not affected.
Missing data: What if a participant has not fillied out all 26 questions of the standard MJT?
If you use an online-version of the MJT which checks automatically for missing data and reminds the participant to complete his or her answers, missing data cannot occur.
Otherwise, missing data can be a problem for the scoring of the MJT. In my experience, missing data are usually not made on purpose but are caused by distractions and fatigue. Therefore you should make sure in your instruction of the participants that they do not forget to answer out all questions. Also you should allow for sufficient time for answering the MJT. In some cases missing data can be caused by the wording of the MJT if the participants are very young or have little reading proficiency (you are allowed to explain difficult words to the participant). Note that the wording of arguments must not be changed. A change would require the modified test to be validated and certified again. However, the wording of the story can be carefully modified to enhance readability.
If the questions about the decision of the protagonist is omitted, the C-score can still be calculated. However, omission of these two questions are a problem if you want to calculate scores that involve "opinion agreement."
If only one or two responses to the 24 arguments are missing, we replace missing data by the individual mean score that are calculated on the bases of the other 22 or 23 responses of that participant. This seems to be the most neutral way to replace missing data. (Do not forget to document the number of cases with missing data in your research report.) To make sure that this replacement has no biasing effect, you should run your most central analyses both with and without the modified data and compare the findings.
As a matter of convention, we discard all participants (cases) from analysis who have more than 2 missing data. (Do not forget to mention this in your research report.) Their C-score cannot be validly interpreted. In some instances, it may be interesting to analyze this phenomenon. If it cannot be explained as a technical problem, it may indicate a psychological process which deserves attention.
Is the MJT valid?
Yes, the MJT is highly valid because it has been put to more rigorous validity analysis than most if not all other tests of moral development.
The MJT has been submitted to more rigorous validation process than most psychological measures. The criteria chosen for checking its validity are so demanding that even minor defects of the test would have been detected. These criteria have also proven to be very effective in securing the validity of new dilemmas and the cross-cultural validity of more than thirty foreign language versions of the MJT (see Lind, 2008; certification procedure).
Moreover, it should be noted that the MJT has not been submitted to "item-selection" in order to increase the likelihood of confirming any of the predictions to be tested with the MJT. For example, no items have been omitted or included in order to maximize correlation with age. Thus the MJT is not biased for or against a specific assumption.
Note that validity is not just an attribute of a test but of the whole measurement procedure including its interpretation: "Validity is an integrated evaluative judgment of the degree to which
empirical evidence and theoretical rationales support the adequacy and
appropriateness of inferences and actions based on test scores or other modes
of assessment" (Messick, 1989, p. 13, emphasis added). Hence, the MJT can claim validity only if one administers it according to the standard procedure described above, and if the user has sufficient psychological knowledge about the Dual-Aspect-Theory of moral behavior and development (see above) to be able to interpret MJT scores adequately.
Over the past 30 years the MJT has shown to be very useful for testing theoretical assumptions about moral behavior and development and about the effect size of certain educational programs (Lind, 2005).
Messick, S. (1989). Validity. In R.L. Linn, ed.,
Educational measurement (3rd ed.), pp. 13-103. New York: Macmillan.
Is the MJT reliable?
Yes, the MJT is highly reliable, not only in the conventional way but also in more meaningful ways, too:
- The MJT is reliable in the sense that neither its administration nor its scoring involve a "human factor" as is the case in open interviews.
- The MJT is reliable in the sense that the test instruction
and the test stimuli do not change at all.
- The MJT is reliable in the sense that it independent from the sample studied. Its scores do not change from sample to sample, as is the case when sample statistics are used to calculate individual test scores like in Guttman-scales, or Rasch-scales, z-transformation scores and scores based on standard deviations in a sample.
- The concept of internal consistency does not apply because the MJT regards consistency information of the response pattern as a sign of a person's moral judgment competence but not as an attribute of the test. That is, inconsistency is not considered as "measurement error" or "unreliability" but as a sign of the participant's "manifest pattern of behavior" (Kohlberg, 1984, p. 407; see theoretical background).
- The concept of stability does not apply because the MJT is an instrument to measure developmental change and change due to educational interventions. Such instruments must not be unalterable but sensitive to real changes.
- Hence, the MJT has not been submitted to "item-analysis" to maximize internal consistency or stability which which would have inevitably lowered the validity and the usefulness of the test.
- In spite of the fact that the MJT has not been tuned for classical reliability (of because of this?), Lerkiatbundit et al. (2006) report a reliability coefficient for the MJT of r = 0.90 ... more.
For what age groups can the MJT be used?
The MJT has been used for ages 10 on upward, if the participant has average reading and comprehension capabilities. For younger children or for children and adolescents with educational disadvantages, the MJT can and should be modified. This is especially necessary when the participants are not completely proficient in the language of the test, and when the participants lack sufficient education.
These modifications can be made without diminishing the validity of the MJT:
- Use larger print
- Use shorter response scales (-2 to +2 instead of -4 to +4)
- Simplify the language of the dilemma story (but do not touch the arguments/reasons)
- Apply the MJT in small groups and have someone to assist the participants when they have difficulties to understand a certain expression (but do not suggest an answer, of course)
- Offer post-test discussion of the MJT.
Please contact Dr. Lind for a simplified version of the MJT in German for these low age groups (8 years of age) or educationally disadvantages participants.
For younger children (grade 2 to 4) there is a one-dilemma children-version available (Juergen's Dilemma, by Dr. Zierer), which is however not fully certified yet. Please write to Dr. Zierer or me.
False news on the MJT
- "Almost all existing objective scorable instruments for measuring moral
development are also based on the idea of a series of hierarchically ordered,
qualitatively different steps: ... the MJT (Lind, 2000)...". Boom, J. (2009, p. 8). Measuring moral development: Stages as markers along a Latent
Development (chapter 8). Manuscript (personal communication).
Comment: When I survey the research literature on Kohlbergian moral development research, I felt that the Stage model does not agree with experimental data. Its core postulates of qualitative organization, non-regression or invariant sequencing, non-stage-skipping, and hierarchical integration have not shown to be empirically valid. Therefore, I proposed the Dual-Aspect-Theory of moral behavior and constructed, on this basis, a new test, the MJT. For a more recent statement on the dual-aspect theory, see Lind (2008). Hence, the MJT cannot be considered a measure of "moral stages". It is a measure of moral judgment competence.
- "Response methods [including MJT] are not as accurate a measure of individual
moral stage." Haste, H. et al. (1998, p. 325): Morality, wisdom and the life-span. In: A. Demetriou, W. Doise & C.F.M. van Lieshout, eds.,
Life-span developmental psychology, pp. 317-350. New York: John Wiley & Sons.
- "Stages cannot be assessed [with the MJT]." Oser, F. & Althof, W. (1992, p. 176): Moralische Selbstbestimmung. Modelle der Entwicklung und Erziehung im
Comment: The MJT does not claim to measure moral stages because it is built on the assumption that Kohlbergian stages have not shown to exist empirically.
- "The Defining Issues Test (DIT) by James Rest has been adapted in German as the Moral Judgment Test (MJT) ..." (several authors)
Comment: Both tests are based on totally different psychological and methodological theories and have nothing in common but a common dilemma story. Rest rejected the use of counter-arguments to the participants because he thought of them as being "artificial." Rest (1979, p. 89: Development in judging moral issues. Minneapolis, MI: University of Minnesota Press.) writes: "The artificiality of the [con] statement interfered with its usefulness in
studying modes of reasoning. For the most part, information from these
statement was useless and had to be eliminated from the analysis." (p. 89) The reason for this elimination was that Rest and his colleagues used conventional psychometric theory as a criterion for judging the validity of the items of the DIT. Thus they believe that their psychometric is irrefutable.
In contrast, Lind intended to measure moral judgment competence rather than "modes of moral reasoning" and has built the MJT on the basis of multivariate experimental design, in order to be able test the validity of assumptions underlying the MJT.
- "As measure of moral judgment competence, the modal stage [of preference] was calculated for each of the two dilemmas [of the MJT..." Beck, K. (1993, p. 102): Dimensionen der ökonomischen Bildung. Meßinstrumente und
Nürnberg Universität Erlangen-Nürnberg, Lehrstuhl fuer Wirtschaftspädagogik. Unpublished manuscript.
Comment: The MJT measures moral judgment competence through its C-score or C-index, but not through calculating a score for moral preferences.
- "Outcomes from [the MJT] would overestimate moral competence" Beck, K. et al. (2002, p. 112): Autonomy in heterogeneity? Development of moral judgment behavior
during business education. In:
K. Beck, ed.,
Teaching-learning processes in vocational education, pp. 87-119. Frankfurt: Peter Lang.
Comment: Actually, of all measures, which claim to measure moral competence, the MJT produces the lowest scores, because it is the only one which poses a difficult task to the respondent.
- "Although the measurement of moral developmental stage [with the MJT] lacks validity. ... [The correlation between the scores of each dilemma -- it is not specified which one -- is only r = ] 0.10. ... This inconsistency is the lower the higher the level of education." [my transl. GL] Herrmann, D. (2000, pp. 13-14): Religiöse Werte, Moral und Kriminalität. In: J. Allmendinger, ed.,
Gute Gesellschaft? Verhandlungen des 30. Kongresses der Deutschen
Gesellschaft für Soziologie in Köln 2000, Teil B, pp.
802-822. Opladen: Leske + Budrich.
Comment: The MJT does not claim to measure developmental stages, so it cannot lack validity in that respect. It is not clear which scores the author has calculated, hence we cannot know what the r = 0.10 means. Moreover, the correlation between subtests is not considered as an index of validity in methodological literature, and such correlations depend strongly on the variance of the scores in a given sample, and on the moral judgment competence of the participants. None of these factors has been considered by this author.
- "Eine Individualdiagnose des
Entwicklungsstandes der moralischen Urteilskompetenz läßt sich anhand dieses
Instruments kaum vornehmen. Einerseits, weil die durch eine
Varianzkomponentenzerlegung ermittelten Werte eine Stufenbeschreibung nicht
ermöglichen; andererseits, weil sich wegen der fehlenden Altersdifferenzierung
... eine Ermittlung des Entwicklungsstands der präfererierten Wertperspektiven
nahezu erübrigt." My translation: "We cannot use this test for individual diagnosis of the developmental status of moral judgment competence. On one hand, because the [C-score] does not allow the definition of the Stage; on the other hand because the development of moral preferences is meaningless because of the lack of correlation with age." Schmied, D. (1981, p. 61): Standardisierte Fragebogen zur Erfassung des Entwicklungsstandes der
moralischen Urteilskompetenz. Diagnostica 27, 51-65.
Comment: (a) No, the MJT has not been designed for individual diagnosis or selection purposes but for research and program evaluation. (b) In the past 35 years, hundreds of studies have been done which show that the MJT is a valid measure of moral judgment competence. (c) However, one must not, as the authors seems to do, confuse moral competence with moral preferences. (d) Indeed, moral preferences or orientations do not correlate with age because they do hardly differ among people. Neither does moral competence correlate consistently with age, because it is not a function of biological maturation but of high quality education.
- Tests like the MJT "require high reading ability of the participants. They can be applied in people not younger than 12 years of age. For younger subjects and for adolescents with reading problems, the must not be used." (my transl. GL) Krettenauer, T. & Becker, G. (2001, p. 189): Entwicklungsniveaus sozio-moralischen Denkens.Diagnostica, 47, 188-195.
Comment: Actually, these fears are not supported by empirical evidence. The MJT has been applied with children as young as 8 years of age. For participants with reading problems, the test administrator may give help in understanding certain words. Moreover, since the MJT is no speed-test and participants can take as much time as they need, reading problems do not seem to affect the test scores.
- "Durch die Gewichtung der sechs Aussagen nach Akzeptabilität ergibt sich die
kognitive Dimension; die affektive Dimension ergibt aus dem "modalen
Präferenzwert" (Oser/ Althof, 2001, 176). Die Stufe wird aus einer"intraindividuellen Konsistenzmessung" (ebd.) ermittelt." Translation: "The cognitive dimension results from the weighting of six statements according to their acceptability.; the affective dimension results from the 'modal preference value' (Oser/Althof, 2001, 176). The stage is inferred through a intra-individual measurement of consistency." Sterba-Philipp, C. (2003, p. 22): Dilemma-Geschichten zur Förderung moralischer Urteilsfähigkeit einer
Förder- und Hauptschulklasse einer Schule für Körperbehinderte. http://www.foepaed.net/sterba-philipp/dilemma.pdf (1.11.2004)
Comment: (a) The "cognitive dimension" of the MJT does NOT result from weighting the acceptability of six statements (which six should that be?], nor does the affective dimension of the MJT result from calculating "modal preferences". Actually, the cognitive aspect of moral judgment behavior, namely moral judgment competence, is calculated through an intra-individual analysis of variance components; and the affective aspect is indexed by the average preference of the six moral orientations as defined by Kohlberg. (b) The MJT has not been designed to measure Kohlbergian Stages because its underlying dual-aspect theory is not compatible with the stage theory.
- "... von den 44 Protokollen
nach den von Lind (1977) angegebenen Kriterien nur 21 auswertbar waren. ...
In neun von 21 Protokollen [ergaben sich] theoretisch nur schwierig
begründbare Ergebnisse, weil entsprechend der Auswertung vom Vor- zum Nachtest
Stufenübersprünge und -regressionen von zwei bis fünf Stufen vorkamen. ... Andere Forscher (HINDER, in
Vorb.) berichten von denselben Problemen mit dem MUT." Translation: "... of 44 protocols [filled out tests] only 21 were scorable according to the criteria given by Lind (1977). ... In nine of 21 protocols findings resulted which could hardly be justified by theory, because they showed stage-skipping and stage-regressions of two to five stages between pretest and posttest. ... Other researchers found report similar problems with the MJT (Hinder, in preparation)". Schlaefli (1986, p. 166): Förderung der sozial-moralischen Kompetenz: Evaluation, Curriculum und
Durchführung von Interventionsstudien.
Frankfurt: Peter Lang.
Comment: (a) Incompletely filled out MJT sheets and unscorable data sets are extremely rare. Complete data sets have always been scorable according to my criteria (Lind, 1977, 2008). In many cases, the data sets were submitted to strict validity analysis and no unscorable data set was found, if the researcher used a certified test version. It seems that Schlaefli and his colleague Hinder have applied their own criteria, which do not seem to be compatible with the MJT. (b) It is not clear how these authors could observe "stage-skipping" and "stage-regression" with MJT data, because the MJT does NOT claim to assess "Stages" and does not facilitate stage assessment at all. (c) The report by Schlaefli is unique. Besides him and his colleague Hinder, who worked with the same data, no other researcher has ever found anything alike although the MJT has been in use for over 35 years now and been used in many hundred studies with thousands of subjects.
- "After finishing his analysis, the author [of this study] became aware of the fact that in studies with the MJT the return rate [of filled out questionnaire] usually was 50%." (my translation, GL) Mieg, H.A. (1994, p. 208): Verantwortung. Moralische Motivation und die Bewältigung sozialer
Opladen: Westdeutscher Verlag.
Comment: (a) Even though this author seems to have a complaint, a return rate of 50% is unusually high for a survey study. In fact, in our first survey study with the MJT, the return rate was even over 70%. Typically, return rates of survey studies tend to be much lower. (b) The MJT seems to keep the return rates high. Many respondents tell us that answering the MJT is much more interesting than answering many of the other scales which we included in our test batteries.
- "Lind has changed the MJT several times..." Rest, J.R., Thoma, S.J., & Edwards, L. (1997, footnote 5): Designing and validating a measure of moral judgment: Stage preferences
and stage consistency approaches. Journal of Educational Psychology, 89(1), 5-28.
Comment: Actually, in 37 years the MJT has shown to be so valid and fruitful for research that it needed hardly any change. Only some minor corrections were made (for more details, click here). In contrast, the Defining Issues Test by Rest at al. underwent a major revision of the test content and several major revisions of the scoring system (from P-score to P-2 score, N-score and U-score), which makes it hard to compare DIT-findings over various generations of research.
- "The studies that specifically are lacking in MJT research are (a)
studies of 'moral experts' like philosophers or political scientists. (b)
Relating Lind's measure of moral competence to some other psychological test
of moral comprehension or moral competence. (c) Longitudinal studies that
contain some way of characterizing 'enrichted' or 'stimulating' life
experiences other than education. (d) Detailed reports and replications of
moral education programs, with control groups. (e) Studies linking moral
judgment with behavioral measures (going beyond the moral judgment test
itself)."Rest, J.R., Thoma, S.J., & Edwards, L. (1997, footnote 8): Designing and validating a measure of moral judgment: Stage preferences
and stage consistency approaches. Journal of Educational Psychology, 89(1), 5-28.
Comment: This critical evaluation do not seem to be based on the reading of research literature. See, for example, the compiled references on this web-site. In detail: (a) Five renown moral expert were involved in constructing the MJT through stage-rating its items. (b) The MJT is the only true measure of moral competence; how could it be compared with other such tests? (c) The MJT as used in a longitudinal study of university students in five different countries; no other test has been used in a similar way. In this and on other studies stimulating life experiences (like opportunities for responsibility-taking and guided reflection) were assessed in many life areas outside the syllabus (Lind, 2000; Schillinger, 2006; Lupus, 2009; for downloading click here). There are no other studies which did such comprehensive assessment of the learning environment. In DIT studies mostly characteristics of the learner was assessed, and only few characteristics of his or her environment. (d) Many moral education programs have been evaluated with the MJT, including pretests, posttests, follow-up studies, and control groups, and, of course, detailed reports have been given (see, e.g., Lind, 2002 and more here). (e) In contrast to most, if not all, other tests of moral development, the MJT is itself an experimental test of behavior. Moreover, there are several studies linking moral judgment competence as measured with the MJT to the ability to behave morally in other settings (see here). Finally, it has been shown in two experiments that the MJT's C-score cannot be faked upward like DIT's P-score.
- "Durch dieses Vermeiden 'moralischer' Reizworte wird [im MUT] versäumt,
dem Befragten zu signalisieren, daß nach seinem moralischen Urteil gefragt
wird." (p. 343) "Gefahren" bei standardisierten Fragebogen:
S. 346 "Unlust, Müdigkeit und Meinungslosigkeit, individueller Antworthabitus
und Präferenz für sozial erwünschte Reaktionen beeinflussen potentiell die
Gültigkeit aller durch standardisierte Verfahren erhobenen Einstellungsdaten.
Dennoch würde man ihretwegen keineswegs auf die ungeheuren pragmatischen
Vorzüge von Standardisierungen verzichten." (p. 346) "Opinion-Agreement" (läßt sich aber kontrollieren oder nutzen, meint
G.N.-W.) und "Tiefstapeln... wenn "Abwehrmechanismen" wirken oder "Scheu vor
hehren Worten" besteht. (p. 347) "Kreatitivät" (mehrere Handlungsoptionen) des Antwortenden wird
unterdrückt. (das ist irrelevant für die Moraleinstufung!) (p. 354) "Willkürliches ankreuzen: ... lange Ankreuz-Sequenzen ermüden oder
langweilen ihn schnell; komplizierte Sätze können seine Lesefertigkeit oder
sein Verständnis überfordern... Willkürlich gesetzte Kreuze sind natürlich bei
der Auswertung standardisierter Erhebungen nicht als solche zu erkennen. (p. 344) "Response-Set: ... acquiescence... Nun lassen sich solche verzerrenden
Antwortmuster u.U. durch statistische Verfahren herausfiltern... [aber] die
Gültigkeit der bereinigten Daten scheint... problematisiert, da der Schnitt
zwischen [den response-set und dem MU] ja nicht theoretisch begründet, sondern
aufgrund des statistischen Konstrukts eines 'Normal-Antworthabitus'
vorgenommen werden kann." (p. 345) "Social Desirability: ... daß Ankreuzungen schlicht als Ausdruck der
Zustimmung oder Ablehnung des propositionellen Gehalts einer vorgelegten
Aussage gewertet werden können.... positive Bild von sich selbst machen.
[Im offen Interview bestehe die Gefahr nicht, da die Moraleinstufung]
ausschließlich nach der Struktur der Begründungen vorgenommen wird. Und
Argumentationsstrukturen lassen sich nun einmal nicht 'hochstilisieren'." (p. 345) Summary: The author rejects all recognition tests of moral development for these reasons: (a) Lack of interest, tiredness, lack of opinion, response set, and preference for socially desired answers the validity of attitudes which are assessed through standardized tests." (p. 343) and these biases "cannot be discovered in the analysis of such standardized test" (p. 345). Nunner-Winkler, G. (1978). Probleme bei der Messung des moralischen Urteils mit standardisierten
L. Eckensberger, ed.,
Entwicklung des moralischen Urteilens, pp. 337-358. Saarbrücken: Universitätsdruck.
Comment: (a) The MJT has been constructed to measure moral competence,not moral attitudes. Many biases which the author counts as possible threats to validity apply only to attitude tests. The measurement of competencies can be biased but by different threats (see above). (b) Yet, these biases can be detected. There are three very rigorous criteria for checking on the validity of MJT data, which allow us to detect most severe biases in the data.
- " Nach vorliegendenn Untersuchungen
wären bei einem Einsatz derartiger Instrumente bei unserer
Erwachsenenstichprobe außerdem wahrscheinlich ceiling-Effekte aufgetreten,
d.h. die meisten Befragten hätten ... postkonventionelle Argumente präferiert.
Unterschiede ihrer moralischen Urteilsfähigkeit wären dann also wenig in
Erscheinung getreten." Translation: "According to existing studies, we expected ceiling-effects in our sample of adults, that is, most respondents would have preferred postconventional arguments. Difference in moral judgment competence would have not appeared therefore." Spang, W. & Lempert, W. (1989, p. 19): Analyse moralischer Argumentationen (Textteil). Berlin Max-Planck-Institut für Bildungsforschung.
Comment: According to our general understanding of the meaning of words, moral competence is something completely different from the preference for certain types of moral reasoning. Therefore, a high preference for postconventional moral reasoning does not signify a high moral judgment competence. In fact, all studies on moral judgment competence agree that even adults mostly show rather low moral judgment competence on the MJT.