The validity phase in the development of the rating scales was a nearly two-year process with five iterations of the environment scale for review and comment by the validity team. The apparent disparity of validator views coalesced over the five rating scale reviews. The 155-item rating scale developed, grew and improved with the input of the 24-member validation team. Noted Montessori educators, teacher trainers and school heads across the United States participated the validation phase.
The reliability phase in the development of the rating scales utilized interrater reliability. The consistency of paired
raters to produce the same scores is crucial. In practical use a single rater assesses the environment. The same score must be produced regardless of the rater. The assessment outcome must be irrespective of the particular rater.
Test-retest reliability was not utilized because the same rater could simply have the same biases on the retest. Consistent scores could be achieved but the prediction and assurance that different raters would achieve the same outcomes are not achieved with test-retest. Functionally annual re-test by the same rater measures the development of the environment and the development of the teacher. However, test-retest in the development of the rating scale is not the appropriate reliability measure for measuring the consistency of the scale across raters.
Internal reliability was determined to be inappropriate because each item on the scale is distinct and independent
of the other items.
The pilot reliability phase resulted in low reliability outcomes. Examination and clarification of all items as well as the rating process led to subsequent higher reliability outcomes. The major realization was the need for stronger and more extensive rater training prior to utilizing the scale. The final reliability phase included rater training with Power-Point and video segments. These issues contributing to low reliability were identified in the pilot. The rater training not only emphasizes the necessity of rater objectivity, it also addresses and clarifies these issues:
· the 3-point options, and the “N/A” (not applicable) and “O” (optional) scoring choices
· the rationale for “Substandard”
· the rationale for “Above Standard” (in order to avoid the “halo effect”)
· the requirement to score all items (eliminating a rater’s avoidance of using the “Substandard” score).
Additionally, a teacher comments section is included at the conclusion of the scoring, allowing for the Q-PIP.
The Spearman rank order correlation was used in generating a reliability rating for each item, for each subscale, and for the total scale.
The internal consistency of the MRS-EC-E is assessed at the subscale and total scale level. Each subscale measures quality of that item, while the total scale is an indicator of the global quality of an environment.