The reliability of the total scores on three rating scales (Melancholia Scale and the two Newcastle Scales) and the algorithms leading to the Feighner, Research Diagnostic Criteria, and the DSM-III subtypes of depression have been compared. The degree of inter-observer agreements for the various item-combinations was significantly higher than would be expected by chance. The average agreement for each assessment system ranged from 80 to 93 per cent. This 7 to 20 per cent lack of total agreement probably reflects the limitation of clinical assessments including the influence of halo effects.