Hamilton Rating Scale for Anxiety: exploring validity with robust measures of classical theory parameters and a rating scale model in university students

Md Dilshad Manzar; Faizan Z. Kashoo; Mohammed Salahuddin; Dejene Nureye; Habtamu Acho Addo; Seithikurippu R. Pandi-Perumal; Amir H. Pakpour; Ahmed S. Bahammam

doi:10.1192/bjo.2025.10055

Hamilton Rating Scale for Anxiety: exploring validity with robust measures of classical theory parameters and a rating scale model in university students

Published online by Cambridge University Press: 12 August 2025

Seithikurippu R. Pandi-Perumal ,

Amir H. Pakpour

and

Ahmed S. Bahammam

Show author details

Md Dilshad Manzar: Affiliation:
Department of Primary Nursing Care, College of Nursing, Majmaah University, Majmaah, Saudi Arabia
Faizan Z. Kashoo: Affiliation:
Department of Physical Therapy and Health Rehabilitation, College of Applied Medical Sciences, Majmaah University, Majmaah, Saudi Arabia
Mohammed Salahuddin*: Affiliation:
Department of Pharmaceutical Sciences, School of Pharmacy, Notre Dame of Maryland University, Baltimore, Maryland, USA
Dejene Nureye: Affiliation:
School of Pharmacy, College of Medicine and Health Sciences, Mizan-Tepi University, Mizan-Aman, Ethiopia Research Unit of Neuroinflammatory and Cardiovascular Pharmacology, Faculty of Science, University of Dschang, Dschang, Cameroon
Habtamu Acho Addo: Affiliation:
School of Pharmacy, College of Medicine and Health Sciences, Mizan-Tepi University, Mizan-Aman, Ethiopia
Seithikurippu R. Pandi-Perumal: Affiliation:
Centre for Research and Development, Chandigarh University, Mohali, Punjab, India Division of Research and Development, Lovely Professional University, Phagwara, Punjab, India
Amir H. Pakpour: Affiliation:
Department of Nursing, School of Health and Welfare, Jönköping University, Jönköping, Sweden
Ahmed S. Bahammam: Affiliation:
University Sleep Disorders Center, Department of Medicine, College of Medicine, King Saud University, Riyadh, Saudi Arabia National Plan for Science and Technology, College of Medicine, King Saud University, Riyadh, Saudi Arabia
*: Correspondence: Mohammed Salahuddin. Email: smohammed@ndm.edu

Article contents

Abstract
Background
Aims
Method
Results
Conclusions
Method
Results
Discussion
Supplementary material
Data availability
Author contributions
Funding
Declaration of interest
References

Rights & Permissions

Abstract

Background

No research has assessed Hamilton Rating Scale for Anxiety (HRSA) psychometric properties in Ethiopian university students, using item response theory (IRT) and classical theory.

Aims

This study aimed to assess psychometric properties of the English HRSA in Ethiopian students, using IRT and classical theory.

Method

University students (N = 370, age 21.44 ± 2.30 years) in Ethiopia participated in a cross-sectional study. Participants completed a self-reported measure of anxiety, a sociodemographics tool and interviewer-administered HRSA.

Results

Confirmatory factor analysis (CFA) favoured a one-factor structure because fit indices for the one-factor model; and two distinct two-factor models were similar, but high interfactor correlations violated discriminant validity criteria in two-factor models. This one-factor structure showed structural invariance as evidenced by multi-group CFA across gender groups. No ceiling/floor effects were seen for the HRSA total scores. Infit and outfit mean square values for all the items were within the acceptable range (0.6–1.4). Four threshold estimates (τi1, τi2, τi3 and τi4) for each item were ordered as expected. Differential item functions showed item-level measurement invariance for all the 14 HRSA items across gender for both uniform and non-uniform estimates. McDonald’s ω and Cronbach’s α for the HRSA tool were both 0.88. The convergent validity of the interviewer-administered HRSA with self-reported anxiety subscale of the 21-item Depression, Anxiety and Stress Scale was weak to moderate.

Conclusions

The findings favour the validity of a one-factor structure of the HRSA with adequate item properties (classical and rating scale model), convergent validity, reliability and measurement invariance (structural and item level) across gender groups in Ethiopian university students.

Keywords

Reliability validity item response theory psychometrics Ethiopia

Information

Type: Paper
Information: BJPsych Open , Volume 11 , Issue 5 , September 2025 , e176

DOI: https://doi.org/10.1192/bjo.2025.10055 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of Royal College of Psychiatrists

Anxiety disorders are a common mental health condition that can significantly affect an individual’s quality of life. The World Health Organization estimates that 301 million people have anxiety disorders, with approximately 20% of them being children and adolescents.¹ Patients diagnosed with anxiety disorder are three to five times more likely to visit the doctor and six times more likely to be admitted to hospital for other psychiatric problems compared with those who were not diagnosed with an anxiety disorder.^{Reference Hirschfeld2,Reference Kessler, Stang, Wittchen, Ustun, Roy-Burne and Walters3} Excessive fear, worry and behavioural abnormalities are some of the hallmark traits of anxiety disorders¹. Anxiety disorders can also significantly impair an individual’s ability to perform daily activities, maintain employment and engage in social interactions.^{Reference McKnight, Monfort, Kashdan, Blalock and Calton4} Intriguingly, the COVID-19 pandemic has exacerbated stress and anxiety levels by 25%, especially among youth and women.¹ A recent systematic review reported a global prevalence of anxiety of 27–30%.^{Reference Mahmud, Hossain, Muyeed, Islam and Mohsin5,Reference Nochaiwong, Ruengorn, Thavorn, Hutton, Awiphan and Phosuya6} Furthermore, in 2020, anxiety disorders caused 44.5 million disability-adjusted life-years (DALYs) globally.⁷

University students at colleges and universities are the most vulnerable population who might require mental health intervention services.^{Reference Czyz, Horwitz, Eisenberg, Kramer and King8} The transitional phase of emerging adulthood, which typically occurs during the university years, can indeed be a period filled with various life events and significant stressors ranging from starting college to leaving home and moving to a new city and making new friends to taking finals and looking for a job.^{Reference Park, Andalibi, Zou, Ambulkar and Huh-Yoo9} However, many university students do not seek mental health treatments because of concerns regarding confidentiality, time and cost constraints, unpleasant experiences with professional help and greater reliance on family and friends for support.^{Reference Eisenberg, Golberstein and Gollust10,Reference Givens and Tjia11} Failure to timely address the mental health needs of university students is associated with poor academic performance, behavioural issues, dropping out, substance misuse, school violence and suicide.^{Reference Park, Andalibi, Zou, Ambulkar and Huh-Yoo9} Furthermore, university students from low-income countries may go undetected for mental health illnesses because of the lack of a sophisticated infrastructure to screen for such disorders. In this regard, interviewer-administered anxiety screening methods may be beneficial for preliminary screening for such illnesses.

Anxiety disorders remain underdiagnosed, with only a minority detected in primary care services, presumably because of the underutilisation of diagnostic questionnaires in routine practice, as well as the cost and time involved in consultations.^{Reference Kessler, Lloyd, Lewis and Gray12} Therefore, simple, quick questionnaires are needed for routine practice. The Hamilton Rating Scale for Anxiety (HRSA) is a self-reported interviewer-administered questionnaire comprised of 14 questions that assess anxiety.^{Reference Hamilton13} A thorough psychometric validation of the HRSA in the Ethiopian population may help in establishing an evidence-based application as an initial screening tool for anxiety in this resource-limited setting. Such a prospective clinical application may help in the expansion of cost-effective screening for mental health issues of vulnerable groups. The HRSA is a widely used assessment tool for measuring anxiety symptoms. Unlike other tools such as the Generalised Anxiety Disorder Scale (GAD-7) and the Depression, Anxiety and Stress Scale (DASS-21), the HRSA can detect physiological and psychological symptoms.^{Reference Thompson14} The HRSA questionnaire is valid in various primary healthcare settings.^{Reference Thompson14–Reference Matza, Morlock, Sexton, Malley and Feltner16} Although the HRSA is a widely used measure for screening anxiety symptoms, there is a relative scarcity of research using item response theory (IRT) framework and classical theory parameters.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} IRT modelling has several distinct advantages over classical test theory, such as the modelling of item-latent variable relationships (θ) with nonlinear functions, item parameters in the latent attribute metric and conditional precision parameters.^{Reference Baker and Kim18–Reference Samuel and Bucher22} Through IRT analysis, we could evaluate the psychometric properties of each item, examine their discrimination power, assess the overall scale functioning and explore potential improvements to enhance measurement precision in this context of use.^{Reference De Ayala20–Reference Samuel and Bucher22} However, only few studies have investigated the psychometric validity of the HRSA employing structural invariance tests, differential item functioning, Wright map and item characteristic curves (ICCs).

Therefore, in this study, the psychometric validity of the HRSA was investigated with an elaborate list of measures from both IRT and robust classical theory parameters, in a sample of Ethiopian university students.

Method

Participants and study design

We conducted an observational study with cross-sectional data collection and random sampling. A sample of 500 students from the Mizan campus of Health Science at Mizan-Tepi University (MTU) was earmarked for participation. Random samples were drawn from class sections comprising pharmacy, midwifery and nursing students. All selected students were extended invitations to participate in the study. The inclusion criteria were registration in MTU courses at the time of the study. Students under 18 years were not included as their participation may involve obtaining the consent of their parents and guardians, most of whom are usually located in remote areas. The findings from a sample (N = 370, mean age 21.44 ± 2.30 years) who completed this study are presented. Participants were given a summary of the research plan in simple language. A researcher’s contact detail was shared with participants to contact if they had any doubt or needed any more information. They were informed that all personal data would be kept confidential and that participation was voluntary. There were no rewards and no risks to participants’ health. Participants were free to withdraw at any time without any liabilities, and informed written consent was collected from the participants. The Ethical Institutional Committee, College of Medicine and Health Sciences, MTU, Ethiopia, approved the research plan (approval number DoP/0281/17). The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2013.

Participating students completed the HRSA, DASS-21 and a sociodemographic information sheet in English. The HRSA was administered by researchers who were faculty members of the College of Medicine and Health Sciences, MTU. These faculty members were trained in the interpretation and assessment of interpreting and assessing the disease condition. The students filled in the DASS-21 and the sociodemographic information sheet. Participating students have adequate English proficiency as the mode of education in MTU.

Instruments

The HRSA

The HRSA is a 14-item brief questionnaire to measure the severity of anxiety symptoms.^{Reference Hamilton13} All items are rated on a Likert scale of 0 (not present) to 5 (very severe). Item scores are added to get a total score (range: 0–56). Higher scores indicate higher severity of anxiety symptoms.^{Reference Hamilton13} The HRSA is widely used in both clinical and research settings. Additionally, the HRSA has been shown to have adequate validity in adolescents and the general population, mostly using classical theory parameters.^{Reference Clark and Donovan23,Reference Hallit, Haddad, Hallit, Akel, Obeid and Haddad24}

Sociodemographic questionnaire

Participating students completed a brief structured questionnaire for sociodemographic information. In addition, information regarding age (years), ethnicity, gender, grade at the most recent examination, self-reported presence of chronic disease/conditions, failure to pass the previous examination and class attendance were collected.

The DASS-21

The DASS-21 is the shortened version of a questionnaire tool with 42 items, i.e. DASS-42. The DASS-21 was developed by Lovibond and Lovibond in 1995.^{Reference Lovibond and Lovibond25} All the items were scored on a scale of 0 (‘Did not apply to me at all’) to 3 (‘Applied to me very much or most of the time’). The DASS-21 has 21 items and three subscales, depression, anxiety and stress, with seven items each. Item scores were added to generate subscale scores (range: 0–21). Higher scores for the DASS-21 subscales indicate increasing severity of depression, anxiety and stress symptoms. The DASS-21 is valid in various settings and populations.^{Reference Henry and Crawford26} In this study, the anxiety subscale of DASS-21 was used to determine the convergent validity of the HRSA.

Data analysis

SPSS (version 23.0 for Windows), JASP (version 0.17.0.0 for Windows; JASP Team, Amsterdam, The Netherlands; https://jasp-stats.org) and JAMOVI (version 2.3.18 for Windows; The Jamovi Project, Sydney, Australia; https://www.jamovi.org) were used for the statistical analysis. Descriptive parameters were used to present participant characteristics and the HRSA item score distribution. The distribution of the HRSA item scores was deemed suitable for performing factor analysis based on Bartlett’s test of sphericity (χ ²(91) = 2267.11, P < 0.001, Kaiser–Meyer–Olkin (KMO) test of sampling adequacy 0.90), and most of the inter-item correlation coefficients (75 out of 91) were 0.3 and above (Supplementary Table 1 available at https://doi.org/10.1192/bjo.2025.10055).^{Reference Manzar, BaHammam, Hameed, Spence, Pandi-Perumal and Moscovitch27,Reference Manzar, Jahrami and Bahammam28} As the HRSA item scores are ordinal variables, confirmatory factor analysis (CFA) was performed with diagonally weighted least squares (DWLS) with a pairwise deletion method of handling missing values, using JASP 0.17.0.0. Standardised estimates of factor loading with robust standard error were determined. CFA assessed the validity of the original one-factor as well as two distinct two-factor models that had been investigated in previous studies.^{Reference Hallit, Haddad, Hallit, Akel, Obeid and Haddad24,Reference Rodriguez-Seijas, Thompson, Diehl and Zimmerman29} As indicated by recent systematic reviews, we estimated multiple indices to unambiguously assess the fit of a model.^{Reference Manzar, BaHammam, Hameed, Spence, Pandi-Perumal and Moscovitch27,Reference Manzar, Jahrami and Bahammam28} The following fit indices were used: Bollen’s incremental fit index (IFI), parsimony normed fit index (PNFI), chi-squared (χ²) test, comparative fit index (CFI), goodness of fit index (GFI), root mean square error of approximation (RMSEA) and standardised root mean square residual (SRMR). A value of 0.9 or higher for the CFI, IFI and GFI, and a value of 0.08 or lesser for the SRMR, and RMSEA were taken to indicate a good fit. Moreover, a nonsignificant χ ^2-test may indicate a better fit.^{Reference Manzar, BaHammam, Hameed, Spence, Pandi-Perumal and Moscovitch27,Reference Ullman30} Multi-group CFA was performed to assess the measurement invariance of the HRSA across gender. Three levels of factorial invariance were measured: configural invariance (that the same factor structure exists across different groups), metric invariance (the factor loadings for the items are equal across different groups) and scalar invariance (both the factor loadings and the intercepts of the items are equal across different groups). Factorial invariance was established if ⊿CFI was less than 0.01 and ⊿RMSEA was less than 0.015.^{Reference Chen31} Factorial invariance was conducted with JASP 0.0.17.0.0 with a DWLS estimator, estimation for mean, and intercepts.

The polytomous rating scale model was used because all 14 items of the HRSA are Likert scales with scores ranging from 0 to 4.^{Reference Sözer and Kahraman32} Parametric IRT analysis properties such as item difficulty, an information-weighted fit statistic (infit) mean square (MnSq) and outlier-sensitive fit statistic (outfit) MnSq, and thresholds (τi1, τi2, τi3 and τi4) were determined with the eRm R package in the snowIRT program of the JAMOVI 2.3.18, using marginal maximum likelihood estimates.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} Furthermore, Wright map person–item distribution and ICCs were used to show the graphical representation of the estimated item parameters.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} The differential item function (DIF) test was performed with the difNLR package of R in JAMOVI 2.3.18. DIF was estimated using these methods: model-adjacent,^{Reference Tutz33} type of DIF (uniform and non-uniform),^{Reference Tutz33} matching criterion z-score^{Reference Hladká and Martinková34} and Bonferroni multiple comparison adjustments. Item purification was not used as there were no concerns about DIF, and the HRSA tool is short. The difNLR:difORD function was used for ordinal data, which employs generalised logistic regression models for DIF estimation.^{Reference Hladká and Martinková34}

Mean and s.d. of the HRSA scores, percentage distribution across item scores, Cronbach’s α-test, McDonald’s ω, item-rest correlations, skewness and kurtosis were determined by JAMOVI 2.3.18. Concurrent validity was evaluated by a receiver operating characteristic (ROC) curve analysis, using SPSS 23.0. A dichotomised score of the anxiety subscale of the DASS-21 was used as the state variable, and the HRSA total score as the test variable in the ROC curve analysis. A score of 10 or above on the anxiety subscale of the DASS-21 was taken to indicate moderate to severe anxiety.^{Reference Lovibond and Lovibond25,Reference Lovibond35} Further, the area under the curve analysis was performed. A cut-off score for the test measure HRSA total score was determined at the point of the highest value of accuracy, the score at which the sum of sensitivity and specificity is highest.

Results

Participants’ characteristics

The average value of age, grade point average (scale of 4.0), anxiety subscale score of the DASS-21 and the HRSA total score were 21.44 ± 2.30 years, 13.1 ± 0.58, 13.01 ± 9.14 and 15.02 ± 9.66, respectively (Table 1). Amhara and Oromo linguistic ethnicities formed the majority of the participants (63.5%). More than two-thirds of the participating students were male (67.8%). The prevalence of self-reported chronic disease/health conditions was high (18.9%). The presence of backlogs in previous examinations was common, with one in seven students reporting it (14.1%).

Table 1 Participant characteristics of the university students

GPA, grade point average; DASS-21, Depression, Anxiety and Stress Scale - 21 Item; HRSA, Hamilton Rating Scale for Anxiety.

Factor analysis

CFA

All three models performed very similarly, as indicated by multiple fit indices: CFI, GFI and IFI were above 0.95, and SRMR and RMSEA were < 0.06 for all three models (Table 2). Both two-factor models had slightly better values for fit indices in the CFA except, the parsimony-adjusted PNFI (Table 2), but had high interitem correlations of 0.84 and 0.87 (Supplementary Fig. 1). Therefore, the one-factor solution was deemed suitable. According to the criteria of Comrey and Lee, an average factor loading of 0.65 indicate a very good level of overlap in the variance of HRSA item scores (Table 3).^{Reference Comrey and Lee36}

Table 2 Fit statistics of the Hamilton Rating Scale for Anxiety scores in university students

GFI, goodness of fit index; IFI, Bollen’s incremental fit index; CFI, comparative fit index; SRMR, standardised root mean square residual; RMSEA, root mean square error of approximation.

Table 3 Factor loading of some reported models of the Hamilton Rating Scale for Anxiety (HRSA) in university students

Structural invariance: one-factor model multi-group CFA across gender groups

The configural model tests the basic structure of the latent variable across genders, the metric model tests the factor loadings of the latent variable to be equal across genders and the scalar model tests both the factor loadings and the intercepts of the latent variable to be equal across genders.

The ∆χ ², ∆CFI and ∆SRMR values show the change in fit statistics between the current model and the previous model. If the ∆χ ² value is significant at the 0.05 level, it indicates that the current model has a significantly worse fit than the previous model. Conversely, if the ∆χ ² value is not significant, it suggests that the current model is not significantly worse than the previous model. The CFI and SRMR values should be close to 1 and 0, respectively, indicating a good fit. The RMSEA value should be less than 0.08 for an acceptable fit, and less than 0.05 for a good fit.

The results suggest that the configural model has a good fit, as the CFI and RMSEA values are both acceptable (Table 4). The metric and scalar models both have a slightly worse fit than the configural model, but they still have acceptable fit indices. The ∆χ ² values between the configural and metric models and the metric and scalar models are not significant, indicating that there is no significant difference in fit between these models (Table 4). However, the ∆RMSEA and ∆SRMR values suggest that there may be some improvement in fit from the metric to the scalar model. In conclusion, measurement invariance is confirmed across gender subgroups (Table 4).

Table 4 Structural invariance of the one-factor model of the Hamilton Rating Scale for Anxiety in university students across gender groups

CFI, comparative fit index; SRMR, standardised root mean square residual; RMSEA, root mean square error of approximation.

HRSA item analysis: classical theory parameters

There was no trend in the missing values of the HRSA scores: 0.4% missing values (19 values out of 5180) in 3.5% cases (13 out of 370) for eight HRSA item scores. There was no major issue of non-normality in HRSA item score distribution as the absolute value of skewness (highest value was 1.22 for HRSA item 10) and kurtosis (highest value was 0.68 for HRSA item 10) were less than 2 and 7, respectively (Table 5),^{Reference Kim37} as well as on the visual inspection. All HRSA items showed floor effect (>15% of respondents recorded the lowest possible score).^{Reference Lim, Harris, Dawson, Beard, Fitzpatrick and Price38,Reference Manzar, Albougami, Salahuddin, Sony, Spence and Pandi-Perumal39} No ceiling/floor effect was seen for the HRSA total score (range: 0–52; 3.5% (13 students) recorded the lowest score of 0, and none recorded the highest score of 56).^{Reference Lim, Harris, Dawson, Beard, Fitzpatrick and Price38,Reference Manzar, Albougami, Salahuddin, Sony, Spence and Pandi-Perumal39}

Table 5 Item analysis, internal homogeneity, of Hamilton Rating Scale for Anxiety (HRSA) scores in university students

*P < 0.001.

HRSA item analysis: Rasch rating scale model parameters

HRSA item 10 (respiratory symptoms) was the most difficult item, and HRSA item 3 (fears of the dark, strangers, etc.) was the easiest task, as indicated by item difficulty/severity scores of 0.333 and −0.380, respectively (Table 6). Infit and outfit statistics of the HRSA item scores were in the desirable range: 0.6–1.4. Four threshold estimates (τi1, τi2, τi3 and τi4) were determined for all 14 HRSA items, as these are scored from 0 to 4 with five response levels. For all 14 HRSA item scores, the thresholds were ordered (τi1 < τi2 < τi3 < τi4) (Table 6).

Table 6 Summary of item difficulty, polytomous mean-square fit statistics (infit, outfit) and threshold (τi) statistics of the rating scale model: Hamilton Rating Scale for Anxiety (HRSA)

All of the HRSA items showed invariance across gender groups indicated by non-significant likelihood ratio chi-squared statistics at adjusted P-values for both uniform and non-uniform estimates (Table 7). A visual inspection of the Wright map (Fig. 1) shows that the width of the spread of the person’s ability shown on the left panel, and the item difficulty level on the right panel do not match. The item difficulty level of the HRSA items do mostly correspond with the people with higher ability levels. An inspection of ICCs revealed that the response level functioned as expected for all of the HRSA item scores. For instance, at a latent dimension of 1.0 (Supplementary Fig. 2), the probability of getting a third response level (approximately 38%) is nearly similar for all HRSA item scores.

Table 7 Differential item function (DIF) test on the Hamilton Rating Scale for Anxiety (HRSA) in university students across gender groups

Fig. 1 Wright map person–item distribution for individual items of the Hamilton Rating Scale for Anxiety (HRSA). H1 to H14 are items of the HRSA.

Internal consistency and item discrimination

McDonald’s ω and Cronbach’s α for the HRSA tool were 0.88, and 0.88, respectively. McDonald’s ω if the item was deleted, and Cronbach’s α if the item was deleted ranged from 0.88 to 0.87, and 0.87 to 0.86, respectively (Table 5). The correlation coefficients of the item-total and item-rest scores of the HRSA ranged from 0.71 to 0.55, and 0.65 to 0.47, respectively.

Convergent validity: correlation between HRSA score and self-reported measure of anxiety

All of the correlation coefficients were between HRSA scores (item and total), and the anxiety subscale of the DASS-21 was significant and varied between 0.29 and 0.56 (Supplementary Table 2). The ROC test showed that an HRSA total score of 13.5 and above had a sensitivity of 72.1% and specificity of 74.4%, respectively, to screen cases of moderate-severe level of anxiety as determined by the score of the anxiety subscale of the DASS-21. The AUC was 0.78 (95% CI 0.73–0.83, P < 0.001) (Supplementary Fig. 3).

Discussion

To the best of our knowledge, this is the first study to employ an appropriate Rasch rating scale approach for IRT-based psychometric measures, including the ICC and Wright map; as well as the first to record statistical evidence for both structural- and item-level measurement invariance, for the one-factor structure of the HRSA across gender groups. It is also the first comprehensive psychometric study of the Hamilton Rating Scale for Anxiety on a sample of collegiate young adults in an African country, and one of very few studies on the factor analysis of the HRSA that verified the assumptions of various sample size adequacy measures, including the KMO test, Bartlett’s test of sphericity and inter-item correlation matrix. In brief, the study found evidence that HRSA showed very good psychometric properties (summary in Table 8) in this sample of university students in Ethiopia.

Table 8 Summary of psychometric tests performed in the university students

CFA, confirmatory factor analysis; HRSA, Hamilton Rating Scale for Anxiety; DIF, differential item function; DASS-21, Depression, Anxiety and Stress Scale 21 Items; ROC, receiver operating characteristic.

CFA and structural invariance

A one-factor structure of the HRSA was deemed suitable because the comparative CFA was inconclusive, as revealed by similar fit indices for the three models. However, high interfactor correlation coefficients for the two two-factor models made them untenable because of the concerns of divergent validity of the factor constructs. High interfactor correlation coefficients (0.85 and above) violate divergent validity requirements for the different latent constructs.^{Reference Manzar, BaHammam, Hameed, Spence, Pandi-Perumal and Moscovitch27,Reference Manzar, Albougami, Hassen, Sikkandar, Pandi-Perumal and Bahammam40} Moreover, the parsimony adjusted index of the PNFI was slightly higher for one-factor model. Similarly, a study conducted among 725 adults with or without symptoms of depression reported an acceptable one-factor model of HRSA that explained 34.6 and 38.5% of variance across groups.^{Reference Rodriguez-Seijas, Thompson, Diehl and Zimmerman29} The study further concluded that the anxious-distress specifier evaluates a unidimensional construct with close associations to both the psychological and somatic manifestations of anxiety.^{Reference Rodriguez-Seijas, Thompson, Diehl and Zimmerman29} This indicates that HRSA measures both physical and psychological manifestations as a unidimensional construct. It is difficult to establish invariance, but given the outcome of the multi-group CFA across genders in this study, it is unlikely that responses vary by gender. This further supports the unidimensionality of the HRSA.^{Reference Manzar, Albougami, Hassen, Sikkandar, Pandi-Perumal and Bahammam40} The one-factor structure was found to show invariance at four levels of measurements, i.e. configural, metric, scalar and strict across gender groups. Few studies reported structural invariance across gender for HRSA, although it has been in wide clinical and research use since its development some six decades ago. There is much disparity in the findings about the structural validity of the HRSA, with studies reporting validity of one-, two- and three-factor models in different demographics such as adolescents, non-clinical samples, people living with Parkinson’s disease and patients visiting psychiatry clinics.^{Reference Maier, Buller, Philipp and Heuser15,Reference Matza, Morlock, Sexton, Malley and Feltner16,Reference Clark and Donovan23,Reference Rodriguez-Seijas, Thompson, Diehl and Zimmerman29} The disparity in findings of previous studies on the dimensionality of the HRSA may be partly explained by the non-reporting of structural invariance measures.^{Reference Maier, Buller, Philipp and Heuser15,Reference Matza, Morlock, Sexton, Malley and Feltner16,Reference Clark and Donovan23,Reference Rodriguez-Seijas, Thompson, Diehl and Zimmerman29}

For a questionnaire to be measurement invariant, it must measure identical constructs with the same structure across all groups. Using a multi-group CFA, our procedure for testing the measurement invariance of the 14-item questionnaire included three sequential steps requiring increasingly stringent equality constraints on between-group model parameters. This is the first study to demonstrate measurement invariance for the one-factor solution of the HRSA across three levels, i.e. configural, metric and scalar, across gender groups. The results of these tests provided additional support for the one-factor model of the HRSA in the study population.

Item evaluation with classical theory

Results indicated no significant deviation from the univariate distribution for items, factors and total scores of the HRSA; this suggests that the score distribution followed a pattern that is typical for a general population.^{Reference Nguyen, Han, Kim and Chan21} This lends credibility to the study’s findings as a whole. In addition, the absence of a ceiling/floor effect for the HRSA total score suggests that even at extreme scores, HRSA total score can differentiate between groups. Although all of the HRSA item scores had floor effects, this may be attributable to the research population’s non-clinical makeup – young people who attend universities. Similarly, a clinical construct for assessing insomnia severity showed floor effects for item scores, but not for the construct level measures.^{Reference Kim37}

Convergent validity: correlation between HRSA score and self-reported measure of anxiety

A significant positive correlation with the self-reported measure of anxiety indicated the convergent validity of the HRSA scores. However, all of the correlations between HRSA item scores and the anxiety subscale of the DASS-21 were weak, and the correlation between the HRSA total score and the anxiety subscale of the DASS-21 was moderate and positive.^{Reference Lovibond and Lovibond25} Similarly, HRSA scores were shown to have low-moderate correlations with self-reported measures of anxiety and fear among American adolescents.^{Reference Lovibond and Lovibond25} Furthermore, in people living with Parkinson’s disease, a modest level of convergent validity was found for the HRSA score.^{Reference Maier, Buller, Philipp and Heuser15} Our results concerning convergent validity are similar to previous studies,^{Reference Clark and Donovan23,Reference Maier, Buller, Philipp and Heuser15} which may suggest a further modification in scale items to improve convergent validity. However, it may be important to highlight two important issues that might explain the reasons for this modest correlation. First, HRSA comprises items that assess anxiety symptoms across many physiological systems, unlike other measures that usually consider only psychological symptoms. Second, HRSA is an expert-administered tool, whereas the DASS-21 measures used for convergent validity assessment by previous studies are mostly self-reported measures. Tool item score assignment is done by experts.^{Reference Hamilton13,Reference Maier, Buller, Philipp and Heuser15,Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17,Reference Clark and Donovan23} Therefore, it is pragmatic to expect that some of the variances may not be accounted for when correlating an expert-level evaluation with self-reported measures. Future studies may better explore the convergent and divergent validity of HRSA by using similar measures administered by experts/health professionals. Furthermore, the convergent validity of the HRSA with respect to the anxiety subscale of DASS-21 was evidenced by the adequate level of sensitivity, specificity and AUC.

HRSA item analysis: Rasch rating scale model parameters

It would be pragmatic to mention here that a direct comparison with previous studies is not possible because, to the best of our knowledge, there are no reports of item difficulty level or MnSq infit/outfit of HRSA scores based on the rating scale model. The item difficulty analysis in this study showed that the easier items were related to items 3 (fears), 6 (depressed mood), 2 (tension) and 4 (insomnia), whereas the more challenging items were related to items 10 (respiratory symptoms), 12 (genitourinary symptoms), 13 (autonomic symptoms) and 8 (somatic sensory symptoms). Therefore, there appears to be a trend wherein items with higher difficulty levels were taking appraisal of somatic symptoms, whereas items assessing psychic anxiety had lower difficulty levels.^{Reference Hamilton13} Forjaz et al found that the Rasch analysis of HRSA in Parkinson’s disease showed that the three most difficult items were item 9 (cardiovascular symptoms), item 14 (behaviour at interview) and item 10 (respiratory symptoms); easier items were item 1 (anxious mood), item 8 (somatic sensory) and item 2 (tension).^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} As evident, both studies have similar findings for item 10 (respiratory symptoms) and item 2 (tension). However, most of the trends regarding difficulty level were not similar in both studies.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} This may be related to the differences in the statistical approach and sample characteristics: Forjaz et al used Rasch analysis in patients with Parkinson’s disease, whereas we implemented a Rating scale model in university-attending young adults^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} who were enrolled in health science courses. The findings of this study do support using the HRSA tool practically in university students in Ethiopia. Future research may further explore the culture specific development and adaptation, and possibly help establish an evidence-based clinical application in the Ethiopian population in general.

The outfit statistic is sensitive to unexpected observations by person or item, whereas the infit statistic is sensitive to unexpected patterns where residuals are close to estimated individual abilities.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} In general, MnSq fit indices in the range of 0.5–1.5 imply that the measurement done by item scores is productive.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} Greater values indicate underfit between the items and the model, whereas lower values indicate overfit. An MnSq of 0.6–1.4 represents the optimal range for rating scale surveys.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} In this study, the range was 0.78–1.18, indicating that HRSA items are not ideal for rating scale surveys or clinical observation.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} Most of the HRSA items are overloaded and sometimes take an appraisal of as many as 11 signs and symptoms, as in the case of item 11 (gastrointestinal symptoms). Therefore, our findings suggest that clinical implementation of the HRSA, even by experts, may benefit from attempts to simplify the HRSA items.

The response-level thresholds were ordered, but the difference between τi3, and τi4 were narrow for all items. This was also evident in the near overlap in category 3 and category 4 response slopes in the ICCs. The statistical consideration from these two findings, i.e. a narrowed gap in τi3 and τi4, and near overlap in category 3 and category 4 response slopes, suggest that category 3 and category 4 responses may be fused. However, a degree of caution may be suggested because HRSA items are highly loaded, and a further decrease in the number of response categories may result in ambiguities in response allocation by the interviewers.

A visual inspection of the Wright map (Fig. 1) shows that the spread of the person’s ability shown on the left panel and the item difficulty level on the right panel were asymmetrical. The item difficulty levels of the HRSA items mostly correspond with persons with higher ability levels. The results are in alignment with the expert interviewer-administered nature of the HRSA. This implies that more efforts are needed to possibly simplify and decrease the difficulty levels of some of the HRSA items. This may help attain a comparative width of ability distribution with the item difficulty levels.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17} Finally, in the present study, item-level invariance was noted for all the HRSA items across gender groups. Forjaz et al found evidence for item-level invariance for only two items of the HRSA Rasch analysis in Parkinson’s disease.^{Reference Forjaz, Martinez-Martin, Dujardin, Marsh, Richard and Starkstein17}

Limitations

It is important to highlight some limitations that may aid comprehension of the generalisation and implications of the findings. The results may not be generalisable to the general population because the student population differs from those of the general population. The study sample had more male students in a close age range, mostly from the health sciences discipline; some of these courses are more challenging. Therefore, a wider generalisability may need further exploration with more representative samples. To maximise participation, convenient sampling was used; however, such a sampling procedure may theoretically limit generalisability. However, it is appropriate to note that the study’s sample size was substantial. In addition, the absence of significant skewness/kurtosis issues reinforces the representativeness of the study sample. Moreover, robust measures of psychometric validity testing using both classical theory and Rasch rating theory parameters were utilised in this study. Additionally, future research may investigate temporal measurement invariance, as well as invariance across other sociodemographic parameters.

In conclusion, classical and IRT analysis measures demonstrated that the HRSA possessed robust psychometric validity. Nonetheless, there is evidence that future research efforts could increase the practical use of the HRSA. Such efforts may investigate (a) simplifying HRSA items, (b) exploring response levels further, (c) establishing item-level invariance across different demographic characteristics and (d) attempting to increase symmetry in a person’s ability and item difficulty levels.

Supplementary material

The supplementary material is available online at https://doi.org/10.1192/bjo.2025.10055

Data availability

The raw data supporting the conclusions of this article are submitted as Supplementary Material.

Author contributions

M.D.M. conceptualised the study, curated the data and conducted the formal analysis. F.Z.K. wrote the original draft of the manuscript and reviewed and edited the manuscript. M.S. conceptualised the study, curated the data, wrote the original draft of the manuscript and reviewed and edited the manuscript. D.N. and H.A.A. conceptualised the study, curated the data and reviewed and edited the manuscript. S.R.P.-P., A.H.P. and A.S.B. conceptualised the study and reviewed and edited the manuscript. All authors have approved the final draft.

Funding

The authors extend their appreciation to the Deanship of Scientific Research at Majmaah University for funding this work under project number R-2025-1772.

Declaration of interest

None.

References

World Health Organization. COVID-19 Pandemic Triggers 25% Increase in Prevalence of Anxiety and Depression Worldwide. WHO, 2022 (https://www.who.int/news/item/02-03-2022-covid-19-pandemic-triggers-25-increase-in-prevalence-of-anxiety-and-depression-worldwide).Google Scholar

Hirschfeld, RM. The comorbidity of major depression and anxiety disorders: recognition and management in primary care. Prim Care Companion J Clin Psychiatry 2001; 3: 244–54.Google Scholar PubMed

Kessler, RC, Stang, PE, Wittchen, HU, Ustun, TB, Roy-Burne, PP, Walters, EE. Lifetime panic-depression comorbidity in the National Comorbidity Survey. Arch Gen Psychiatry 1998; 55: 801–8.10.1001/archpsyc.55.9.801CrossRef Google Scholar PubMed

McKnight, PE, Monfort, SS, Kashdan, TB, Blalock, DV, Calton, JM. Anxiety symptoms and functional impairment: a systematic review of the correlation between the two measures. Clin Psychol Rev 2016; 45: 115–30.10.1016/j.cpr.2015.10.005CrossRef Google Scholar PubMed

Mahmud, S, Hossain, S, Muyeed, A, Islam, MM, Mohsin, M. The global prevalence of depression, anxiety, stress, and, insomnia and its changes among health professionals during COVID-19 pandemic: a rapid systematic review and meta-analysis. Heliyon 2021; 7: e07393.10.1016/j.heliyon.2021.e07393CrossRef Google Scholar PubMed

Nochaiwong, S, Ruengorn, C, Thavorn, K, Hutton, B, Awiphan, R, Phosuya, C, et al. Global prevalence of mental health issues among the general population during the coronavirus disease-2019 pandemic: a systematic review and meta-analysis. Sci Rep 2021; 11: 10173.10.1038/s41598-021-89700-8CrossRef Google Scholar PubMed

Covid-Mental Disorders Collaborators. Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the COVID-19 pandemic. Lancet 2021; 398: 1700–12.10.1016/S0140-6736(21)02143-7CrossRef Google Scholar

Czyz, EK, Horwitz, AG, Eisenberg, D, Kramer, A, King, CA. Self-reported barriers to professional help seeking among college students at elevated risk for suicide. J Am Coll Health 2013; 61: 398–406.10.1080/07448481.2013.820731CrossRef Google Scholar PubMed

Park, SY, Andalibi, N, Zou, Y, Ambulkar, S, Huh-Yoo, J. Understanding students’ mental well-being challenges on a university campus: interview study. JMIR Form Res 2020; 4: e15962.10.2196/15962CrossRef Google Scholar PubMed

Eisenberg, D, Golberstein, E, Gollust, SE. Help-seeking and access to mental health care in a university student population. Med Care 2007; 45: 594–601.CrossRef Google Scholar

Givens, JL, Tjia, J. Depressed medical students’ use of mental health services and barriers to use. Acad Med 2002; 77: 918–21.10.1097/00001888-200209000-00024CrossRef Google Scholar PubMed

Kessler, D, Lloyd, K, Lewis, G, Gray, DP. Cross sectional study of symptom attribution and recognition of depression and anxiety in primary care. BMJ 1999; 318: 436–9.10.1136/bmj.318.7181.436CrossRef Google Scholar PubMed

Hamilton, M. The assessment of anxiety states by rating. Br J Med Psychol 1959; 32: 50–5.10.1111/j.2044-8341.1959.tb00467.xCrossRef Google Scholar PubMed

Thompson, E. Hamilton rating scale for anxiety (HAM-A). Occup Med (Lond) 2015; 65: 601.CrossRef Google Scholar PubMed

Maier, W, Buller, R, Philipp, M, Heuser, I. The Hamilton Anxiety Scale: reliability, validity and sensitivity to change in anxiety and depressive disorders. J Affect Disord 1988; 14: 61–8.10.1016/0165-0327(88)90072-9CrossRef Google Scholar PubMed

Matza, LS, Morlock, R, Sexton, C, Malley, K, Feltner, D. Identifying HAM-A cutoffs for mild, moderate, and severe generalized anxiety disorder. Int J Methods Psychiatr Res 2010; 19: 223–32.10.1002/mpr.323CrossRef Google Scholar PubMed

Forjaz, MJ, Martinez-Martin, P, Dujardin, K, Marsh, L, Richard, IH, Starkstein, SE, et al. Rasch analysis of anxiety scales in Parkinson’s disease. J Psychosom Res 2013; 74: 414–9.10.1016/j.jpsychores.2013.02.009CrossRef Google Scholar PubMed

Baker, FB, Kim, S. The Basics of Item Response Theory Using R. Springer, 2017.10.1007/978-3-319-54205-8CrossRef Google Scholar

Dai, S, Vo, TT, Kehinde, OJ, He, H, Xue, Y, Demir, C, et al. Performance of polytomous IRT models with rating scale data: an investigation over sample size, instrument length, and missing data. Front Educ 2021; 6: 721963.CrossRef Google Scholar

De Ayala, RJ. The Theory and Practice of Item Response Theory. Guilford Publications, 2013.Google Scholar

Nguyen, TH, Han, HR, Kim, MT, Chan, KS. An introduction to item response theory for patient-reported outcome measurement. Patient 2014; 7: 23–35.10.1007/s40271-013-0041-0CrossRef Google Scholar PubMed

Samuel, DB, Bucher, MA. Translating item response theory findings for clinical practice. J Pers Assess 2019; 101: 452–3.CrossRef Google Scholar PubMed

Clark, DB, Donovan, JE. Reliability and validity of the Hamilton Anxiety Rating Scale in an adolescent sample. J Am Acad Child Adolesc Psychiatry 1994; 33: 354–60.10.1097/00004583-199403000-00009CrossRef Google Scholar

Hallit, S, Haddad, C, Hallit, R, Akel, M, Obeid, S, Haddad, G, et al. Validation of the Hamilton anxiety rating scale and state trait anxiety inventory A and B in Arabic among the Lebanese population. Clin Epidemiol Global Health 2020; 8: 1104–9.10.1016/j.cegh.2020.03.028CrossRef Google Scholar

Lovibond, SH, Lovibond, PF. Manual for the Depression Anxiety Stress Scales 2nd ed. Psychology Foundation, 1995.Google Scholar

Henry, JD, Crawford, JR. The short-form version of the Depression Anxiety Stress Scales (DASS-21): construct validity and normative data in a large non-clinical sample. Br J Clin Psychol 2005; 44: 227–39.10.1348/014466505X29657CrossRef Google Scholar

Manzar, MD, BaHammam, AS, Hameed, UA, Spence, DW, Pandi-Perumal, SR, Moscovitch, A, et al. Dimensionality of the Pittsburgh Sleep Quality Index: a systematic review. Health Qual Life Outcomes 2018; 16: 89.10.1186/s12955-018-0915-xCrossRef Google Scholar PubMed

Manzar, MD, Jahrami, HA, Bahammam, AS. Structural validity of the Insomnia Severity Index: a systematic review and meta-analysis. Sleep Med Rev 2021; 60: 101531.10.1016/j.smrv.2021.101531CrossRef Google Scholar PubMed

Rodriguez-Seijas, C, Thompson, JS, Diehl, JM, Zimmerman, M. A comparison of the dimensionality of the Hamilton Rating Scale for anxiety and the DSM-5 Anxious-Distress Specifier Interview. Psychiatry Res 2020; 284: 112788.CrossRef Google Scholar PubMed

Ullman, JB. Structural Equation Modeling 4th ed. Allyn & Bacon, 2001.Google Scholar

Chen, FF. Sensitivity of goodness of fit indexes to lack of measurement invariance. Struct Equ Model 2007; 14: 464–504.10.1080/10705510701301834CrossRef Google Scholar

Sözer, E, Kahraman, N. Investigation of psychometric properties of likert items with same categories using polytomous item response theory models. J Meas Eval Educ Psychol 2021; 12: 129–46.Google Scholar

Tutz, G. Ordinal regression: a review and a taxonomy of models. WIREs Comput Stat 2022; 14: e1545.10.1002/wics.1545CrossRef Google Scholar

Hladká, A, Martinková, P. difNLR: generalized logistic regression models for DIF and DDF detection. R J 2020; 12: 300–23.10.32614/RJ-2020-014CrossRef Google Scholar

Lovibond, SH. Manual for the Depression Anxiety Stress Scales. Sydney Psychology Foundation, 1995.Google Scholar

Comrey, AL, Lee, HB. A First Course in Factor Analysis 2nd ed. Lawrence Erlbaum Associates, 1992.Google Scholar

Kim, HY. Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restor Dent Endod 2013; 38: 52–4.10.5395/rde.2013.38.1.52CrossRef Google Scholar PubMed

Lim, CR, Harris, K, Dawson, J, Beard, DJ, Fitzpatrick, R, Price, AJ. Floor and ceiling effects in the OHS: an analysis of the NHS PROMs data set. BMJ Open 2015; 5: e007765.10.1136/bmjopen-2015-007765CrossRef Google Scholar PubMed

Manzar, MD, Albougami, A, Salahuddin, M, Sony, P, Spence, DW, Pandi-Perumal, SR. The Mizan meta-memory and meta-concentration scale for students (MMSS): a test of its psychometric validity in a sample of university students. BMC Psychol 2018; 6: 59.10.1186/s40359-018-0275-7CrossRef Google Scholar

Manzar, MD, Albougami, A, Hassen, HY, Sikkandar, MY, Pandi-Perumal, SR, Bahammam, AS. Psychometric validation of the Athens insomnia scale among nurses: a robust approach using both classical theory and rating scale model parameters. Nat Sci Sleep 2022; 14: 725–39.10.2147/NSS.S325220CrossRef Google Scholar PubMed