Development of primary care assessment tool–adult version in Tibet: implication for low- and middle-income countries

Aim: To conduct advanced psychometric analysis of Primary Care Assessment Tool (PCAT) in Tibet and identify avenues for metric performance improvement. Background: Measuring progress toward high-performing primary health care can contribute to the achievement of sustainable development goals. The adult version of PCAT is an instrument for measuring patient experience, with key elements of primary care. It has been extensively used and validated internationally. However, only little information is available regarding its psychometric properties obtained based on advanced analysis. Methods: We used data collected from 1386 primary care users in two prefectures in Tibet. First, iterative confirmatory factor analysis examined the fit of the primary care construct in the original tool. Then item response theory analysis evaluated how well the questions and individual response options perform at different levels of patient experience. Finally, multiple logistic regression modeling examined the predicative validity of primary care domains against patient satisfaction. Findings: A best final structure for the PCAT-Tibetan includes 7 domains and 27 items. Confirmatory factor analysis suggests good fit for a unidimensional model for items within each domain but doesn’t support a unidimensional model for the entire instrument with all domains. Non-parametric and parametric item response theory analysis models show that for most items, the favorable response option (4 = definitely) is overwhelmingly endorsed, the discriminability parameter is over 1, and the difficulty parameters are all negative, suggesting that the items are most sensitive and specific for patients with poor primary care experience. Ongoing care is the strongest predictor of patient satisfaction. These findings suggest the need for some principles in adapting the tool to different health system contexts, more items measuring excellent primary care experience, and update of the four-point response options.


Introduction
The contribution of primary care to health system performance has been widely examined nationally and internationally (Starfield et al., 2005;Kringos et al., 2013). A systematic review of 36 studies in low-and middle-income countries showed that strong primary care leads to improved and more equitable health outcomes, especially in infants and children (Macinko et al., 2009). Another study of 31 high-income countries in Europe showed that strong primary care system is associated with improved population health outcomes, reduced socioeconomic inequality in health outcomes, fewer unnecessary hospitalizations, and slower increases in the overall health-care expenditures (Kringos et al., 2013). Recognizing the value and effectiveness of primary care, many countries including China have identified primary care transformation as a major component of health reform.
Measuring progress towards high-performing primary health care can contribute to the achievement of sustainable development goals. Measuring the quality of primary care from patients' perspective can provide actionable and comprehensive performance information to guide primary care reform efforts. Patients are the best evaluators of key aspects of their health care, including accessibility, continuity, interpersonal communication, respectfulness, familycentered care, whole-person care, and cultural sensitivity (Haggerty et al., 2007). To strengthen people-centered integrated primary care system building, there is increasing interests in patient experience measurement globally (Kruk et al., 2017). Many instruments have been developed, such as Primary Care Assessment Tool [(PCAT); Shi et al., 2001].
The PCAT developed by Barbara Starfield has been extensively used internationally (Shi et al., 2001). Inspired by the World Health Organization definition of primary care, the PCAT was originally developed to measure the extent to which primary care is achieved from user perspective in the United States. Seven primary care domains were included in the original English PCAT-adult version: first contact utilization and access, ongoing care, coordination with specialists, comprehensiveness of service available and provided, family centeredness, community orientation, and cultural competency (Shi et al., 2001).
Some of these issues were also found in the Tibetan version of PCAT that was translated from one (Yang et al., 2013) of the three available Chinese versions (Yang et al., 2013;Wei et al., 2015;Wang et al., 2015a). The Chinese version developed by Yang et al. (2013) was adapted from adult version of PCAT longer version. The initial PCAT-Tibet included six domains of first contact, continuity, coordination, comprehensiveness, family-centeredness, and community orientation. The Tibet Autonomous Region is located in southwestern China, at an average elevation of 4000 m and an area about one eighth of China's area, with a population of 3 million scattered over this large region. Since the national health reform initiated in China in 2009 (Yip et al., 2012) township health centers in Tibet were identified as the main primary care provider, and they received strong support from regional and national governments, including more health staff and salary increases, need-based training, and capital investments in infrastructure and necessary equipment. Actionable information about the effectiveness of these programs was needed for directing the resource allocation and to address the specific weakness in primary care service delivery.
The initial validation of PCAT-Tibetan suggested departures from the original factor structure and confirmed the skewed responses seen in other studies (Wang et al., 2014). For example, three original PCAT domains (first contact, continuity, and coordination) were split across five domains in the PCAT-Tibetan; and the original comprehensiveness domain was represented by two domains (Wang et al., 2014). These issues in psychometric properties in common with results from validation studies in other PCAT versions suggest a need for advanced psychometric analysis to examine the appropriateness of domains and items in relative to the original PCAT. In order to further examine the construct validity and reliability of PCAT-Tibetan version, this article reports on the results of confirmatory factor analysis to test the congruence between the theoretical primary care domains and the empirical results in Tibet, and on item response theory analysis to examine item performance and the appropriateness of item response options.

Methods
This was a further analysis of the initial PCAT-Tibetan validation study that was conducted among 1386 patients who visited their primary care providers in three different types of health facilities in two prefectures in 2013. This survey was administered through face-to-face interview. The detailed information about sampling and data collection can be found in our previous publication (Wang et al., 2014).
Maximum likelihood (ML) imputation method was used to replace the missing values; match age, sex, education, self-rated health status; 295 respondents were excluded because of missing values for the matching variables, leaving 1091 respondents in subsequent analyses. Those excluded were more likely to be female and less educated. To examine the robustness of our conclusion, we excluded all respondents who had at least one missing value on any item (listwise deletion) and repeated all data analyses (n = 729), which did not alter any of the general conclusions. Values were also imputed for the 'not sure/don't remember' response option as an alternative to attributing the pre-set value of 2.5. In general, estimates produced with listwise deletion are less efficient than other methods of handling missing data. Therefore, we reported only results using database with ML imputation (Enders, 2001).
In user-evaluation research, it is common to treat report and rating values as quasi-cardinal (DeVellis, 1991). The items measuring patient experience in this study were strictly ordinal level, so we treated them as interval level, which was consistent with previous validation studies of original PCAT version in the United States (Shi et al., 2001) and the PCAT versions in other languages (Lee et al., 2009;Haggerty et al., 2011b;Yang et al., 2013;Aoki et al., 2016;Mei et al., 2016). This approach was also used by the validation study of World Health Organization's health system responsiveness survey (Valentine et al., 2007). Therefore, the fourpoint Likert response scale in PCAT-Tibetan version was treated as continuous variable.
First, the inter-item correlation and exploratory factor analysis (principle component analysis) by domain was conducted to flag items with low correlations (Pearson <0.20) or low factor loading (<0.30) for potential deletion. We repeated the exploratory factor analysis after deleting each item with low factor loading until all retained items had a factor loading of at least 0.30. The results of exploratory factor analysis guided subsequent confirmatory factor analysis.
Then confirmatory factor analysis using structural equation modeling was used to test the goodness of fit of the items to the theoretical domains of the original PCAT. Subsequently, confirmatory factor models were adjusted iteratively based on fit and judgment until the goodness-of-fit statistics were optimized. We also assessed the entire instrument including all domains in our data analysis. First, we included all domains in confirmatory factor analysis. Then we only included the four core domains (first contact, continuity, coordination, and comprehensiveness) in confirmatory factor analysis. The following goodness-of-fit statistics were used: normed fit index (NFI) ≥0.9 indicating good fit, comparative fit index (CFI) ≥0.9 indicating good fit, standardized root mean square residual (SRMR) ≤0.05 indicating acceptable fit.
For each domain confirmed as being unidimensional, we examined the distribution of response options of individual items within each domain based on domain performance using nonparametric item response theory analysis. However, we cannot get the exact information of each item performance. To be more precise, two parameter estimates (discriminability and difficulty) were subsequently generated using Samejima's (1969) grade response model.
We estimated the correlations between domains to examine how domains were related and whether they demonstrated distinctiveness.

Wenhua Wang and Jeannie Haggerty
Finally, we used logistic regression modeling to examine how primary care domains were associated with patients' satisfaction with service attitude (one item) and perceived technical quality (one item) of their primary care provider. The five-point Likert response scale for the two items was dichotomized to indicate satisfaction (very satisfied or satisfied) versus dissatisfaction. All domains were put in the same model, and age, sex, education, and self-rated health status were included in the model as covariates. Education level was categorized into three groups: illiterate, primary school, junior high school and above. Self-rated health status was measured by a visual analogue scale with end points of 0 and 100, where 0 corresponds to 'the worst health status', and 100 corresponds to 'the best health'.
Descriptive, correlation, and exploratory factor analysis were conducted with SPSS 22.0. ML imputation and confirmatory factor analysis were conducted with LISREL9.1. Nonparametric and parametric item response theory analyses was conducted with SPSS 22.0 and MULTILOG 7.03. Table 1 summarizes the response distribution and descriptive statistics of each item in initial PCAT-Tibetan. The percentage of true missing values for each item ranged from 0.9% to 4.4%. Most items are slightly negatively skewed, and with over 50% respondents reporting the most favorable response option '4 = Definitely' on 22 of 36 items. The percentage of respondents choosing the response option of 'not sure/don't remember' is higher, particularly for items in first contact access, coordination, and community orientation, suggesting that patients may not be the best information source for these domains or may not have direct experience with all aspects elicited.

Results
For first contact access, two of the six items were deleted because of the unacceptable goodness-of-fit statistics (FCA5 and FCA6); and two items fit better in first contact utilization (FCA1 and FCA4), leaving only two items in first contact access and four in first contact utilization. In the eight-item ongoing care, two items were deleted because of unacceptable goodness-of-fit statistics (OC1 and OC7); and a further item (OC8) was removed to improve the goodness-of-fit statistics, leaving a five-item ongoing care scale. The eight-item comprehensiveness subscale showed unacceptable goodness-of-fit statistics on a single factor (NFI = 0.27, CFI = 0.27, SMRM = 0.19); and after removing four of the poorest fitting items, a final four-item comprehensiveness subscale demonstrated good fit (NFI = 0.98, CFI = 0.99, SMRM = 0.02). There is no change in items in other domains. The detailed iterative process to finalize the original factor structure was reported in Supplemental Table S1. Table 2 shows the goodness-of-fit statistics of confirmatory factor analysis for each domain in the final structural model. The fit statistics of first contact utilization and coordination do demonstrate a moderate fit. For ongoing care, comprehensiveness, family centeredness, and community orientation, all the four confirmatory factor analysis models demonstrate a good fit. The empirical results suggest a best final structure for the PCAT-Tibetan of 7 domains and 27 items instead of 36. At least three items are needed for confirmatory factor analysis. First contact access only includes two items, so we did not have confirmatory factor analysis results.
However, either the model including all domains (NFI = 0.46, CFI = 0.46, SMRM = 0.23) or the model including only the four core domains (NFI = 0.55, CFI = 0.56, SMRM = 0.14) shows unacceptable goodness-of-fit statistics on a single factor, which suggest that the domains included in original PCAT may not measure a common single construct in Tibet context, and it is not appropriate to report a total score.
The mean of each domain score is lower than the median and negatively skewed, indicating most of patients reporting favorable response answers. The first contact utilization score is the highest (3.66±0.48), while the first contact access score is the lowest (2.94±0.95). Cronbach α is over 0.70, indicating good internal consistency of items for all domains except first contact access (0.66) and ongoing care (0.66).
Non-parametric item response theory graphs were modeled on each unidimensional domain to provide further insight into item performance and reliability. In most items, the option characteristic curve for the response option '2 = Probably not' is overshadowed by other options (see example in Figure 1a), indicating that nowhere along the primary care experience continuum was this option more likely to be chosen than other options, raising the question of the appropriateness of a four-point response scale. Only a few items perform optimally, such that the probability of choosing each response option is highest in a unique zone of primary care experience continuum, reflecting clearly ordinal response behavior appropriate to the assigned value for each option. Figure 1b shows a well-performing item from community orientation. A problem common to all items is that the extreme response option '4 = definitely' covers a large area of primary care experience continuum and is most likely to be endorsed, even at below-average primary care experience level, suggesting that additional response options may be desirable.
The results from parametric item response theory analysis provide further evidence to confirm the findings from nonparametric item response theory analysis (Table 3). The discriminability parameter is over 1 for all items except one item on ongoing care, indicating that response options discriminate well between low and high levels of primary care experience. However, the difficulty parameters for almost all items are negative, indicating that positive ratings (b2, b3) are endorsed at less than average performance, reinforcing the pattern observed in Figure 1a. Likewise the information curves show that the majority of items are most informative in the negative zone of the underlying construct. Only CO1, illustrated in Figure 1b, shows that each response option corresponds to a distinct zone of the domain, including the most positive experience. Together these suggest that the items are most sensitive and specific for patients with poor primary care experience.
The Pearson correlations between the domains indicate the distinctiveness of each domain. Correlation coefficients between domains range from 0.23 to 0.61 and are lower than Cronbach α of each domain. Coordination is most highly correlated with family centeredness (0.61) and ongoing care (0.60). First contact utilization is also highly correlated with family centeredness (0.54) and ongoing care (0.55). First contact access, comprehensiveness, and community orientation have lower correlation with other domains.
Finally, Table 4) shows the extent to which a unit increase in each domain score increases the odds of patient satisfaction. Although most patients are satisfied with the service attitude (82.6%) and the technical quality (80.2%) of their primary care provider, a higher PCAT score generally is associated with higher satisfaction. For example, every unit increase in ongoing care score increases the likelihood of being satisfied with service attitude by 2.65 times.

Discussion
This advanced psychometric analysis of the PCAT-Tibetan versions provides further insight into some of the problematic psychometric properties found in the initial validation analysis, and it suggests avenues to improve the metric performance of the tool. Despite the metric problems, the PCAT-Tibetan domains of first-contact, ongoing care, and coordination with specialists are associated with an increased likelihood of patient satisfaction. This illustrates the potential of the PCAT and underlines the importance of improving the tool to address some of the metric problems. Some of the problems, such as the skewness of item response distribution found in many items, are shared with the original PCAT and other versions; and our results suggest some solutions that could improve performance. Others may be specific to the PCAT-Tibetan versionsuch as the non-optimal resolution of items relating to First Contact constructssuggest the need for some principles in adapting the tool to different health system contexts.
The widespread use of the PCAT to evaluate primary care in many countries provides an opportunity to compare primary care across different contexts and to support a worldwide movement to improve primary care. Our results show not only the association of the PCAT-Tibetan with patient satisfaction but also other analysis that demonstrated the capacity to distinguish between health-care organizations in Tibet and other regions in China McCollum et al., 2014;Wang et al., 2015b;Hu et al., 2016;Feng et al., 2017). Other studies have shown the capacity of different versions of the PCAT to differentiate between delivery models. For instance, in the United States, community health centers have been showed to provide better quality primary care than health maintenance organizations, especially in continuity, coordination, and comprehensiveness (Shi et al., 2003). Patients   a n = 1091. Domains were scored by averaging the value of individual items. Item response scale ranged from 1 to 4. Higher score meant better patient experience. b IRT parameter estimates were generated using Samejima's (1969) grade response model and using IRT software MULTILOG 7.03. a = discriminability; value >1 indicating minimal discriminability. b = difficulty; highest probability of endorsing. mainly receiving care from private general practitioners in Hong Kong reported better primary care experiences than those mainly receiving care from public general outpatient clinics, especially in accessibility and interpersonal relationships (Wong et al., 2010). In South Korea, among four types of primary care clinics staffed by family physicians, health cooperative clinics displayed the best primary care performance, while public health center clinics showed the worst performance (Sung et al., 2010). The negative skewness of item response distribution in many items was noted in the original validation of the long PCAT version (Shi et al., 2001) and has been found in most other validation studies (Macinko et al., 2007;Lee et al., 2009;Berra et al., 2011;Haggerty et al., 2011b;Berra et al., 2013;Yang et al., 2013;Lağarlıa et al., 2014;Aoki et al., 2016;Mei et al., 2016) but may be more extreme in contexts such as China where low literacy requires face-to-face administration (Haggerty et al., 2011a).
The item response theory analysis shows that the PCAT is most reliable in identifying negative experience of care, but the low information yield in the above average range of the domains means that it will have limited sensitivity for detecting improvements in care. Some researchers suggested that new response categories should be developed to minimize the favored response (Yang et al., 2013), but this would be challenging using the current response scale as it is difficult to imagine an intermediate category between 'probably yes' and 'definitely yes'. The acceptable discriminability parameters for most items reflect adequate capacity to discriminate between poor and average performance on domains even though the four-point response values are not optimal. This suggests that some form of dichotomous scoring could be applied to each item to give greater weight to the more informative negative responses rather than averaging across all response options. This approach is used in a Brazil study (Macinko et al., 2007) and also in the Consumer Assessment of Healthcare Providers and Systems (CAHPS) (Dyer et al., 2012). Finally, to increase discriminability and improve the potential for sensitivity to improvement, it would be good to develop more items to measure excellent primary care experience. The community orientation item about home visits (CO1) is an example of an item that discriminates clearly between average and good primary care.
Another metric issue results from offering the 'not sure/don't remember' option. The high rate of endorsing this response option is common in many language versions, especially in Asian countries including Korean, Japanese and Chinese. (Lee et al., 2009;Yang et al., 2013;Wang et al., 2014;Aoki et al., 2016;Mei et al., 2016). Although respondents appreciate having such response option (Haggerty et al., 2011a), its management is analytically challenging. Although the PCAT scoring manual suggests attributing a value of 2.5, this is not supported for all items in a previous response theory analysis of English and French versions of the PCAT (Haggerty et al., 2011b). A more usual practice would be to treat the values as missing and attribute values using more ML methods. However, the approach used (excluding these as missing values) will impact on the factor analysis and the subsequent conclusions about the validity of PCAT version. Again, this seems to call for further work and international collaborations on response options that accord with patient experience in different contexts.
Collaborative international work could also address principals for measuring and/or comparing domains that affected health system specificities, for instance, in the access domain. First contact in the PCAT-Tibetan fails to meet optimal psychometric standards of construct validity and internal consistency. Similar problems were also found in other PCAT validation studies in China (Yang et al., 2013;Mei et al., 2016) and other countries in Asia (Lee et al., 2009;Lağarlıa et al., 2014;Aoki et al., 2016). Two studies of the PCAT-Chinese version found that Cronbach α was only 0.38 (Mei et al., 2016) and 0.48 (Yang et al., 2013) for first contact utilization; similar values were found in the PCAT-Turkish (Lağarlıa et al., 2014). The validation of PCAT-Korean concluded that first contact could not be assessed using a traditional scale with multiple correlated items, and it was treated to be a composite domain consisting of five independent single-item subscales (Lee et al., 2009). The exploratory factor analysis of PCAT-Japanese collapsed first contact utilization and first contact access into one scale (Aoki et al., 2016). These results are not surprising, given that access reflects the fit between how services are organized and the perceived need of the population (Penchansky and Thomas, 1981;Khan and Bhardwaj, 1994). Consequently, the access dimension will be sensitive to contextual differences in how services are organized. Although these subscales perform well in studies in the North American context (Shi et al., 2001;Haggerty et al., 2011c), the first contact items in original PCAT may not be appropriate in other countries with different health system organization and patient expectations. Making appointment in advance and gatekeeping by primary care worker do not pertain to many countries. For example, geographical accessibility is a major constraint for local people to get health-care services in Tibet, but it was not addressed in the PCAT. Context-specific items should be explored based on the operational definitions of first contact. Similar issues are likely to pertain to comprehensiveness and coordination with specialists.
In contrast, the domain of ongoing care most strongly predicts patient satisfaction with primary care provider in Tibet. This is consistent with a previous systematic review showing that the most important determinants of satisfaction are the interpersonal relationships and their related aspects of care (Crow et al., 2002). Recognizing the benefit of continuity of care, China is developing a family doctor contract service model to build a long-term trust doctor-patient relationship. Under this model, residents can sign a contract with a family doctor working at a community health center and be eligible for a service contract package including basic medical care, public health, and health management service. Although countries differ in the organizational support of continuing provider-patient relationships, our study and studies in other countries point to the value of a stable long-term relationship and mutual interactive communication between patient and primary care workers (Baker et al., 2003;Paddison et al., 2015). Nonetheless, measures may be improved by accounting for cultural specificities in interpersonal communication and therapeutic relationships that impact on both patient experience and outcomes.
This study contributes to the growing international work supporting the relevance and need for valid and reliable measures of the patient experience of health care. Several lessons from PCAT-Tibetan validation study may be shared with our colleagues in low-and middle-income countries. First, the items or content of instruments from developed countries may not be appropriate in other countries. Some specific features of local primary care system may not be reflected in the translated instruments. For example, no items in PCAT could reflect the feature of geographical accessibility, which is a major aspect in Tibet. Some items in comprehensiveness domain are not appropriate in Tibet. Therefore, instead of adapting existed instrument directly, qualitative research is needed to understand the local population preference of primary care first. Second, more items to measure excellent primary care experience 8 Wenhua Wang and Jeannie Haggerty should be developed. Most existing items have adequate capacity to discriminate between poor and average performance on different primary care domains. However, items that discriminate clearly between average and good primary care are needed. Further research is required to explore the characteristics of some exemplars with good primary care experience and what the good primary care is from their narratives. Third, the choice of response categories should be careful. The current four-point response scale and its wording in PCAT may not be appropriate in some countries. Several factors could be considered when exploring the appropriate response categories, such as literacy level, response tendency, and judgment making of local population. This could be done through qualitative research. Finally, a summary score of overall primary care experience including all domains, which is often the most used metric when assessing a health system, is not supported by our analysis of PCAT-Tibetan version. However, this psychometrically validated 27-item Tibetan version of PCAT will be useful in monitoring and evaluating the performance of primary care system in Tibet in specific areas, especially in accessibility, continuity, and coordination, which are the priorities of current health reform efforts in Tibet. The health service research is underdeveloped in Tibet, and there is no instrument measuring patient experience that could be used when this study was conducted. We hope this study could bring more researchers' attention into primary care performance evaluation in Tibet. We also recognize that different policy interventions to achieve primary care functions are inter-related, but each policy has its own priorities. For example, the family doctor contract service model is being developed and expanded now to improve performance in accessibility and continuity; and the transformation of tiered health service delivery system aims to promote collaboration between different health-care providers and to improve coordination. Under this context, PCAT-Tibetan version is a potential useful instrument to evaluate the effectiveness of these policy interventions.