To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Diagnosis of cases of Mycoplasma mastitis is particularly challenging due to their unique biological characteristics, which complicate diagnosis and treatment. Hence, accurate and quick diagnostic tests for early detection of Mycoplasma mastitis are essential to initiate appropriate interventions or culling. The objective of this research is to estimate the diagnostic performance of the molecular microarray assay (MMA) against bacterial culture for the diagnosis of bovine intramammary infections (IMI) with Mycoplasma spp., using a gold standard approach and the Kappa agreement coefficient. A total of 395 quarter milk samples were collected from cows in 31 dairy herds with conventional milking systems in California, USA. Following dairy personnel practices, milk samples were collected from the lactating cows showing abnormal milk characteristics and shipped within 24 hours to the laboratory for bacterial culture and MMA examination. Milk samples with positive growth were confirmed via PCR test to eliminate misdiagnosis of Acholeplasma spp. Eighty-seven cows (22%) were positive for Mycoplasma spp. IMI and the test accuracy was 88.4%. The sensitivity of MMA was 90.8% (95% CI (Confidence Interval): 82.68–95.95), and the specificity was 87.66% (95% CI: 83.46–91.12). The positive predictive value of MMA in these herds was 67.52% (95% CI: 60.51–73.83), and the negative predictive value was 97.12% (95% CI: 94.57–98.49). Calculated Kappa coefficient was 0.70 (95% CI: 0.618–0.778). The high estimates of sensitivity and specificity of MMA suggest its usefulness as a routine and quick test for accurate diagnosis of Mycoplasma spp. IMI in dairy cows. Our findings indicate that MMA holds promise for enhancing the detection of Mycoplasma spp. and could potentially revolutionize diagnostic practices in the dairy industry and supports udder health management.
Dementia, a slowly progressive disease, is poorly diagnosed. One reason is that it is difficult to use the screening tools. The six-item cognitive impairment test (6-CIT) is brief, with six items, and has a confirmed scoring system that can easily be used by an average individual. This review aimed to analyze the predictive validity of the 6-CIT including comparisons with other tools such as the Mini-Mental State Examination (MMSE).
Methods:
Literature searches were performed on the MEDLINE, EMBASE, CINAHL, and PsycArticles using the dementia and 6-CITas keywords. The Quality Assessment of Diagnostic Accuracy Studies-2 was applied to assess the risk of bias.
Results:
Seven studies with 6,831 participants that met the selection criteria were included. The pooled sensitivity of the 6-CIT analyzed in seven studies was 0.82 (95% CI 0.73–0.89), the pooled specificity was 0.87, and the summary receiver operating characteristic (sROC) curve was 0.90 (SE = 0.04). The diagnostic performance of the 6-CIT and MMSE was compared in three studies. The pooled sensitivity of the 6-CIT was 0.85, the pooled specificity was 0.91, and the sROC curve was 0.91, whereas the MMSE values were 0.70, 0.93, and 0.68, respectively.
Conclusion:
This review presents evidence that the 6-CIT has excellent dementia screening performance and could be used as a potential alternative to the MMSE. The 6-CIT may provide an opportunity for early detection of dementia.
This chapter generalizes Van Evera’s typology of process tracing tests to a fully probabilistic context, introducing measures of anticipated test strength that will also be used in Chapter 11.
In the absence of a simple validated instrument to screen for cognitive impairment among illiterate Lebanese older adults, the aims of this study were to validate an Arabic version of the Test of Nine Images (A-TNI93) adapted by the Working Group on Dementia at Saint Joseph University: Groupe de Travail sur les Démences de l’Univesité Saint Joseph (GTD-USJ) for illiterate older Lebanese and to establish normative data.
Method:
A national population-based sample of 332 community-dwelling illiterate Lebanese aged 55 years and older was administered the A-TNI93 (GTD-USJ) scoring free and overall recall. The sample is part of a larger national sample (1342 participants) used to validate an Arabic version of the Mini-Mental State Examination already reported. Reproducibility, sensitivity, specificity, and area under the curve of the A-TNI93 (GTD-USJ) scoring to detect cognitive impairment according to Clinical Dementia Rating (CDR) as the gold standard were measured. Normative data were established among 188 cognitively normal participants.
Results:
A threshold score of six on free recall (FR) provided a sensitivity of 66.7% and a specificity of 90.5%. The area under the curve was 0.93. By taking either scores, that is, a FR ≤ 6 or a total recall ≤ 8, the A-TNI93 (GTD-USJ) slightly improved dementia case detection with a sensitivity of 70.8% and a specificity of 88%. Normative data illustrate the distribution of cognitive performance among illiterate older adults.
Conclusions:
Compared to the CDR requiring physician’s competence, the A-TNI93 (GTD-USJ) is a valid Arabic adaptation to screen for cognitive impairment among illiterate Lebanese older adults.
Self-report screening instruments are frequently used as scalable methods to detect common mental disorders (CMDs), but their validity across cultural and linguistic groups is unclear. We summarized the diagnostic accuracy of brief questionnaires on symptoms of depression, anxiety and posttraumatic stress disorder (PTSD) among Arabic-speaking adults.
Methods
Five databases were searched from inception to 22 January 2021 (PROSPERO: CRD42018070645). Studies were included when diagnostic accuracy of brief (maximally 25 items) psychological questionnaires was assessed in Arabic-speaking populations and the reference standard was a clinical interview. Data on sensitivity/specificity, area under the curve, and data to generate 2 × 2 tables at various thresholds were extracted. Meta-analysis was performed using the diagmeta package in R. Quality of studies was assessed with QUADAS-2.
Results
Thirty-two studies (Nparticipants = 4042) reporting on 17 questionnaires with 5–25 items targeting depression/anxiety (n = 14), general distress (n = 2), and PTSD (n = 1) were included. Seventeen studies (53%) scored high risk on at least two QUADAS-2 domains. The meta-analysis identified an optimal threshold of 11 (sensitivity 76.9%, specificity 85.1%) for the Edinburgh Postnatal Depression Scale (EPDS) (nstudies = 7, nparticipants = 711), 7 (sensitivity 81.9%, specificity 87.6%) for the Hospital Anxiety and Depression Scale (HADS) anxiety subscale and 6 (sensitivity 73.0%, specificity 88.6%) for the depression subscale (nstudies = 4, nparticipants = 492), and 8 (sensitivity 86.0%, specificity 83.9%) for the Self-Reporting Questionnaire (SRQ-20) (nstudies = 4, nparticipants = 459).
Conclusion
We present optimal thresholds to screen for perinatal depression with the EPDS, anxiety/depression with the HADS, and CMDs with the SRQ-20. More research on Arabic-language questionnaires, especially those targeting PTSD, is needed.
BMI is a time-intensive measurement to assess nutritional status. Mid-upper arm circumference (MUAC) has been studied as a proxy for BMI in adults, but there is no consensus on its optimal use.
Design:
We calculated sensitivity, specificity and area under receiver operating characteristic curve (AUROCC) of MUAC for BMI < 18·5, <17 and <16 kg/m2. We designed a system using two MUAC cut-offs, with a healthy (non-thin) ‘green’ group, a ‘yellow’ group requiring BMI measurement and a ‘red’ group who could proceed directly to treatment for thinness.
Setting:
We retrospectively analysed monitoring data collected by the International Committee of the Red Cross in places of detention.
Participants:
11 917 male detainees in eight African countries.
Results:
MUAC had excellent discriminatory ability with AUROCC: 0·87, 0·90 and 0·92 for BMI < 18·5, BMI < 17 and BMI < 16 kg/m2, respectively. An upper cut-off of MUAC 25·5 cm to exclude healthy detainees would result in 64 % fewer detainees requiring BMI screening and had sensitivity 77 % (95 % CI 69·4, 84·7) and specificity 79·6 % (95 % CI 72·6, 86·5) for BMI < 18·5 kg/m2. A lower cut-off of MUAC < 21·0 cm had sensitivity 25·4 % (95 % CI 11·7, 39·1) and specificity 99·0 % (95 % CI 97·9, 100·0) for BMI < 16 kg/m2. An additional 50 kg weight requirement improved specificity to 99·6 % (95 % CI 99·0, 100·0) with similar sensitivity.
Conclusions:
A MUAC cut-off of 25·5 cm, above which detainees are classified as healthy and below receive further screening, would result in significant time savings. A cut-off of <21·0 cm and weight <50 kg can identify some detainees with BMI < 16 kg/m2 who require immediate treatment.
Research on the lateralizing value of neuropsychological tests is limited among Latino people with epilepsy (PWE). This study aims to evaluate the utility of two confrontation naming measures in laterality determination.
Method:
Data were collected from 71 Latino PWE who completed the Vocabulario Sobre Dibujos (VSD) and the Pontón-Satz Modified Boston Naming Test (MBNT). Raw and standardized scores were examined to determine diagnostic accuracy for predicting left hemisphere (LH) epilepsy for the full sample and using a sample-specific median split of educational attainment.
Results:
The MBNT demonstrated adequate classification accuracy (65.7%, 77.1%) as did the VSD (54.3%, 74.3%) for predicting LH seizure laterality using raw and standardized scores, respectively. For participants with ≥ 9 years of education (HEdu), receiver operator characteristic curve analyses showed a raw/percentile cutoff of ≤ 26/≤ 5th on the VSD, yielding 53%–58% sensitivity/87%–83% specificity. A raw score cutoff of ≤ 17 on MBNT produced 47% sensitivity/78% specificity for HEdu participants.
Conclusions:
The VSD was found to have greater flexibility in determining cutoff scores using either raw or standardized scores for predicting seizure laterality. This study provides interpretation guidance, emphasizing education as a pertinent variable, to optimize lateralization accuracy for Latino PWE.
We aimed to assess the validity of maternal recall of exclusive breastfeeding (EBF) at 3 months obtained 12 months after childbirth.
Design:
A population-based birth cohort study. The gold standard is maternal report of EBF at the age of 3 months (yes or no) and age of introduction of other foods in the infant’s diet. EBF was considered when the mother reported that no liquid, semi-solid or solid food was introduced up to that moment. The variable to be validated was obtained at 12 months after childbirth when the mother was asked about the age of food introduction. The prevalence of EBF at 3 months, and sensitivity, specificity, positive (PPV) and negative predictive values (NPV), and accuracy of 12-month recall with 95 % CI were calculated.
Setting:
Pelotas, Brazil.
Participants:
3700 mothers of participants of the Pelotas 2004 Birth Cohort.
Results:
The prevalence of EBF at 3 months was 27·8 % (95 % CI 26·4, 29·3) and 49·0 % (95 % CI 47·4, 50·6) according to gold standard and maternal recall, respectively. The sensitivity of maternal recall at 12 months was 98·3 % (95 % CI 97·4, 99·0), specificity 70·0 % (95 % CI 68·2, 71·7), PPV 55·8 % (95 % CI 53·4, 58·1), NPV 99·1 % (95 % CI 98·6, 99·5) and accuracy 77·9 % (95 % CI 76·6, 79·2). When the analyses were stratified by maternal and infant characteristics, the sensitivity remained around 98 %, and the specificity ranged from 64·4 to 81·8 %.
Conclusions:
EBF recalled at the end of the first year of infant’s life is a valid measure to be used in epidemiological investigations.
Malnutrition risk screening in cirrhotic patients is crucial, as poor nutritional status negatively affects disease prognosis and survival. Given that a variety of malnutrition screening tools is usually used in routine clinical practice, the effectiveness of eight screening tools in detecting malnutrition risk in cirrhotic patients was sought. A total of 170 patients (57·1 % male, 59·4 (sd 10·5) years, 50·6 % decompensated ones) with cirrhosis of various aetiologies were enrolled. Nutritional screening was performed using the Malnutrition Universal Screening Tool, Nutritional Risk Index, Malnutrition Screening Tool, Nutritional Risk Screening (NRS-2002), Birmingham Nutritional Risk Score, Short Nutritional Assessment Questionnaire, Royal Free Hospital Nutritional Prioritizing Tool (RFH-NPT) and Liver Disease Undernutrition Screening Tool (LDUST). Malnutrition diagnosis was defined using the Subjective Global Assessment (SGA). Data on 1-year survival were available for 145 patients. The prevalence of malnutrition risk varied according to the screening tools used, with a range of 13·5–54·1 %. RFH-NPT and LDUST were the most accurate in detecting malnutrition (AUC = 0·885 and 0·892, respectively) with a high sensitivity (97·4 and 94·9 %, respectively) and fair specificity (73·3 and 58 %, respectively). Malnutrition according to SGA was an independent prognostic factor of within 1-year mortality (relative risk was 2·17 (95 % CI 1·0, 4·7), P = 0·049) after adjustment for sex, age, disease aetiology and Model for End-stage Liver Disease score, whereas nutrition risk according to RFH-NPT, LDUST and NRS-2002 showed no association. RFH-NPT and LDUST were the only screening tools that proved to be accurate in detecting malnutrition in cirrhotic patients.
Naming or word-finding tasks are a mainstay of the typical neuropsychological evaluation, particularly with older adults. However, many older adults have significant visual impairment and there are currently no such word-finding tasks developed for use with older visually impaired populations. This study presents a verbal, non-visual measure of word-finding for use in the evaluation of older adults with possible dysnomia. Stimuli were chosen based on their frequency of usage in everyday spoken language. A 60-item scale was created and given to 131 older Veterans. Rasch analyses were conducted and differential item functioning assessed to eliminate poorly-performing items. The final 55-item scale had a coefficient alpha of 0.84 and correlated with the Neuropsychological Assessment Battery Naming test, r=0.84, p<.01, Delis-Kaplan Executive Function System (D-KEFS) Category Fluency, r=0.45, p<.01, and the D-KEFS Letter Fluency, r=0.40, p<.01. ROC analyses found the measure to have sensitivity of 79% and specificity of 85% for detecting dysnomia. Patients with dysnomia performed worse on the measure than patients with intact word-finding, t(84)=8.2, p<.001. Patients with no cognitive impairment performed significantly better than patients with mild cognitive impairment, who performed significantly better than patients with dementia. This new measure shows promise in the neuropsychological evaluation of word-finding ability in older adults with or without visual impairment. Future directions include the development of a shorter version and the generation of additional normative data. (JINS, 2015, 21, 1–10)
The six-item cognitive impairment test (6CIT) is a brief cognitive screening instrument (CSI) recommended for use in primary care settings. There are very few studies of 6CIT performance in secondary care settings.
Methods:
We undertook a pragmatic diagnostic accuracy study of 6CIT in consecutive patients referred over the course of one year to a neurology-led cognitive function clinic, and compared its performance for the diagnosis of dementia and mild cognitive impairment (MCI) to that of the simultaneously administered Mini-Mental State Examination (MMSE).
Results:
In a cohort of 245 patients with dementia prevalence around 20%, 6CIT proved quick and easy to use and acceptable to patients. It had good sensitivity (0.88) and specificity (0.78) for dementia diagnosis; it was more sensitive than MMSE (0.59) but less specific (0.85). For MCI diagnosis, 6CIT was again more sensitive (0.66) than MMSE (0.51) but less specific (0.70 vs. 0.75). Weighted comparisons showed net benefit for 6CIT compared to MMSE for both dementia and MCI diagnosis. 6CIT effect sizes (Cohen's d) were large for dementia diagnosis and moderate for MCI diagnosis.
Conclusions:
6CIT is an acceptable and accurate test for the assessment of cognitive problems, its performance being more sensitive than the MMSE. 6CIT use should be considered as a viable alternative to MMSE in the secondary care setting.
Foreign body aspiration is common and potentially life threatening. Although rigid bronchoscopy has the potential for serious complications, it is the ‘gold standard’ of diagnosis. It is used frequently in light of the inaccuracy of clinical examination and chest radiography. Computed tomography is proposed as a non-invasive alternative to rigid bronchoscopy.
Objective:
This study aimed to evaluate the accuracy and safety of computed tomography used in the diagnosis of suspected foreign body aspiration, and compare this with the current gold standard, in order to examine the possibility of using computed tomography to reduce the number of diagnostic rigid bronchoscopies performed.
Method:
The study comprised a review of literature published from 1970 to 2013, using the PubMed, Scopus, Web of Knowledge, Embase and Medline electronic databases.
Results:
The sensitivity for computed tomography ranged between 90 and 100 per cent, with four studies demonstrating 100 per cent sensitivity. Specificity was between 75 and 100 per cent. Radiation exposure doses averaged 2.16 mSv.
Conclusion:
Computed tomography is a sensitive and specific modality in the diagnosis of foreign body aspiration, and its future use will reduce the number of unnecessary rigid bronchoscopies.
The Confusion Assessment Method (CAM) is the most widely used delirium screening instrument. The aim of this study was to evaluate the reliability and validity of the European Portuguese version of CAM.
Methods:
The sample included elderly patients (≥65 years), admitted for at least 48 h, into two intermediate care units (ICMU) of Intensive Medicine and Surgical Services in a university hospital. Exclusion criteria were: score ≤11 on the Glasgow Coma Scale (GCS), blindness/deafness, inability to communicate and to speak Portuguese. For concurrent validity, a blinded assessment was conducted by a psychiatrist (DSM-IV-TR, as a reference standard) and by a trained researcher (CAM). This instrument was also compared with other cognitive measures to evaluate convergent validity. Inter-rater reliability was also assessed.
Results:
In this sample (n = 208), 25% (n = 53) of the patients had delirium, according to DSM-IV-TR. Using this reference standard, the CAM had a moderate sensitivity of 79% and an excellent specificity of 99%. The positive predictive value was 95%, indicating a strong ability to confirm delirium with a positive test result, and the negative predictive value was lower (93%). Good convergent validity was also found, in particular with Mini-Mental State Examination (MMSE) (rs = −0.676; p ≤0.01) and Digit Span Test (DST) forward (rs = −0.605; p ≤0.01), as well as a high inter-rater reliability (diagnostic k = 1.00; single items’ k between 0.65 and 1.00).
Conclusion:
Robust results on concurrent and convergent validity and good reliability were achieved. This version was shown to be a valid and reliable instrument for delirium detection in elderly patients hospitalized in intermediate care units.
A meta-analysis was conducted to reach a pooled estimate of the diagnostic accuracy of the SCOFF. The 15 selected studies represented a total of 882 cases and 4350 controls. The main criterion for inclusion was that the primary study had provided diagnostic classification with both a diagnostic reference and with the SCOFF (with five items and a cut-off point of two). The pooled estimates were .80 (sensitivity) and .93 (specificity). The moderator variables gender and type of measure for the diagnostic reference (interview versus psychometric tests) account for part of the observed variability. For diagnostic references based on interviews the estimate of the efficacy improves significantly. For the studies that match this criterion the sensitivity is .882 and the specificity .925 (diagnostic odds ratio, 92.19). The main conclusion was that the five questions of the SCOFF constitute a very useful screening tool, in several languages; it is highly recommended for screening purposes.
To evaluate the adequacy and accuracy of cut-off values currently recommended by the WHO for assessment of cardiovascular risk in southern Brazil.
Design
Population-based study aimed at determining the predictive ability of waist circumference for cardiovascular risk based on the use of previous medical diagnosis for hypertension, diabetes mellitus and/or dyslipidaemia. Descriptive analysis was used for the adequacy of current cut-off values of waist circumference, receiver operating characteristic curves were constructed and the most accurate criteria according to the Youden index and points of optimal sensitivity and specificity were identified.
Setting
Pelotas, southern Brazil.
Subjects
Individuals (n 2112) aged ≥20 years living in the city were selected by multistage sampling, since these individuals did not report the presence of previous myocardial infarction, angina pectoris or stroke.
Results
The cut-off values currently recommended by WHO were more appropriate in men than women, with overestimation of cardiovascular risk in women. The area under the receiver operating characteristic curve showed moderate predictive ability of waist circumference in men (0·74, 95 % CI 0·71, 0·76) and women (0·75, 95 % CI 0·73, 0·77). The method of optimal sensitivity and specificity showed better performance in assessing the accuracy, identifying the values of 95 cm in men and 87 cm in women as the best cut-off values of waist circumference to assess cardiovascular risk.
Conclusions
The cut-off values currently recommended for waist circumference are not suitable for women. Longitudinal studies should be conducted to evaluate the consistency of the findings.
Objective: The aim of this study was to present a clear process of synthesizing test accuracy data when conducting economic evaluations of diagnostic tests for health technology assessment (HTA) assessors and health economists.
Methods: We appraised the methods advocated for using diagnostic test accuracy data in economic evaluations. We used a case study of fetal anemia in which data from a screening test are used in combination with a confirmatory test.
Results: We developed a step-by-step guide and consider two scenarios: when data on test accuracy from several studies are based on (i) the same test threshold for positivity and (ii) different test thresholds.
Conclusions: We conclude that each approach has its strengths and limitations. We show that the optimal operating point of the test should be identified to determine the true cost-effectiveness of the test. We advocate that these issues require a multidisciplinary team of health economists, decision modelers and statisticians.
Current prehospital protocols for the management of patients with altered mental status include the empiric administration of hypertonic glucose, naloxone, and thiamine. The injudicious use of 50% dextrose (D50W) may result in hyperosmolarity, a worsening of hypokalemia, and unwarranted additional health-care costs for the patient. The administration of D50W also may worsen the neurological outcome of patients with local or generalized ischemia.
Objective:
To evaluate the ExacTech blood glucose meter's ability to estimate blood glucose levels accurately and rapidly.
Methods:
Emergency medical technicians (EMTs) from selected advanced life support (ALS) units in the Portland, Ore., metropolitan area participated in a prospective clinical trial of the ExacTech blood glucose meter. A convenience sample, was drawn from emergency medical services (EMS) patients with suspected diabetic emergencies, altered mental status, and other neurological deficits. Venous blood samples were drawn from these populations at the same time as the ExacTech readings were obtained. The venous blood was submitted to the receiving hospitals for laboratory analysis of blood glucose levels, and a comparison was made between the results of the two methods.
Results:
A total of 80 matched sets of data were obtained from 1 April 1990 through 6 May 1991. The hospital blood glucose values ranged from 8 to 1233 mg/dl. Sixteen (20%) of the patients were hypoglycemic (<60 mg/dl) and 23 (28.8%) were hyperglycemic (>180 mg/dl). The ExacTech device sensitivity and specificity for hypoglycemia using venous samples were 94.6% and 89.2%, respectively. For hyperglycemia, these same parameters were 87.5% and 97.1%. Pearson's r over the range of the instrument (40–450 mg/dl) was 0.8656 (p <.001). If the prehospital “definition” of hypoglycemia (for threshold-to-treat) is raised to 65 mg/dl, the device has 100% sensitivity in the sample population.
Conclusion:
The device functioned accurately and consistently in the prehospital environment over a wide range of temperatures, and in the hands of many different individuals.
Influenza causes severe illness and deaths, and global surveillance systems use different clinical case definitions to identify patients for diagnostic testing. We used data collected during January 2007–July 2010 at hospital-based influenza surveillance sites in western Kenya to calculate sensitivity, specificity, positive predictive value, and negative predictive value for eight clinical sign/symptom combinations in hospitalized patients with acute respiratory illnesses, including severe acute respiratory illness (SARI) (persons aged 2–59 months: cough or difficulty breathing with an elevated respiratory rate or a danger sign; persons aged ⩾5 years: temperature ⩾38 °C, difficulty breathing, and cough or sore throat) and influenza-like illness (ILI) (all ages: temperature ⩾38 °C and cough or sore throat). Overall, 4800 persons aged ⩾2 months were tested for influenza; 416 (9%) had laboratory-confirmed influenza infections. The symptom combination of cough with fever (subjective or measured ⩾38 °C) had high sensitivity [87·0%, 95% confidence interval (CI) 83·3–88·9], and ILI had high specificity (70·0%, 95% CI 68·6–71·3). The case definition combining cough and any fever is a simple, sensitive case definition for influenza in hospitalized persons of all age groups, whereas the ILI case definition is the most specific. The SARI case definition did not maximize sensitivity or specificity.
Objectives and Methods: Health technology assessment (HTA) often requires the identification and review of economic evaluations and models. This study surveys the available specific and general resources to search to identify economic evaluations. It also provides information on efficient searching of those resources and comments on the current evidence-base.
Results: Published checklists recommend searching for economic evaluations in specific information resources which collect economic evaluations such as NHS EED and HEED, followed by top-up searches of large biomedical bibliographic databases (such as MEDLINE and EMBASE). Other resources such as the HTA and DARE databases can yield reports of economic evaluations. Searches within NHS EED and HEED can be made more efficient by using database-specific search options. Searches within large biomedical databases such as MEDLINE and EMBASE require the use of economic search terms called search filters. Search filters are highly sensitive, retrieving most economic evaluations, but suffer from low precision returning many irrelevant records which need to be assessed.
Conclusions: It is relatively easy to identify rapidly a high proportion of economic evaluations but more research is required to improve the efficiency of this process. There are key high yield resources to search but more evidence is required on their overlap and unique contribution to searches. The value of other resources, particularly those providing access to gray literature, should be explored. Research into efficient retrieval requires clear definitions of economic evaluations to allow comparison across studies.
Objectives: The aim of this study is to review briefly different methods for determining the optimal retrieval of studies for inclusion in a health technology assessment (HTA) report.
Methods: This study reviews the methodology literature related to specific methods for evaluating yield from literature searching strategies and for deciding whether to continue or desist in the searching process.
Results: Eight different methods were identified. These include using the Capture–recapture technique; obtaining Feedback from the commissioner of the HTA report; seeking the Disconfirming case; undertaking comparison against a known Gold standard; evaluating retrieval of Known items; recognizing the Law of diminishing returns, specifying a priori Stopping rules, and identifying a point of Theoretical saturation.
Conclusions: While this study identified a variety of possible methods, there has been very little formal evaluation of the specific strengths and weaknesses of the different techniques. The author proposes an evaluation agenda drawing on an examination of existing data together with exploration of the specific impact of missing relevant studies.