We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure coreplatform@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Current cosmological observations place little constraints on the nature of dark matter, allowing the development of a large number of models and various methods for probing their properties, which seem to provide ideal grounds for the employment of robustness arguments. In this article, the extent to which such arguments can be used to overcome various methodological and theoretical challenges is examined. The conclusion is that while robustness arguments have a limited scope in the context of dark matter research, they can still be used for increasing the scientists’ confidence about the properties of specific models.
This study aimed to assess the validity and reliability of the Turkish version of the ‘Disaster Response Self-Efficacy Scale (DRSES).’
Method:
This is a methodological study to validate the DRSES. Third and fourth grade nursing students participated in the study (n = 340). Construct validity was evaluated by exploratory and confirmatory factor analysis. Reliability was assessed by internal consistency and test-retest reliability. Data were analyzed in SPSS 20.0 (IBM Corp., Armonk, NY, USA) and IBM SPSS AMOS 21.0. (IBM Corp., Armonk, NY, USA).
Results:
The content validity was 0.96, Cronbach’s alpha coefficient was 0.94, and the intraclass correlation coefficient for test-retest reliability was 0.95. The exploratory factor analysis revealed that 3 factors accounted for 59.4% of the explained variance. The factor loads ranged between 0.50 - 0.81. The construct validity was good (χ2/df = 2.54; RMSEA = 0.067; CFI = 0.93; NFI = 0.95; GFI = 0.93; TLI = 0.94; IFI = 0.92; P < 0.001).
Conclusions:
The results of this study show that the Disaster Response Self-Efficacy Scale is a valid and reliable tool that could be used to determine the nursing students’ disaster response self-efficacy.
We discuss multiple case studies in this chapter. We start off with a discussion of theoretical sampling and replication logic. We specifically discuss literal and theoretical replication (LR and TR) in connection with multiple case studies. The strengths and limitations of LR and TR are discussed thereafter. In particular, we deliberate upon the potential of TR to enhance the internal and external validity of a case study. Henceforth, we address some common (mis)conceptions regarding replication logic, internal validity, external validity (generalizability), and reliability. We also discuss how multiple case studies might need to sacrifice the depth of observation for breadth. Other potential weaknesses, such as the smaller number of independent variables and the difficulty in controlling context, are also discussed thereafter.
Motor unit number index of the upper trapezius (MUNIX-Trapezius) is a candidate biomarker for bulbar lower motor neuron function; however, reliability data is incomplete. To assess MUNIX-Trapezius reliability in controls, we conducted a systematic review, a cross-sectional study (n = 20), and a meta-analysis. We demonstrated a high inter- and intra-rater intraclass correlation (0.86 and 0.94, respectively), indicating that MUNIX-Trapezius is reliable with between-study variability moderated by age and MUNIX technique. With further validation, this measure can serve as a disease monitoring and response biomarker of bulbar function in the therapeutic development for amyotrophic lateral sclerosis.
Deferred prosecution agreements (DPAs) are a legal tool for the nontrial resolution of cases of corruption. Each DPA is accompanied by a Statement of Facts that provides detailed and publicly available textual records of the given cases, including summarized evidence of who was involved, what they committed, and with whom. These statements can be translated into networks amenable to social network analysis allowing an analysis of the structure and dynamics of each case. In this study, we show how to extract information about which actors were involved in a given case, the relations and interactions among these actors (e.g., communication or payments), and their relevant individual attributes (gender, affiliation, and sector) from five Statements of Fact. We code the extracted information manually with two independent coders and subsequently, we assess the inter-coder reliability. For assessing the coding reliability of nodes and attributes, we use a matching coefficient, whereas for assessing the coding reliability of ties, we construct a network from the coding of each coder and subsequently calculate the graph correlations of the two resulting networks. The coding of nodes and ties in the five extracted networks turns out to be highly reliable with only slightly lower coding reliability in the case of the largest network. The coding of attributes is highly reliable as well, although it is prone to missing data on actors’ gender. We conclude by discussing the flexibility of our data collection framework and its extension by including network dynamics and nonhuman actors (such as companies) in the network representation.
It was the purpose of this study to assess the consistency of selected animal-based welfare parameters for dairy cattle throughout a one-year period. Eight cubicle-housed dairy herds were visited five times, at two-monthly intervals. At each visit, lameness, injuries to the carpal and tarsal joints, cleanliness, social behaviour and the avoidance distance towards an unknown person were assessed by the same observer in a random sample.
At herd level, lesions of the carpal joints, udder cleanliness and frequencies of agonistic and cohesive behaviour showed low consistency. However, correlations between consecutive recordings as well as between single visits and the average were moderate to satisfactory for lameness prevalence, lesions of the tarsal joints, cleanliness of the hind leg and avoidance distances towards an unknown person in two different locations. The integration of these parameters into on-farm welfare assessment protocols seems to be justified.
Four observers were trained in lameness assessment using a subjective scoring system with five categories, and observer agreement was investigated four times at different stages of training and experience. Inter-observer reliability increased with time and reached acceptable levels in the last session. Retrospectively simplified versions of the scoring system were satisfactorily reliable already at a fairly low training level. For experienced raters, the original scoring system with five categories is suitable in terms of reliability for on-farm welfare assessment.
Position papers on artificial intelligence (AI) ethics are often framed as attempts to work out technical and regulatory strategies for attaining what is commonly called trustworthy AI. In such papers, the technical and regulatory strategies are frequently analyzed in detail, but the concept of trustworthy AI is not. As a result, it remains unclear. This paper lays out a variety of possible interpretations of the concept and concludes that none of them is appropriate. The central problem is that, by framing the ethics of AI in terms of trustworthiness, we reinforce unjustified anthropocentric assumptions that stand in the way of clear analysis. Furthermore, even if we insist on a purely epistemic interpretation of the concept, according to which trustworthiness just means measurable reliability, it turns out that the analysis will, nevertheless, suffer from a subtle form of anthropocentrism. The paper goes on to develop the concept of strange error, which serves both to sharpen the initial diagnosis of the inadequacy of trustworthy AI and to articulate the novel epistemological situation created by the use of AI. The paper concludes with a discussion of how strange error puts pressure on standard practices of assessing moral culpability, particularly in the context of medicine.
Qualitative Behaviour Assessment of cattle expression using a fixed rating scale of 20 descriptors is one of the measures of the Welfare Quality® (WQ) assessment protocol for dairy cattle. As for other on-farm measures of welfare, reliability is an important issue especially if farms are to be certified. This study investigated the repeatability of QBA results across three different observation times during the day (early morning, late morning, early afternoon). For this purpose, 13 observers assessed a total of 30 video clips from ten commercial dairy farms using visual analogue scales to score the 20 QBA terms. QBA scores for ‘emotional state’ were computed according to the Welfare Quality® protocol (WQ_QBA) and, additionally, a Principal Component Analysis was carried out. The latter revealed two main dimensions which may be described as ‘mood’ and ‘activity’, the former thus corresponding to the ‘emotional state’ score of the WQ protocol. Both for scores derived from the WQ protocol and from PCA, mixed model analysis for repeated measures revealed a significant effect of observation time depending on the farm. Mixed model analysis for repeated measures revealed a significant effect of observation time for three farms out of ten on both the WQ_QBA score and the PCA ‘mood’ dimension; a similar effect was found for eight out of ten farms for the PCA ‘activity’ dimension. These results indicate that observation time potentially affects WQ (and other QBA) outcomes on a proportion of farms. However, given that outcomes for WQ_QBA and PCA ‘mood’ were consistent for the majority of farms, procedures suggested in the Welfare Quality® protocol may constitute a reasonable compromise between reliability and feasibility. If the QBA assessment should reflect the ‘mean mood’, multiple assessments throughout the day may be carried out.
The present study was conducted to validate the Chinese version of the Adult Decision-Making Competence scale. 364 college students were recruited from four universities in China. The results indicate the Chinese Adult Decision-Making Competence subscales have good internal consistency and the two-factor structure in the study of Bruine de Bruin et al. (2007) was confirmed. Gender differences were found in Resistance to Sunk Cost. Differences of Applying Decision Rules and Consistency in Risk Perception were found between participants with different education background. Overall, the Chinese Adult Decision-Making Competence scale is validated in China.
Decades of research show that (i) social value orientation (SVO) is related to important behavioral outcomes such as cooperation and charitable giving, and (ii) individuals differ in terms of SVO. A prominent scale to measure SVO is the social value orientation slider measure (SVOSM). The central premise is that SVOSM captures a stable trait. But it is unknown how reliable the SVOSM is over repeated measurements more than one week apart. To fill this knowledge gap, we followed a sample of N = 495 over 6 months with monthly SVO measurements. We find that continuous SVO scores are similarly distributed (Anderson-Darling k-sample p = 0.57) and highly correlated (r ≥ 0.66) across waves. The intra-class correlation coefficient of 0.78 attests to a high test-retest reliability. Using multilevel modeling and multiple visualizations, we furthermore find that one’s prior SVO score is highly indicative of SVO in future waves, suggesting that the slider measure consistently captures one’s SVO. Our analyses validate the slider measure as a reliable SVO scale.
The objective of this study was to investigate inter-observer and test-retest reliability of different behavioural observations to be used in an on-farm, animal welfare monitoring system for veal calves. Twenty-three veal calf farms, varying in size, housing system, feeding regime and age of the calves were visited twice with two observers, simultaneously. Behavioural tests were conducted in eight pens per farm, measuring the response of calves to: a human entering the barn; a novel object; a passive, unfamiliar person; disturbance in the pen and an active approach by an unfamiliar and a familiar person. Furthermore, behaviour was recorded 20 min before and 20 min after feeding in eight other pens per farm. For all behavioural tests, inter-observer reliability was very high. Farm effects and test-retest reliabilities were high and significant for all behavioural tests, except for the test measuring response to disturbance in the pen. Although the active approach test with the familiar person was reliable, it was not feasible in practice due to the availability of the farmer. Since the active approach test with the unfamiliar person gave similar results, this test was recommended for an on-farm animal welfare monitoring system. For most behavioural elements recorded around feeding, farms differed significantly and interobserver and test-retest reliabilities were high as well as being significant. The behavioural tests with entering the barn, novel object and unfamiliar person, and the behavioural observations before and after feeding were feasible and distinctive and reliable enough to be performed on-farm. These methods are promising tools to use as a monitor of animal welfare in veal calves.
This paper discusses the current state of development of on-farm cattle welfare assessment systems with special regard to the approach of Welfare Quality® that focuses on animal-related measures. The central criteria validity, reliability and feasibility are considered with regard to selected welfare measures. All welfare measures incorporated into the Welfare Quality® protocol possess face validity, but for most of them construct or criterion validity as, eg shown for lameness, have not been demonstrated. Exemplarily the cases of qualitative behaviour assessment and measurement of avoidance distance towards humans or social licking are discussed. Reliability issues have often been neglected in the past and require more thorough investigation and discussion in the future, especially with respect to appropriate test statistics and limits of acceptability. Means of improving reliability are the refinement of definitions or recording methods and training. Consistency of results over time requires further attention, especially if farms are to be certified, based on infrequent recordings. Considering feasibility, time constraints are the main concern for assessment systems that focus seriously on animal-based measures; currently they require several hours of on-farm recordings, eg about 6 h for a herd of 60 dairy cows. The Welfare Quality® project has promoted knowledge and discussion about validity, reliability and feasibility issues. Many welfare measures applied in the Welfare Quality® on-farm assessment approach can be regarded sufficiently valid, reliable and feasible. However, there are still a considerable number of challenges. They should be tackled while using the present assessment system in order to constantly improve it.
In this study we investigated the robustness of the WelFur welfare assessment system for farmed mink (Neovison vison) to date of assessment in the winter and growth assessment periods. The prevalence of occurrences of certain measurements was hypothesised to increase with date of assessment (too thin, fur-chewing and stereotypic behaviour in the winter period and injuries, diarrhoea and exploratory mink in the growth period). The welfare was assessed on eight Danish mink farms according to the WelFur-Mink protocol. Each farm was assessed once in the nursing period (to be able to calculate WelFur-Mink scores), four times in the growth period and three times in the winter period. WelFur scores were calculated based on the assessments in the three periods: one calculation for each assessment in the winter and growth periods. The odds of fur-chewing increased with date of assessment in the winter period, and the odds of injuries, diarrhoea and exploratory mink increased with date of assessment in the growth period. The odds of too thin mink in the winter period decreased, ie the change was in the opposite direction to what was expected. The effect of these changes on the aggregated WelFur scores on the higher levels was limited, but could potentially lead to changes in the overall welfare categorisation of farms if the principle scores were close to a threshold between two categories. A potential way to eliminate the effect of date of assessment could be to develop a correction factor for the measurements that can be expected to change within each assessment period.
A study using a high school and college sample (age 18–26) was conducted to validate the Slovak version of the Adult Decision-Making Competence. The results were similar to findings reported by Bruine de Bruin, Parker, and Fischhoff (2007) on the adult population in America. The internal consistency of component subscales and whole measure was confirmed as well as the factor structure. Gender differences in two of the six subscales were found. The results highlight the usefulness of A-DMC in assessing decision-making competence in Slovak language, but non-student samples are needed to enhance the generalisability of findings.
The concept of “passive risk taking”, which refers to the risk brought on or magnified by inaction, has recently appeared in the literature on risk taking. Keinan and Bereby-Meyer (2012) have developed a scale to measure the personal tendency for passive risk taking (PRT); the scale has criterion validity and high test-retest reliability; it correlates with reported passive risk taking in everyday life and with the DOSPERT scale. Furthermore, it presents divergent validity from classic risk-taking constructs such as sensation seeking, and convergent validity with procrastination and avoidance. In this paper we propose a validation of the PRT scale in Italian. We performed the linguistic adaptation to Italian via the five steps suggested by Guillemin and colleagues (1993) and Beaton and colleagues (2000); we then submitted the derived questionnaire to a 297-adult sample. Results show that two out of three factors from the original scale were confirmed. However the third factor, originally composed of 6 items, was not consistent. We present the scale derived from such results, and discuss the differences with the original scale.
This paper focuses on the reliability of the multi-criteria evaluation model included in the Welfare Quality® protocol for growing pigs to aggregate the animal-based indicators, first to criteria, then to principle level and finally to an overall welfare score. This assessment was carried out in a practical application study on a sample of 24 farms in Germany. Altogether, 102 protocol assessments were carried out in repeated visits to these farms in order to evaluate the inter-observer and test-retest repeatability of the overall scores calculated by the multi-criteria evaluation system. Reliability is then assessed by the calculation of different reliability and agreement parameters: Spearman Rank Correlation Coefficients (RS), Intraclass Correlation Coefficients (ICC), Smallest Detectable Changes (SDC) and Limits of Agreement (LoA). Inter-observer repeatability was insufficient for the criteria comfort around resting, absence of injuries, expression of social behaviours, expression of other behaviours, good human-animal relationship and positive emotional state as well as for the principles good housing and appropriate behaviour. This is probably due in the main to insufficient repeatability of the underlying indicators that have been revealed in previous studies. Test-retest repeatability is predominantly insufficient. Overall, the present results highlight the importance of absolutely reliable indicators at the baseline level. Furthermore, it could be shown that the calculation procedure is partly incorrect and consequently needs correction. Therefore, this study is an important contribution to the future progression of the Welfare Quality® protocols and animal welfare assessment tools in general.
Consistency over time is a basic requirement for welfare assessment schemes since consistency must not depend, for example, on the day it is carried out. This study analysed the consistency of the indicators of the Animal Welfare Indicators (AWIN) protocol for horses (Equus caballus) over time. Given the multi-dimensionality of animal welfare, the AWIN protocol includes a variety of indicators evaluating, eg the health status or the behaviour of the animals. Fourteen establishments keeping horses in Germany were visited four times each (day 0, day 3, day 42, day 90). For the evaluation of reliability and agreement between the different visits, ie across time, the reference visit on day 0 was compared to the other visits via calculation of Spearman's rank correlation (RS), intra-class correlation (ICC), smallest detectable change (SDC) and limits of agreement (LoA). The indicator, Qualitative Behaviour Assessment (QBA) was analysed by Principal Component Analysis (PCA). Most of the indicators demonstrated sufficient consistency over time. Indicators that were inconsistent included parts of the Horse Grimace Scale, outcomes of behavioural tests, the presence of swollen joints as well as the indicators hoof neglect, alopecia on the legs and water cleanliness. The QBA was consistent for the period of 42 days, but not for 90 days. Overall, those indicators with insufficient consistency over time require to be revised or replaced in future welfare assessment schemes.
The study investigates the psychometric characteristics of the Slovak version of the original and short form of the Indecisiveness Scale on three samples of university students and one general population sample. An exploratory as well as confirmatory factor analysis confirmed the one factor structure of the scale with a satisfactory internal consistency and time stability of scores. The criterion validity was examined through relationships with thinking styles, decision-making styles, the Big Five factors, decision outcomes, well-being and perceived stress, as well as through a comparison of the general population sample with a sample with an obsessive-compulsive disorder diagnosis. Subjects who self-reported as undecided in their future intentions regarding migration tendencies had higher scores in indecisiveness. Both examined forms of the Slovak version of the Indecisiveness Scale were demonstrated to be reliable and valid instruments for the measurement of indecisiveness with the short form being favorited as more appropriately tapping into the core aspect of indecisiveness.
Qualitative Behaviour Assessment (QBA) is part of the Welfare Quality® protocol for dairy cattle, although its inter- and intra-observer reliability have not been reported. This study evaluated inter- and intra-observer reliability of the QBA for dairy cattle in experienced and inexperienced observers using videos. Eight experienced observers performed the QBA (20 descriptors) twice for 16 video clips (60 s per clip; series 1) showing 4-17 animals. They assessed another 11 video clips showing herds (4 shots of 30 s per clip; series 2). Ten inexperienced observers performed the QBA on both video series one time. Inter-observer reliability of experienced observers ranged from slight to moderate (both assessments of series 1), and from low to high (series 2) for descriptors, and from slight to moderate for the QBA score. Inter-observer reliability of inexperienced observers ranged from low to moderate (series 1), and from low to high (series 2) for descriptors, and was moderate (both series) for the QBA score. Intra-observer correlations varied largely per descriptor and observer. They were both negative and positive, and ranged from low to very high. High correlations, however, were not necessarily associated with low paired differences. Values of half of the descriptors and the QBA score differed amongst experienced and inexperienced observers. The QBA appears insufficiently reliable as a tool for welfare assessment in dairy cattle.