We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We use an experiment to evaluate the effects of participatory management on firm performance. Participants are randomly assigned roles as managers or workers in firms that generate output via real effort. To identify the causal effect of participation on effort, workers are exogenously assigned to one of the two treatments: one in which the manager implements a compensation scheme unilaterally or another in which the manager cedes control over compensation to the workers who vote to implement a scheme. We find that output is between seven and twelve percentage points higher in participatory firms.
Accurate diagnosis of bipolar disorder (BPD) is difficult in clinical practice, with an average delay between symptom onset and diagnosis of about 7 years. A depressive episode often precedes the first manic episode, making it difficult to distinguish BPD from unipolar major depressive disorder (MDD).
Aims
We use genome-wide association analyses (GWAS) to identify differential genetic factors and to develop predictors based on polygenic risk scores (PRS) that may aid early differential diagnosis.
Method
Based on individual genotypes from case–control cohorts of BPD and MDD shared through the Psychiatric Genomics Consortium, we compile case–case–control cohorts, applying a careful quality control procedure. In a resulting cohort of 51 149 individuals (15 532 BPD patients, 12 920 MDD patients and 22 697 controls), we perform a variety of GWAS and PRS analyses.
Results
Although our GWAS is not well powered to identify genome-wide significant loci, we find significant chip heritability and demonstrate the ability of the resulting PRS to distinguish BPD from MDD, including BPD cases with depressive onset (BPD-D). We replicate our PRS findings in an independent Danish cohort (iPSYCH 2015, N = 25 966). We observe strong genetic correlation between our case–case GWAS and that of case–control BPD.
Conclusions
We find that MDD and BPD, including BPD-D are genetically distinct. Our findings support that controls, MDD and BPD patients primarily lie on a continuum of genetic risk. Future studies with larger and richer samples will likely yield a better understanding of these findings and enable the development of better genetic predictors distinguishing BPD and, importantly, BPD-D from MDD.
In this chapter we discuss a few cases of scientific misconduct that turned out easy to spot, given some basic knowledge of statistics. We learn that it is always important to begin with a close look at the data that you are supposed to analyze. What is the source of the data, how were they collected, and who collected them and for what purpose? Next, we discuss various specific cases where the misconduct was obvious. We see that it is not difficult to create tables with fake regression outcomes, and that it is also not difficult to generate artificial data that match with those tables. Sometimes results are too good to be true. Patterns in outcomes can be unbelievable. We also see that it is not difficult to make the data fit better to a model. These are of course all unethical approaches and should not be replicated, but it is good to know that these things can happen and how.
The first chapter contains an overview of what is accepted as good practice. We review several general ethical guidelines. These can be used to appreciate good research and to indicate where and how research does not adhere to them. Good practice is “what we all say we (should) adhere to.” In the second part of this chapter, the focus is more on specific ethical guidelines for statistical analysis. Of course, there is overlap with the more general guidelines, but there are also a few specifically relevant to statistics: Examples are misinterpreting p values and malpractice such as p hacking and harking.
In practice it often happens that forecasts from econometric models are manually adjusted. There can be good reasons for this. Foreseeable structural changes can be incorporated. Recent changes in data, in measurement or in the relevance of variables, can be addressed. A main issue with manual adjustment is that the end user of a forecast needs to know why someone modified a forecast and, next, how that forecast was changed. This should therefore be documented. We discuss an example to show that one may also need to know specific details of econometric models, here growth curves, to understand that even a seemingly harmless adjustment by a priori fixing the point of inflection leads to any result that you would like. In this chapter we discuss why people manually adjust forecasts. We discuss the optimal situation when it comes to adjustment and the experience with manual adjustment so far. A plea is made to consider model-based adjustment of model forecasts, thus allowing for a clear understanding of how and why adjustment was made.
In this chapter we move towards more subtle aspects of econometric analysis, where it is not immediately obvious from the numbers or the graphs that something is wrong. We see that so-called influential observations may not be visible from graphs but become apparent after creating a model. This is one of the key takeaways from this chapter – that we do not throw away data prior to econometric analysis. We should incorporate all observations in our models and, based on specific diagnostic measures, decide which observations are harmful.
Econometricians develop and use methods and techniques to model economic behavior, create forecasts, to do policy evaluation, and to develop scenarios. Often, this ends up in advice. This advice can relate to a prediction for the future or for another sector or country, it can be a judgment on whether a policy measure was successful or not, or suggest a possible range of futures. Econometricians (must) make choices that can often only be understood by fellow econometricians. A key claim in this book is that it is important to be clear on those choices. This introductory chapter briefly describes the contents of all following chapters.
This chapter deals with features of data that suggest a certain model or method, but where this suggestion is erroneous. We highlight a few cases in which an econometrician could be directed in the wrong direction, and at the same time we show how this can be prevented from happening. These situations happen in cases where there is no strong prior information on how the model should be specified. The data are then used to guide model construction. This guidance can be in an inappropriate direction. We review a few empirical cases where some data features obscure a potentially proper view of the data and may suggest inappropriate models. We discuss spurious cycles and the impact of additive outliers on detecting ARCH and nonlinearity. We also focus on a time series that may exhibit recessions and expansions, allowing you to (wrongly) interpret the recession observations as outliers. Finally, we deal with structural breaks and trends and unit roots, and see how data with these features can look alike.
This last chapter summarizes most of the material in this book in a range of concluding statements. It provides a summary of the lessons learned. These lessons can be viewed as guidelines for research practice.
We first discuss a phenomenon called data mining. This can involve multiple tests on which variables or correlations are relevant. If used improperly, data mining may associate with scientific misconduct. Next, we discuss one way to arrive at a single final model, involving stepwise methods. We see that various stepwise methods lead to different final models. Next, we see that various configurations in test situations, here illustrated for testing for cointegration, lead to different outcomes. It may be possible to see which configurations make most sense and can be used for empirical analysis. However, we suggest that it is better to keep various models and somehow combine inferences. This is illustrated by an analysis of the losses in airline revenues in the United States owing to 9/11. We see that out of four different models, three estimate a similar loss, while the fourth model suggests only 10 percent of that figure. We argue that it is better to maintain various models, that is, models that stand various diagnostic tests, for inference and for forecasting, and to combine what can be learned from them.
In practice it may happen that a first-try econometric model is not appropriate because it violates one or more of the key assumptions that are needed to obtain valid results. In case there is something wrong with the variables, such as measurement error or strong collinearity, we may better modify the estimation method or change the model. In the present chapter we deal with endogeneity, which can, for example, be caused by measurement error, and which implies that one or more regressors are correlated with the unknown error term. This is of course not immediately visible because the errors are not known beforehand and are estimated jointly with the unknown parameters. Endogeneity can thus happen when a regressor is measured with error, and, as we see, when the data are aggregated at too low a frequency. Another issue is called multicollinearity, in which it is difficult to disentangle (the statistical significance of) the separate effects. This certainly holds for levels and squares of the same variable. Finally, we deal with the interpretation of model outcomes.
This chapter opens with some quotes and insights on megaprojects. We turn to the construction and the use of prediction intervals in a time series context. We see that depending on the choice of the number of unit roots (stochastic trends) or the sample size (when does the sample start?), we can compute a wide range of prediction intervals. Next, we see that those trends, and breaks in levels and breaks in trend, can yield a wide variety of forecasts. Again, we reiterate that maintaining a variety of models and outcomes is useful, and that an equal-weighted combination of results can be most appropriate. Indeed, any specific choice leads to a different outcome. Finally, we discuss for a simple first-order autoregression how you can see what the limits to predictability are. We see that these limits are closer than we may think at the onset.
This chapter uses a range of quotes and findings from the internet and the literature. The key premises of this chapter, which is illustrated with examples, are as follows. First, Big Data requires the use of algorithms. Second, algorithms can create misleading information. Third, algorithms can lead to destructive outcomes. But we should not forget that humans program algorithms. With Big Data come algorithms to run many and involved computations. We cannot oversee all these data ourselves, so we need the help of algorithms to make computations for us. We might label these algorithms as Artificial Intelligence, but this might suggest that they can do things on their own. They can run massive computations, but they need to be fed with data. And this feeding is usually done by us, by humans, and we also choose the algorithms to be used.