The classical analysis model for agricultural field trials is based on the principles of experimental design – randomization, replication and blocking – and it assumes independent residual effects. Accounting for any existent spatial correlation as an add-on component may be beneficial, but it requires selection of a suitable spatial model and modification of classical tests of treatment contrasts. Using a sugar beet trial laid out in complete blocks for illustration, it is shown that tests obtained with different modifications yield diverging results. Simulations were performed to decide whether different test modifications lead to valid statistical inferences. For the spherical, power and Gaussian models, each with six different values of the range parameter and without a nugget effect, the suitability of the following modifications was studied: a generalization of the Satterthwaite method (1941), the method of Kenward and Roger (1997), and the first-order corrected method described by Kenward and Roger (2009). A second-order method described by Kenward and Roger (2009) is also discussed and detailed results are provided as Supplemental Material (available at: http://journals.cambridge.org/AGS). Simulations were done for experiments with 10 or 30 treatments in complete and incomplete block designs. Model selection was performed using the corrected Akaike information criterion and likelihood-ratio tests. When simulation and analysis models were identical, at least one of the modifications for the t-test guaranteed control of the nominal Type I error rate in most cases. When the first-order method of Kenward and Roger was used, control of the t-test Type I error rate was poor for 10 treatments but on average very good for 30 treatments, when considering the best-fitting models for a given simulation setting. Results were not satisfactory for the F-test. The more pronounced the spatial correlation, the more substantial was the gain in power compared to classical analysis. For experiments with 20 treatments or more, the recommendation is to select the best-fitting model and then use the first-order method for t-tests. For F-tests, a randomization-based model with independent error effects should be used.