Regression discontinuity (RD) designs enable researchers to estimate causal effects using observational data. These causal effects are identified at the point of discontinuity that distinguishes those observations that do or do not receive the treatment. One challenge in applying RD in practice is that data may be sparse in the immediate vicinity of the discontinuity. Expanding the analysis to observations outside this immediate vicinity may improve the statistical precision with which treatment effects are estimated, but including more distant observations also increases the risk of bias. Model specification is another source of uncertainty; as the bandwidth around the cutoff point expands, linear approximations may break down, requiring more flexible functional forms. Using data from a large randomized experiment conducted by Gerber, Green, and Larimer (2008), this study attempts to recover an experimental benchmark using RD and assesses the uncertainty introduced by various aspects of model and bandwidth selection. More generally, we demonstrate how experimental benchmarks can be used to gauge and improve the reliability of RD analyses.