We developed a method to generate omnidirectional depth maps from corresponding omnidirectional images of cityscapes by learning each pair of an omnidirectional and a depth map, created by computer graphics, using pix2pix. Models trained with different series of images, shot under different site and sky conditions, were applied to street view images to generate depth maps. The validity of the generated depth maps was then evaluated quantitatively and visually. In addition, we conducted experiments to evaluate Google Street View images using multiple participants. We constructed a model that predicts the preference label of these images with and without the generated depth maps using the classification method with deep convolutional neural networks for general rectangular images and omnidirectional images. The results demonstrate the extent to which the generalization performance of the cityscape preference prediction model changes depending on the type of convolutional models and the presence or absence of generated depth maps.