Three-dimensional mapping-aided (3DMA) Global Navigation Satellite System (GNSS) positioning improves the positioning in urban canyons for non-precision GNSS receivers. However, the 3DMA GNSS algorithms often produce a multimodal position solution, and simply taking the average of these modes reduces accuracy. A further problem, named ‘solution shifting’, is the effect of large numbers of low-scoring candidates shifting the overall position solution away from high-scoring regions. This study uses a clustering method to separate the different modes and exclude low-scoring regions from the position solution. Factor graph optimisation (FGO) is then used to integrate clustered 3DMA GNSS position and GNSS Doppler measurements or estimated velocity over multiple epochs. Positioning performance is assessed using data collected in London. The results show that the clustering method can successfully mitigate the multimodal effect, and integrating the FGO can mitigate the occurrence of multimodality and solution shifting. Static experiments in London achieve an RMSE of approximately 10 m for FGO 3DMA GNSS with clustering and 11 m without clustering.