We investigate the synchronization of speech and co-speech actions (i.e., manual game moves) in a dyadic game interaction across different levels of information structure and mutual visibility. We analyze cross-modal synchronization as the temporal distance between co-speech actions and corresponding (1) pitch accent peaks and (2) word onsets. For (1), we find no effect of mutual visibility on cross-modal synchronization. However, pitch accent peaks and co-speech actions are more tightly aligned when game moves are prosodically prominent due to information structural needs. This result is in line with a view of a tightly coupled processing of modalities, where prominence-lending modifications in the prosodic structure attract corresponding manual actions to be realized in a way similar to ‘beat’ gestures. For (2), we do find an effect of mutual visibility, under which co-speech actions are produced earlier and more tightly aligned with word onsets. This result is in line with a view in which co-speech actions act as communicative affordances, which may help an early disambiguation of a message, similar to ‘representational’ co-speech gestures.