A central question in foreign language (LX) learning is how vocabulary acquisition is affected by using image versus orthographic referents. According to the picture superiority effect (PSE) and bilingual/dual coding theory (b/DCT), images should lead to better novel word encoding and retrieval. We tested this prediction using behavioral and event-related potential (ERP) measures. Thirty Polish native speakers learned 40 LX (artificial language) words using either image or L1/orthographic referents. After 24 hours, participants were tested using a translational priming paradigm in congruent and incongruent training-testing modalities. Behavioral results showed higher accuracy and faster responses for LX words learned and tested with images, in line with the PSE and b/DCT. ERP results revealed smaller Late Positive Complex (LPC) amplitudes for words preceded by image compared to lexical primes, likely reflecting less cognitively demanding lexical retrieval. These results provide converging evidence that visual referents provide a more salient modality for L2 learning.