Hostname: page-component-77f85d65b8-lfk5g Total loading time: 0 Render date: 2026-03-30T00:02:12.076Z Has data issue: false hasContentIssue false

Learning priors for adversarial autoencoders

Published online by Cambridge University Press:  20 January 2020

Hui-Po Wang
Affiliation:
National Chiao Tung University, 1001 Ta-Hsueh Rd., Hsinchu30010, Taiwan
Wen-Hsiao Peng*
Affiliation:
National Chiao Tung University, 1001 Ta-Hsueh Rd., Hsinchu30010, Taiwan
Wei-Jan Ko
Affiliation:
National Chiao Tung University, 1001 Ta-Hsueh Rd., Hsinchu30010, Taiwan
*
Corresponding authors: Wen-Hsiao Peng. E-mail: wpeng@cs.nctu.edu.tw

Abstract

Most deep latent factor models choose simple priors for simplicity, tractability, or not knowing what prior to use. Recent studies show that the choice of the prior may have a profound effect on the expressiveness of the model, especially when its generative network has limited capacity. In this paper, we propose to learn a proper prior from data for adversarial autoencoders (AAEs). We introduce the notion of code generators to transform manually selected simple priors into ones that can better characterize the data distribution. Experimental results show that the proposed model can generate better image quality and learn better disentangled representations than AAEs in both supervised and unsupervised settings. Lastly, we present its ability to do cross-domain translation in a text-to-image synthesis task.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2020
Figure 0

Fig. 1. The relations of our work with prior arts.

Figure 1

Fig. 2. The architecture of AAE without (a) and with (b) the code generator.

Figure 2

Fig. 3. Alternation of training phases: (a) the prior improvement phase and (b) the AAE phase. The shaded building blocks indicate the blocks to be updated.

Figure 3

Fig. 4. Supervised learning architecture with the code generator.

Figure 4

Fig. 5. Unsupervised learning architecture with the code generator.

Figure 5

Algorithm 1 Training procedure.

Figure 6

Table 1. Comparison with AAE, VAE, and Vamprior on CIFAR-10.

Figure 7

Fig. 6. Sample images produced by (a) AAE, (b) VAE, (c) Vamprior, and (d) the proposed model. (a) AAE [3], (b) VAE [7], (c) Vamprior [11], and (d) Proposed model.

Figure 8

Table 2. Comparison with other state-of-the-art generative models on CIFAR-10.

Figure 9

Fig. 7. Subjective quality evaluation of generated images produced by state-of-the-art generative models. (a) BEGAN [15], (b) DCGAN [16], (c) LSGAN [17], (d) WGAN-GP [18], and (e) Proposed model.

Figure 10

Table 3. Inception score of generated images with the models trained on CIFAR-10: A, B, and C denote respectively the design choices of enabling the learned prior, the perceptual loss, and the updating of the decoder in both phases.

Figure 11

Fig. 8. Images generated by our model and AAE trained on MNIST (upper) and CIFAR-10 (lower). (a) Our model + 8-D latent code, (b) AAE [3] + 8-D latent code, (c) Our model + 64-D latent code, and (d) AAE [3] + 64-D latent code.

Figure 12

Fig. 9. Images generated by our model and AAE trained on MNIST (upper) and CIFAR-10 (lower). In this experiment, the latent code dimension is increased significantly to 64-D and 2000-D for MNIST and CIFAR-10, respectively. For AAE, the re-parameterization trick is applied to the output of the encoder as suggested in [3]. (a) Our model + 100-D latent code, (b) AAE [3] + 100-D latent code, (c) Our model + 2000-D latent code, and (d) AAE [3] + 2000-D latent code.

Figure 13

Fig. 10. Images generated by the proposed model (a)(c)(e) and AAE (b)(d)(f) trained on MNIST, SVHN, and CIFAR-10 datasets in the supervised setting. Each column of images has the same label/class information but varied Gaussian noise. On the other hand, each row of images has the same Gaussian noise but varied label/class variables. (a) Our model, (b) AAE [3], (c) Our model, (d) AAE [3], (e) Our model, and (f) AAE [3].

Figure 14

Fig. 11. Visualization of the code generator output in the supervised setting. (a) MNIST, (b) SVHN, and (c) CIFAR-10.

Figure 15

Fig. 12. Images generated by the proposed model (a)(c)(e) and AAE (b)(d)(f) trained on MNIST, SVHN, and CIFAR-10 datasets in the unsupervised setting. Each column of images has the same label/class information but varied Gaussian noise. On the other hand, each row of images has the same Gaussian noise but varied label/class variables. (a) Our model, (b) AAE, (c) Our model, (d) AAE, (e) Our model, and (f) AAE.

Figure 16

Fig. 13. Visualization of the encoder output versus the code generator output in the unsupervised setting. (a) Encoder (MNIST), (b) Encoder (SVHN), (c) Encoder (CIFAR-10), (d) Code generator (MNIST), (e) Code generator (SVHN), and (f) Code generator (CIFAR-10).

Figure 17

Fig. 14. Generated images from text descriptions. (a) This vibrant flower features lush red petals and a similar colored pistil and stamen and (b) This flower has white and crumpled petals with yellow stamen.

Figure 18

Fig. 15. Generated images in accordance with the varying color attribute in the text description “The flower is pink in color and has petals that are rounded in shape and ruffled.” From left to right, the color attribute is set to pink, red, yellow, orange, purple, blue, white, green, and black, respectively. Note that there is no green or black flower in the dataset.

Figure 19

Table A.1. Implementation details of the encoder and decoder networks.

Figure 20

Table A.2. Implementation details of the code generator networks.

Figure 21

Table A.3. Implementation details of the image and code discriminator.