Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-09T02:47:58.346Z Has data issue: false hasContentIssue false

Understanding convolutional neural networks via discriminant feature analysis

Published online by Cambridge University Press:  11 December 2018

Hao Xu*
Affiliation:
Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, Los Angeles, CA, USA
Yueru Chen
Affiliation:
Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, Los Angeles, CA, USA
Ruiyuan Lin
Affiliation:
Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, Los Angeles, CA, USA
C.-C. Jay Kuo
Affiliation:
Ming Hsieh Department of Electrical Engineering, University of Southern California, 3740 McClintock Avenue, Los Angeles, CA, USA
*
Corresponding author: Hao Xu Email: iamxuhao@gmail.com

Abstract

Trained features of a convolution neural network (CNN) at different convolution layers is analyzed using two quantitative metrics in this work. We first show mathematically that the Gaussian confusion measure (GCM) can be used to identify the discriminative ability of an individual feature. Next, we generalize this idea, introduce another measure called the cluster purity measure (CPM), and use it to analyze the discriminative ability of multiple features jointly. The discriminative ability of trained CNN features is validated by experimental results. Research on CNNs utilizing GCM and CPM tools offers important insights into its operational mechanism, including the behavior of trained CNN features and good detection performance of some object classes that were considered difficult in the past. Finally, the trained feature representation is compared between different CNN structures to explain the superiority of deeper networks.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2018
Figure 0

Fig. 1. None-GMC-like and GMC-like features are compared in the left and right subfigures. The first and third subfigures give the top nine activations for two conv5 filters and the second and fourth subfigures show their Gaussian confusion plot, which reflects the distribution of a filter's response value (see discussion in Section A). In this example, the GMC filter (i.e., filter 142 in conv5 of the fast-RCNN [9] CaffeNet model) is mainly activated by the dog class and its Gaussian confusion plot shows better separation.

Figure 1

Fig. 2. The red and blue bars correspond to the decision error, ℚkj(t, c), and the null mean response, m0(fkj, c), of the 256 conv5 filters of the fast-RCNN CaffeNet with 40,000 iterations, respectively. The blue bar is plotted bottom-up using the main vertical axis while the red bar is plotted top-down using the secondary vertical axis. The red dots indicate filters of indices 132, 178, and 205 (from left to right), which have the GCM score of 0.05, 0.9, and 0.17, respectively. These filter responses have low decision errors so that they are selected as GMC-like features. The top nine activations and the Gaussian confusion plot for each of these three cases are presented for validation.

Figure 2

Fig. 3. The response vectors of filter 188 (top left) and 212 (bottom left) in the 40000 iteration CaffeNet model are plotted in the 2D space. The blue dots in the plot corresponds to response vectors obtained from testing samples of cars and the red dots corresponds response vectors obtained from others. The plot in the right takes the top 200 responses vectors that are closest to the point P (shown as the star on the top right). The CPM score in this example is 0.87.

Figure 3

Fig. 4. The red and blue bars correspond to the decision error, min ckj(t, c), and the null mean response value, m0(fkj, c), of the 256 conv5 filters of the fast-RCNN CaffeNet with 40,000 iterations, respectively. (Note that these bars are different from those in Fig. 2, where ℚkj(t, c) and m0(fkj, c) were plotted for c =“person”). The red bar is plotted top-down using the secondary vertical axis while the blue bar is plotted bottom-up using the main vertical axis. The red dots highlight filters 7, 132, and 246 from left to right. These filters have low decision errors and are selected as GMC-like features. The blue dots highlight filters 35, 45, and 51. These filters have higher decision errors and their responses are selected as none-GMC-like features. The top nine activations and their deconv results are presented to validate whether it is a GMC-like or none-GMC-like feature.

Figure 4

Fig. 5. From left to right: the top nine activations and the 20 classes Gaussian confusion plot of filters 188, 212, and 172 in conv5 of the fast-RCNN CaffeNet with 40,000 iterations. The CPM score is 0.54 for filter 188 alone, 0.86 for filter 188 and filter 212 - it reaches 0.90 for the three filters group. The filters correspond to the tail, window, and head of the car, respectively.

Figure 5

Fig. 6. From left to right: the top nine activations and the 20 classes Gaussian confusion plot of filters 16, 180, and 220 in conv5 of the fast-RCNN CaffeNet Model with 40,000 iterations. The CPM score is 0.36. These filters correspond to dishes, cups, an ellipse contour, respectively. Filters 16 and 180 are not related to a dining table itself but rather objects placed on top of it.

Figure 6

Fig. 7. The top nine activations and the 20 classes Gaussian confusion plot of filters 56, 164, and 206 in conv5 of the fast-RCNN CaffeNet Model with 40,000 iterations are shown from left to right. The CPM score is 0.61. These filters are all dedicated to detecting blue or gray color in the input, which are the typical background associated with “airplanes”.

Figure 7

Fig. 8. From left to right: (1) the top nine activations of filter 216 in conv5 layer of the 40000 iteration CaffeNet model, and (2) the Gaussian confusion plot of the bicycle versus others, the motorbike versus others, and the car versus others.

Figure 8

Table 1. The CPM scores of the top 5 features in the CaffeNet and the VGG_M_1024 for each object class.

Figure 9

Fig. 9. From left to right: (1) the top nine activations of filter 141 in conv5 layer of the 40000 iteration CaffeNet model, and (2) the Gaussian confusion plot of the cat versus others, the horse versus others, and the dog versus others.

Figure 10

Fig. 10. From left to right: (1) the top nine activations of filter 214 in conv5 layer of the 40000 iteration CaffeNet model, and (2) the Gaussian confusion plot of all 20 object classes.

Figure 11

Fig. 11. From left to right: (1) the top nine activations of filter 35 in conv5 layer of the 40000 iteration CaffeNet model, and (2) the Gaussian confusion plot of all 20 object classes.

Figure 12

Fig. 12. The top three filters in the CaffeNet trained to detect the “bottle” object. Their corresponding Gaussian confusion plots are shown under the top nine activations.

Figure 13

Fig. 13. The top three filters in the VGG_M_1024 (right) trained to detect the “bottle” object. Their corresponding Gaussian confusion plots are shown under the top nine activations.

Figure 14

Table 2. The functionality summary of conv5 filters trained by the CaffeNet and the VGG_M_1024.