By FAIZAN SHAIKH Compile VK Source | Analytics Vidhya
introduce
One of the most controversial topics in deep learning is how to explain and understand a trained model - especially in the context of high-risk industries such as healthcare. The term "black box" is often associated with deep learning algorithms. If we can't explain how the model works, how can we trust the results of the model? This is a reasonable question.
Take a deep learning model trained for cancer detection as an example. The model tells you that it's 99% sure it has detected cancer, but it doesn't tell you why or how to make that decision.
Did you find any important clues in the MRI scan? Or is it just that the stain on the scan was mistakenly detected as a tumor? It's a matter of life and death for patients. The consequences of doctors' mistakes are very serious.
In this paper, we will explore how to visualize convolutional neural network (CNN), which is an in-depth learning architecture, especially for the most advanced image-based applications. We'll learn about the importance of visualizing CNN models and how to visualize them. We will also look at a use case that will help you better understand the concept.
catalog
- The importance of CNN model visualization
- Visualization method
- Basic methods
- Draw model architecture
- Visual filter
- Activation based approach
- Maximum activation
- Image occlusion
- Gradient based method
- Saliency map
- Class activation graph based on gradient
- Basic methods
The importance of CNN model visualization
As we saw in the cancer tumor example above, it's absolutely important that we know what our model is doing and how it makes decisions about predictions. In general, the reasons listed below are the most important points a deep learning practitioner should remember:
- Understand how the model works
- Super parameter adjustment
- Find out the failure of the model and be able to solve the failure
- Explain decisions to consumers / end users or business executives
Let's take an example. In this example, visualizing a neural network model helps to understand some bad behaviors of the model and improve performance (the following example comes from: http://intelligence.org/files/AIPosNegFactor.pdf).
Once upon a time, the US Army wanted to use neural networks to automatically detect camouflaged enemy tanks. The researchers trained the neural network with 50 photos of tanks camouflaged by trees and 50 photos of trees without tanks. Using standard technology for supervised learning, the researchers trained the neural network so that its weight can correctly load the training set: output "yes" to 50 photos of camouflaged tanks and "no" to 50 photos of trees.
This does not ensure, or even imply, that new examples will be properly categorized. Neural networks may have "learned" 100 special cases that do not generalize to any new problem. Cleverly, the researchers initially took 200 photos, 100 tank photos and 100 tree photos. They used only 50 in the training ground. The researchers ran the neural network on the remaining 100 photos. Without further training, the neural network correctly classified all the remaining photos. pretty good! The researchers handed over the work to the Pentagon, which quickly handed it back to them, complaining that in their own tests, neural networks were almost random in identifying photos.
It was found that in the researchers' data set, photos of camouflaged tanks were taken in cloudy days, while those without camouflage were taken in sunny days. Neural networks have learned to distinguish between cloudy and sunny days, rather than between camouflaged tanks and open forests.
Visualization method of CNN model
In general, the visualization method of CNN model can be divided into three parts according to its internal working mode
- Basic approach - a simple way to show us the overall architecture of the training model
- Activation based methods - in these methods, we decipher the activation functions of a single neuron or group of neurons to understand what they are doing
- Gradient based methods - these methods tend to manipulate the gradient formed by forward and backward propagation when training the model
We will cover them in more detail in the following chapters. Here, we will use keras as our library to build deep learning models and use keras vis to visualize them. Before proceeding, make sure you have these programs installed on your system.
Note: This article uses the data set given in the "Identify the Digits" contest. To run the code mentioned below, you must download it from the system. Also, perform the steps provided before you begin the implementation below.
Dataset: https://datahack.analyticsvidhya.com/contest/practice-problem-identify-the-digits/
Preparation steps: https://www.analyticsvidhya.com/keras_script-py/
1. Basic methods 1.1 draw model architectureThe easiest way is to print the model. Here, you can also print the shape and parameters of each layer of the neural network.
In keras, it can be implemented as follows:
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 26, 26, 32) 320 _________________________________________________________________ conv2d_2 (Conv2D) (None, 24, 24, 64) 18496 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 12, 12, 64) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 9216) 0 _________________________________________________________________ dense_1 (Dense) (None, 128) 1179776 _________________________________________________________________ dropout_2 (Dropout) (None, 128) 0 _________________________________________________________________ preds (Dense) (None, 10) 1290 ================================================================= Total params: 1,199,882 Trainable params: 1,199,882 Non-trainable params: 0
To be more creative and expressive, you can draw an architecture diagram( keras.utils.vis_utils function).
1.2 visual filterAnother method is to draw filters of training model so that we can understand the behavior of these filters. For example, the first filter of the first layer of the above model is as follows:
top_layer = model.layers[0] plt.imshow(top_layer.get_weights()[0][:, :, :, 0].squeeze(), cmap='gray')
Generally speaking, we see that low-level filters play the role of edge detectors. When we go higher, they tend to capture high-level concepts such as objects and faces.
2. Activation based approach 2.1 maximum activationIn order to understand what our neural network is doing, we can apply a filter to the input image and then draw the output. This allows us to understand what input mode activates a particular filter. For example, there may be a human face filter that activates when it appears in an image.
from vis.visualization import visualize_activation from vis.utils import utils from keras import activations from matplotlib import pyplot as plt %matplotlib inline plt.rcParams['figure.figsize'] = (18, 6) # Search for layer keys by name. # Alternatively, we can specify - 1 because it corresponds to the last layer. layer_idx = utils.find_layer_idx(model, 'preds') #Replace softmax with linear layer model.layers[layer_idx].activation = activations.linear model = utils.apply_modifications(model) # This is the output node we want to maximize. filter_idx = 0 img = visualize_activation(model, layer_idx, filter_indices=filter_idx) plt.imshow(img[..., 0])
We can move this idea to all classes and examine each class.
PS: run the following script to check it.
for output_idx in np.arange(10): # Let's turn off detailed output this time to avoid confusion img = visualize_activation(model, layer_idx, filter_indices=output_idx, input_range=(0., 1.)) plt.figure() plt.title('Networks perception of {}'.format(output_idx)) plt.imshow(img[..., 0])2.2 image occlusion
In an image classification problem, a natural problem is whether the model really recognizes the location of the object in the image, or only uses the surrounding context. We briefly introduced this in the gradient based method above. The occlusion based method tries to answer this question by systematically occluding different parts of the input image with a gray square, and monitors the output of the classifier. The example clearly shows that the model is positioning the object in the scene because the probability of the correct class is significantly reduced when the object is occluded.
In order to understand this concept, let's randomly extract an image from the data set and try to draw a thermal map of the image. This will give us an intuition about which parts of the picture are important and clearly distinguish the categories.
def iter_occlusion(image, size=8): occlusion = np.full((size * 5, size * 5, 1), [0.5], np.float32) occlusion_center = np.full((size, size, 1), [0.5], np.float32) occlusion_padding = size * 2 # print('padding...') image_padded = np.pad(image, ( \ (occlusion_padding, occlusion_padding), (occlusion_padding, occlusion_padding), (0, 0) \ ), 'constant', constant_values = 0.0) for y in range(occlusion_padding, image.shape[0] + occlusion_padding, size): for x in range(occlusion_padding, image.shape[1] + occlusion_padding, size): tmp = image_padded.copy() tmp[y - occlusion_padding:y + occlusion_center.shape[0] + occlusion_padding, \ x - occlusion_padding:x + occlusion_center.shape[1] + occlusion_padding] \ = occlusion tmp[y:y + occlusion_center.shape[0], x:x + occlusion_center.shape[1]] = occlusion_center yield x - occlusion_padding, y - occlusion_padding, \ tmp[occlusion_padding:tmp.shape[0] - occlusion_padding, occlusion_padding:tmp.shape[1] - occlusion_padding] i = 23 # for example data = val_x[i] correct_class = np.argmax(val_y[i]) # model.predict Input vector of inp = data.reshape(1, 28, 28, 1) # Picture of the matplotlib imshow function img = data.reshape(28, 28) # cover img_size = img.shape[0] occlusion_size = 4 print('occluding...') heatmap = np.zeros((img_size, img_size), np.float32) class_pixels = np.zeros((img_size, img_size), np.int16) from collections import defaultdict counters = defaultdict(int) for n, (x, y, img_float) in enumerate(iter_occlusion(data, size=occlusion_size)): X = img_float.reshape(1, 28, 28, 1) out = model.predict(X) #print('#{}: {} @ {} (correct class: {})'.format(n, np.argmax(out), np.amax(out), out[0][correct_class])) #print('x {} - {} | y {} - {}'.format(x, x + occlusion_size, y, y + occlusion_size)) heatmap[y:y + occlusion_size, x:x + occlusion_size] = out[0][correct_class] class_pixels[y:y + occlusion_size, x:x + occlusion_size] = np.argmax(out) counters[np.argmax(out)] += 13. Gradient based method 3.1 significance chart
As we can see in the tank example, how can we know which part of our model focuses on to get predictions? To do this, we can use the saliency map.
Using the concept of saliency graph is very straightforward - we calculate the gradient of the output category relative to the input image. This should tell us how the output category value changes slightly with respect to the input image pixels. All positive values in the gradient tell us that small changes to the pixel will increase the output value. Therefore, visualization of these gradients with the same shape as the image should provide some intuition.
Intuitively, this method highlights the most significant image region that contributes to the output.
class_idx = 0 indices = np.where(val_y[:, class_idx] == 1.)[0] # Select some random inputs from here. idx = indices[0] # Let santity check the selected image. from matplotlib import pyplot as plt %matplotlib inline plt.rcParams['figure.figsize'] = (18, 6) plt.imshow(val_x[idx][..., 0]) from vis.visualization import visualize_saliency from vis.utils import utils from keras import activations # Search layer keys by name # Alternatively, we can specify - 1 because it corresponds to the last layer. layer_idx = utils.find_layer_idx(model, 'preds') # Replace softmax with linear layer model.layers[layer_idx].activation = activations.linear model = utils.apply_modifications(model) grads = visualize_saliency(model, layer_idx, filter_indices=class_idx, seed_input=val_x[idx]) # Visualized as a heat map. plt.imshow(grads, cmap='jet') # This corresponds to a linear layer. for class_idx in np.arange(10): indices = np.where(val_y[:, class_idx] == 1.)[0] idx = indices[0] f, ax = plt.subplots(1, 4) ax[0].imshow(val_x[idx][..., 0]) for i, modifier in enumerate([None, 'guided', 'relu']): grads = visualize_saliency(model, layer_idx, filter_indices=class_idx, seed_input=val_x[idx], backprop_modifier=modifier) if modifier is None: modifier = 'vanilla' ax[i+1].set_title(modifier) ax[i+1].imshow(grads, cmap='jet')3.2 class activation graph based on gradient
The class activation graph is another way to visualize what the model sees when making predictions. Use the penultimate rollup output instead of the gradient relative to the output. This is done to take advantage of the spatial information stored in the penultimate layer.
from vis.visualization import visualize_cam # This corresponds to a linear layer. for class_idx in np.arange(10): indices = np.where(val_y[:, class_idx] == 1.)[0] idx = indices[0] f, ax = plt.subplots(1, 4) ax[0].imshow(val_x[idx][..., 0]) for i, modifier in enumerate([None, 'guided', 'relu']): grads = visualize_cam(model, layer_idx, filter_indices=class_idx, seed_input=val_x[idx], backprop_modifier=modifier) if modifier is None: modifier = 'vanilla' ax[i+1].set_title(modifier) ax[i+1].imshow(grads, cmap='jet')
ending
In this article, we introduced how to visualize CNN model and why to visualize it. We combined an example to realize it. Visualization has a wide range of applications.
Original link: https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-visualizing-convolutional-neural-networks/
Welcome to pioneer AI blog: http://panchuang.net/
sklearn machine learning Chinese official document: http://sklearn123.com/
Welcome to pioneer blog Resource Hub: http://docs.panchuang.net/