Detection of pneumonia by X-ray film based on convolution neural network

Prediction of pneumonia by X-ray film based on convolution neural network

[!] the task is completed at the beginning of my freshman year, and my professional knowledge is not yet mature

1 mission objective

  1. Understanding the lung and the types of lung diseases

  2. Principle of X-ray inspection and computed tomography

  3. Understanding COVID-19 disease

  4. Master the concept and basic structure of convolutional neural network

  5. Master the application of convolution neural network in X-ray examination of patients with pneumonia

  6. Establishment of X-ray film detection system for pneumonia by convolution neural network

2 task description

2.1 lung and lung diseases

2.1.1 lung
  • The respiratory system is composed of respiratory tract (nose, pharynx, larynx, trachea and bronchi at all levels) and alveoli. Lung is the main organ of the respiratory system, and lung disease belongs to respiratory system disease. In order to complete metabolism, the human body needs to continuously absorb oxygen and discharge carbon dioxide from the air (gas exchange), which is called breathing. Gas exchange between the lung and the external environment and lung ventilation - the gas exchange between alveoli and blood is called external respiration (also known as lung respiration). The gas exchange between blood and tissue cells or tissue fluid is called internal respiration (also called tissue respiration). Therefore, the lung is closely related to the cardiovascular system. In addition to respiratory function, the lung also has non respiratory defense, immune and endocrine metabolism functions.
2.1.1 definition of lung disease
  • Lung disease is a disease of the lung itself or a manifestation of a systemic disease.
2.1.2 types of common diseases
  • Infectious lung disease
  • Lung diseases associated with air pollution and smoking
  • Occupational lung diseases
  • Immune related lung diseases
  • Genetic related lung diseases
  • Lung tumor
  • Pulmonary diseases of unknown origin
  • Pulmonary manifestations of systemic diseases
2.1.3 differential diagnosis method
  • X-ray examination and computed tomography (CT) provide a more clear understanding of lung shadow, tumor shape and nature. Fiberoptic bronchoscopy, if necessary, can also be skin puncture lung biopsy for pathological, bacterial, biochemical test, greatly improve the diagnostic rate of etiology. Bronchoalveolar lavage fluid (BALF) can be used to count, classify and measure the immune antibodies, complement and enzymes in the fluid under the microscope to help understand the nature of the disease. The use of radionuclide Ga (gallium) for lung scanning: Ga can be accumulated in the active metabolic sites, which can be used to diagnose lung cancer, sarcoidosis, interstitial alveolitis and so on. It can be used to distinguish pulmonary infarction from pneumonia. Pulmonary arteriography is helpful in the diagnosis of pulmonary infarction. In this system, we will assist X-ray tomography and computed tomography to predict pneumonia in patients.

2.2 X-ray examination and computed tomography (CT)

Note: the following items may cause discomfort

2.2.1 principle of X-ray testing
  • X-ray detection is mainly based on the penetration, differential absorption, photosensitivity and fluorescence of X-ray. As X-rays pass through the human body, they are absorbed to varying degrees. If the amount of X-rays absorbed by bones is greater than that absorbed by muscles, the amount of X-rays passing through the human body will be different. In this way, information about the density distribution of various parts of the human body can be carried. The intensity of fluorescence or photosensitivity caused on the fluorescent screen or photographic film is quite different. Therefore, it can be taken on the fluorescent screen or on the photographic film The shadow of different density will be shown on the film (after developing and fixing). According to the contrast of shadow intensity, combined with clinical manifestations, laboratory results and pathological diagnosis, we can judge whether a part of the human body is normal.
2.2.2 principle of computed tomography (CT)
  • Traditional X-ray examination can only produce one image of brain shadow at most, so the resolution of this image is not high. In order to solve this problem, the researchers designed computed tomography (CT). CT is a technique that X-ray is used to irradiate a selected section of the head from multiple directions to measure the amount of transmitted x-ray. After digitization, the absorption coefficient of each unit volume of the tissue at that level is calculated by computer, and then the image is reconstructed. This is a good image quality, high diagnostic value and no trauma, no pain, no risk diagnostic method. It allows us to reconstruct the various layers of the brain at any depth or angle. Compared with traditional CT, it has the advantages of large scanning range, good image quality and fast imaging speed.

2.3 COVID-19

2.3.1 introduction
  • CONVID-19 is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first name of the disease appeared in Wuhan, Hubei, Chinese mainland at the end of 2019. The disease spread rapidly in the world. As of April 28, 2020, it has spread to 226 countries and regions, and infected more than 3 million people, and has resulted in over 200 thousand deaths. It is recognized by the World Health Organization as the most devastating plague since World War II.
2.3.2 clinical manifestations
  • The symptoms and severity of COVID-19 vary from person to person. Asymptomatic infection exists in this disease, and the majority of symptomatic patients are mild (about 81%). Influenza like symptoms are the main symptoms of most patients. Common clinical manifestations include fever, weakness of limbs, dry cough and other symptoms. Other manifestations include nasal congestion, sneezing, runny nose, headache, sore throat, hemoptysis, expectoration, myalgia, or diarrhea. It has been reported that the initial symptoms of mild symptoms may be loss of sense of smell and taste. Some patients with mild symptoms have no symptoms of pneumonia, only low fever and slight fatigue. Severe symptoms include dyspnea, persistent chest pain, confusion of consciousness, difficulty walking or blackness of the face and lips.
  • Severe complications include acute respiratory distress syndrome (ARDS), septic shock, systemic inflammatory response syndrome, refractory metabolic acidosis, acute myocardial injury, coagulation dysfunction, and even death.

  • The incubation period of the disease is usually about 4-5 days after exposure, and it is generally believed that it will not exceed 14 days. 97.5% of the patients had symptoms within 11.5 days after infection. At present, asymptomatic patients also have the ability to spread diseases.

2.3.3 image detection
  • In the early stage, the lungs of patients will present multiple small patchy shadows and interstitial changes, especially in the extrapulmonary zone. After development, multiple ground glass lesions and infiltrating shadows were observed in patients with pneumonia. In severe cases, it will be further developed into secondary lobar or lobar consolidation. Pleural effusion is rare. Some patients may have early typical pulmonary consolidation on computed tomography when RT-PCR is negative.

3 convolution neural network

3.1 brief introduction

  • Convolutional neural network (CNN) is a kind of feedforward neural network. Its artificial neurons can respond to a part of the surrounding units in the coverage area, which has excellent performance for large image processing.

3.2 structure

Convolution neural network is composed of one or more convolution layers and the top fully connected layer (corresponding to the classical neural network), and also includes correlation weight and pooling layer

3.2.1 convolution layer
  • Convolution layer is a group of parallel feature graphs, which is composed of multiple convolution kernels. It can make the convolution kernel slide on the input image and do some operation, so as to get a value and project it into the output feature map.
3.2.2 ReLU layer
  • ReLU is linear rectification layer f(x)=max(0,x)f(x)=max{(0,x)}f(x)=max(0,x). Compared with Sigmoid activation function, ReLU can effectively avoid gradient fracture. At the same time, ReLU activation function simulates the excitatory behavior of neurons in biology, which can enlarge the influence factors and improve the training speed of neural network

    The red line is the ReLU activation function f(x)=max(0,x)f(x)=max(0,x)f(x)=max(0,x), and the blue line is Sigmoid activation function f(x)=11+e − xf(x)=\frac{1}{1+e^{-x}}f(x)=1+e − x1

Pool formation
  • Pooling is a non-linear form of down sampling. The general convolution neural network uses "maximum pooling" operation, that is, the input image is divided into several rectangular regions, and the maximum value is output for each sub region.
3.2.4 full connection layer
  • After several rounds of convolution pooling operation and flattening operation, convolution neural network will get a one-dimensional tensor for processing. The full connection layer is the structure of convolutional neural network for advanced reasoning process, which operates like a multi-layer perceptron, that is, affine transformation.
3.2.5 loss function layer
  • The loss function layer is generally used to calculate the quality of the model in the training process, and feedback the results to the model so that it can be adjusted. In this system, we will use the sparse tensor cross entropy loss function to express the difference between the predicted results and the real results.

4 task realization

X-ray examination of pneumonia in patients is a classic image multi classification task. Based on the task description and knowledge reserve, we use the X-ray image data set compiled by Joseph Cohen, a postdoctoral of Montreal University, as the patient data source, and the chest health X-ray image data set from Kaggle as the health data source

4.1 realization ideas

  1. Preprocessing dataset
  2. Constructing convolution neural network model
  3. The convolution neural network is trained
  4. Test convolution neural network model

The required modules for this task are as follows:

import tensorflow as tf
from tensorflow.keras import Sequential,layers
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pandas as pd
import glob,os,random,pathlib

4.2 implementation

4.2.1 reading data

The data set is composed of two data sets, and the Kaggle chest health X-ray image data set has a little noise. The following is the content comparison of the two data sets.

The file storage path of the system is

#Load dataset 1
data1 = pd.read_csv('./data/metadata.csv',encoding='gb18030') 

#View the first picture
temp = mpimg.imread('./data/images/' +data1.filename[0])

4.2.2 sorting paths and labels

Converting tags to indexes through Python Dictionaries

#finding and filename are tags in the csv file
labelToIndex = dict((name,number) for number,name in enumerate(list(set(data1.finding))))
{'Chlamydophila': 0,
 'Pneumocystis': 1,
 'Streptococcus': 2,
 'No Finding': 3,
 'Klebsiella': 4,
 'SARS': 5,
 'COVID-19': 6,
 'COVID-19, ARDS': 7,
 'ARDS': 8,
 'E.Coli': 9,
 'Legionella': 10}

Then two parallel lists in Python are used to store the location of the image and the corresponding tag

#Package path and label
dataPaths1 = ['./data/images/' + path for path in data1.filename]
dataLabels1 = [labeltoIndex[name] for name in data1.finding]
4.2.3 processing health data sets

According to the csv, the first 1300 samples are all health samples, so we can extract several of them in order for processing. It can be seen that Joseph Cohen does not contain health conditions, so additional labels need to be added

#Load dataset 2
data2 = pd.read_csv('./data/seMetadata.csv',encoding='gb18030') 

#Package path and label
SIZE = 150
SIZE = min(1300,SIZE)
unormalType = len(list(set(data1.finding))) #Abnormal types
labelToIndex['Normal'] = UNORMAL #Increase normal data set
dataPaths2 = ['./data/seImages/' + path for path in data2.X_ray_image_name[:SIZE]]
dataLabels2 = [unormalType] * SIZE
4.2.4 integrating and disrupting data sets

Then, only two sets of one-to-one corresponding lists are added to get the complete source data set

#Consolidate two datasets
dataPaths = dataPaths1 + dataPaths2
dataLabels = dataLabels1 + dataLabels2

Because the two data sets are arranged in order, serious over fitting will occur if the data is not disordered

#Scrambling data sets
seed = random.randint(1,100)

#Extract the top 10 Views

The output is as follows:

 [6, 6, 6, 6, 11, 2, 6, 11, 11, 11]

After inspection, it is confirmed that there is no error

4.2.5 create Dataset class

In Tensorflow, users can integrate data into datasets, and can directly use Dataset class to train in keras

#The data sets were classified according to the ratio of 3:1:1
SPILT = 0.6
trainData =[:int(SPILT*len(dataLabels))],dataLabels[:int(SPILT*len(dataLabels))]))
testData =[int(SPILT*len(dataLabels)):],dataLabels[int(SPILT*len(dataLabels)):]))

Then the function is constructed to preprocess the image

def prePicPNG(path,label):
    """Preprocessing image"""
    temp =
    temp = tf.cond(tf.image.is_jpeg(temp),
      lambda: tf.image.decode_jpeg(temp, channels=3),
      lambda: tf.image.decode_png(temp, channels=3))
    temp = tf.cast(temp,tf.float32)
    temp /= 255.0
    temp = tf.image.resize(temp,[192,192])
    return temp,label

Use map() to traverse the Dataset, and at the same time to scramble and package the Dataset

SHUFFLE = 1000
trainData =
trainData = trainData.batch(BATCH)
testData =
vaildData =

At this point, the data is ready for use

4.2.6 building CNN model

Since the task is relatively simple, we use the classic network model which imitates LeNet-5 to predict

#Building models
model = Sequential()
model.add(layers.Conv2D(32,(5,5),padding = 'SAME',activation='relu',input_shape=[192,192,3]))
model.add(layers.Conv2D(64,(3,32),padding = 'SAME',activation='relu'))


The output results are as follows:

4.2.7 training model

The training set and test set are loaded into the model, and the training can be started after the basic parameters are configured

#Train the model and save the history
history =,shuffle=True,epochs=10,validation_data=testData)

So far, the training of the model has been completed

4.2.8 visual data and validation set

Use matplotlib and vaildData to complete the analysis

#Show the data
plt.plot(history.history['accuracy'],label = 'accuracy')
plt.plot(history.history['val_accuracy'],label = 'val_accuracy')
plt.plot(history.history['loss'],label = 'loss')
plt.plot(history.history['val_loss'],label = 'val_loss')
#Prediction validation set

It can be seen that although the accuracy rate is 83.48%, the loss value is also out of control

4.3 case test

4.3.1 processing pictures

In order to make the convolution neural network predict the image, it is necessary to preprocess the image

def editInput(path):
    """Decode processing picture"""
    temp =
    temp = tf.image.decode_jpeg(temp,channels = 3) #RGB graph corresponds to three characteristic graphs
    temp = tf.image.resize(temp,[192,192])
    return tf.reshape(temp,(1,192,192,3))   
4.3.2 functional packaging

The function is encapsulated to facilitate subsequent use

#Index - tag list
indexName = [name for name,val in labelToIndex.items()]

def predictPic(path):
    """Predict one X X-ray cases"""
    test = editInput(path)
    ans = model.predict(test).tolist()[0]
    ans = ans.index(max(ans))
    return indexName[ans]

The results were as follows:

5 data analysis and optimization

5.1 data analysis

5.1.1 main problems
  • Over fitting: in the training, we can see that the model is over fitted after the third training
  • The training effect is not good: in ten training, the error rate of test set continues to rise
5.1.2 improvement ideas
  • Adjust network structure
  • Using callback functions to store the best model
  • Add data set

5.2 optimization implementation

5.2.1 adjustment of network structure

The network structure is slightly modified, and the concept module and Dropout layer are added

def Inception(x,n=1):
    """Inception modular"""
    Inception1 = layers.Conv2D(n * 2,(1,1),padding='SAME',activation = 'relu')(x)
    Inception1 = layers.Conv2D(n * 8,(1,3),padding='SAME',activation = 'relu')(Inception1)
    Inception1 = layers.Conv2D(n * 8,(3,1),padding='SAME',activation = 'relu')(Inception1)
    Inception1 = layers.Conv2D(n * 8,(1,3),padding='SAME',activation = 'relu')(Inception1)
    Inception1 = layers.Conv2D(n * 8,(3,1),padding='SAME',activation = 'relu')(Inception1)

    Inception2 = layers.Conv2D(n * 16,(1,1),padding='SAME',activation = 'relu')(x)

    Inception3 = layers.Conv2D(n * 4,(1,1),padding='SAME',activation = 'relu')(x)
    Inception3 = layers.Conv2D(n * 32,(1,3),padding='SAME',activation = 'relu')(Inception3)
    Inception3 = layers.Conv2D(n * 32,(3,1),padding='SAME',activation = 'relu')(Inception3)

    Inception4 = layers.MaxPooling2D((3,3),strides=(1, 1),padding = 'SAME')(x)
    Inception4 = layers.Conv2D(n * 4,(1,1),padding='SAME',activation = 'relu')(Inception4)

    return  layers.Concatenate()([Inception1,Inception2,Inception3,Inception4])

The overall structure code is as follows:

inputs = layers.Input(shape=(192,192,3))
x = Inception(x)
x = layers.MaxPooling2D((2,2),strides=(2,2),padding = 'SAME')(x)

x = Inception(x,2)
x = layers.MaxPooling2D((2,2),strides=(2,2),padding = 'SAME')(x)

x = Inception(x,2)
x = layers.MaxPooling2D((2,2),strides=(2,2),padding = 'SAME')(x)

x = Inception(x,3)
x = layers.MaxPooling2D((2,2),strides=(2,2),padding = 'SAME')(x)

x = Inception(x,3)
x = layers.MaxPooling2D((2,2),strides=(2,2),padding = 'SAME')(x)

x = Inception(x,4)
x = layers.MaxPooling2D((2,2),strides=(2,2),padding = 'SAME')(x)
x = layers.Flatten()(x)
x = layers.Dense(1024,activation = 'relu')(x)
x = layers.Dense(512,activation = 'relu')(x)
x = tf.keras.layers.Dropout(0.5)(x)

outputs = layers.Dense(12,activation='softmax')(x)
model = Model(inputs=inputs,outputs=outputs)

The training effect is as follows:

It can be seen that the over fitting situation is effectively alleviated, and the accuracy rate and loss value of the test set and the test set are maintained at the same level. It is estimated that the accuracy cannot be further improved due to the insufficient number of samples

In the verification set, we can see that the loss value has dropped to a relatively low level, and the accuracy rate is the same as expected

5.2.2 callback function

Use Callbacks to monitor the loss value of verification set and reclaim the model with the minimum loss value

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3,restore_best_weights=True)
history =,shuffle=True,epochs=12,validation_data=testData,steps_per_epoch=120,callbacks=[callback])
5.2.3 expanding the data set

It can be seen that the number of samples in the data set is very small. Therefore, the number of health data sets added in Kaggle can be adjusted to increase the number of data sets

The training effects are as follows:

It can be seen that the accuracy is improved and stabilized at about 90%, and the loss value further decreases

Use validation set for validation:

5.2.3 expanding the data set

It can be seen that the number of samples in the data set is very small. Therefore, the number of health data sets added in Kaggle can be adjusted to increase the number of data sets

The training effect is as follows:

It can be seen that the accuracy of loss can be improved by 90%

Use validation set for validation:

It can be seen that it is almost the same as the analysis content

Tags: network encoding Python Lambda

Posted on Mon, 29 Jun 2020 22:29:41 -0400 by herrin