Deep Learning: COVID-19 detection in X-Ray with CNN

Posted by

In this project we develop a Deep Learning detector of Covid-19 in radiographs. For this purpose, we use images from the “Covid-chestxray-dataset” [3], generated by researchers from the Mila research group and the University of Montreal [4]. We also use images of radiographs of healthy and bacterial pneumonia patients extracted from Kaggle’s “Chest X-Ray Images (Pneumonia)” competition [5].

In total, we have a number of 426 images, divided into training (339 images), validation (42 images) and test (45 images) sets.

The partitions are divided into training (339 images), validation (42 images) and test (45 images).

The partitions are given in “.txt” lists, in which each image is assigned a tag:

  • 0) Healthy
  • 1) Covid-19
  • 2) Pneumonia

Note: The results obtained by the models trained in this database are purely for educational purposes and cannot be used for actual diagnosis without clinical validation.

References

  1. María Climent, 2020 Covid-19: La Inteligencia Artificial De La Española Quibim Puede Acelerar El Diagnóstico Del Coronavirus
  2. Angel Alberich-bayarri,2020 Imagin, AI and Radiomix to understand and fight Coronavirus Covid-19
  3. Ieee8023/covid-chestxray-dataset
  4. Cohen, J.P., Morrison, P. and Dao, L., 2020. COVID-19 image data collection.
  5. Paul Mooney, 2019 Chest X-ray Images (pneumonia)
dordorica_M2_875_20192_PracticaFinal
This notebook is run on Google Collab so we setup google drive to upload the images.
In [8]:
from google.colab import drive 
drive.mount('/content/gdrive')
In [0]:
#Import libraries

import numpy as np
import re,shutil,os,timeit,glob
import matplotlib.pyplot as plt
import random
from IPython.display import Image
from sklearn.dummy import DummyClassifier
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, Activation, Dropout, MaxPooling2D, BatchNormalization
from sklearn.metrics import accuracy_score
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam,RMSprop,SGD,Adadelta
To load the images we will use the ImageDataGenerator from Keras, which generates images from the train, validation and test sets with the characteristics indicated. To increase the train data we will generate moved or rotated images from the original dataset, also we will generate images with some noise. The images will be scaled to the appropriate size for each network architecture and the pixel values will be normalized.
In [0]:
#Function to clean txt files path.
def remChars(string):
    #Remove first character
    ret=string[2:]
    ret=ret[:-1]
    return ret

#Function to add noise to the images.
def add_noise(img):
    VARIABILITY = 50
    deviation = VARIABILITY*random.random()
    noise = np.random.normal(0, deviation, img.shape)
    img += noise
    np.clip(img, 0., 255.)
    return img
In [0]:
#Images path base
basepath="/content/gdrive/My Drive/"
In [0]:
#Let's create folder structure, train, test and validation with a folder per class inside them.

#Paths and txt files with the image names.
paths=['test','train','validation']
files=['testing.txt','training.txt','validation.txt']

#Read image names from the txt files

#Copy file in the proper folder.
for p,f in zip(paths, files):
    
    file = open(basepath+f,"r")
    imgfiles= file.readlines()

    #Clean path and create folder structure
    os.makedirs(basepath+p+"/COVID",exist_ok =True)
    os.makedirs(basepath+p+"/HEALTHY",exist_ok =True)
    os.makedirs(basepath+p+"/PNEUMONIA",exist_ok =True)
    for s in imgfiles:
        s=remChars(s)
        if "COVID" in s:
            shutil.copy(basepath+s, basepath+p+"/COVID")
        elif "HEALTHY" in s:
            shutil.copy(basepath+s,basepath+p+"/HEALTHY")
        else:
            shutil.copy(basepath+s,basepath+p+"/PNEUMONIA")
        
In [6]:
#Import images. Reduce size to 224x224
#Training dataset augmentation.

train_data_gen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=25,
    width_shift_range=0.3,
    height_shift_range=0.3,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest',
    preprocessing_function=add_noise)
#Only scale validation and test images.
validation_data_gen = ImageDataGenerator(rescale=1./255)
test_data_gen = ImageDataGenerator(rescale=1./255)

train_generator  =train_data_gen.flow_from_directory(basepath+'train',                                          
                                          target_size=(224,224),
                                          batch_size=32,
                                          class_mode = 'categorical',
                                          shuffle=True)

validation_generator = validation_data_gen.flow_from_directory(basepath+'validation',                                          
                                          target_size=(224,224),                                                               
                                          batch_size=32,
                                          class_mode = 'categorical',
                                          shuffle=True)


test_generator = test_data_gen.flow_from_directory(basepath+'test',                                          
                                          target_size=(224,224),
                                          batch_size=32,
                                          class_mode = 'categorical',
                                          shuffle=False)
Found 339 images belonging to 3 classes.
Found 42 images belonging to 3 classes.
Found 45 images belonging to 3 classes.
In [0]:
#Show some transformed images examples.
img = load_img(train_generator.filepaths[0])  # this is a PIL image
x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)

i = 0
for batch in train_data_gen.flow(x, batch_size=1,
                          save_to_dir=basepath+'augmented', save_prefix='hi', save_format='jpeg'):
    i += 1
    if i > 2:
        break

        
In [8]:
#Show images

for filename in glob.glob(basepath+'augmented/*.jpeg'): #assuming gif
    display(Image(filename,width=150,height=150))