CNTK 302 Part A: Evaluation of pretrained super-resolution models

Contributed by Borna Vukorepa October 30, 2017

Introduction

The aim of CNTK 302 tutorial is to first familiarize the user with image super-resolution problem and to showcase how we can use tools from CNTK to train and evaluate models which perform image super-resolution. This notebook serves as a guide to experimenting with pre-trained super-resolution CNTK models. The tutorial on how to prepare the data and train the models is contained in the notebook CNTK 302B. It is recommended for the user to complete tutorial CNTK 206 before continuing. Some familiarity with convolutional neural networks is also advised. Imagine you are given an image of low resolution (for example, 200 x 200 pixels) and that you want to make it larger (for example, 400 x 400 pixels). Adding more pixels can add more details (often high frequency content) to the image, if done correctly. Using some obvious methods, like the bicubic interpolation, details will be lost leading to a blurried image. This problem is commonly referred to as Single Image Super-Resolution (SISR) problem and there exist many methods that adress it. Methods that have been shown to give best results so far are all including deep learning and convolutional neural networks. Below are links to several papers which discuss SISR. Some of the methods mentioned in them will be used in this study.

<a href="https://arxiv.org/pdf/1608.00367.pdf">Accelerating the Super-Resolution Convolutional Neural Network</a>

<a href="http://cvlab.cse.msu.edu/pdfs/Tai_Yang_Liu_CVPR2017.pdf">Image Super-Resolution via Deep Recursive Residual Network</a>

<a href="https://arxiv.org/pdf/1511.04587.pdf">Accurate Image Super-Resolution Using Very Deep Convolutional Networks</a>

<a href="https://arxiv.org/pdf/1609.04802.pdf">Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network</a>

<a href="https://arxiv.org/pdf/1606.01299.pdf">RAISR: Rapid and Accurate Image Super Resolution</a>

Encouraged by the recent results in deep learning (e.g. GANs), our goal is to explore the space of image super resolution, and look at both GAN and non-GAN approaches. All work is done for upscaling factor of 2. For other factors, the dataset preparation and methods are completely analoguous.

Motivation

As super-resolution models could be applied to a large number of problems, the study of SISR can be of great use to the community. For example, super-resolution models could be applied in fields like medicine where they could help doctors in reading X-ray scans and similar. Similarly, they could be useful to the police for identifying criminals from photos taken by cameras. However, it is very important to notice that each potential application would require a specific training set designed specially for that problem. Here we will not focus on any specific subproblem of SISR so we will not need those special datasets.

Evaluation and filters

There are four super-resolution models that we trained: VDSR, DRNN, SRResNet, SRGAN. We will describe their working principles in a nutshell. In CNTK 302B we will describe them in more detail, together with their architectures and training procedures. <ul> <li>VDSR model takes a blurry 64 x 64 blurry image patch as the input and outputs the predicted difference of the blurry patch and the original clear version. After adding that difference to the blurry patch, we obtain our prediction, which is also 64 x 64. The idea is to upscale the target image with bicubic interpolation before running the model, so in the end, after we evaluate patch by patch, we will get a more clear version of the upscaled image. Related paper can be found <a href="https://arxiv.org/pdf/1511.04587.pdf">here</a>.</li> <li>DRNN model works the same way as VDSR, but it has a different architecture. Related paper can be found <a href="http://cvlab.cse.msu.edu/pdfs/Tai_Yang_Liu_CVPR2017.pdf">here</a>.</li> <li>SRResNet and SRGAN take a 112 x 112 original patch and upscale it to the predicted 224 x 224 version inside the model (there is no pre-upscaling involved). Both have the same architecture, but the loss functions are different, as we will see in the 302B tutorial. The paper describing both methods can be found <a href="https://arxiv.org/pdf/1609.04802.pdf">here</a>.</li> </ul> Generally, VDSR and DRNN tend to give us sharper resulting images, but they sometimes introduce minor artifacts into the resulting images, making them look less convincing. On the other hand, SRResNet and SRGAN tend to give us less sharpness, but also no artifacts. This gives us the idea of somehow combining these models, hoping that we will be able to get better results than by just applying one of our four basic models. Notice that VDSR and DRNN do not change the image dimensions because all upscaling is done in the preprocess. This is why we can use them to further clear up the images which come as the results of the four basic models. We will call such model combinations filters. A filter consists of two models, applied one after another. The upscaling will be done with the first model, so the second one can be either VDSR or DRNN. For example, one filter might consist of SRGAN and VDSR. First, we would apply SRGAN to the starting image and then VDSR to the result, but without doing the bicubic interpolation preprocess, since SRGAN already upscaled the image. Since VDSR was trained to clear up a blurry image, we can expect additional improvement because SRGAN is obviously better at upscaling than bicubic interpolation. Similarly, we could combine VDSR with itself. First, we would upscale the starting image with bicubic interpolation and then clear it up twice in a row with VDSR (no pre-upscaling for the second time). This filter is expected to give great sharpness, but there is a higher chance of artifacts appearing which can reduce the image quality. We will need to enable evaluation of arbitrary sized images. Remember that our models expect the input with fixed dimensions. This is easily fixed by evaluating the image patch by patch. Since boundary pixels are sometimes predicted less accurately, we slightly overlap the patches and do not take pixels close to boundary into the account. This will ensure that every pixel is predicted as a non-boundary pixel of some patch, except for those that are really on the image boundary. We must be careful and remember that some of the models are learning residual image only. We also must not forget to scale back the result by 255. We start with some basic evaluation configuration.

python

import cntk as C
from PIL import Image
import os
import numpy as np
import urllib
from scipy.misc import imsave

try:
    from urllib.request import urlretrieve, urlopen
except ImportError: 
    from urllib import urlretrieve, urlopen

try:
    C.device.try_set_default_device(C.device.gpu(0))
except:
    print("GPU unavailable. Using CPU instead.")

python

# Determine the data path for testing
# Check for an environment variable defined in CNTK's test infrastructure
envvar = 'CNTK_EXTERNAL_TESTDATA_SOURCE_DIRECTORY'
def is_test(): return envvar in os.environ

if is_test():
    test_data_path_base = os.path.join(os.environ[envvar], "Tutorials", "data")
    test_data_dir = os.path.join(test_data_path_base, "BerkeleySegmentationDataset")
    test_data_dir = os.path.normpath(test_data_dir)
    

#prefer our default path for the data
data_dir = os.path.join("data", "BerkeleySegmentationDataset")

if not os.path.exists(data_dir):
    os.makedirs(data_dir)
    
#folder with images to be evaluated
example_folder = os.path.join(data_dir, "example_images")
if not os.path.exists(example_folder):
    os.makedirs(example_folder)

#folders with resulting images
results_folder = os.path.join(data_dir, "example_results")
if not os.path.exists(results_folder):
    os.makedirs(results_folder)

#names of used models
model_names = ["VDSR", "DRNN", "SRResNet", "SRGAN"]

#output dimensions of models respectively (assumed that output is a square)
output_dims = [64, 64, 224, 224]

The evaluation algorithm above is implemented here in function <code>evaluate</code>. See code comments for details about each step.

python

#filename - relative path of image being processed
#model - the model for super-resolution
#outfile - relative path of the image which will be saved

#output_dims - dimensions of current model output image
#            - it is assumed that model output image is a square

#pre_upscale - if True, image will be upscaled by a specified factor with bicubic interpolation at the start
#            - the resulting image then replaces the original one in the next operations
#            - if False, that step is skipped
#            - this should be set on True for models which are clearing up the image and don't make upscaling by themselves

#clear_up - if True, the forwarded image will be cleared up by the model and not upscaled
#         - this is important to know because step variables are different then (see code)
#         - notice that we exit the function if pre_upscale is True and clear_up false because if image was pre-upscaled,
#           it should be cleared up afterwards

#residual_model - is the model learning residual image only (the difference between blurry and original patch)?
#               - if true, residual is added to the low resolutin image to produce the result
#               - otherwise, we only need to scale back the result (see code below)
def evaluate(filename, model, outfile, output_dims, pre_upscale = False, clear_up = False, residual_model = False):
    img = Image.open(filename)
    
    #upscaling coefficient
    coef = 2
    
    #at each step, we will evaluate subpatch (x : x + range_x, y : y + range_y) of original image
    #patch by patch, we will resolve the whole image
    range_x = output_dims // coef
    range_y = output_dims // coef
    
    #how many bounding pixels from resulting patch should be excluded?
    #this is important because boundaries tend to be predicted less accurately
    offset = output_dims // 10
    
    #after we evaluate a subpatch, how much we move down/right to get the next one
    #we subtract offset to cover those pixels which were boundary in the previous subpatch
    step_x = range_x - offset
    step_y = range_y - offset
    
    #situation which should not occur, if we need preprocess, we will need to clear up the result
    if((pre_upscale) and (not clear_up)):
        print("Pre-magnified image is not being cleared up.")
        return
    
    #pre-magnify picture if needed
    if(pre_upscale):
        img = img.resize((coef * img.width, coef * img.height), Image.BICUBIC)
    
    #if the current image is being cleared up with no further uspcaling,
    #set coef to 1 and other parameters accordingly
    if(clear_up):
        result = np.zeros((img.height, img.width, 3))
        range_x = output_dims
        range_y = output_dims
        step_x = range_x - 2 * offset
        step_y = range_y - 2 * offset
        coef = 1
    #otherwise, set result to be coef (2 by default) times larger than image
    else:
        result = np.zeros((coef * img.height, coef * img.width, 3))
    
    rect = np.array(img, dtype = np.float32)
    
    #if the image is too small for some models to work on it, pad it with zeros
    if(rect.shape[0] < range_y):
        pad = np.zeros((range_y - rect.shape[0], rect.shape[1], rect.shape[2]))
        rect = np.concatenate((rect, pad), axis = 0).astype(dtype = np.float32)
        
    if(rect.shape[1] < range_x):
        pad = np.zeros((rect.shape[0], range_x - rect.shape[1], rect.shape[2]))
        rect = np.concatenate((rect, pad), axis = 1).astype(dtype = np.float32)
    
    x = 0
    y = 0
    
    #take subpatch by subpatch and resolve them to get the final image result
    while(y < img.width):
        x = 0
        while(x < img.height):
            rgb_patch = rect[x : x + range_x, y : y + range_y]
            rgb_patch = rgb_patch[..., [2, 1, 0]]
            rgb_patch = np.ascontiguousarray(np.rollaxis(rgb_patch, 2))
            pred = np.squeeze(model.eval({model.arguments[0] : [rgb_patch]}))
            
            img1 = np.ascontiguousarray(rgb_patch.transpose(2, 1, 0))
            img2 = np.ascontiguousarray(pred.transpose(2, 1, 0))
            
            #if model predicts residual image,
            #scale back the prediction and add to starting patch
            #otherwise just scale back
            if(residual_model):
                img2 = 255.0 * img2 + img1
            else:
                img2 = pred.transpose(2, 1, 0)
                img2 = img2 * 255.0
                
            # make sure img2 is C Contiguous as we just transposed it
            img2 = np.ascontiguousarray(img2)
            #make sure no pixels are outside [0, 255] interval
            for _ in range(2):
                img2 = C.relu(img2).eval()
                img2 = np.ones(img2.shape) * 255.0 - img2
            
            rgb = img2[..., ::-1]
            patch = rgb.transpose(1, 0, 2)
            
            #fill in the pixels in the middle of the subpatch
            #don't fill those within offset range to the boundary
            for h in range(coef * x + offset, coef * x + output_dims - offset):
                for w in range(coef * y + offset, coef * y + output_dims - offset):
                    for col in range(0, 3):
                        result[h][w][col] = patch[h - coef * x][w - coef * y][col]
            
            #pad top
            if(x == 0):
                for h in range(offset):
                    for w in range(coef * y, coef * y + output_dims):
                        for col in range(0, 3):
                            result[h][w][col] = patch[h][w - coef * y][col]
            
            #pad left
            if(y == 0):
                for h in range(coef * x, coef * x + output_dims):
                    for w in range(offset):
                        for col in range(0, 3):
                            result[h][w][col] = patch[h - coef * x][w][col]
                  
            #pad bottom
            if(x == img.height - range_x):
                for h in range(coef * img.height - offset, coef * img.height):
                    for w in range(coef * y, coef * y + output_dims):
                        for col in range(0, 3):
                            result[h][w][col] = patch[h - coef * x][w - coef * y][col]
            
            #pad right                
            if(y == img.width - range_y):
                for h in range(coef * x, coef * x + output_dims):
                    for w in range(coef * img.width - offset, coef * img.width):
                        for col in range(0, 3):
                            result[h][w][col] = patch[h - coef * x][w - coef * y][col]
            
            #reached bottom of image
            if(x == img.height - range_x):
                break
            #next step by x, we must not go out of bounds
            x = min(x + step_x, img.height - range_x)
        
        #reached right edge of image
        if(y == img.width - range_x):
            break
        #next step by y, we must not go out of bounds
        y = min(y + step_y, img.width - range_x)
        
    result = np.ascontiguousarray(result)
    
    #save result
    imsave(outfile, result.astype(np.uint8))

Now we load our pretrained models.

python

#Get the path for pre-trained models and example images
if is_test():
    models_dir = os.path.join(test_data_dir, "PretrainedModels")
    image_dir = os.path.join(test_data_dir, "Images")
else:
    models_dir = os.path.join(data_dir, "PretrainedModels")
    if not os.path.exists(models_dir):
        os.makedirs(models_dir)
    
    image_dir = os.path.join(data_dir, "Images")
    if not os.path.exists(image_dir):
        os.makedirs(image_dir)
        
print("Model directory", models_dir)
print("Image directory", image_dir)

python

if not os.path.isfile(os.path.join(models_dir, "VDSR.model")):
    print("Downloading VDSR model...")
    urlretrieve("https://www.cntk.ai/Models/SuperResolution/VDSR.model", os.path.join(models_dir, "VDSR.model"))
else:
    print("Using cached VDSR model")

if not os.path.isfile(os.path.join(models_dir, "DRNN.model")):
    print("Downloading DRNN model...")
    urlretrieve("https://www.cntk.ai/Models/SuperResolution/DRNN.model", os.path.join(models_dir, "DRNN.model"))
else:
    print("Using cached DRNN.model")
    
if not os.path.isfile(os.path.join(models_dir, "SRResNet.model")):
    print("Downloading SRResNet model...")
    urlretrieve("https://www.cntk.ai/Models/SuperResolution/SRResNet.model", os.path.join(models_dir, "SRResNet.model"))
else:
    print("Using cached SRResNet.model")

if not os.path.isfile(os.path.join(models_dir, "SRGAN.model")):
    print("Downloading SRGAN model...")
    urlretrieve("https://www.cntk.ai/Models/SuperResolution/SRGAN.model", os.path.join(models_dir, "SRGAN.model"))
else:
    print("Using cached SRGAN model")

print("Loading pretrained models...")
VDSR_model = C.load_model(os.path.join(models_dir, "VDSR.model"))
DRNN_model = C.load_model(os.path.join(models_dir, "DRNN.model"))
SRResNet_model = C.load_model(os.path.join(models_dir, "SRResNet.model"))
SRGAN_model = C.load_model(os.path.join(models_dir, "SRGAN.model"))

models = [VDSR_model, DRNN_model, SRResNet_model, SRGAN_model]

print("Loaded pretrained models.")

We will take an example image on which we want to try out our models.

python

from shutil import copyfile

if not os.path.isfile(os.path.join(image_dir, "253027.jpg")):
    print("Downloading example image ...")
    link = "https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench/BSDS300/html/images/plain/normal/color/253027.jpg"
    urlretrieve(link, os.path.join(example_folder, "253027.jpg"))
else:
    print("Using cached image file")
    copyfile(os.path.join(image_dir, "253027.jpg"), os.path.join(example_folder, "253027.jpg"))

We will also save a copy of bicubic interpolation effect for every image we are testing on, just for reference.

python

save_folder = os.path.join(results_folder, "bicubic")

#upscale by bicubic and save for reference
for entry in os.listdir(example_folder):
    filename = os.path.join(example_folder, entry)
        
    if not os.path.exists(save_folder):
        os.makedirs(save_folder)
    
    img = Image.open(filename)
    out = img.resize((2 * img.width, 2 * img.height), Image.BICUBIC)
    out.save(os.path.join(save_folder, entry))

Now we are finally ready to evaluate our models. The code below combines all possible filters for all images. Parameters in the calls of <code>evaluate</code> are set according to the models in question. For example, models on indices 0 and 1 are learning only the residual image, so we set <code>residual_model = True</code>. Also, all upscaling is done in the first element of the filter. Therefore, only VDSR and DRNN can be the second element of the filter and their preprocess part must be skipped in that case. We process the images and save them in appropriate folders. Folder names reflect which filter was used. See code below for additional comments.

python

#loop thorugh every model
for i in range(4):
    save_folder = os.path.join(results_folder, model_names[i] + "_results")
        
    #loop through every image in example_folder
    for entry in os.listdir(example_folder):
        filename = os.path.join(example_folder, entry)
            
        if not os.path.exists(save_folder):
            os.makedirs(save_folder)
                
        outfile = os.path.join(save_folder, entry)
            
        print("Now creating: " + outfile)
            
        #function calls for different models
        if(i < 2):
            #residual learning, image is pre-upscaled and then cleared up
            evaluate(filename, models[i], outfile, output_dims[i], pre_upscale = True, clear_up = True, residual_model = True)
        else:
            #all upscaling is within the model
            evaluate(filename, models[i], outfile, output_dims[i], pre_upscale = False, clear_up = False, residual_model = False)
      
    #loop through models which can additionally clear up image after we increased it (DRNN and VDSR)
    for j in range(2):
        #loop through results of previously applied model
        for entry in os.listdir(save_folder):
            filename = os.path.join(save_folder, entry)
            filter_folder = os.path.join(results_folder, model_names[j] + "_" + model_names[i] + "_results")
                
            if not os.path.exists(filter_folder):
                os.makedirs(filter_folder)
                
            outfile = os.path.join(filter_folder, entry)
                
            print("Now creating: " + outfile)
                
            #additionally clear up image without pre-magnifying
            evaluate(filename, models[j], outfile, output_dims[j], pre_upscale = False, clear_up = True, residual_model = True)

Result analysis and future work

Below is the example of what we were able to get. There is no mathematical formula which would evaluate how good our models perform. The best and most used method would be through opinion scoring, that is, presenting generated images to people so they can grade them from 1 to 5 depending on how realistic they look.

python

from IPython.display import Image
Image(url = "https://superresolution.blob.core.windows.net/superresolutionresources/example.PNG")

For more detailed analysis, we will present several original images (low resolution), highlight a small piece of them and see how different filters performed and what results we were able to get. We will see that in some examples we accomplished our goal pretty well, but not always.

python

Image(url = "https://superresolution.blob.core.windows.net/superresolutionresources/analysis_zebra.PNG")

python

Image(url = "https://superresolution.blob.core.windows.net/superresolutionresources/analysis_village.PNG")

python

Image(url = "https://superresolution.blob.core.windows.net/superresolutionresources/analysis_pot.PNG")

For the future work, it would be wise to attempt to get better results from SRGAN model. The key problem is finding the appropriate coefficients for different pieces of the loss function. In most of our attempts, we would end up with completely unusable and unrecognizable images. After some trying, we were able to get decent results and there is more room for improvement there. The idea of creating model combinations (filters) has provided us with considerably better results. It is visible that the results of combinations of two models usually give better results than just one model by itself.