Jasmine L. Collins

jlcollins121@gmail.com

Refining Images using DeepDream Inceptionism

For my Machine Learning class project, I decided to look at whether or not we can refine images by using techniques for visualizing what a Convolutional Neural Network (CNN) trained on an image recognition task has learned. The class project was inspired by this blog post from the Google Research Blog.

DeepDream
In DeepDream, an image input is fed through a trained network in a forward pass to a specified layer, where the gradients are then set equal to the activations themselves. Backprop is performed and then rather than using the gradients to update the weights (as is typically done), the gradient values are used to make an update to the image input. At a high level, this is like asking the neural network to amplify the features that it "sees".

Activity maximization
Creating maximally activating class images is just a small departure from DeepDream. Rather than performing a forward pass to any arbitrary layer, maximally activating class images are generated by targeting the layer right before class prediction, and rather than setting the gradients equal to the activations, only the gradient of the class of interest is set to 1 and the rest are zeroed out. As before, backpropogation is performed and the image is updated, rather than the weights. In effect, this produces what the CNN would “like to see” when classifying a specific class.

In the blog post, Mordvintsev, Olah, and Tyka show various results of both DeepDream and the activity maximization method. They show the transformation of random noise into an image that maximally activates a specified class (for example, "banana"). They also show the transformation of one class into another (turning a tree into a building). This led me to wonder if this technique could be used to refine images. That is, starting with an image from a particular class - can we make the image look even more like that class?

Constraints and regularizers
It is important to set the right constraints in order to generate outputs that resemble natural images. For example in natural images, neighboring pixels are typically correlated. To understand the need for constraints, I tried a few well-known techniques from the literature. Specifically, I looked at the effects of L2 regularization, Gaussian blur, and jitter/pixel clipping.

Maximally activating image for Black Swan class, starting from noise. From left to right: No constraints, only clipping, only jitter, only Gaussian blur, all methods combined.


Refining images
Now rather than starting from noise, I wanted to see the results of an image from a class, modified to look even more like an image from that class.

The effect of increasing number of iterations in activity maximization for the Black Swan class. From left to right: Original image, 100 iterations, 500 iterations, 1000 iterations, 5000 iterations.


The effect of increasing number of iterations in activity maximization for the Flamingo class. From left to right: Original image, 100 iterations, 1000 iterations.


Here, it is clear that the number of iterations that this algorithm is run for makes a considerable difference in the amount of refinement of the output image. With more iterations, more class tiling occurs and the image becomes less like a flamingo or a swan in terms of shape and structure, yet more like a flamingo or swan in terms of overall texture.

Computational biology application
A difficult problem in computational drug discovery is automatically finding small drugs that will strongly bind to a target of interest and block the activity of the protein. While there is a wealth of information regarding chemical compounds online, it is difficult to search these large databases and intelligently select only a few top candidates to move on for clinical trials. If a drug can be modeled as a 3-dimensional “image” and fed into a CNN, perhaps it can benefit from the methods described. That is, a CNN can use several labeled training examples of binding and non-binding small molecules (for a specific protein target) to distinguish with high accuracy the differences between a good (strong binding) drug and bad (non-binding) drug. Then, when given the input of a drug that we already know is good, this pre-trained CNN can transform it into an even better drug using the activity maximization approach.

In this scenario, it is clear that different constraints will be needed (as opposed to “neighboring pixels must be correlated”). Rather, concepts such as: atoms may not be too close to each other in space, or, bonds can only twist and stretch so much, must be conveyed through the implementation. Regardless, activity maximization for drug discovery is an interesting future direction to explore.