For my Machine Learning class project, I decided to look at whether or not we can
refine images by using techniques for visualizing what a Convolutional Neural Network
(CNN) trained on an image recognition task has learned. The class project was inspired
blog post from the Google Research Blog.
In DeepDream, an image input is fed through a trained network in a forward pass to
a specified layer, where the gradients are then
set equal to the activations themselves. Backprop is performed and then rather than
using the gradients to update the weights (as is typically done), the gradient values
are used to make an update to the image input. At a high level, this is like asking
the neural network to amplify the features that it "sees".
Creating maximally activating class images is just a small departure from DeepDream.
Rather than performing a forward pass to any arbitrary layer, maximally activating
class images are generated by targeting the layer right before class prediction, and
rather than setting the gradients equal to the activations, only the gradient of the
class of interest is set to 1 and the rest are zeroed out. As before, backpropogation
is performed and the image is updated, rather than the weights. In effect, this
produces what the CNN would “like to see” when classifying a specific class.
In the blog post, Mordvintsev, Olah, and Tyka show various results of both DeepDream
and the activity maximization method.
They show the transformation of random noise into an image that maximally activates
a specified class (for example, "banana"). They also show the transformation of one
class into another (turning a tree into a building). This led me to wonder if this
technique could be used to refine images. That is, starting with an image from a
particular class - can we make the image look even more like that class?
Constraints and regularizers
It is important to set the right constraints in order to generate outputs that
images. For example in natural images, neighboring pixels are
typically correlated. To understand the need for constraints, I tried a few
well-known techniques from the literature. Specifically, I looked at the effects of
Gaussian blur, and
Now rather than starting from noise, I wanted to see the results of an image from a
class, modified to look even more like an image from that class.
Here, it is clear that the number of iterations that this algorithm is run for makes
a considerable difference in the amount of refinement of the output image. With more
iterations, more class tiling occurs and the image becomes less like a flamingo or a
swan in terms of shape and structure, yet more like a flamingo or swan in terms of
Computational biology application
A difficult problem in computational drug discovery is automatically finding small
drugs that will strongly bind to a target of interest and block the activity of the
protein. While there is a wealth of information regarding chemical compounds online,
it is difficult to search these large databases and intelligently select only a few
top candidates to move on for clinical trials. If a drug can be modeled as a
3-dimensional “image” and fed into a CNN, perhaps it can benefit from the methods
described. That is, a CNN can use several labeled training examples of binding
and non-binding small molecules (for a specific protein target) to distinguish with
high accuracy the differences between a good (strong binding) drug and bad
(non-binding) drug. Then, when given the input of a drug that we already know is
good, this pre-trained CNN can transform it into an even better drug using the
activity maximization approach.
In this scenario, it is clear that different constraints will be needed (as opposed
to “neighboring pixels must be correlated”). Rather, concepts such as: atoms may not
be too close to each other in space, or, bonds can only twist and stretch so much,
must be conveyed through the implementation. Regardless, activity maximization for
drug discovery is an interesting future direction to explore.