I used Matlab's built-in kmeans classifier to create my codebook of the images. When I first started using it, I noticed that it kept saying that one of the clusters it was updating on was empty after a certain phase. To fix this, I used the 'singleton' option, which "creates a new cluster consisting of the one observation furthest from its centroid."
The kmeans clustering algorithm accepts n p-dimensional vectors in an nxp matrix. This means that I needed to unroll the 11x11 2D image samples into a 1x121-D vector. Note that no information or expression is lost this way; it's just the format that matlab requires
ImClassify(number of patches, number of codewords in the codebook, distance type (either 'sqEuclidean' or 'cityblock'), use corner information:1/0)
The distance type tells whether the k-means clustering algorithm (and, subsequently, the codebook lookup comparison) should use squared Euclidean distance or "cityblock" distance (absolute norm distance, or the sum of components) as a similarity metric. I actually found that this didn't make a huge difference; it seems there is no way to get around the curse of dimensionality. Therefore, I just stuck with Euclidean distance in most of my tests.
When the "using corner information" parameter is a 1, I try to make the majority of the sampled spots "corners" using techniques from assignment 1. This is because I expect corners to be more often points of interest than just random samples. I will explain this addition in more detail later; at first I just used random samples (set the last parameter to 0).
Overall, the results are not great. This is to be expected, since we're only taking 10 training images per class. They are at least better than consistently random (which would be 0.1, assuming each image had an equal probability of falling into any 1 of the 10 classes). This is a start. Varying a few parameters can help:
NOTE: In the confusion matrix results below, an entry in row x column y corresponds to the probability of classifying an object in class x as being part of class y. This means that the probabilities should sum to 1 within a column.
ALSO NOTE: In the confusion matrices, the objects are listed in alphabetical order from row 1 to row 10 and column 1 to column 10: {airplane, butterfly, camera, helicopter, lotus, panda, pizza, pyramid, snoopy, yin_yang}
Fixing the number of random 11x11 samples per image to 50 (both for creating the codebook and for classifying), and varying the number of entries in the codebook:
Increasing the number of codewords in the codebook helps up to a point, and then it just gets to be too many (since the space is so high dimensional). The best number of patches seems to be about 100.
Fixing the number of codewords to 100 and varying the number of random patches:
Varying the number of samples also helps initially, because the probability that important features get picked up with random samples increases with the number of samples taken.
To try to improve the results, I tried to give the Naive Bayes a "push in the right direction" by picking up on some features first when sampling patches (instead of doing it completely randomly). I decided to use my corner detector from assignment 1. If there weren't enough strong corners, I simply filled in the rest of the patches with random samples. But this way at least, I hoped to make better use of my samples (especially when using a sparser set). Here are some of the results I got (fixing codebook size to 100):
It appears that using the corner detector actually does boost performance, as it consitently beats out random sampling. Looking at the codebooks, it's not hard to see why. In particular, compare the codebooks with 50 and 100 samples between using corners and not using corners. The random ones (without corners) have many more patches that are constant shades of grey (without much other detail) or just one edge, which I would argue are much less descriptive than the patches that have a lot "going on" in the codebooks that use corners.
Using Naive Bayes on a really small data set sucks. A few things that can improve it slightly are choosing the number of codewords to be around 100 and increasing the number of random patches taken (though this has diminishing returns). Also, using corner information (instead of just taking random patches) can significantly boost performance.
Also, butterflies are the hardest images to recognize.
CannyGradient filter from assignment 1 (used with corner detection)
CornerFind.m
The corner finder from assignment 1, slightly modified to return corner centers in descending order or strength
estimateClass.m
Given an image, a codebook, and a histogram associated with the codebook for each class, this function returns the class that the image most likely belongs to
getRandPatch.m
Returns a random 11x11 grayscale patch from an image as a 121-D vector
ImClassify.m
The main program that goes through and loads all of the images, and connects the different phases of the image classification pipeline
OutputCodeBook.m
Outputs the codebook from the current session as a PNG image (this is how I generated the codebooks for viewing in my data section above)
randuint.m
Return a random unsigned 32-bit integer in a certain range
TrainCodeBook.m
Given a set of training images, create a codebook and histogram associated with that codebook for each class