# Image processing and machine learning
Some image processing numerical techniques are very specific to image processing, such as mathematical morphology or anisotropic diffusion segmentation. However, it is also possible to adapt generic machine learning techniques for image processing.
## A short introduction to machine learning
This section is adapted from the quick start tutorial from the scikit-learn documentation.
In general, a learning problem considers a set of N samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), it is said to have several attributes or features.
Typical machine learning tasks are :
classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. For example, given examples of pixels belonging to an object of interest and background, we want the algorithm to label all the other pixels of the image. Or given images of cats and dogs, we want to label automatically images whether they show cats or dogs.
clustering: grouping together similar samples. For example, given a set of pictures, can we group them automatically by suject (e.g. people, monuments, animals...)?
In image processing, a sample can either be
a whole image, its features being pixel values, or sub-regions of an image (e.g. for face detection)
a pixel, its features being intensity values in colorspace, or statistical information about a neighbourhood centered on the pixel,
a labeled region, e.g. for classifying particles in an image of labels
The only requirement is to create a dataset composed of N samples, of m features each, which can be passed to the estimators of scikit-learn.
Let us start with an example, using the digits dataset from scikit-learn.
The dataset is a dictionary-like object that holds all the data and some metadata about the data. This data is stored in the .data
member, which is a n_samples, n_features
array. Response variables (if available, as here) are stored in the .target member.
From the shape of the data
array, we see that there are 1797 samples, each having 64 features. In fact, these 64 pixels are the raveled values of an 8x8 image. For convenience, the 2D images are also provided as in the .images
member. In a machine learning problem, a sample always consists of a flat array of features, which sometimes require reshaping data.
We now use one of scikit-learn's estimators classes in order to predict the digit from an image.
Here we use an SVC (support vector machine classification) classifier, which uses a part of the dataset (the training set) to find the best way to separate the different classes. Even without knowing the details of the SVC, we can use it as a black box thanks to the common estimator API of scikit-learn. An estimator is created by initializing an estimator object:
The estimator is trained from the learning set using its .fit
method.
Then the target value of new data is predicted using the .predict
method of the estimator.
So far, so good? We completed our first machine learning example!
In the following, we will see how to use machine learning for image processing. We will use different kinds of samples and features, starting from low-level pixel-based features (e.g. RGB color), to mid-level features (e.g. corner, patches of high contrast), and finally to properties of segmented regions.
Outline
Image segmentation using pixel-based features (color and texture)
Panorama stitching / image registration based on mid-level features
Classifying labeled objects using their properties
What we will not cover
computer vision: automatic detection / recognition of objects (faces, ...)
A follow-up by Stéfan after this part : image classification using deep learning with Keras.
Thresholding and vector quantization
Image binarization is a common operation. For grayscale images, finding the best threshold for binarization can be a manual operation. Alternatively, algorithms can select a threshold value automatically; which is convenient for computer vision, or for batch-processing a series of images.
Otsu algorithm is the most famous thresholding algorithm. It maximizes the variance between the two segmented groups of pixels. Therefore, it is can be interpreted as a clustering algorithm. Samples are pixels and have a single feature, which is their grayscale value.
How can we transpose the idea of Otsu thresholding to RGB or multichannel images? We can use the k-means algorithm, which aims to partition samples in k clusters, where each sample belongs to the cluster of nearest mean.
Below we show a simple example of k-means clustering, based on the Iris dataset of scikit-learn
. Note that the KMeans
estimator uses a similar API as the SVC we used for digits classification, with the .fit method.
k-means clustering uses the Euclidean distance in feature space to cluster samples. If we want to cluster together pixels of similar color, the RGB space is not well suited since it mixes together information about color and light intensity. Therefore, we first transform the RGB image into Lab colorspace, and only use the color channels (a and b) for clustering.
Then we create a KMeans
estimator for two clusters.
Of course we can generalize this method to more than two clusters.
Exercise:
For the chapel floor image, cluster the image in 3 clusters, using only the color channels (not the lightness one). What happens?
SLIC algorithm: clustering using color and spatial features
In the thresholding / vector quantization approach presented above, pixels are characterized only by their color features. However, in most images neighboring pixels correspond to the same object. Hence, information on spatial proximity between pixels can be used in addition to color information.
SLIC (Simple Linear Iterative Clustering) is a segmentation algorithm which clusters pixels in both space and color. Therefore, regions of space that are similar in color will end up in the same segment.
Let us try to segment the different spices using the previous k-means approach. One problem is that there is a lot of texture coming from the relief and shades.
SLIC is a superpixel algorithm, which segments an image into patches (superpixels) of neighboring pixels with a similar color. SLIC also works in the Lab colorspace. The compactness
parameter controls the relative importance of the distance in image- and color-space.
After the super-pixel segmentation (which is also called oversegmentation, because we end up with more segments that we want to), we can add a second clustering step to join superpixels belonging to the same spice heap.
Note that other superpixel algorithms are available, such as Felzenswalb segmentation.
Exercise
Repeat the same operations (SLIC superpixel segmentation, followed by K-Means clustering on the average color of superpixels) on the astronaut image. Vary the following parameters
slic: n_segments and compactness
KMeans: n_clusters (start with 8 for example)
Increasing the number of low-level features: trained segmentation using Gabor filters and random forests
In the examples above, a small number of features per pixel was used: either a color triplet only, or a color triplet and its (x, y) position. However, it is possible to use other features, such as the local texture. Texture features can be obtained using Gabor filters, which are Gaussian kernels modulated by a sinusoidal wave.
We define a segmentation algorithms which:
computes different features for Gabor filters of different scale and angle, for every pixel
trains a RandomForest classifier from user-labeled data, which are given as a mask of labels
and predicts the label of the remaining non-labeled pixels
The RandomForest algorithm chooses automatically thresholds along the different feature directions, and also decides which features are the most significant to discriminate between the different classes. This is very useful when we don't know if all features are relevant.
Using mid-level features
Clustering or classifying labeled objects
We have already seen how to use skimage.measure.regionprops
to extract the properties (area, perimeter, ...) of labeled objects. These properties can be used as features in order to cluster the objects in different groups, or to classify them if given a training set.
In the example below, we use skimage.data.binary_blobs
to generate a binary image. We use several properties to generate features: the area, the ratio between squared perimeter and area, and the solidity (which is the area fraction of the object as compared to its convex hull). We would like to separate the big convoluted particles from the smaller round ones. Here I did not want to bother with a training set, so we will juste use clustering instead of classifying.
Once again we use the KMeans algorithm to cluster the objects. We visualize the result as an array of labels.
However, our features were not carefully designed. Since the area
property can take much larger values than the other properties, it dominates the other ones. To correct this effect, we can normalize the area to its maximal value.
A better way to do the rescaling is to use of the scaling methods provided by sklearn.preprocessing
. The StandardScaler
makes sure that every feature has a zero mean and a unit standard deviation.
###Exercise
Replace the area property by the eccentricity, so that clustering separates compact and convoluted particles, regardless of their size.