Stereo Imaging
CS484: Introduction to Computer Vision


Download the raw material




Most of students are doing great, but it would be more nice if you follow the guidelines below.


We will implement codes for image rectification and a basic stereo algorithm. We provide some pairs of images and feature point sets from corresponding images. All outliers are eliminated already.


Forbidden functions:

Useful functions:

Starter Code Details Main code starts by reading images and loading feature points from data directory. Reading and loading codes are already implemented. We also provided codes for visualizing feature points on each stereo image pairs. The final output of this homework is a disparity map. To obtain disparity map you will use following functions. There might be some hyperparameters such as NCC window size and maximum disparities. Try various hyperparameter values. Search good values by your own. Provide your findings in writeup. We provided some pairs of images for stereo algorithm. But, the images are RGGB bayer patterned, so that you need to convert them into RGB color images first. There are many interpolated methods, but here we give you two choices: bilinear and bicubic interpolation. Note that you should implement Bayer to "RGB" (general), NOT "BGR" (which openCV use). RGB to BGR conversion will be automatically done in main code. From given two sets of feature points, find the fundamental matrix using the normalized eight point algorithm. Use the normalize_points function to normalize points. Center the image data at the origin, and scale it so the mean squared distance between the origin and the data points is 2 pixels. This function is already implemented and you do not need to work on it. Rectify stereo images by applying homography to each image. Make a cost volume of size H x W x D filled with NCC matching cost. Before selecting the best disparity for each pixel, aggregate the cost using a box filter. Your function should be run in 180 seconds (3 minutes) on grading machine (CPU:i7-7700K 4.20GHz). Otherwise you will not get score. You should also provide running time on your machine and machine information (CPU info, ...) in writeup. FYI, naive implementation based on plane sweep to generate disparity map in supplementary slide runs about 70 seconds on grading machine. (You can make it more faster!) If possible, also print the progress of computation to check your code is running (e.g. print iteration numbers or using tqdm libraries or any other methods ...).


Describe your process and algorithm, show your results, describe any extra credit, and tell us any other information you feel is relevant. We provide you with a LaTeX template writeup/writeup.tex. Please compile it into a PDF and submit it along with your code.


Rubric and Extra Credit

Extra Credit

Bayer image interpolation

Raw image data obtained by cameras such as our data image data\scene0\bayer_cam1.png or data\scene0\bayer_cam2.png have Bayer pattern. Alternating pixels in each of these image store different color intensity. There are several types of Bayer pattern, but out data images have RGGB patterns. It means four pixels in each 2*2 windows in the image store Red, Green, Green, Blue color intensity, respectively. The following figure represents RGGB Bayer patterns.

To implement Bayer image interpolation, first, you should extract each channel from the raw image.

Then you will have three channels of the image, but each channel of the image has many missing values. The white blocks in the above figure mean missing pixels of each channel. The 3/4 of R and B channels and the half of G channel are missed. So you should implement interpolation for each image channels. There are many interpolation methods, but here we give you two choices: bilinear and bicubic interpolation. If you implement bicubic interpolation then you will get +10 extra credit.

Fundamental matrix

We have learned the definition and properties of fundamental matrices. After implementing Bayer image interpolation, you should implement the eight-point algorithm to find the fundamental matrix. Details for the eight-point algorithm is described in the supplementary slides in the provided zip file. You can study only the "Eight Point Algorithm: Implementation" slide to implement this task, but "Eight Point Algorithm: Proof" slides in the supplementary material will be helpful if you are interested in optimization methods and detailed proof for the normalized eight-point algorithm.

Homography matrix

After finding the fundamental matrix, you should rectify your cameras. It means that we need to reproject the input image planes onto the common plane parallel to the line between camera centers. After the reprojection, the pixel motion will be horizontal. This reprojection is formed as two homography matrices(3*3) H and H', one for each input image reprojection. H and H' can be derived from the fundamental matrix and feature point sets, however, it requires much more steps than what we covered in the lecture. Therefore, finding H and H' is done by the opencv native function cv2.stereoRectifyUncalibrated() and you do not need to work on that part.

Image rectification

Once you got two homography matrices H and H', you need to actually apply those transformations to the images. Note that you should care about alignment of two images. Details can be found in the supplementary slides.

Disparity map

Start by constructing 3D cost volume of H x W x D, where H is the height of images, W is the width of images, and D is the number of disparities. Then you need to fill out the value of cost volume at an index (x, y, d), by calculating NCC between a block at (x, y) and a block at (x + d, y). You may choose your own window size and D. Larger window size gives less noisy result but suffers from edge bleeding. Test the various size of block window for performance.
The initial cost volume contains a lot of noise due to inaccurate matching values. Cost aggregation is a process for removing those noisy values by smoothing it out. A naive method for cost aggregation is applying box filter for each cost planes of specific disparity d. There can be many other sophisticated method for cost aggregation. If you implement one and explain in your writeup, you can get (upto) +10 extra points.
The final step is deciding the winning disparity of each pixel by finding optimal cost value at (x, y).


The Python version of this project are revised by Dahyun Kang, Inseung Hwang, Donggun Kim, Jungwoo Kim and Min H. Kim. The original project description and code are made by James Hays based on a similar project by Derek Hoiem.