teaser
Figure 1. Left: Anaglyph image. See the position difference between red and cyan. Right: Disparity map.

Logistics

You can download the raw material (hw3_2023.zip) and the supplementary slides (hw3_supplementary_slides.pdf) from KLMS.

To do

  • Code template in code/.

  • Writeup template in writeup/.

Submission

  • Due: Friday, Nov. 3rd 2023, 23:59

  • Do NOT change the file structure

  • Use create_submission_zip.py to create zip file.

  • Rename your zip file to hw3_studentID_name.zip (ex: hw3_20231234_ShinyoungYi.zip).

  • The zip file must include your code and writeup (in pdf).

  • Submit your homework to KLMS.

  • An assignment after its original due date will be degraded 20% per day (see course introduction for detail).

QnA

  • Please use KLMS QnA board, and title of the post like: [hw3] question about …​ .

  • Note that TAs are trying to answer question ASAP, but it may take 1—​2 day on weekend. Please post the question on weekday if possible.

  • TA in charge: Shinyoung Yi <class255@kaist.ac.kr>

Overview

You will implement codes for image rectification and a basic stereo algorithm. We provide some pairs of images and feature point sets from corresponding images. There are the 4 tasks with function names below.

  • (a) Bayer image interpolation. (bayer_to_rgb_bilinear, bayer_to_rgb_bicubic)

  • (b) Find fundamental matrix using the normalized eight point algorithm. (calculate_fundamental_matrix)

  • (c) Rectify stereo images by applying homography. (transform_fundamental_matrix, rectify_stereo_images).

  • (d) Make a cost volume using zero-mean NCC(normalized cross correlation) matching cost function for the two rectified images, then obtain disparity map from the cost volume after aggregate it with a box filter. (calculate_disparity_map).

Forbidden functions

  • np.correlate(), np.corrcoef()

  • cv2.cvtColor(), cv2.demosaicing()

  • cv2.stereo*() (e.g., cv2,stereoBM(), cv2.stereoRectify(), …​)

  • cv2.get*Transform() (e.g., cv2.getPerspectiveTransform(), …​)

  • cv2.findEssentialMat(), cv2.findFundamentalMat(), cv2.findHomography()

  • Any other pre-implemented functions to get direct result of main task.

Useful functions

  • np.linalg.inv(), np.linalg.svd()

  • cv2.perspectiveTransform()

Code details

[NB] We will only use hw3_functions.py from your submission for grading. To keep consistency with grading environment, do NOT modify the main code hw3_main.py, functions name and input variables. Note that only basic scientific libraries are installed in grading machine (e.g. Numpy, Sicpy and OpenCV) with other commnly used python libraries (e.g. tqdm, matplotlib).

hw3_main.py

Main code starts by reading images and loading feature points from data directory. Reading and loading codes are already implemented. The output of main code are:

  • (a) Bilinear interpolated images / Bicubic interpolated images

  • (b) Images overlayed with feature points and their epipolar lines

  • (c) Rectified images (w/ features & epipolar lines overlay) / Anaglyph before and after rectification / Values of fundamental matrices between rectified images

  • (d) Calculated disparity map, as color mapped plot (.png) and raw data (.exr)

hw3_functions.py

Here you implement the functions. You have to implement 5 functions named:

  • (a) bayer_to_rgb_bilinear

  • (b) calculate_fundamental_matrix

  • (c) transform_fundamental_matrix

  • (c) rectify_stereo_images

  • (d) calculate_disparity_map

And optionally for extra credit:

  • (a) bayer_to_rgb_bicubic

Also, there are some hyperparameters (WINDOW_SIZE, DISPARITY_RANGE, AGG_FILTER_SIZE) at the top of this code. You can modify these to get best results (actually, you must). More detail will be described in the next Task details section.

utils.py

There are some utility functions. You can freely use the function here. Some of them are already used in main code.

  • homogenize: Make nonhomogeneous coordinate vectors (or matrices/arrays obtained by stacking those vectors) by appending 1’s as last elements.

  • normalize_points: Center the image data at the origin, and scale it so the mean squared distance between the origin and the data points is 2 pixels. This function is already implemented and you do not need to work on it.

  • image_overlay_line, image_overlay_circle, images_overlay_fundamental, get_anaglyph, draw_epipolar_overlayed_img draw_stereo_rectified_img, draw_disparity_map: Functions to visualize your work. These functions are already used in main code.

  • compute_epe: Computes the evaluation metric. We will use 2 type of metric, details in the next Task details section.

Task details

(a) Bayer image interpolation

  • TODO code: bayer_to_rgb_bilinear, bayer_to_rgb_bicubic

We provided some pairs of images (data/img1_bayer.png and data/img2_bayer.png) for stereo algorithm. But, the images are RGGB Bayer patterned, so that you need to convert them into RGB color images first. There are many interpolation methods, but here we only cover two methods: bilinear and bicubic interpolation. Note that you should implement Bayer to "RGB" (general), NOT "BGR" (which openCV use). RGB to BGR conversion will be automatically done in main code.

Raw image data obtained by cameras such as our data image have Bayer pattern. Alternating pixels in each of these image store different color intensity. There are several types of Bayer pattern, but out data images have RGGB patterns. It means four pixels in each 2*2 windows in the image store Red, Green, Green, Blue color intensity, respectively. The following figure represents RGGB Bayer patterns.

rggb

To implement Bayer image interpolation, first, you should extract each channel from the raw image. Then you will have three channels of the image, but each channel of the image has many missing values. The white blocks in the above figure mean missing pixels of each channel. The 3/4 of R and B channels and the half of G channel are missed. So you should implement interpolation for each image channels.

Note that you have to implemente bilinear interpolation first. Additionally, you can work on bicubic interpolation for +10 extra credits.

(b) Calculate fundamental matrix

  • TODO code: calculate_fundamental_matrix

From given two sets of feature points, find the fundamental matrix using the normalized eight point algorithm. Use the normalize_points function to normalize points.

We have learned the definition and properties of fundamental matrices. After implementing Bayer image interpolation, you should implement the eight-point algorithm to find the fundamental matrix. Details for the eight-point algorithm is described in the supplementary slides in the provided zip file. You can study only the "Eight Point Algorithm: Implementation" slide to implement this task, but "Eight Point Algorithm: Proof" slides in the supplementary material will be helpful if you are interested in optimization methods and detailed proof for the normalized eight-point algorithm.

Note that a fundamental matrix can convert an image point into its epipolar line on the opposite image. After implementing calculate_fundamental_matrix, hw3_main.py will generate an image file result/imgs_epipolar_overlay.png, which is overlayed with the feature points and their epipolar lines. You can utilize this image to qualitatively check your implementation.

imgs epipolar overlay
Figure 2. result/imgs_epipolar_overlay.png with accurately implemeted calculate_fundamental_matrix

(c) Stereo rectification

Rectify stereo images by applying homography to each image.

After finding the fundamental matrix, you should rectify your cameras. It means that we need to reproject the input image planes onto the common plane parallel to the line between camera centers. After the reprojection, the pixel motion will be horizontal. This reprojection is formed as two homography matrices(3*3) \(H\) and \(H'\), one for each input image reprojection. \(H\) and \(H'\) can be derived from the fundamental matrix and feature point sets, however, it requires much more steps than what we covered in the lecture. Therefore, finding \(H\) and \(H'\) is done by the opencv native function cv2.stereoRectifyUncalibrated() and you do not need to work on that part.

i) Rectification constraint for fundamental matrices

  • TODO code: transform_fundamental_matrix

Instead of implementing the rectification method itself, we will only validate the mathematical constraint of rectification that should satisfy. First, you have to recalculate the fundamental matrix between rectified images. You have to implement transform_fundamental_matrix so that it compute the funcdamental matrix between rectified images using the original fundamental matrix and \(H\) and \(H'\). Then hw3_main.py will automatically report the recomputed fundamental matrix as the following figure (See only the first matrix now. the second one will be explained in ii)).

fundamental numeric
Figure 3. Fundamental matrices reported by hw3_main.py after implemnting transform_fundamental_matrix and rectify_stereo_images.

You should report this matrix value in your writeup. You may also find some constraint of the resulting fundamental matrix. Also write the constraint that fundamental matrices between rectified images should satisfy in your writeup. Now you will see which mathematical equation the rectification method is designed to solve.

ii) Rectification without cropping

  • TODO code: rectify_stereo_images

Once you got two homography matrices \(H\) and \(H'\), you can actually apply those transformations to the images. However, these \(H\) and \(H'\) obtained by cv2.stereoRectifyUncalibrated() will produce cropped images. There are many ways to get rectified images without cropping, but here we give one choice: modifying the homographies. In rectify_stereo_images, you should modify \(H\) and \(H'\) to \(H_{\mathrm{mod}}\) and \(H_{\mathrm{mod}}'\), which also indicate rectification and do not yield cropping. Note that cv2.perspectiveTransform() may be useful.

After implementing rectify_stereo_images, hw3_main.py will once more report the recomputed fundamental matrix using \(H_{\mathrm{mod}}\) and \(H_{\mathrm{mod}}'\) (the second matrix in the above figure). You have to report this matrix and check it also satisfies the same constraint as investigated in i) in your writeup. Details can be found in the supplementary slides. In addition, result/rectified_imgs_epipolar_overlay.png generated by hw3_main.py will be helpful to check your implementation.

(d) Calculate disparity map

  • TODO code: calculate_disparity_map

It is possible to estimate disparity map from your rectified image before. But for better evaluation of disparity map estimation itself, we provide a perfect rectified images from calibrated camera (data/img3.png and data/img4.png) and ground truth disparity map (data/img3_disp.exr). The data reading part is already implemented and you don’t need to work on it. And of course, do NOT utilize ground truth disparity to calculate your disparity map. If we find this kind of cheating, you will get 0 point in this homework whatever you did.

The final goal of this task is to get disparity map of first image (left image, data/img3.png). In other words, for such point \((x,y)\) in first image, the corresponding point in second image (right image, data/img4.png) is given as:

\[(x + \mathbf{d}(x, y), y),\]

where \(\mathbf{d}(x, y)\) is disparity map of first image. (Hint: In general rectified left/right images from calibrated camera, \(\mathbf{d}(x, y)\) of left image might have negative values. Think why it happens.) Since this process takes long time, print the progress of computation to check your code is running (e.g. print iteration numbers, using tqdm library or any other methods) if possible. But don’t print whole iteration number, which makes dirty output.

Before start, there are some hyperparameters such as NCC window size, range of disparities and cost aggregation window size. Try various hyperparameter values and search good values by your own. Provide your findings in writeup. Start by constructing 3D cost volume of H x W x D, where H is the height of images, W is the width of images, and D is the number of disparities. Then you need to fill out the value of cost volume at an index (x, y, d), by calculating zero-mean NCC between a block at (x, y) and a block at (x + d, y). You may choose your own window size and D. Larger window size gives less noisy result but suffers from edge bleeding. Test the various size of block window for performance.

The initial cost volume contains a lot of noise due to inaccurate matching values. Cost aggregation is a process for removing those noisy values by smoothing it out. A naive method for cost aggregation which you have to implement is applying box filter for each cost planes of specific disparity d. There can be many other sophisticated method for cost aggregation. If you implement one (other than box filter) and explain in your writeup, you can get (upto) +5 extra points. The final step is deciding the winning disparity of each pixel by finding optimal cost value at (x, y).

You can also estimate sub-pixel disparity (you learned in class) from the cost volume, not just selecting the winning disparity. If you implement this and explain in writeup, you can get +10 extra points. NB, Try sub-pixel disparity estimation AFTER you successfully implement the disparity map computation with full credit (see Rubric and Extra Credit for detail).

Disparity map evaluation

Your function for disparity map estimation should be run in 75 seconds on grading machine (CPU:i9-12900K, RAM: 64GB DDR4 2666MHz). Otherwise you will not get score. You should also provide running time on your machine and machine information (CPU info, …​) in writeup. Make efficient as much as you can. FYI, TA’s implementation based on plane sweep to generate disparity map with full credit runs about 25 seconds on grading machine, and 32 seconds on Macbook Air (M1, 2020). Hence, you can assume less than ~96 seconds in M1 Macbook Air is acceptable. We will consider the computational power difference (That’s why we need your machine information).

We will use two metrics to evaluate your disparity map. EPE (End point error) and Bad pixel ratio. The evaluation function is already implemented in utils.py, and used in main code, so you don’t need to do additional things. EPE measures the root mean squared error (RMSE) between estimated disparity map and ground truth disparity map. Bad pixel ratio measures ratio of pixel which has EPE higher that such threshold. We use 3px as a threshold to decide whether it is bad pixel. There are example evaluation results in supplementary slide.

Writeup

Describe your process and algorithm, show your results, describe any extra credit, and tell us any other information you feel is relevant. We provide you with a LaTeX template. Please compile it into a PDF and submit it along with your code. Also, your report should include following informations.

  • Report how you implement bayer interpolation.

  • Hyperparameter findings and its effects

  • Running time with your machine information.

In addition, you should answer for several questions which are directly asked in (c).

Rubric and Extra Credit

Total 100pts

  • [+10 pts] Implement bilinear interpolation to get RGB imgaes from RGGB bayer images. (bayer_to_rgb_bilinear)

  • [+30 pts] Find the fundamental matrix from given two feature point sets. (calculate_fundamental_matrix)

  • [+20 pts] Rectify stereo images by applying homography.

    • [+5 pts] transform_fundamental_matrix: implementation, reporting the fundamental matrix, write the constraint

    • [+15 pts] rectify_stereo_images: implementation, reporting the fundamental matrix, check the constraint

  • [+35 pts] Disparity map computation. 0 pts if running time >= 75s (calculate_disparity_map)

    • [+15 pts] EPE (End-point-error)

      • [15 pts] \(\mathsf{EPE}<3.0\)

      • [10 pts] \(3.0\le \mathsf{EPE}<4.0\)

      • [5 pts] \(4.0\le \mathsf{EPE}<5.0\)

      • [0 pts] \(5.0\le \mathsf{EPE}\)

    • [+15 pts] Bad pixel ratio

      • [15 pts] \(\mathsf{Bad\ pixel\ ratio}<25\%\)

      • [10 pts] \(25\%\le\mathsf{Bad\ pixel\ ratio}<30\%\)

      • [5 pts] \(30\%\le\mathsf{Bad\ pixel\ ratio}<40\%\)

      • [0 pts] \(40\%\le\mathsf{Bad\ pixel\ ratio}\)

    • [+5 pts] Aggregate the cost volume using box filter. (should be explained in writeup)

  • [+5 pts] Writeup with design decisions and evaluation.

Extra credit

Total 25pts

  • [+10 pts] Implement bicubic bayer interpolation. (bayer_to_rgb_bicubic)

  • [+5 pts] Implement any existing or new sophisticated cost aggregation algorithm other than the box filter and explain in the writeup.

  • [+10 pts] Sub-pixel disparity interpolation and explain in the writeup. (NB, Try this after you successfully implement disparity map computation.)

To get extra credits, you still need to submit code for the original task. And then, feel free to add your own functions or parameters to switch between original task and extra credit task.

Credits

The original project description and code are made by James Hays based on a similar project by Derek Hoiem. The Python version of this project is revised by Dahyun Kang, Inseung Hwang, Donggun Kim, Jungwoo Kim and Min H. Kim. The recent version is revised by Shinyoung Yi.