CS-6476: Computer Vision

CS-6476: Computer Vision

Instructor(s): Aaron Bobick/ Irfan Essa
Course Page: Link

This course provides a gentle introduction to computer vision covering a wide range of topics including:

  • Feature Detection & Extraction: Hough Transforms, SIFT descriptors & RANSAC
  • Camera Models and Stereo Geometry
  • Motion Models - Hierarchical Lucas and Kanade
  • Kalman and Particle Filters for Motion Tracking
  • Convolutional Neural Networks

Overall this project was very heavy in-terms of project work and course material, with easily 20+ hours of work/studying every week. The recorded lectures are done every well with high production quality and a decent amount of depth (you'll need to brush up on your Linear Algebra and Calculus). In total there were 6 projects, 1 final project and 1 exam, which can be pretty demanding since there's a non-trivial amount of coding needed to be done for each assignment every two weeks.

The individual projects were definetly the highlight of this course, where you're expected to implement and learn how to use core Computer Vision algorithms to solve a toy/real-life problem. You'll be well-versed in OpenCV by the end of the course! I've listed some of the projects that I found to be the mosting interesting:

Edge and Object Detection:

Using Hough Transforms, we can detect shapes by transforming edges into a parameter space and use a simple voting procedure to figure out if it's likely a feature. Hough transforms work well for parameterized shapes like lines or circles (like our traffic signs), but can also be generalized to any shape.

Detected Stop Sign with Hough Transform

Used in combination with some fairly basic image pre-processing, we implemented some fairly basic traffic sign detection, since signs are mostly composed of simply shapes like triangles, circles and octagons.

Feature Tracking:

A more robust way to do tracking is to use template matching against a known pattern. Utilizing markers placed in a real scene, we can use this to calculate their location relative to a video camera and find a homography or projective transform

This let's us project a video onto the scene, implementing some very basic Augmented Reality!

Optical Flow:

We implement basic motion detection/optical flow detection by implementing an iterative Lucas-Kanade algorithm with gaussian pyramids:

Final Project - Convolutional Neural Networks:

Digit Classification with a ResNet

As a final project, we had free reign to implement multiple different advanced computer vision algorithms. I chose to explore training a Deep Residual Network on The Street View House Numbers (SVHN) Dataset, performed a comparitive study on the architecture's performance compared to a simpler convolutional neural network and also a CNN built from ImageNet through transfer-learning.