OpenCV: Interview Questions

This document compiles a range of common interview questions related to OpenCV, covering fundamental concepts to more advanced topics. These questions are designed to test a candidate's understanding of OpenCV's capabilities, its role in computer vision, and its practical application.

Foundational Concepts

What is OpenCV, and what is its primary purpose?
- Answer: OpenCV (Open Source Computer Vision Library) is a widely used open-source library for computer vision and machine learning. Its primary purpose is to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in commercial products. It offers a vast collection of algorithms for image and video processing, analysis, and understanding.
How does OpenCV represent images internally? What is the default color order for color images loaded by OpenCV?
- Answer: OpenCV represents images as NumPy arrays. This allows for seamless integration with NumPy's powerful numerical operations. The default color order for color images loaded by OpenCV (e.g., using cv2.imread()) is BGR (Blue, Green, Red), unlike many other applications and libraries (like Matplotlib) which typically use RGB.
How do you read, display, and save an image using OpenCV in Python?
- Answer:
  - Read: img = cv2.imread('image.jpg', cv2.IMREAD_COLOR) (or cv2.IMREAD_GRAYSCALE for grayscale).
  - Display (with Matplotlib): plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)), plt.show().
  - Display (with OpenCV): cv2.imshow('Image Title', img), cv2.waitKey(0), cv2.destroyAllWindows().
  - Save: cv2.imwrite('output.png', img).
What is the purpose of cv2.waitKey() and cv2.destroyAllWindows()?
- Answer:
  - cv2.waitKey(delay): This function waits for a keyboard event for delay milliseconds. If delay is 0, it waits indefinitely. It's essential to keep OpenCV display windows open until a key is pressed. It also returns the ASCII value of the key pressed.
  - cv2.destroyAllWindows(): This function destroys all the OpenCV windows that were created. It's good practice to call this after you're done displaying images or videos.
How do you convert a color image to grayscale in OpenCV? Why is grayscale conversion often a preprocessing step?
- Answer: Use gray_img = cv2.cvtColor(color_img, cv2.COLOR_BGR2GRAY).
- Importance: Grayscale conversion is often a preprocessing step because:
  - Reduces Complexity: It simplifies the image by reducing the number of channels from three (color) to one (intensity), reducing computational cost.
  - Focuses on Luminance: Many image processing algorithms (e.g., edge detection, corner detection, some feature matching) primarily rely on intensity variations and perform better or are designed for single-channel images.
  - Memory Efficiency: Grayscale images consume less memory.

Intermediate Concepts

Explain the role of image blurring/smoothing. Name two blurring techniques in OpenCV and describe when to use each.
- Answer: Image blurring (smoothing) is used to reduce noise, reduce image detail, or prepare images for further processing. It works by averaging pixel intensities in a neighborhood.
- Techniques:
  - cv2.GaussianBlur(): Uses a Gaussian kernel to smooth the image. It's very effective for removing Gaussian noise and generally preserves edges better than simple averaging. Good for general-purpose blurring.
  - cv2.medianBlur(): Replaces each pixel value with the median of its neighbors. It is particularly effective at removing "salt-and-pepper" noise (impulse noise) while preserving edges relatively well.
What is edge detection? Describe the Canny edge detector and its steps.
- Answer: Edge detection is a technique used to identify points in an image where the image brightness changes sharply. These points often correspond to boundaries of objects.
- Canny Edge Detector (cv2.Canny()): A multi-stage algorithm widely recognized for its effectiveness. Its steps are:
  1. Noise Reduction: Applies a Gaussian filter to smooth the image and remove noise.
  2. Gradient Calculation: Finds the intensity gradients of the image using Sobel operators.
  3. Non-maximum Suppression: Thins out the edges to 1-pixel wide lines by suppressing pixels that are not the maximum along the gradient direction.
  4. Hysteresis Thresholding: Uses two thresholds (minVal, maxVal) to identify strong edges (above maxVal) and weak edges (between minVal and maxVal). Weak edges are kept only if they are connected to strong edges, filtering out noise.
How do you resize an image in OpenCV? What are some common interpolation methods you might use?
- Answer: Use cv2.resize(src, dsize, fx, fy, interpolation).
  - src: The input image.
  - dsize: Desired output size (width, height). If (0,0), then fx and fy are used.
  - fx, fy: Scaling factors along the horizontal and vertical axes.
  - interpolation: Algorithm for pixel interpolation. Common methods:
    - cv2.INTER_AREA: For shrinking images (resampling using pixel area relation).
    - cv2.INTER_LINEAR: For zooming (bilinear interpolation). Default.
    - cv2.INTER_CUBIC: For zooming (bicubic interpolation, generally better quality but slower).
    - cv2.INTER_LANCZOS4: For zooming (Lanczos interpolation over 8x8 neighborhood).
Explain what feature detection and description are in computer vision. Give an example of a feature detector/descriptor from OpenCV.
- Answer:
  - Feature Detection: The process of finding distinct and salient points or regions (e.g., corners, blobs, edges) in an image that are robust to variations like scale, rotation, and illumination.
  - Feature Description: Once a feature is detected, a descriptor is computed for it. This descriptor is a compact, numerical representation (a vector) that describes the local image region around the feature. It should be unique enough to distinguish the feature from others and robust enough to match it even under transformations.
- Example (ORB): ORB (Oriented FAST and Rotated BRIEF) is a robust and efficient alternative to SIFT/SURF. It combines the FAST algorithm for keypoint detection with the BRIEF descriptor for feature description, making it rotation-invariant.
How would you perform object tracking in a video stream using OpenCV with simple methods?
- Answer: A common simple approach is using template matching or mean-shift/CAMShift algorithms, or more recently, tracking APIs provided by OpenCV (e.g., CSRT, KCF trackers).
  - Template Matching (simplified): If the object's appearance doesn't change much, find a small template image (of the object) in successive frames. (Limitations: computationally expensive, sensitive to scale/rotation changes).
  - Mean-Shift/CAMShift: Useful for tracking objects based on color histograms. cv2.CamShift can adapt to object size changes.
  - Tracker APIs (cv2.TrackerCSRT_create()): OpenCV's Tracker classes (e.g., CSRT, KCF) are more robust and efficient. You initialize the tracker with a bounding box in the first frame, and then tracker.update(frame) is called for subsequent frames to get the new bounding box.

Advanced Concepts

What is image segmentation, and how can it be achieved using OpenCV? Name a few methods.
- Answer: Image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as superpixels) to simplify and/or change the representation of an image into something more meaningful and easier to analyze. The goal is to locate objects and boundaries in images.
- Methods in OpenCV:
  - Thresholding: cv2.threshold() for simple foreground-background separation based on pixel intensity.
  - GrabCut Algorithm: cv2.grabCut() for interactive foreground extraction, using graph cuts.
  - Watershed Algorithm: cv2.watershed() for segmenting touching objects based on markers.
  - Contour Detection: While not strictly segmentation, cv2.findContours() can find boundaries of objects after thresholding, which is a step towards object-level analysis.
  - Deep Learning Models: For more advanced semantic and instance segmentation, deep learning models (e.g., U-Net, Mask R-CNN) integrated with OpenCV's DNN module can be used.
Discuss the concept of image homography. When is it used, and how can OpenCV estimate it?
- Answer: Homography is a transformation (a 3x3 matrix) that maps points from one plane to another plane. In computer vision, it's used to describe the geometric transformation between two images of the same planar object, or images taken from different viewpoints, assuming the scene is planar or the camera is rotating around its optical center.
- Use cases: Image stitching, camera calibration, augmented reality, perspective correction.
- OpenCV Estimation: cv2.findHomography(src_points, dst_points, method, ransacReprojThreshold). It takes corresponding pairs of points (src_points from the first image, dst_points from the second) and uses algorithms like RANSAC (RANdom SAmple Consensus) to robustly estimate the homography matrix, handling outliers.
What are morphological operations in OpenCV? Provide an example (e.g., erosion or dilation).
- Answer: Morphological operations are a set of image processing operations based on shape. They apply a "structuring element" (kernel) to an input image to create an output image. They are typically performed on binary images (though can be applied to grayscale).
- Example (Dilation): cv2.dilate(src, kernel, iterations). Dilation adds pixels to the boundaries of objects in an image. It's useful for making objects more prominent, joining broken parts of an object, or filling small holes. python import cv2 import numpy as np kernel = np.ones((5,5), np.uint8) # 5x5 square kernel dilated_img = cv2.dilate(binary_img, kernel, iterations=1)
- Example (Erosion): cv2.erode(src, kernel, iterations). Erosion removes pixels from object boundaries. It's useful for removing noise, detaching two connected objects, or thinning objects.
How do you perform real-time object detection using OpenCV and pre-trained deep learning models?
- Answer: OpenCV's dnn (Deep Neural Network) module allows loading and running pre-trained deep learning models from various frameworks (TensorFlow, Caffe, PyTorch, Darknet, ONNX).
  1. Load Model: Load the pre-trained model (e.g., SSD, YOLO) and its configuration file (cv2.dnn.readNet()).
  2. Prepare Input: Preprocess the input image/frame into a format suitable for the network (e.g., resizing, scaling, mean subtraction) using cv2.dnn.blobFromImage().
  3. Forward Pass: Pass the processed blob through the network (net.setInput(blob), net.forward(output_layers)).
  4. Parse Output: Interpret the network's output to get bounding boxes, class labels, and confidence scores.
  5. Non-Maximum Suppression (NMS): Apply NMS (cv2.dnn.NMSBoxes()) to remove overlapping duplicate bounding boxes.
Discuss camera calibration. Why is it important in computer vision, and what OpenCV functions are used?
- Answer: Camera calibration is the process of estimating the intrinsic and extrinsic parameters of a camera.
  - Intrinsic Parameters: Describe the camera's internal properties (focal length, optical center, lens distortion coefficients).
  - Extrinsic Parameters: Describe the camera's position and orientation in the world (rotation and translation vectors).
- Importance: It's crucial for:
  - Accurate 3D Reconstruction: To accurately infer 3D information from 2D images.
  - Augmented Reality: To correctly overlay virtual objects onto real-world scenes.
  - Robotics: For robot navigation and manipulation tasks.
  - Removing Lens Distortion: Correcting barrel or pincushion distortion.
- OpenCV Functions:
  - cv2.findChessboardCorners(): Detects the corners of a chessboard pattern.
  - cv2.calibrateCamera(): Computes the camera matrix, distortion coefficients, rotation, and translation vectors.
  - cv2.getOptimalNewCameraMatrix(): Refines the camera matrix based on the free scaling parameter.
  - cv2.undistort(): Removes lens distortion from an image.

Scenario-Based Questions

You have a live video feed from a camera. How would you detect faces in each frame and draw a bounding box around them?
- Answer:
  1. Load Haar Cascade Classifier: face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml').
  2. Read Video Frame by Frame: Use cv2.VideoCapture().
  3. Grayscale Conversion: Convert each frame to grayscale (cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)), as Haar cascades work on grayscale.
  4. Detect Faces: faces = face_cascade.detectMultiScale(gray_frame, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30)).
  5. Draw Bounding Boxes: Iterate through faces (which returns (x, y, w, h) for each detected face) and draw rectangles on the original color frame using cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2).
  6. Display: cv2.imshow('Face Detection', frame).
  7. Loop/Exit: Add cv2.waitKey() and a key to break the loop.
You need to compare two images to find out if they contain the same object, even if one is rotated or scaled slightly. Which OpenCV features/techniques would you use?
- Answer: Feature matching using invariant feature detectors and descriptors.
  1. Detect Keypoints and Compute Descriptors: Use an algorithm like ORB, SIFT (now free), or SURF (if using older OpenCV non-free modules) on both images (orb.detectAndCompute()).
  2. Match Features: Use a BFMatcher (Brute-Force Matcher) or FlannBasedMatcher to find corresponding features between the two images (bf.match() or bf.knnMatch()).
  3. Filter Matches: Refine matches (e.g., using ratio test for knnMatch).
  4. Find Homography: If the object is planar, use cv2.findHomography() with RANSAC to estimate the transformation matrix from the matched points.
  5. Verify: Transform one image/object onto the other using the homography to confirm correspondence.
How would you perform background subtraction in a video to isolate moving objects?
- Answer: OpenCV provides several background subtraction algorithms within its bgsegm or video modules.
  1. Initialize Background Subtractor: Use cv2.createBackgroundSubtractorMOG2() (Gaussian Mixture-based Background/Foreground Segmentation Algorithm) or cv2.createBackgroundSubtractorKNN().
  2. Process Frames: For each frame in the video, apply the background subtractor (fgmask = fgbg.apply(frame)). This generates a binary mask where white pixels are foreground (moving objects) and black pixels are background.
  3. Post-processing: Apply morphological operations (e.g., erode, dilate, opening, closing) to clean up the foreground mask and remove noise.
  4. Contour Detection: Find contours in the foreground mask to detect and draw bounding boxes around the moving objects.
You have an image that contains text, and you want to isolate the text regions (e.g., for OCR). How can morphological operations help in this task?
- Answer: Morphological operations are very useful for text extraction.
  - Dilation: Can be used to make text characters thicker and connect fragmented parts of letters, making them easier to detect as connected components.
  - Erosion: Can remove small specks of noise around text.
  - Opening (Erosion then Dilation): Removes small objects (noise) from the foreground.
  - Closing (Dilation then Erosion): Fills small holes within objects and connects nearby objects, which is particularly useful for connecting parts of text characters that might have been broken due to noise or varying stroke widths.
  - Black Hat / Top Hat: Can be used to highlight text that is darker/lighter than its background.
Describe a situation where a cv2.cvtColor function might return an error, and how you would prevent it.
- Answer: A common error occurs if the input image to cv2.cvtColor is None (e.g., cv2.imread() failed to load the image) or has an incorrect number of channels for the specified conversion.
  - Scenario: python img = cv2.imread('non_existent_image.jpg') # img will be None gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # This will raise an error
  - Prevention: Always check if the image was loaded successfully before performing operations on it. python img = cv2.imread('image.jpg') if img is None: print("Error: Image not loaded.") else: gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # ... proceed with processing