⬡ Hub
Skip to content

NumPy: Interview Questions

This document compiles a range of common interview questions related to NumPy, covering fundamental concepts to more advanced topics. These questions are designed to test a candidate's understanding of NumPy's architecture, core functionalities, and practical application in numerical computing.

Foundational Concepts

  1. What is NumPy, and what is its primary data structure?

    • Answer: NumPy (Numerical Python) is the fundamental package for numerical computing in Python. Its primary data structure is the ndarray (N-dimensional array), which is a powerful, fixed-size container for items of the same type and size.
  2. How do you create a NumPy array? Provide several methods.

    • Answer:
      • From Python lists/tuples: np.array([1, 2, 3])
      • Placeholders: np.zeros((2, 3)), np.ones((2, 3)), np.empty((2, 3)), np.full((2, 3), 7)
      • Numerical ranges: np.arange(10), np.linspace(0, 1, 5)
      • Identity matrix: np.identity(3), np.eye(4)
  3. Explain the importance of dtype in NumPy arrays.

    • Answer: dtype (data type) specifies the type of elements stored in a NumPy array (e.g., np.int32, np.float64, np.bool_). It's important because:
      • Memory Efficiency: Explicitly choosing smaller dtypes (e.g., int8 vs int64) can significantly reduce memory consumption for large arrays.
      • Performance: Operations on arrays with homogeneous dtype are highly optimized in C.
      • Numerical Precision: float32 vs float64 impacts precision and can be crucial for scientific computations.
      • Interoperability: Ensures compatibility when exchanging data with other libraries or systems.
  4. What is "broadcasting" in NumPy? Give a simple example.

    • Answer: Broadcasting is NumPy's mechanism for performing arithmetic operations on arrays of different shapes. It effectively "stretches" the smaller array across the larger array so that they have compatible shapes for element-wise operations, without actually making copies of the data. This makes operations efficient.
    • Example: python import numpy as np a = np.array([[1, 2, 3], [4, 5, 6]]) # Shape (2, 3) b = np.array([10, 20, 30]) # Shape (3,) c = a + b # b is broadcast across rows of a # c will be [[11, 22, 33], [14, 25, 36]]
  5. How do you select elements from a 2D NumPy array? Provide examples using single elements, slices, and boolean conditions.

    • Answer: ```python import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

      Single element:

      print(arr[1, 2]) # Output: 6 (row 1, col 2)

      Slicing:

      print(arr[:2, 1:]) # Output: [[2, 3], [5, 6]] (first 2 rows, cols 1 onwards)

      Boolean (mask) indexing:

      print(arr[arr > 5]) # Output: [6 7 8 9] (elements greater than 5) ```

Intermediate Concepts

  1. Explain the use of axis parameter in NumPy functions (e.g., sum(), mean()).

    • Answer: The axis parameter specifies the dimension(s) along which an operation is performed.
      • axis=0: Operations are performed column-wise (across rows). For a 2D array, it collapses rows.
      • axis=1: Operations are performed row-wise (across columns). For a 2D array, it collapses columns.
      • axis=None (default): Operations are performed over the entire array, flattening it.
      • Example: arr.sum(axis=0) sums elements down each column. arr.mean(axis=1) calculates the mean across each row.
  2. What is the difference between reshape() and resize()?

    • Answer:
      • reshape(shape): Returns a new array with the specified shape, keeping the original array unchanged. The new array must have the same number of elements as the original.
      • resize(shape): Modifies the array in-place to the new shape. If the new shape is larger than the original, new elements will be filled with zeros. If smaller, elements will be truncated. It can also be called as np.resize() which returns a copy.
  3. How would you perform matrix multiplication in NumPy?

    • Answer: There are several ways:
      • np.dot(a, b): General dot product, can be matrix multiplication, vector dot product, or scalar multiplication depending on arguments.
      • a.dot(b): Instance method, same as np.dot.
      • @ operator (Python 3.5+): Dedicated operator for matrix multiplication. a @ b. This is the most readable and recommended method for true matrix multiplication.
  4. You have two 1D arrays, a and b. How would you concatenate them horizontally and vertically?

    • Answer:
      • Horizontally: np.hstack((a, b)) or np.concatenate((a, b), axis=0) (if both are already 2D row vectors).
      • Vertically: To stack 1D arrays vertically, they first need to be reshaped into 2D column vectors. np.vstack((a[:, np.newaxis], b[:, np.newaxis])) or np.concatenate((a.reshape(-1,1), b.reshape(-1,1)), axis=1) (this actually concatenates columns). If they are to be stacked as rows of a 2D array, and they are 1D, just np.vstack((a, b)) works.
  5. When would you use np.newaxis?

    • Answer: np.newaxis (or None) is used to increase the dimension of an existing array by one. It's often used to convert a 1D array into a 2D row or column vector, which is useful for broadcasting or for making arrays compatible with operations that expect a certain number of dimensions (e.g., batch processing in deep learning). python arr_1d = np.array([1, 2, 3]) row_vec = arr_1d[np.newaxis, :] # shape (1, 3) col_vec = arr_1d[:, np.newaxis] # shape (3, 1)

Advanced Concepts

  1. Explain the difference between a "view" and a "copy" in NumPy, and why it's important.

    • Answer:
      • Copy: A completely new array object and data buffer are created. Modifying the copy does not affect the original, and vice-versa. Operations like explicit copying (.copy()) create copies.
      • View: A new array object is created, but it references the same data buffer as the original array. Modifying the view will modify the original array, and vice-versa. Slicing in NumPy typically creates a view.
    • Importance: Understanding this prevents unexpected side effects. If you slice an array and then modify the slice, you might inadvertently change your original data if it was a view, potentially corrupting subsequent calculations. If you need an independent version, always use .copy().
  2. How can you optimize NumPy code for performance?

    • Answer:
      • Vectorization: Always prefer vectorized operations over Python loops.
      • Broadcasting: Use broadcasting to avoid creating large intermediate arrays.
      • Correct dtype: Use the smallest dtype that preserves necessary precision.
      • Avoid unnecessary copies: Be mindful of view vs. copy.
      • In-place operations: Use in-place operators (+=, *=) when possible.
      • ufuncs: Leverage NumPy's optimized universal functions.
      • Numba / Cython: For complex operations that can't be vectorized, consider JIT compilation with Numba or writing C extensions with Cython.
  3. How do you generate reproducible random numbers in NumPy?

    • Answer: By seeding the random number generator. The recommended modern way is to create a Generator object with a seed: python rng = np.random.default_rng(seed=42) random_array = rng.rand(5) The legacy way used np.random.seed(42). Using default_rng creates an independent stream, which is better for complex simulations.
  4. What is np.where() and when would you use it?

    • Answer: np.where(condition, x, y) is a vectorized conditional function. It returns elements chosen from x or y depending on condition. If condition is True, it chooses from x; otherwise, it chooses from y. It's equivalent to a ternary operator (if-else) for arrays.
    • Use case: Creating a new array or modifying an existing one based on a condition, e.g., creating a binary flag, replacing values, or applying conditional logic element-wise. python scores = np.array([85, 92, 78, 65, 95]) grades = np.where(scores >= 90, 'A', np.where(scores >= 80, 'B', 'C')) print(grades) # Output: ['B', 'A', 'C', 'C', 'A']
  5. How would you perform polynomial fitting using NumPy?

    • Answer: Use np.polyfit() to find the coefficients of a polynomial that best fits a set of data points, and np.poly1d() to create a polynomial object from these coefficients for evaluation. ```python x = np.array([0, 1, 2, 3]) y = np.array([0, 0.8, 0.9, 0.1]) # Fit a 2nd degree polynomial coefficients = np.polyfit(x, y, 2) poly = np.poly1d(coefficients)

      Predict new values

      x_new = np.linspace(0, 3, 10) y_new = poly(x_new) ```

Scenario-Based Questions

  1. You have a large image loaded as a NumPy array (height, width, channels). How would you extract a 100x100 pixel patch starting from pixel (50, 50)?

    • Answer: Use slicing: python image[50:150, 50:150, :] # Assuming channels is the last dimension
  2. You have a dataset where each row is a sample and each column is a feature. You want to normalize each feature (column) so it has a mean of 0 and a standard deviation of 1. How would you do this using NumPy?

    • Answer: Calculate mean and std along axis=0 (columns), then broadcast the subtraction and division. python data_matrix = np.random.rand(100, 10) # 100 samples, 10 features mean = data_matrix.mean(axis=0) std = data_matrix.std(axis=0) normalized_data = (data_matrix - mean) / std
  3. How would you count the occurrences of each unique value in a 1D NumPy array?

    • Answer: Use np.unique() with return_counts=True. python arr = np.array([1, 2, 1, 3, 2, 1, 4, 3]) unique_elements, counts = np.unique(arr, return_counts=True) # Output: (array([1, 2, 3, 4]), array([3, 2, 2, 1]))
  4. You have two NumPy arrays representing two vectors. How would you calculate their Euclidean distance efficiently?

    • Answer: python vec1 = np.array([1, 2, 3]) vec2 = np.array([4, 5, 6]) euclidean_distance = np.sqrt(np.sum((vec1 - vec2)**2)) # Alternatively, using np.linalg.norm euclidean_distance = np.linalg.norm(vec1 - vec2)
  5. Describe a situation where a NumPy array operation might raise a ValueError due to shape incompatibility, and how you would fix it.

    • Answer: This often happens during arithmetic operations without proper broadcasting rules being met, or when trying to concatenate arrays with incompatible dimensions along the specified axis.
    • Scenario: Attempting to add a (2, 3) shaped array with a (4,) shaped array directly without reshaping the latter to be compatible (e.g., (1, 4) or (4, 1)).
    • Fix: Explicitly reshape one of the arrays using reshape(), np.newaxis, or ensure the operation adheres to broadcasting rules. python a = np.ones((2, 3)) # (2,3) b = np.array([1, 2, 3, 4]) # (4,) # a + b would raise ValueError # Fix for adding b to each row: b_reshaped = b[:3].reshape(1, 3) # Take first 3 elements and reshape to (1,3) result = a + b_reshaped # Broadcasting works