NumPy: Interview Questions
This document compiles a range of common interview questions related to NumPy, covering fundamental concepts to more advanced topics. These questions are designed to test a candidate's understanding of NumPy's architecture, core functionalities, and practical application in numerical computing.
Foundational Concepts
-
What is NumPy, and what is its primary data structure?
- Answer: NumPy (Numerical Python) is the fundamental package for numerical computing in Python. Its primary data structure is the
ndarray(N-dimensional array), which is a powerful, fixed-size container for items of the same type and size.
- Answer: NumPy (Numerical Python) is the fundamental package for numerical computing in Python. Its primary data structure is the
-
How do you create a NumPy array? Provide several methods.
- Answer:
- From Python lists/tuples:
np.array([1, 2, 3]) - Placeholders:
np.zeros((2, 3)),np.ones((2, 3)),np.empty((2, 3)),np.full((2, 3), 7) - Numerical ranges:
np.arange(10),np.linspace(0, 1, 5) - Identity matrix:
np.identity(3),np.eye(4)
- From Python lists/tuples:
- Answer:
-
Explain the importance of
dtypein NumPy arrays.- Answer:
dtype(data type) specifies the type of elements stored in a NumPy array (e.g.,np.int32,np.float64,np.bool_). It's important because:- Memory Efficiency: Explicitly choosing smaller dtypes (e.g.,
int8vsint64) can significantly reduce memory consumption for large arrays. - Performance: Operations on arrays with homogeneous
dtypeare highly optimized in C. - Numerical Precision:
float32vsfloat64impacts precision and can be crucial for scientific computations. - Interoperability: Ensures compatibility when exchanging data with other libraries or systems.
- Memory Efficiency: Explicitly choosing smaller dtypes (e.g.,
- Answer:
-
What is "broadcasting" in NumPy? Give a simple example.
- Answer: Broadcasting is NumPy's mechanism for performing arithmetic operations on arrays of different shapes. It effectively "stretches" the smaller array across the larger array so that they have compatible shapes for element-wise operations, without actually making copies of the data. This makes operations efficient.
- Example:
python import numpy as np a = np.array([[1, 2, 3], [4, 5, 6]]) # Shape (2, 3) b = np.array([10, 20, 30]) # Shape (3,) c = a + b # b is broadcast across rows of a # c will be [[11, 22, 33], [14, 25, 36]]
-
How do you select elements from a 2D NumPy array? Provide examples using single elements, slices, and boolean conditions.
-
Answer: ```python import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Single element:
print(arr[1, 2]) # Output: 6 (row 1, col 2)
Slicing:
print(arr[:2, 1:]) # Output: [[2, 3], [5, 6]] (first 2 rows, cols 1 onwards)
Boolean (mask) indexing:
print(arr[arr > 5]) # Output: [6 7 8 9] (elements greater than 5) ```
-
Intermediate Concepts
-
Explain the use of
axisparameter in NumPy functions (e.g.,sum(),mean()).- Answer: The
axisparameter specifies the dimension(s) along which an operation is performed.axis=0: Operations are performed column-wise (across rows). For a 2D array, it collapses rows.axis=1: Operations are performed row-wise (across columns). For a 2D array, it collapses columns.axis=None(default): Operations are performed over the entire array, flattening it.- Example:
arr.sum(axis=0)sums elements down each column.arr.mean(axis=1)calculates the mean across each row.
- Answer: The
-
What is the difference between
reshape()andresize()?- Answer:
reshape(shape): Returns a new array with the specified shape, keeping the original array unchanged. The new array must have the same number of elements as the original.resize(shape): Modifies the array in-place to the new shape. If the new shape is larger than the original, new elements will be filled with zeros. If smaller, elements will be truncated. It can also be called asnp.resize()which returns a copy.
- Answer:
-
How would you perform matrix multiplication in NumPy?
- Answer: There are several ways:
np.dot(a, b): General dot product, can be matrix multiplication, vector dot product, or scalar multiplication depending on arguments.a.dot(b): Instance method, same asnp.dot.@operator (Python 3.5+): Dedicated operator for matrix multiplication.a @ b. This is the most readable and recommended method for true matrix multiplication.
- Answer: There are several ways:
-
You have two 1D arrays,
aandb. How would you concatenate them horizontally and vertically?- Answer:
- Horizontally:
np.hstack((a, b))ornp.concatenate((a, b), axis=0)(if both are already 2D row vectors). - Vertically: To stack 1D arrays vertically, they first need to be reshaped into 2D column vectors.
np.vstack((a[:, np.newaxis], b[:, np.newaxis]))ornp.concatenate((a.reshape(-1,1), b.reshape(-1,1)), axis=1)(this actually concatenates columns). If they are to be stacked as rows of a 2D array, and they are 1D, justnp.vstack((a, b))works.
- Horizontally:
- Answer:
-
When would you use
np.newaxis?- Answer:
np.newaxis(orNone) is used to increase the dimension of an existing array by one. It's often used to convert a 1D array into a 2D row or column vector, which is useful for broadcasting or for making arrays compatible with operations that expect a certain number of dimensions (e.g., batch processing in deep learning).python arr_1d = np.array([1, 2, 3]) row_vec = arr_1d[np.newaxis, :] # shape (1, 3) col_vec = arr_1d[:, np.newaxis] # shape (3, 1)
- Answer:
Advanced Concepts
-
Explain the difference between a "view" and a "copy" in NumPy, and why it's important.
- Answer:
- Copy: A completely new array object and data buffer are created. Modifying the copy does not affect the original, and vice-versa. Operations like explicit copying (
.copy()) create copies. - View: A new array object is created, but it references the same data buffer as the original array. Modifying the view will modify the original array, and vice-versa. Slicing in NumPy typically creates a view.
- Copy: A completely new array object and data buffer are created. Modifying the copy does not affect the original, and vice-versa. Operations like explicit copying (
- Importance: Understanding this prevents unexpected side effects. If you slice an array and then modify the slice, you might inadvertently change your original data if it was a view, potentially corrupting subsequent calculations. If you need an independent version, always use
.copy().
- Answer:
-
How can you optimize NumPy code for performance?
- Answer:
- Vectorization: Always prefer vectorized operations over Python loops.
- Broadcasting: Use broadcasting to avoid creating large intermediate arrays.
- Correct
dtype: Use the smallestdtypethat preserves necessary precision. - Avoid unnecessary copies: Be mindful of view vs. copy.
- In-place operations: Use in-place operators (
+=,*=) when possible. ufuncs: Leverage NumPy's optimized universal functions.Numba/Cython: For complex operations that can't be vectorized, consider JIT compilation withNumbaor writing C extensions withCython.
- Answer:
-
How do you generate reproducible random numbers in NumPy?
- Answer: By seeding the random number generator. The recommended modern way is to create a
Generatorobject with a seed:python rng = np.random.default_rng(seed=42) random_array = rng.rand(5)The legacy way usednp.random.seed(42). Usingdefault_rngcreates an independent stream, which is better for complex simulations.
- Answer: By seeding the random number generator. The recommended modern way is to create a
-
What is
np.where()and when would you use it?- Answer:
np.where(condition, x, y)is a vectorized conditional function. It returns elements chosen fromxorydepending oncondition. IfconditionisTrue, it chooses fromx; otherwise, it chooses fromy. It's equivalent to a ternary operator (if-else) for arrays. - Use case: Creating a new array or modifying an existing one based on a condition, e.g., creating a binary flag, replacing values, or applying conditional logic element-wise.
python scores = np.array([85, 92, 78, 65, 95]) grades = np.where(scores >= 90, 'A', np.where(scores >= 80, 'B', 'C')) print(grades) # Output: ['B', 'A', 'C', 'C', 'A']
- Answer:
-
How would you perform polynomial fitting using NumPy?
-
Answer: Use
np.polyfit()to find the coefficients of a polynomial that best fits a set of data points, andnp.poly1d()to create a polynomial object from these coefficients for evaluation. ```python x = np.array([0, 1, 2, 3]) y = np.array([0, 0.8, 0.9, 0.1]) # Fit a 2nd degree polynomial coefficients = np.polyfit(x, y, 2) poly = np.poly1d(coefficients)Predict new values
x_new = np.linspace(0, 3, 10) y_new = poly(x_new) ```
-
Scenario-Based Questions
-
You have a large image loaded as a NumPy array (height, width, channels). How would you extract a 100x100 pixel patch starting from pixel (50, 50)?
- Answer: Use slicing:
python image[50:150, 50:150, :] # Assuming channels is the last dimension
- Answer: Use slicing:
-
You have a dataset where each row is a sample and each column is a feature. You want to normalize each feature (column) so it has a mean of 0 and a standard deviation of 1. How would you do this using NumPy?
- Answer: Calculate mean and std along
axis=0(columns), then broadcast the subtraction and division.python data_matrix = np.random.rand(100, 10) # 100 samples, 10 features mean = data_matrix.mean(axis=0) std = data_matrix.std(axis=0) normalized_data = (data_matrix - mean) / std
- Answer: Calculate mean and std along
-
How would you count the occurrences of each unique value in a 1D NumPy array?
- Answer: Use
np.unique()withreturn_counts=True.python arr = np.array([1, 2, 1, 3, 2, 1, 4, 3]) unique_elements, counts = np.unique(arr, return_counts=True) # Output: (array([1, 2, 3, 4]), array([3, 2, 2, 1]))
- Answer: Use
-
You have two NumPy arrays representing two vectors. How would you calculate their Euclidean distance efficiently?
- Answer:
python vec1 = np.array([1, 2, 3]) vec2 = np.array([4, 5, 6]) euclidean_distance = np.sqrt(np.sum((vec1 - vec2)**2)) # Alternatively, using np.linalg.norm euclidean_distance = np.linalg.norm(vec1 - vec2)
- Answer:
-
Describe a situation where a NumPy array operation might raise a
ValueErrordue to shape incompatibility, and how you would fix it.- Answer: This often happens during arithmetic operations without proper broadcasting rules being met, or when trying to concatenate arrays with incompatible dimensions along the specified axis.
- Scenario: Attempting to add a (2, 3) shaped array with a (4,) shaped array directly without reshaping the latter to be compatible (e.g., (1, 4) or (4, 1)).
- Fix: Explicitly reshape one of the arrays using
reshape(),np.newaxis, or ensure the operation adheres to broadcasting rules.python a = np.ones((2, 3)) # (2,3) b = np.array([1, 2, 3, 4]) # (4,) # a + b would raise ValueError # Fix for adding b to each row: b_reshaped = b[:3].reshape(1, 3) # Take first 3 elements and reshape to (1,3) result = a + b_reshaped # Broadcasting works