SQL Interview Questions and Answers

This guide is structured to build from basic concepts to advanced topics, using a consistent sample schema for all query examples.

Sample Schema

For the query-based questions, we will use the following simple schema representing employees and their departments.

Departments Table: | id (PK) | name | | :------ | :-------------- | | 1 | 'Engineering' | | 2 | 'Human Resources' | | 3 | 'Sales' | | 4 | 'Marketing' |

Employees Table: | id (PK) | first_name | last_name | salary | department_id (FK) | | :------ | :--------- | :-------- | :----- | :----------------- | | 1 | 'John' | 'Doe' | 90000 | 1 | | 2 | 'Jane' | 'Smith' | 110000 | 1 | | 3 | 'Peter' | 'Jones' | 80000 | 3 | | 4 | 'Alice' | 'Williams'| 120000 | 1 | | 5 | 'Bob' | 'Brown' | 75000 | 3 | | 6 | 'Emily' | 'Davis' | 60000 | 2 | | 7 | 'Chris' | 'Green' | 95000 | NULL |

(Note: Chris Green is not assigned to any department.)

Basic Concepts & Queries

1. What is a Primary Key and a Foreign Key?

Primary Key (PK): A column (or a set of columns) that uniquely identifies each row in a table. It cannot contain NULL values, and all values in the column must be unique. Think of it as a person's unique national ID number; no two people have the same one.
- In our schema, Departments.id and Employees.id are primary keys.
Foreign Key (FK): A column (or a set of columns) in one table that refers to the Primary Key of another table. It is used to create a link and enforce integrity between the two tables.
- In our schema, Employees.department_id is a foreign key that references Departments.id.

Real-World Analogy: E-commerce Site

Imagine an e-commerce website with two tables: Users and Orders.

The Users table has a primary key column called user_id. This is a unique number assigned to every user.
The Orders table needs to know which user placed each order. So, it has a foreign key column, also called user_id.
When a user places an order, a new row is created in the Orders table, and the user_id from the Users table is copied into the Orders.user_id column.

This link ensures that: 1. You can easily find all orders placed by a specific user. 2. You can't create an order for a user that doesn't exist, preserving data integrity.

2. What is the difference between `DELETE`, `TRUNCATE`, and `DROP`?

Feature	`DELETE`	`TRUNCATE`	`DROP`
Type	DML (Data Manipulation Language)	DDL (Data Definition Language)	DDL (Data Definition Language)
Scope	Removes one or more rows.	Removes all rows.	Removes the entire table.
`WHERE` Clause	Can be used.	Cannot be used.	Cannot be used.
Speed	Slower (logs each row deletion).	Much faster (deallocates data pages).	Fastest (removes the table definition).
Rollback	Can be rolled back.	Cannot be rolled back (in most systems).	Cannot be rolled back.
Triggers	Fires `DELETE` triggers for each row.	Does not fire triggers.	Does not fire triggers.

When to use which? A Real-World Scenario Guide:

Use DELETE when you need precision.
- Scenario: A user decides to close their account on your platform. You need to remove only their specific data from the Users table.
- Action: DELETE FROM Users WHERE user_id = 'user123';
Use TRUNCATE when you want to reset a table.
- Scenario 1: You have a staging table (stg_daily_sales) that you load with fresh data every day. Before each morning's ETL job, you need to wipe all of yesterday's data from it, but keep the table structure for the new load.
- Scenario 2: You are running integration tests and need to quickly clear out all records from test tables (test_orders, test_users) to ensure a clean state before each test run.
- Action: TRUNCATE TABLE stg_daily_sales;
Use DROP when the object is no longer needed at all.
- Scenario: Your company has decided to deprecate an old feature, and the LegacyFeatureFlags table that supported it is now completely obsolete. Keeping it just clutters the database.
- Action: DROP TABLE LegacyFeatureFlags;

3. How do you filter and sort query results?

WHERE: Used to filter rows based on a specific condition.
ORDER BY: Used to sort the result set in ascending (ASC) or descending (DESC) order.

Example: Find all employees in the Sales department (department_id = 3) and order them by salary, highest first.

SELECT first_name, last_name, salary
FROM Employees
WHERE department_id = 3
ORDER BY salary DESC;

JOINs

4. Explain the different types of SQL JOINs.

JOINs are used to combine rows from two or more tables based on a related column between them.

INNER JOIN: Returns only the rows where the join condition is met in both tables. It's the most common type of join.

Example: Get a list of all employees and their department names. (Chris Green will not be included as his department_id is NULL). sql SELECT e.first_name, e.last_name, d.name AS department_name FROM Employees e INNER JOIN Departments d ON e.department_id = d.id;
LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table (Employees), and the matched rows from the right table (Departments). If there is no match, the columns from the right table will be NULL.

Example: Get a list of all employees and their department names. (Chris Green will be included, but his department_name will be NULL). sql SELECT e.first_name, e.last_name, d.name AS department_name FROM Employees e LEFT JOIN Departments d ON e.department_id = d.id;
RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table (Departments), and the matched rows from the left table (Employees). If there is no match, the columns from the left table will be NULL.

Example: Get a list of all departments and the employees in them. (The "Marketing" department will be included, even though it has no employees). sql SELECT e.first_name, e.last_name, d.name AS department_name FROM Employees e RIGHT JOIN Departments d ON e.department_id = d.id;
FULL OUTER JOIN: Returns all rows when there is a match in either the left or the right table. It combines the results of both LEFT and RIGHT joins.

Example: Get a list of all employees and all departments, matching them where possible. (Both Chris Green and the Marketing department will be included). sql SELECT e.first_name, e.last_name, d.name AS department_name FROM Employees e FULL OUTER JOIN Departments d ON e.department_id = d.id;

Aggregation & Grouping

5. What are aggregate functions? Explain `GROUP BY` and `HAVING`.

Aggregate functions perform a calculation on a set of values and return a single value. Common aggregate functions are COUNT(), SUM(), AVG(), MAX(), and MIN().

GROUP BY The GROUP BY clause is used with aggregate functions to group rows that have the same values in specified columns into summary rows.

Example: Find the number of employees in each department.

SELECT d.name, COUNT(e.id) AS number_of_employees
FROM Employees e
JOIN Departments d ON e.department_id = d.id
GROUP BY d.name;

HAVING The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions. HAVING is used to filter groups based on the results of aggregate functions.

Example: Find only the departments with more than 1 employee.

SELECT d.name, COUNT(e.id) AS number_of_employees
FROM Employees e
JOIN Departments d ON e.department_id = d.id
GROUP BY d.name
HAVING COUNT(e.id) > 1;

Key Difference: WHERE filters rows before aggregation, while HAVING filters groups after aggregation.

Advanced Querying (Window Functions)

6. What are window functions, and how are they different from aggregate functions?

Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, which group rows into a single output row, window functions perform the calculation on a "window" of rows but still return a value for every single row.

Key Clause: OVER() is what makes a function a window function. It defines the window of rows to operate on. * PARTITION BY: Divides the rows into partitions (groups). The window function is applied independently to each partition. * ORDER BY: Orders the rows within each partition.

Example: Find the salary of each employee, and also rank them within their department based on salary.

SELECT
    e.first_name,
    d.name AS department_name,
    e.salary,
    RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) as salary_rank_in_dept
FROM
    Employees e
JOIN
    Departments d ON e.department_id = d.id;

Result: | first_name | department_name | salary | salary_rank_in_dept | | :--- | :--- | :--- | :--- | | 'Alice' | 'Engineering' | 120000 | 1 | | 'Jane' | 'Engineering' | 110000 | 2 | | 'John' | 'Engineering' | 90000 | 3 | | 'Peter' | 'Sales' | 80000 | 1 | | 'Bob' | 'Sales' | 75000 | 2 | | 'Emily' | 'Human Resources'| 60000 | 1 |

Common window functions include RANK(), DENSE_RANK(), ROW_NUMBER(), LEAD(), LAG(), and aggregate functions used with an OVER() clause (e.g., AVG(salary) OVER (PARTITION BY department)).

Database Theory & Design

7. What is an index, and why is it important for performance?

An index is a special lookup table that the database search engine can use to speed up data retrieval. Essentially, it's a data structure (typically a B-tree) that stores the values for a specific column or set of columns in a sorted order. It works like the index in the back of a book; instead of scanning the entire book (the table) for a topic, you go to the index to find the page number (the data location) and go directly there.

Real-World Scenario: Searching for a Product

Imagine you have a Products table with 10 million rows. A user on your e-commerce site searches for a product with a specific SKU, ABC-12345.

Without an Index: The database has to perform a full table scan. It starts at row 1 and reads every single row, checking if the product_sku column matches ABC-12345. On average, it would have to read 5 million rows to find the product. This could take several seconds and is very inefficient.
With an Index on product_sku: The database doesn't touch the table first. It goes to the much smaller, sorted index file. Because the index is sorted alphabetically, it can perform a binary search (an "index seek") to find ABC-12345 almost instantly. The index entry contains a pointer directly to the location of the full row in the main table. The database then retrieves just that one row. This process is typically measured in milliseconds.

How to create an index: You would add an index to a column that is frequently used in WHERE clauses or JOIN conditions.

-- Creates a non-clustered index named 'idx_employees_lastname'
-- on the 'last_name' column of the 'Employees' table.
CREATE INDEX idx_employees_lastname ON Employees(last_name);

Trade-offs: * Faster Reads, Slower Writes: Indexes dramatically speed up SELECT queries but can slow down data modification (INSERT, UPDATE, DELETE) because every time you change the data, the index must also be updated to stay in sync. * Storage: Indexes take up additional disk space.

It's a best practice to be selective and only index columns that are frequently queried.

8. What is the difference between a clustered and a non-clustered index?

Clustered Index:
- Determines the physical order of data in a table. The rows of the table are stored on disk in the same order as the clustered index.
- Because of this, there can be only one clustered index per table.
- The primary key of a table is often the clustered index by default.
Non-Clustered Index:
- Has a structure separate from the data rows. It contains the index key values, and each key value has a pointer (a row locator) to the actual data row.
- It's like the index of a book, which is separate from the content and points you to the right page.
- You can have multiple non-clustered indexes on a single table.

9. What is a transaction and what are the ACID properties?

A transaction is a sequence of one or more database operations that are executed as a single logical unit of work. All operations in a transaction must succeed; if any one of them fails, the entire transaction is rolled back.

ACID is an acronym that describes the four key properties of a reliable transaction. The classic example to illustrate this is a bank transfer.

Scenario: Transferring $100 from Alice's account to Bob's account.

The transaction consists of two operations: 1. Debit $100 from Alice's balance. 2. Credit $100 to Bob's balance.

Here is how the ACID properties ensure this works correctly:

Atomicity (All or nothing):
- What it means: The transaction is an indivisible, "atomic" unit. Either both operations succeed, or neither does.
- In our scenario: If the system successfully debits Alice's account but then crashes before it can credit Bob's account, Atomicity ensures that the debit operation is automatically rolled back when the system recovers. The money is "magically" back in Alice's account. It prevents the money from vanishing into thin air.
Consistency (The database stays valid):
- What it means: The transaction must bring the database from one valid state to another, preserving all integrity rules.
- In our scenario: Let's say the total amount of money in the bank across all accounts is $1 million. After the transfer, the total amount of money must still be $1 million. Consistency ensures that a transaction can't just invent or destroy money. The overall state of the database remains correct.
Isolation (Transactions don't step on each other):
- What it means: Transactions are often executed concurrently, but Isolation ensures that they don't interfere with each other. The result of concurrent transactions is the same as if they were executed one after another (serially).
- In our scenario: Imagine that at the exact same moment Alice is sending money to Bob, a separate process is trying to calculate the total interest for all accounts, including Alice's and Bob's. Isolation guarantees that the interest calculation process will see the balances either before the transaction started or after it completed, but never in an inconsistent intermediate state where Alice has been debited but Bob has not yet been credited.
Durability (Once it's done, it stays done):
- What it means: Once a transaction has been successfully committed, it is permanently saved. It will survive any subsequent system failure, like a power outage or server crash.
- In our scenario: Once the transfer is complete and the user gets a "Success" message, that $100 is permanently in Bob's account. Even if the database server's power cord is pulled out a second later, the transaction's result is already written to a transaction log or non-volatile memory and will be reapplied, ensuring the change is durable.

10. What is SQL Injection and how can you prevent it?

SQL Injection is a web security vulnerability that allows an attacker to interfere with the queries that an application makes to its database. It occurs when an application insecurely includes user-provided data in its database queries. This can allow an attacker to bypass authentication, view data they are not normally able to retrieve, and even modify or delete data.

Prevention: The Golden Rule The most effective way to prevent SQL injection is to never trust user input and to use Prepared Statements (also known as Parameterized Queries).

How it works: With prepared statements, the application developer first writes a SQL query template with placeholders for the user input. This template is sent to the database engine, which parses it and compiles a query plan. Only then does the application send the user's input separately. The database engine treats this input strictly as data, not as part of the SQL command, making injection impossible.

Example: Bypassing a Login Form

Imagine a login form that takes a username and password.

Vulnerable Code (in Python, conceptual): The application builds the query by concatenating the raw user input directly into the SQL string. ```python # Unsafe - user input is directly concatenated into the query string username = request.form['username'] # User enters: 'admin' password = request.form['password'] # User enters: 'password'

query = f"SELECT * FROM Users WHERE username = '{username}' AND password = '{password}'"

The query becomes:

SELECT * FROM Users WHERE username = 'admin' AND password = 'password'

An attacker can exploit this. If they enter `' OR '1'='1` as the password, look what happens:python password = "' OR '1'='1" query = f"SELECT * FROM Users WHERE username = 'admin' AND password = '' OR '1'='1'" `` Since'1'='1'is always true, theWHERE` clause becomes true for every row, and the query will likely log the attacker in as the first user in the table (often the admin).
Safe Code (using parameters): Here, we use a placeholder (%s or ? depending on the library) for the user input. ```python username = request.form['username'] password = request.form['password']

query = "SELECT * FROM Users WHERE username = %s AND password = %s" cursor.execute(query, (username, password)) `` Even if an attacker provides a malicious string like' OR '1'='1, the database engine does not execute it. It treats it as a literal string and looks for a user whose password is the exact phrase' OR '1'='1`, which will almost certainly fail, and the attack is thwarted.

Practical Query Challenges

11. Write a query to find the second-highest salary from the `Employees` table.

This is a classic interview question with multiple solutions.

Solution 1: Using a Subquery with MAX()

This approach finds the highest salary first and then finds the highest salary that is less than the overall highest.

SELECT MAX(salary) AS SecondHighestSalary
FROM Employees
WHERE salary < (SELECT MAX(salary) FROM Employees);

How it works: The inner query (SELECT MAX(salary) FROM Employees) finds the absolute highest salary (120000). The outer query then finds the maximum salary among all employees whose salary is less than 120000.
Pros: Conceptually simple and works in almost any version of SQL.
Cons: This method can be inefficient for finding the "Nth" highest value (e.g., 5th highest) as the subqueries would get complicated. It also doesn't gracefully handle returning the employee(s) who have that salary, just the salary value itself.

Solution 2: Using a Window Function (DENSE_RANK)

This is a more modern and flexible approach. DENSE_RANK() is a good choice because it handles ties without skipping ranks.

WITH SalaryRanks AS (
    SELECT
        salary,
        DENSE_RANK() OVER (ORDER BY salary DESC) as salary_rank
    FROM
        Employees
)
SELECT salary
FROM SalaryRanks
WHERE salary_rank = 2;

How it works:
1. The WITH clause creates a Common Table Expression (CTE) called SalaryRanks.
2. Inside the CTE, the DENSE_RANK() window function is used to assign a rank to each unique salary value in descending order. If two employees have the same salary, they get the same rank.
3. The final SELECT statement simply pulls the salary (or salaries) from the CTE where the rank is 2.
Pros: Very flexible (you can easily find the 3rd, 4th, or Nth highest salary by changing the WHERE clause). It also correctly handles cases where multiple people share the second-highest salary.
Cons: Slightly more complex syntax than the subquery method. Requires a version of SQL that supports window functions (most modern databases do).