My solutions to various SQL challenges (Leetcode, Hackerrank, etc.)
Skills Covered
- With Statement and Sub-queries
- Window Function(RANK, LAG, SUM, etc.)
- CASE Statement
- UNION ALL, FULL JOIN, Self-Join
Topics Covered
- Month-over-month Difference
- Top N Per Group
- Ratio
Challenges
Nth Highest Salary
Source: Leetcode
Difficulty: Medium
Write a SQL query to get the nth highest salary from the Employee
table.
+----+--------+
| Id | Salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+
For example, given the above Employee
table, the nth highest salary where n = 2 is 200
.
If there is no nth highest salary, then the query should return null
.
+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| 200 |
+------------------------+
Solution:
- First, get the dense ranking of each employee (and wrap them in a subquery).
- Second, select the salary with the ranking n.
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
RETURN (
WITH temp AS (
SELECT
Id,
Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC) AS RANK_DESC
FROM
Employee)
SELECT
Salary
FROM
temp
WHERE
RANK_DESC = N
LIMIT 1
);
END
Tags:
Window Function, Sub-queries, DENSE_RANK Function, Return Nth Highest
Department Top Three Salaries
Source: Leetcode
Difficulty: Hard
Table: Employee
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| Id | int |
| Name | varchar |
| Salary | int |
| DepartmentId | int |
+--------------+---------+
Id is the primary key for this table.
Each row contains the ID, name, salary, and department of one employee.
Table: Department
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| Id | int |
| Name | varchar |
+-------------+---------+
Id is the primary key for this table.
Each row contains the ID and the name of one department.
A company’s executives are interested in seeing who earns the most money in each of the company’s departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department.
Write an SQL query to find the employees who are high earners in each of the departments.
Return the result table in any order.
The query result format is in the following example:
Employee table:
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 85000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
| 7 | Will | 70000 | 1 |
+----+-------+--------+--------------+
Department table:
+----+-------+
| Id | Name |
+----+-------+
| 1 | IT |
| 2 | Sales |
+----+-------+
Result table:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Joe | 85000 |
| IT | Randy | 85000 |
| IT | Will | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
In the IT department:
- Max earns the highest unique salary
- Both Randy and Joe earn the second-highest unique salary
- Will earns the third-highest unique salary
In the Sales department:
- Henry earns the highest salary
- Sam earns the second-highest salary
- There is no third-highest salary as there are only two employee
Solution:
- First, join the two tables together by department id.
- Second, get the dense ranking of each employee’s salary within their own department (classic window function use case)
- Return all employees with a ranking smaller than or equal to 3.
WITH temp AS (
SELECT
e.Id,
e.Name AS Employee,
e.Salary,
e.DepartmentId,
d.Name AS Department,
DENSE_RANK() OVER (PARTITION BY d.Id ORDER BY Salary DESC) AS Rank_Per_Department
FROM
Employee AS e
JOIN
Department AS d
ON
e.DepartmentId = d.Id)
SELECT
Department,
Employee,
SALARY
FROM
temp
WHERE
Rank_Per_Department <=3
Tags:
Window Function, Sub-queries, DENSE_RANK Function, Return Top N
Trips and Users
Source: Leetcode
Difficulty: Hard
Table: Trips
+-------------+----------+
| Column Name | Type |
+-------------+----------+
| Id | int |
| Client_Id | int |
| Driver_Id | int |
| City_Id | int |
| Status | enum |
| Request_at | date |
+-------------+----------+
Id is the primary key for this table.
The table holds all taxi trips. Each trip has a unique Id, while Client_Id and Driver_Id are foreign keys to the Users_Id at the Users table.
Status is an ENUM type of (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’).
Table: Users
+-------------+----------+
| Column Name | Type |
+-------------+----------+
| Users_Id | int |
| Banned | enum |
| Role | enum |
+-------------+----------+
Users_Id is the primary key for this table.
The table holds all users. Each user has a unique Users_Id, and Role is an ENUM type of (‘client’, ‘driver’, ‘partner’).
Status is an ENUM type of (‘Yes’, ‘No’).
Write a SQL query to find the cancellation rate of requests with unbanned users
(both client and driver must not be banned) each day between
"2013-10-01"
and "2013-10-03"
.
The cancellation rate is computed by dividing the number of canceled (by client or driver) requests with unbanned users by the total number of requests with unbanned users on that day.
Return the result table in any order. Round Cancellation Rate
to two decimal points.
The query result format is in the following example:
Trips table:
+----+-----------+-----------+---------+---------------------+------------+
| Id | Client_Id | Driver_Id | City_Id | Status | Request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+
Users table:
+----------+--------+--------+
| Users_Id | Banned | Role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
Result table:
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
On 2013-10-01:
- There were 4 requests in total, 2 of which were canceled.
- However, the request with Id=2 was made by a banned client (User_Id=2), so it is ignored in the calculation.
- Hence there are 3 unbanned requests in total, 1 of which was canceled.
- The Cancellation Rate is (1 / 3) = 0.33
On 2013-10-02:
- There were 3 requests in total, 0 of which were canceled.
- The request with Id=6 was made by a banned client, so it is ignored.
- Hence there are 2 unbanned requests in total, 0 of which were canceled.
- The Cancellation Rate is (0 / 2) = 0.00
On 2013-10-03:
- There were 3 requests in total, 1 of which was canceled.
- The request with Id=8 was made by a banned client, so it is ignored.
- Hence there are 2 unbanned request in total, 1 of which were canceled.
- The Cancellation Rate is (1 / 2) = 0.50
Solution:
- First, get the status(banned or not banned) of each driver and client with a triple join
- Then, filter out transactions with banned users and outside the time period concerned
- Next, put the above query inside a WITH statement.
- Last, calculate the cancellation ratio using a CASE statement.
WITH temp AS (
SELECT
t.Client_Id,
t.Driver_ID,
t.Status,
t.Request_at
FROM
Trips AS t,
Users AS uc,
Users AS ud
WHERE
t.Client_Id = uc.Users_Id AND
t.Driver_Id = ud.Users_Id AND
uc.Banned = 'No' AND
ud.Banned = 'No' AND
t.Request_at >= "2013-10-01" AND
t.Request_at <= "2013-10-03")
SELECT
Request_at AS Day,
ROUND(1-SUM(CASE WHEN Status='completed' THEN 1 ELSE 0 END)/COUNT(*), 2) AS 'Cancellation Rate'
FROM
temp
GROUP BY
Day
Tags:
Sub-queries, Triple Joins, CASE Statement, Ratio Calculation
Monthly Percentage Difference
Source: StrataScratch
Difficulty: Hard
Amazon Interview Question
Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year.
The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month’s revenue - last month’s revenue) / last month’s revenue)*100.
id | created_at | value | purchase_id |
---|---|---|---|
1 | 2019-01-01 | 172692 | 43 |
2 | 2019-01-05 | 177194 | 36 |
3 | 2019-01-09 | 109513 | 30 |
4 | 2019-01-13 | 164911 | 30 |
5 | 2019-01-17 | 198872 | 39 |
Solution:
- To begin with, get the monthly aggregated revenue.
- Then, for each month, get its previous month’s revenue in the same row with LAG.
- Last, just calculate the month-over-month percentage difference.
WITH
temp AS (
SELECT
TO_CHAR(created_at, 'YYYY-MM') AS year_month,
SUM(value) AS revenue
FROM
sf_transactions
GROUP BY
year_month
ORDER BY
year_month),
temp2 AS (
SELECT
year_month,
LAG(revenue) OVER (ORDER BY year_month) AS revenue_previous_month,
revenue
FROM
temp)
SELECT
year_month,
ROUND(revenue/revenue_previous_month - 1, 4) * 100 AS revenue_diff_pct
FROM
temp2
Tags:
Window Function, LAG Function, Sub-queries, Month-over-month Percentage
Popularity Percentage
Source: StrataScratch
Difficulty: Hard
Facebook Interview Question
Find the popularity percentage for each user on Facebook.
The popularity percentage is defined as the total number of friends the user has divided by the total number of users on the platform, then converted into a percentage by multiplying by 100. Output each user along with their popularity percentage. Order records in ascending order by user id. The ‘user1’ and ‘user2’ column are pairs of friends.
user1 | user2 |
---|---|
2 | 1 |
1 | 3 |
4 | 1 |
1 | 5 |
1 | 6 |
Solution:
- First of all, create another table by swiping the two columns
- Second, UNION ALL the two tables so that every person appears in the first column and all their friends the second
- Third, count the number of their friends by doing a GROUP BY
- Last, calculate the popularity ratio by diving the number of their friends by the number of users on the platform
WITH
temp AS (
(SELECT
user1,
user2
FROM facebook_friends)
UNION ALL
(SELECT
user2 AS user1,
user1 AS user2
FROM facebook_friends)),
temp2 AS (
SELECT
user1,
COUNT(*) AS num_friends
FROM
temp
GROUP BY
user1
)
SELECT
user1,
100.00*num_friends/COUNT(*) OVER () AS popularity_percent
FROM
temp2
ORDER BY
user1
Tags:
UNION ALL, Sub-queries, Window Function, Ratio
Monthly Percentage Difference
Source: Leetcode
Difficulty: Medium
Mary is a teacher in a middle school and she has a table seat
storing students’ names and their corresponding seat ids.
The column id is continuous increment.
Mary wants to change seats for the adjacent students.
Can you write a SQL query to output the result for Mary?
+---------+---------+
| id | student |
+---------+---------+
| 1 | Abbot |
| 2 | Doris |
| 3 | Emerson |
| 4 | Green |
| 5 | Jeames |
+---------+---------+
For the sample input, the output is:
+---------+---------+
| id | student |
+---------+---------+
| 1 | Doris |
| 2 | Abbot |
| 3 | Green |
| 4 | Emerson |
| 5 | Jeames |
+---------+---------+
Note:
If the number of students is odd, there is no need to change the last one’s seat.
Solution:
- First, join the table with itself
- Second, write down the update rules in the WHERE clause:
- If the old seat number is odd, add one to it (e.g. 1->2).
- If the old seat number is even, subtract one from it (e.g. 2->1).
- If it’s the last seat and the total number is odd, keep it as it is (e.g. 7->7)
SELECT
old.id,
new.student
FROM
seat AS old,
seat AS new
WHERE
(old.id % 2 = 1 AND new.id = old.id + 1) OR
(old.id % 2 = 0 AND new.id = old.id - 1) OR
(old.id % 2 = 1 AND old.id = (SELECT MAX(id) FROM seat) AND new.id = old.id)
ORDER BY
old.id
Tags:
Self-Join, Swipe Position, Sub-Query