My solutions to various SQL challenges (Leetcode, Hackerrank, etc.)

Skills Covered

  • With Statement and Sub-queries
  • Window Function(RANK, LAG, SUM, etc.)
  • CASE Statement
  • UNION ALL, FULL JOIN, Self-Join

Topics Covered

  • Month-over-month Difference
  • Top N Per Group
  • Ratio

Challenges

Nth Highest Salary

Source: Leetcode

Difficulty: Medium

Write a SQL query to get the nth highest salary from the Employee table.

+----+--------+
| Id | Salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

For example, given the above Employee table, the nth highest salary where n = 2 is 200. If there is no nth highest salary, then the query should return null.

+------------------------+
| getNthHighestSalary(2) |
+------------------------+
| 200                    |
+------------------------+

Solution:

  • First, get the dense ranking of each employee (and wrap them in a subquery).
  • Second, select the salary with the ranking n.
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
  RETURN (
      WITH temp AS (
          SELECT
            Id,
            Salary,
            DENSE_RANK() OVER (ORDER BY Salary DESC) AS RANK_DESC
          FROM
            Employee)
      SELECT
        Salary
      FROM 
        temp
      WHERE
        RANK_DESC = N
      LIMIT 1
  );
END

Tags:

Window Function, Sub-queries, DENSE_RANK Function, Return Nth Highest

Department Top Three Salaries

Source: Leetcode

Difficulty: Hard

Table: Employee

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| Id           | int     |
| Name         | varchar |
| Salary       | int     |
| DepartmentId | int     |
+--------------+---------+


Id is the primary key for this table.
Each row contains the ID, name, salary, and department of one employee.

Table: Department

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| Id          | int     |
| Name        | varchar |
+-------------+---------+

Id is the primary key for this table.
Each row contains the ID and the name of one department.

A company’s executives are interested in seeing who earns the most money in each of the company’s departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department.

Write an SQL query to find the employees who are high earners in each of the departments.

Return the result table in any order.

The query result format is in the following example:

Employee table:
+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 85000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
| 5  | Janet | 69000  | 1            |
| 6  | Randy | 85000  | 1            |
| 7  | Will  | 70000  | 1            |
+----+-------+--------+--------------+

Department table:
+----+-------+
| Id | Name  |
+----+-------+
| 1  | IT    |
| 2  | Sales |
+----+-------+

Result table:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Joe      | 85000  |
| IT         | Randy    | 85000  |
| IT         | Will     | 70000  |
| Sales      | Henry    | 80000  |
| Sales      | Sam      | 60000  |
+------------+----------+--------+

In the IT department:
- Max earns the highest unique salary
- Both Randy and Joe earn the second-highest unique salary
- Will earns the third-highest unique salary

In the Sales department:
- Henry earns the highest salary
- Sam earns the second-highest salary
- There is no third-highest salary as there are only two employee

Solution:

  • First, join the two tables together by department id.
  • Second, get the dense ranking of each employee’s salary within their own department (classic window function use case)
  • Return all employees with a ranking smaller than or equal to 3.
WITH temp AS (
    SELECT
        e.Id,
        e.Name AS Employee,
        e.Salary,
        e.DepartmentId,
        d.Name AS Department,
        DENSE_RANK() OVER (PARTITION BY d.Id ORDER BY Salary DESC) AS Rank_Per_Department
    FROM
        Employee AS e
    JOIN
        Department AS d
    ON
        e.DepartmentId = d.Id)

SELECT
    Department,
    Employee,
    SALARY
FROM
    temp
WHERE
    Rank_Per_Department <=3

Tags:

Window Function, Sub-queries, DENSE_RANK Function, Return Top N

Trips and Users

Source: Leetcode

Difficulty: Hard

Table: Trips

+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| Id          | int      |
| Client_Id   | int      |
| Driver_Id   | int      |
| City_Id     | int      |
| Status      | enum     |
| Request_at  | date     |     
+-------------+----------+
Id is the primary key for this table.
The table holds all taxi trips. Each trip has a unique Id, while Client_Id and Driver_Id are foreign keys to the Users_Id at the Users table.
Status is an ENUM type of (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’).

Table: Users

+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| Users_Id    | int      |
| Banned      | enum     |
| Role        | enum     |
+-------------+----------+
Users_Id is the primary key for this table.
The table holds all users. Each user has a unique Users_Id, and Role is an ENUM type of (‘client’, ‘driver’, ‘partner’).
Status is an ENUM type of (‘Yes’, ‘No’).

Write a SQL query to find the cancellation rate of requests with unbanned users (both client and driver must not be banned) each day between "2013-10-01" and "2013-10-03".

The cancellation rate is computed by dividing the number of canceled (by client or driver) requests with unbanned users by the total number of requests with unbanned users on that day.

Return the result table in any order. Round Cancellation Rate to two decimal points.

The query result format is in the following example:

Trips table:
+----+-----------+-----------+---------+---------------------+------------+
| Id | Client_Id | Driver_Id | City_Id | Status              | Request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1  | 1         | 10        | 1       | completed           | 2013-10-01 |
| 2  | 2         | 11        | 1       | cancelled_by_driver | 2013-10-01 |
| 3  | 3         | 12        | 6       | completed           | 2013-10-01 |
| 4  | 4         | 13        | 6       | cancelled_by_client | 2013-10-01 |
| 5  | 1         | 10        | 1       | completed           | 2013-10-02 |
| 6  | 2         | 11        | 6       | completed           | 2013-10-02 |
| 7  | 3         | 12        | 6       | completed           | 2013-10-02 |
| 8  | 2         | 12        | 12      | completed           | 2013-10-03 |
| 9  | 3         | 10        | 12      | completed           | 2013-10-03 |
| 10 | 4         | 13        | 12      | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+

Users table:
+----------+--------+--------+
| Users_Id | Banned | Role   |
+----------+--------+--------+
| 1        | No     | client |
| 2        | Yes    | client |
| 3        | No     | client |
| 4        | No     | client |
| 10       | No     | driver |
| 11       | No     | driver |
| 12       | No     | driver |
| 13       | No     | driver |
+----------+--------+--------+

Result table:
+------------+-------------------+
| Day        | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33              |
| 2013-10-02 | 0.00              |
| 2013-10-03 | 0.50              |
+------------+-------------------+

On 2013-10-01:
  - There were 4 requests in total, 2 of which were canceled.
  - However, the request with Id=2 was made by a banned client (User_Id=2), so it is ignored in the calculation.
  - Hence there are 3 unbanned requests in total, 1 of which was canceled.
  - The Cancellation Rate is (1 / 3) = 0.33
On 2013-10-02:
  - There were 3 requests in total, 0 of which were canceled.
  - The request with Id=6 was made by a banned client, so it is ignored.
  - Hence there are 2 unbanned requests in total, 0 of which were canceled.
  - The Cancellation Rate is (0 / 2) = 0.00
On 2013-10-03:
  - There were 3 requests in total, 1 of which was canceled.
  - The request with Id=8 was made by a banned client, so it is ignored.
  - Hence there are 2 unbanned request in total, 1 of which were canceled.
  - The Cancellation Rate is (1 / 2) = 0.50

Solution:

  • First, get the status(banned or not banned) of each driver and client with a triple join
  • Then, filter out transactions with banned users and outside the time period concerned
  • Next, put the above query inside a WITH statement.
  • Last, calculate the cancellation ratio using a CASE statement.
WITH temp AS (
    SELECT
        t.Client_Id,
        t.Driver_ID,
        t.Status,
        t.Request_at
    FROM
        Trips AS t,
        Users AS uc,
        Users AS ud
    WHERE
        t.Client_Id = uc.Users_Id AND
        t.Driver_Id = ud.Users_Id AND
        uc.Banned = 'No' AND
        ud.Banned = 'No' AND
        t.Request_at >= "2013-10-01" AND
        t.Request_at <= "2013-10-03")

SELECT
    Request_at AS Day,
    ROUND(1-SUM(CASE WHEN Status='completed' THEN 1 ELSE 0 END)/COUNT(*), 2) AS 'Cancellation Rate'
FROM
    temp
GROUP BY
    Day

Tags:

Sub-queries, Triple Joins, CASE Statement, Ratio Calculation

Monthly Percentage Difference

Source: StrataScratch

Difficulty: Hard

Amazon Interview Question

Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year.

The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month’s revenue - last month’s revenue) / last month’s revenue)*100.

id created_at value purchase_id
1 2019-01-01 172692 43
2 2019-01-05 177194 36
3 2019-01-09 109513 30
4 2019-01-13 164911 30
5 2019-01-17 198872 39

Solution:

  • To begin with, get the monthly aggregated revenue.
  • Then, for each month, get its previous month’s revenue in the same row with LAG.
  • Last, just calculate the month-over-month percentage difference.
WITH 
    temp AS (
        SELECT
            TO_CHAR(created_at, 'YYYY-MM') AS year_month,
            SUM(value) AS revenue
        FROM
            sf_transactions
        GROUP BY
            year_month
        ORDER BY
            year_month),
    temp2 AS (
        SELECT
            year_month,
            LAG(revenue) OVER (ORDER BY year_month) AS revenue_previous_month,
            revenue
        FROM
            temp)
    
SELECT
    year_month,
    ROUND(revenue/revenue_previous_month - 1, 4) * 100 AS revenue_diff_pct
FROM
    temp2

Tags:

Window Function, LAG Function, Sub-queries, Month-over-month Percentage

Popularity Percentage

Source: StrataScratch

Difficulty: Hard

Facebook Interview Question

Find the popularity percentage for each user on Facebook.

The popularity percentage is defined as the total number of friends the user has divided by the total number of users on the platform, then converted into a percentage by multiplying by 100. Output each user along with their popularity percentage. Order records in ascending order by user id. The ‘user1’ and ‘user2’ column are pairs of friends.

user1 user2
2 1
1 3
4 1
1 5
1 6

Solution:

  • First of all, create another table by swiping the two columns
  • Second, UNION ALL the two tables so that every person appears in the first column and all their friends the second
  • Third, count the number of their friends by doing a GROUP BY
  • Last, calculate the popularity ratio by diving the number of their friends by the number of users on the platform
WITH 
    temp AS (
        (SELECT
            user1,
            user2
        FROM facebook_friends)
        UNION ALL
        (SELECT
            user2 AS user1,
            user1 AS user2
        FROM facebook_friends)),
        
    temp2 AS (
        SELECT
            user1,
            COUNT(*) AS num_friends
        FROM
            temp
        GROUP BY
            user1    
    )

SELECT
    user1,
    100.00*num_friends/COUNT(*) OVER () AS popularity_percent
FROM
    temp2
ORDER BY
    user1

Tags:

UNION ALL, Sub-queries, Window Function, Ratio

Monthly Percentage Difference

Source: Leetcode

Difficulty: Medium

Mary is a teacher in a middle school and she has a table seat storing students’ names and their corresponding seat ids.

The column id is continuous increment.

Mary wants to change seats for the adjacent students.

Can you write a SQL query to output the result for Mary?

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Abbot   |
|    2    | Doris   |
|    3    | Emerson |
|    4    | Green   |
|    5    | Jeames  |
+---------+---------+

For the sample input, the output is:

+---------+---------+
|    id   | student |
+---------+---------+
|    1    | Doris   |
|    2    | Abbot   |
|    3    | Green   |
|    4    | Emerson |
|    5    | Jeames  |
+---------+---------+

Note:

If the number of students is odd, there is no need to change the last one’s seat.

Solution:

  • First, join the table with itself
  • Second, write down the update rules in the WHERE clause:
    • If the old seat number is odd, add one to it (e.g. 1->2).
    • If the old seat number is even, subtract one from it (e.g. 2->1).
    • If it’s the last seat and the total number is odd, keep it as it is (e.g. 7->7)
SELECT
    old.id,
    new.student
FROM
    seat AS old,
    seat AS new
WHERE
    (old.id % 2 = 1 AND new.id = old.id + 1) OR
    (old.id % 2 = 0 AND new.id = old.id - 1) OR
    (old.id % 2 = 1 AND old.id = (SELECT MAX(id) FROM seat) AND new.id = old.id)
ORDER BY
    old.id

Tags:

Self-Join, Swipe Position, Sub-Query