Finding missing orders with a recursive CTE
September 22, 2024
Recently I had a customer write into support for our SQLite course, saying that they paid but they still didn't have access. I looked in the database and, sure enough, their order wasn't in there.
I'm going to walk you through how I used a recursive CTE (Common Table Expression) in SQL to find the missing order. It's an effective method for finding gaps in data, even when dealing with unconventional sequence patterns.
(Prefer to watch instead of read? Check out the video version of this article on YouTube: SQL for fun and profit.)
What is a recursive CTE?
A recursive CTE is a query that references itself, allowing for hierarchical or iterative data processing. It consists of an initial condition and a recursive condition that refers back to the CTE.
It's like a loop in a programming language, but in SQL.
Here's the basic structure of a recursive CTE:
WITH RECURSIVE cte_name AS ( -- Initial condition (base case, or anchor member) SELECT ... UNION ALL -- Recursive member SELECT ... FROM cte_name WHERE ...) SELECT * FROM cte_name;
The cte_name
is the name of the recursive CTE. The recursive CTE consists of two parts:
- Anchor member: This is the initial query that starts the recursion. It's the base case for the recursion.
- Recursive member: This is the query that references the CTE itself. This is the loop.
The UNION ALL
keyword is used to combine the results of the anchor and recursive members.
The WHERE
clause in the recursive member is crucial because it defines the termination condition for the recursion.
When the WHERE
clause evaluates to an empty result set, the recursion stops.
Now that we have a basic understanding of recursive CTEs, let's see how we can use them!
Generating a sequence with recursive CTEs
Before we explore finding a missing order in the database, let's start with a simple example of a recursive CTE. In this case, we'll use it to generate a sequence of numbers from 1 to 10.
WITH RECURSIVE numbers AS ( -- Anchor member: Start with 1 SELECT 1 AS num UNION ALL -- Recursive member: Increment num by 1 in each iteration SELECT num + 1 FROM numbers -- Selecting from the CTE itself! WHERE num < 10 -- Termination condition: Stop when num reaches 10)-- Select all the numbers generated by the CTESELECT * FROM numbers;
Running this query will return the following sequence:
12345678910
This is a nice way to generate sequential data, either numbers or dates, traverse hierarchical or tree structures, or perform cumulative calculations.
The missing order
To the issue at hand: the missing order!
The order numbers are generated externally by our payment processor so they don't follow the traditional incremental order number pattern.
Instead, they're a combination of a store ID and an incremental number that resets in a somewhat unconventional way:
Our store ID is 89230
. The order numbers increment normally, but when you combine them it looks weird.
The first order is: 892301 ("89230" + "1")[...]The ninth order is: 892309 ("89230" + "9")The tenth order is: 8923010 ("89230" + "10")
So we went from 892,309 to 8,923,010. Kinda wonky.
This numbering system works well within the scope of a store, but it can make finding gaps in the sequence a bit tricky. We can still use a recursive CTE to find the missing order though!
Here's a simplified version of the orders table for reference:
SELECT * FROM orders; -- | id | order_number | product_name |-- | ------- | ------------ | -------------------- |-- | 2983419 | 892301 | SQLite: Early access |-- | 2983469 | 892302 | SQLite: Early access |-- | 2983584 | 892303 | SQLite: Early access |-- | 2983858 | 892304 | SQLite: Early access |-- | 2985444 | 892305 | SQLite: Early access |-- | 2985462 | 892306 | SQLite: Early access |-- | 2985483 | 892307 | SQLite: Early access |-- | 2985485 | 892308 | SQLite: Early access |-- | 2985521 | 892309 | SQLite: Early access |-- | 2985325 | 8923010 | SQLite: Early access |
Based on the orders count from our payment processor, we should have 741 orders. However, the actual count in our database shows only 740. Instead of manually hunting down the missing order, we can use SQL to do the work for us.
Find the missing order with recursive CTE
We're going to use a recursive CTE to generate a list of all possible order numbers, and then compare it against our existing orders to find the gap.
- Anchor member / initial condition: We start with the first order number, "892301". This part of the CTE is the base case for the recursion.
SELECT "892301" AS order_number, 1 AS n
- Recursive member: We increment the order number by 1 in each iteration until we reach the last possible order number, 741.
UNION ALLSELECT CONCAT("89230", n+1), n+1 FROM all_orders_numbers WHERE n < 741
- Putting it all together, we select the complete list of order numbers.
WITH RECURSIVE all_orders_numbers as ( SELECT "892301" as order_number, 1 as n UNION ALL SELECT CONCAT("89230", n+1), n+1 FROM all_orders_numbers WHERE n < 741) SELECT * FROM all_orders_numbers ORDER BY n DESC; -- Partial output-- | order_number | n |-- | ------------ | --- |-- | 89230741 | 741 |-- | 89230740 | 740 |-- | 89230739 | 739 |-- | 89230738 | 738 |-- | 89230737 | 737 |-- | 89230736 | 736 |-- | 89230735 | 735 |-- | 89230734 | 734 |-- | 89230733 | 733 |-- | 89230732 | 732 |
- Find the missing order: Finally, we join the recursive CTE with the orders table and filter out the orders that
exist. The missing order will be the one that doesn't have a corresponding entry in the orders table. We do this by
performing a left join on the
order_number
column and filtering out the rows whereorders.id
is null.
WITH RECURSIVE all_orders_numbers AS ( SELECT "892301" as order_number, 1 as n UNION ALL SELECT CONCAT("89230", n+1), n+1 FROM all_orders_numbers WHERE n < 741) SELECT * FROM all_orders_numbers LEFT JOIN orders ON all_orders_numbers.order_number = orders.order_number WHERE orders.id is NULL -- | order_number | n | id | order_number | product_name |-- | ------------ | --- | ---- | ------------- | ------------ |-- | 89230458 | 458 | NULL | NULL | NULL |
Boom. Order 89230458 is the missing order. It exists in the sequence but not in the database.
I can go back to my payment processor, retrieve the missing order, and insert it into my database. Problem solved, all thanks to a recursive CTE!
When to use recursive CTEs
Recursive CTEs are incredibly powerful for any scenario where you need to identify gaps in a sequence. While we used it here to find a missing order number, the same technique can be applied to other types of sequences, like dates. For example, if you need to ensure that every day of the year is accounted for in your data, a recursive CTE can generate a complete list of dates, which you can then join with your data to spot any gaps.
Next time you encounter a missing sequence in your data, remember that you can use recursive CTEs! They're a handy tool to have in your SQL toolkit for solving complex data problems.
YouTube video
Check out the video version of this article on YouTube!