Finding missing orders with a recursive CTE

September 22, 2024

Recently I had a customer write into support for our SQLite course, saying that they paid but they still didn't have access. I looked in the database and, sure enough, their order wasn't in there.

I'm going to walk you through how I used a recursive CTE (Common Table Expression) in SQL to find the missing order. It's an effective method for finding gaps in data, even when dealing with unconventional sequence patterns.

(Prefer to watch instead of read? Check out the video version of this article on YouTube: SQL for fun and profit.)

What is a recursive CTE?

A recursive CTE is a query that references itself, allowing for hierarchical or iterative data processing. It consists of an initial condition and a recursive condition that refers back to the CTE.

It's like a loop in a programming language, but in SQL.

Here's the basic structure of a recursive CTE:

WITH RECURSIVE cte_name AS (
-- Initial condition (base case, or anchor member)
SELECT ...
UNION ALL
-- Recursive member
SELECT ... FROM cte_name WHERE ...
)
 
SELECT * FROM cte_name;
Code highlighting powered by torchlight.dev (A service I created!)

The cte_name is the name of the recursive CTE. The recursive CTE consists of two parts:

  1. Anchor member: This is the initial query that starts the recursion. It's the base case for the recursion.
  2. Recursive member: This is the query that references the CTE itself. This is the loop.

The UNION ALL keyword is used to combine the results of the anchor and recursive members.

The WHERE clause in the recursive member is crucial because it defines the termination condition for the recursion. When the WHERE clause evaluates to an empty result set, the recursion stops.

Now that we have a basic understanding of recursive CTEs, let's see how we can use them!

Generating a sequence with recursive CTEs

Before we explore finding a missing order in the database, let's start with a simple example of a recursive CTE. In this case, we'll use it to generate a sequence of numbers from 1 to 10.

WITH RECURSIVE numbers AS (
-- Anchor member: Start with 1
SELECT 1 AS num
UNION ALL
-- Recursive member: Increment num by 1 in each iteration
SELECT num + 1
FROM numbers -- Selecting from the CTE itself!
WHERE num < 10 -- Termination condition: Stop when num reaches 10
)
-- Select all the numbers generated by the CTE
SELECT * FROM numbers;

Running this query will return the following sequence:

1
2
3
4
5
6
7
8
9
10

This is a nice way to generate sequential data, either numbers or dates, traverse hierarchical or tree structures, or perform cumulative calculations.

The missing order

To the issue at hand: the missing order!

The order numbers are generated externally by our payment processor so they don't follow the traditional incremental order number pattern.

Instead, they're a combination of a store ID and an incremental number that resets in a somewhat unconventional way:

Our store ID is 89230. The order numbers increment normally, but when you combine them it looks weird.

The first order is: 892301 ("89230" + "1")
[...]
The ninth order is: 892309 ("89230" + "9")
The tenth order is: 8923010 ("89230" + "10")

So we went from 892,309 to 8,923,010. Kinda wonky.

This numbering system works well within the scope of a store, but it can make finding gaps in the sequence a bit tricky. We can still use a recursive CTE to find the missing order though!

Here's a simplified version of the orders table for reference:

SELECT * FROM orders;
 
-- | id | order_number | product_name |
-- | ------- | ------------ | -------------------- |
-- | 2983419 | 892301 | SQLite: Early access |
-- | 2983469 | 892302 | SQLite: Early access |
-- | 2983584 | 892303 | SQLite: Early access |
-- | 2983858 | 892304 | SQLite: Early access |
-- | 2985444 | 892305 | SQLite: Early access |
-- | 2985462 | 892306 | SQLite: Early access |
-- | 2985483 | 892307 | SQLite: Early access |
-- | 2985485 | 892308 | SQLite: Early access |
-- | 2985521 | 892309 | SQLite: Early access |
-- | 2985325 | 8923010 | SQLite: Early access |

Based on the orders count from our payment processor, we should have 741 orders. However, the actual count in our database shows only 740. Instead of manually hunting down the missing order, we can use SQL to do the work for us.

Find the missing order with recursive CTE

We're going to use a recursive CTE to generate a list of all possible order numbers, and then compare it against our existing orders to find the gap.

  1. Anchor member / initial condition: We start with the first order number, "892301". This part of the CTE is the base case for the recursion.
SELECT "892301" AS order_number, 1 AS n
Code highlighting powered by torchlight.dev (A service I created!)
  1. Recursive member: We increment the order number by 1 in each iteration until we reach the last possible order number, 741.
UNION ALL
SELECT CONCAT("89230", n+1), n+1
FROM all_orders_numbers
WHERE n < 741
  1. Putting it all together, we select the complete list of order numbers.
WITH RECURSIVE all_orders_numbers as (
SELECT "892301" as order_number, 1 as n
UNION ALL
SELECT CONCAT("89230", n+1), n+1
FROM all_orders_numbers
WHERE n < 741
)
 
SELECT * FROM all_orders_numbers ORDER BY n DESC;
 
-- Partial output
-- | order_number | n |
-- | ------------ | --- |
-- | 89230741 | 741 |
-- | 89230740 | 740 |
-- | 89230739 | 739 |
-- | 89230738 | 738 |
-- | 89230737 | 737 |
-- | 89230736 | 736 |
-- | 89230735 | 735 |
-- | 89230734 | 734 |
-- | 89230733 | 733 |
-- | 89230732 | 732 |
  1. Find the missing order: Finally, we join the recursive CTE with the orders table and filter out the orders that exist. The missing order will be the one that doesn't have a corresponding entry in the orders table. We do this by performing a left join on the order_number column and filtering out the rows where orders.id is null.
WITH RECURSIVE all_orders_numbers AS (
SELECT "892301" as order_number, 1 as n
UNION ALL
SELECT CONCAT("89230", n+1), n+1 FROM all_orders_numbers WHERE n < 741
)
 
SELECT * FROM all_orders_numbers
LEFT JOIN orders ON all_orders_numbers.order_number = orders.order_number
WHERE orders.id is NULL
 
 
-- | order_number | n | id | order_number | product_name |
-- | ------------ | --- | ---- | ------------- | ------------ |
-- | 89230458 | 458 | NULL | NULL | NULL |

Boom. Order 89230458 is the missing order. It exists in the sequence but not in the database.

I can go back to my payment processor, retrieve the missing order, and insert it into my database. Problem solved, all thanks to a recursive CTE!

When to use recursive CTEs

Recursive CTEs are incredibly powerful for any scenario where you need to identify gaps in a sequence. While we used it here to find a missing order number, the same technique can be applied to other types of sequences, like dates. For example, if you need to ensure that every day of the year is accounted for in your data, a recursive CTE can generate a complete list of dates, which you can then join with your data to spot any gaps.

Next time you encounter a missing sequence in your data, remember that you can use recursive CTEs! They're a handy tool to have in your SQL toolkit for solving complex data problems.

YouTube video

Check out the video version of this article on YouTube!

Me

Thanks for reading! My name is Aaron and I write, make videos , and generally try really hard .

If you ever have any questions or want to chat, I'm always on Twitter.

If you want to give me money (please do), you can buy my course on SQLite at HighPerformanceSQLite.com or my course on screencasting at Screencasting.com . On the off chance you're a sophomore at Texas A&M University, you can buy my accounting course at acct229.com .

You can find me on YouTube on my personal channel . If you love podcasts, I got you covered. You can listen to me on Mostly Technical .