[ACCEPTED]-What is the most efficient way to write a select statement with a "not in" subquery?-tsql

Accepted answer
Score: 22

For starters, a link to an old article in 24 my blog on how NOT IN predicate works in SQL Server (and 23 in other systems too):


You can rewrite it 22 as follows:

SELECT  *
FROM    Orders o
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    HeldOrders ho
        WHERE   ho.OrderID = o.OrderID
        )

, however, most databases will 21 treat these queries the same.

Both these 20 queries will use some kind of an ANTI JOIN.

This is 19 useful for SQL Server if you want to check two or 18 more columns, since SQL Server does not support this 17 syntax:

SELECT  *
FROM    Orders o
WHERE   (col1, col2) NOT IN
        (
        SELECT  col1, col2
        FROM    HeldOrders ho
        )

Note, however, that NOT IN may be tricky 16 due to the way it treats NULL values.

If Held.Orders is 15 nullable, no records are found and the subquery 14 returns but a single NULL, the whole query will 13 return nothing (both IN and NOT IN will evaluate 12 to NULL in this case).

Consider these data:

Orders:

OrderID
---
1

HeldOrders:

OrderID
---
2
NULL

This 11 query:

SELECT  *
FROM    Orders o
WHERE   OrderID NOT IN
        (
        SELECT  OrderID
        FROM    HeldOrders ho
        )

will return nothing, which is probably not 10 what you'd expect.

However, this one:

SELECT  *
FROM    Orders o
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    HeldOrders ho
        WHERE   ho.OrderID = o.OrderID
        )

will 9 return the row with OrderID = 1.

Note that LEFT JOIN solutions 8 proposed by others is far from being a most 7 efficient solution.

This query:

SELECT  *
FROM    Orders o
LEFT JOIN
        HeldOrders ho
ON      ho.OrderID = o.OrderID
WHERE   ho.OrderID IS NULL

will use a 6 filter condition that will need to evaluate 5 and filter out all matching rows which can be numerius

An 4 ANTI JOIN method used by both IN and EXISTS will just need 3 to make sure that a record does not exists 2 once per each row in Orders, so it will eliminate 1 all possible duplicates first:

  • NESTED LOOPS ANTI JOIN and MERGE ANTI JOIN will just skip the duplicates when evaluating HeldOrders.
  • A HASH ANTI JOIN will eliminate duplicates when building the hash table.
Score: 16

"Most efficient" is going to be different 18 depending on tables sizes, indexes, and 17 so on. In other words it's going to differ 16 depending on the specific case you're using.

There 15 are three ways I commonly use to accomplish 14 what you want, depending on the situation.

1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.

2. Another method is the "correlated subquery" which is a slight variation of what you have...

SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID 
                              FROM HeldOrders h 
                              where h.order_id = o.order_id)

Note 13 the addition of the where clause. This 12 tends to work better when HeldOrders has 11 a large number of rows. Order_ID needs 10 to be indexed in both tables.

3. Another method I use sometimes is left outer join...

SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null

When using 9 the left outer join, h.order_id will have 8 a value in it matching o.order_id when there 7 is a matching row. If there isn't a matching 6 row, h.order_id will be NULL. By checking 5 for the NULL values in the where clause 4 you can filter on everything that doesn't 3 have a match.

Each of these variations can 2 work more or less efficiently in various 1 scenarios.

Score: 5

You can use a LEFT OUTER JOIN and check for NULL on the right 1 table.

SELECT O1.*
FROM Orders O1
LEFT OUTER JOIN HeldOrders O2
ON O1.Order_ID = O2.Order_Id
WHERE O2.Order_Id IS NULL
Score: 1

I'm not sure what is the most efficient, but 1 other options are:

1. Use EXISTS

SELECT * 
FROM ORDERS O 
WHERE NOT EXISTS (SELECT 1 
                  FROM HeldOrders HO 
                  WHERE O.Order_ID = HO.OrderID)

2. Use EXCEPT

SELECT O.Order_ID 
FROM ORDERS O 
EXCEPT 
SELECT HO.Order_ID 
FROM HeldOrders
Score: 0

Try

SELECT *
FROM Orders
LEFT JOIN HeldOrders
ON HeldOrders.Order_ID = Orders.Order_ID
WHERE HeldOrders.Order_ID IS NULL

0

More Related questions