[ACCEPTED]-What is the most efficient way to write a select statement with a "not in" subquery?-tsql
For starters, a link to an old article in 24 my blog on how NOT IN
predicate works in SQL Server
(and 23 in other systems too):
You can rewrite it 22 as follows:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
, however, most databases will 21 treat these queries the same.
Both these 20 queries will use some kind of an ANTI JOIN
.
This is 19 useful for SQL Server
if you want to check two or 18 more columns, since SQL Server
does not support this 17 syntax:
SELECT *
FROM Orders o
WHERE (col1, col2) NOT IN
(
SELECT col1, col2
FROM HeldOrders ho
)
Note, however, that NOT IN
may be tricky 16 due to the way it treats NULL
values.
If Held.Orders
is 15 nullable, no records are found and the subquery 14 returns but a single NULL
, the whole query will 13 return nothing (both IN
and NOT IN
will evaluate 12 to NULL
in this case).
Consider these data:
Orders:
OrderID
---
1
HeldOrders:
OrderID
---
2
NULL
This 11 query:
SELECT *
FROM Orders o
WHERE OrderID NOT IN
(
SELECT OrderID
FROM HeldOrders ho
)
will return nothing, which is probably not 10 what you'd expect.
However, this one:
SELECT *
FROM Orders o
WHERE NOT EXISTS
(
SELECT NULL
FROM HeldOrders ho
WHERE ho.OrderID = o.OrderID
)
will 9 return the row with OrderID = 1
.
Note that LEFT JOIN
solutions 8 proposed by others is far from being a most 7 efficient solution.
This query:
SELECT *
FROM Orders o
LEFT JOIN
HeldOrders ho
ON ho.OrderID = o.OrderID
WHERE ho.OrderID IS NULL
will use a 6 filter condition that will need to evaluate 5 and filter out all matching rows which can be numerius
An 4 ANTI JOIN
method used by both IN
and EXISTS
will just need 3 to make sure that a record does not exists 2 once per each row in Orders
, so it will eliminate 1 all possible duplicates first:
NESTED LOOPS ANTI JOIN
andMERGE ANTI JOIN
will just skip the duplicates when evaluatingHeldOrders
.- A
HASH ANTI JOIN
will eliminate duplicates when building the hash table.
"Most efficient" is going to be different 18 depending on tables sizes, indexes, and 17 so on. In other words it's going to differ 16 depending on the specific case you're using.
There 15 are three ways I commonly use to accomplish 14 what you want, depending on the situation.
1. Your example works fine if Orders.order_id is indexed, and HeldOrders is fairly small.
2. Another method is the "correlated subquery" which is a slight variation of what you have...
SELECT *
FROM Orders o
WHERE Orders.Order_ID not in (Select Order_ID
FROM HeldOrders h
where h.order_id = o.order_id)
Note 13 the addition of the where clause. This 12 tends to work better when HeldOrders has 11 a large number of rows. Order_ID needs 10 to be indexed in both tables.
3. Another method I use sometimes is left outer join...
SELECT *
FROM Orders o
left outer join HeldOrders h on h.order_id = o.order_id
where h.order_id is null
When using 9 the left outer join, h.order_id will have 8 a value in it matching o.order_id when there 7 is a matching row. If there isn't a matching 6 row, h.order_id will be NULL. By checking 5 for the NULL values in the where clause 4 you can filter on everything that doesn't 3 have a match.
Each of these variations can 2 work more or less efficiently in various 1 scenarios.
You can use a LEFT OUTER JOIN
and check for NULL
on the right 1 table.
SELECT O1.*
FROM Orders O1
LEFT OUTER JOIN HeldOrders O2
ON O1.Order_ID = O2.Order_Id
WHERE O2.Order_Id IS NULL
I'm not sure what is the most efficient, but 1 other options are:
1. Use EXISTS
SELECT *
FROM ORDERS O
WHERE NOT EXISTS (SELECT 1
FROM HeldOrders HO
WHERE O.Order_ID = HO.OrderID)
2. Use EXCEPT
SELECT O.Order_ID
FROM ORDERS O
EXCEPT
SELECT HO.Order_ID
FROM HeldOrders
Try
SELECT *
FROM Orders
LEFT JOIN HeldOrders
ON HeldOrders.Order_ID = Orders.Order_ID
WHERE HeldOrders.Order_ID IS NULL
0
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.