What Makes My Query Non Sargable?


Sargable queries

A query is SARGable (Search ARGument able) if a database engine can take advantage of index.

What makes a query non SARGable?

These are the few factors that makes a query non sargable.

1. Put a column inside a function in the where clause.

select
 s.SalesOrderID,
 s.SalesPersonID,
 s.OrderDate,
 s.SalesOrderNumber
from Sales.SalesOrderHeader s
where YEAR(s.OrderDate) = 2011; -- Row Retrieved: 1607 and Row read : 31465

Note: I have created covering index on the OrderDate:

create nonclustered index IX_Sale_OrderDate on Sales.SalesOrderHeader(OrderDate)
include
(
 SalesOrderID,
 SalesPersonID,
 SalesOrderNumber
);

Let’s look at the execution plan:

blah blah

There are total of 1607 rows for year 2011, but total number of row reads are 31465. Let me explain what happened here. The OrderDate has a non-clustered index but database engine won’t optimize it and this query is going to read all the rows from the table. Because, YEAR function will evaluate against each row that causes the non clustered index scan (scan a whole index which have all the needed columns because of covering index). If you don’t understand scans and seeks, I would refer you to check this article.

We can re-write the query to optimize the index on OrderDate.

select
 s.SalesOrderID,
 s.SalesPersonID,
 s.OrderDate,
 s.SalesOrderNumber
from Sales.SalesOrderHeader s
where s.OrderDate>= '2011-01-01' and s.OrderDate<='2011-12-31'; -- rows retrieved 1607 and row read: 1607

bb2

Now this query is optimizing the index.


2. Using the leading wildcards in the LIKE keyword

SELECT
 b.*
FROM Book b
WHERE b.Title Like '%fairy%';

img

The query defined above won’t use the index which is defined on the column ‘Title’. It performs an ‘index scan’ and read all the rows. Because, we need to scan all the rows where a title contains a string ‘fairy’.

Let’s re-write the query:

SELECT
 b.*
FROM Book b
WHERE b.Title Like 'fairy%';

img

This query uses the index and read only appropriate rows. Since the ‘Book’ table have only 1 record which starts with title ‘fairy’, so it will read only 1 row.


3. Implicit conversions (in the table side)

When we compare two values with different data types, the sql server converts the lower-precedence data type to higher-precedence data type. Let’s look at the example below (which I have taken from the article mssqltips and I am using AdventureWorks2022 database).

-- Note: `CardNumber` is `nvarchar`, there is an index on the `CardNumber`
-- and we are comparing it with an integer 11119775847802
SELECT
 CreditCardID,
 CardNumber
FROM Sales.CreditCard
WHERE CardNumber = 11119775847802;

img

Here CardNumber is a type of nvarchar and we are comparing it with an integer. CardNumber has an index. There is only one record with CardNumber=11119775847802, so it should read only 1 row to fetch the record. But that won’t happen. Sql server database engine reads all the rows to find this record. You wonder why? Ok let me explain.

CardNumber is a nvarchar type, we are searching for record with CardNumber = 11119775847802 which is an integer type. Instead of converting 11119775847802 to the nvarchar, the SQL Server converts all CardNumber rows to an integer, leading to index scan. Are you getting the problem here. We just need a single record, whose CardNumber is **11119775847802**, for that the whole table is being scanned.

So, use ‘11119775847802’instead of 11119775847802 , which avoids implicit conversion in a table side, as shown below:

SELECT
 CreditCardID,
 CardNumber
FROM Sales.CreditCard
WHERE CardNumber = '11119775847802';

img

This query will read 1 row only and use the index effectively.

Note: Sometimes sql server smartly handles the implicit conversion and avoid conversion in the table side, like in this example:

SELECT
 soh.SalesOrderID,
 soh.SalesOrderNumber,
 soh.ModifiedDate
FROM Sales.SalesOrderHeader soh
WHERE soh.SalesOrderID= '43668';

The query above won’t perform an implicit conversion on the table side. Rather than converting all SalesOrderId to nvarchar, it just have converted ‘43668’ to integer.

I have tested various queries, where I am comparing the string values (eg. ‘43668’) to an integer type columns (eg. SalesOrderID). What I found is, sql server handles it very well and converts the comparing value not all the rows. That causes the index seek.

So, sometimes sql optimizer also optimize things.


4. Using optional parameter (where column_name=@some_parameter or @some_parameter is NULL)

I am the victim of this one. I have been doing this for my entire life whenever I was writing raw sql queries (however I have used EF most of the time). And I was not even aware that it is not a good query.

Let’s look at the example :

create or alter procedure GetBooks
 @SearchTerm nvarchar(40)=null
as
begin
 set nocount on;

    select * from Book
    where  @SearchTerm is null or Title like @SearchTerm+'%';

end
go

-- execute the procedure
exec GetBooks @SearchTerm='The Epic Of Gilgamesh'; -- returns 1 row

some_image

The above query wont use the index on Title and end up in index scan (scanning all the rows).

We can solve it in two ways:

(i). Using dynamic sql

create or alter procedure GetBooks
 @SearchTerm nvarchar(40)=null
as
begin

set nocount on;

declare @SQL nvarchar(max);

set @SQL = N'SELECT * FROM Book WHERE 1 = 1';

if @SearchTerm is not null
 begin
 set @SearchTerm = @SearchTerm+'%';
 set @SQL += N' AND Title LIKE @SearchTerm';
 end

exec sp_executesql @SQL, N'@SearchTerm nvarchar(40)', @SearchTerm;
end
go

-- Executing the procedure

exec GetBooks @SearchTerm='The Epic Of Gilgamesh'; -- return 1 row

some_img

(ii). Using OPTION (RECOMPILE)

create or alter procedure GetBooks
 @SearchTerm nvarchar(40)=null
as
begin
 set nocount on;

select * from Book
 where @SearchTerm is null or Title like @SearchTerm+'%'
 option (RECOMPILE);
end

-- executing the procedure
exec GetBooks @SearchTerm='The Epic Of Gilgamesh'; -- returns 1 row

some_img

RECOMPILE causes increased CPU usage. In the most cases it won’t be a huge problem. If it is then consider using dynamic sql. But first look for the RECOMPILE. If a query has long compilation time and it runs frequently then consider dynamic sql. It is explained in depth by Gail, check it out, it has an amazing depth.


5. Using OR In where conditions

SELECT
 SalesOrderID,
 OrderDate,
 SalesOrderNumber,
 CustomerID
FROM
Sales.SalesOrderHeader
WHERE
 CustomerID= 29642
 OR
 OrderDate < '2011-10-31';

In the table Sales.SalesOrderHeader, the column CustomerId has a primary key. And I have defined convering index On OrderDate (SalesOrderID, CustomerID).

CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_OrderDate_Covering
ON Sales.SalesOrderHeader (OrderDate)
INCLUDE (SalesOrderID, CustomerID)

Execution plan:

some_img

This query looks for two different conditions and end up in index scan.

👉 One solution is to use FORCESEEK hint as mentioned here:

SELECT
 SalesOrderID,
 OrderDate,
 SalesOrderNumber,
 CustomerID
FROM
Sales.SalesOrderHeader
WITH (FORCESEEK)
WHERE
 CustomerID= 29642
 OR
 OrderDate < '2011-10-31';

some_img

👉 Another solution is to use union:

SELECT
 SalesOrderID,
 OrderDate,
 SalesOrderNumber,
 CustomerID
FROM
Sales.SalesOrderHeader
WHERE CustomerID= 29642

UNION

SELECT
 SalesOrderID,
 OrderDate,
 SalesOrderNumber,
 CustomerID
FROM
Sales.SalesOrderHeader
WHERE OrderDate < '2011-10-31';

some_img


One more thing

It is not exactly related to the index but it is related to performance. Do not use OR with JOIN. I’ve personally never encountered this situation but I found it here during the research for this article. I have simulated the situation and presented it to you.

SELECT DISTINCT c.name, o.order_id, o.amount, o.status
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
OR c.email = o.customer_email

Execution plan:

some_img

Statistics:

some_img

This query took considerable amount of time to run. Even I have only 10k records in each table.

We can use union to tackle the situation.

SELECT c.name, o.order_id, o.amount, o.status
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id

UNION

SELECT c.name, o.order_id, o.amount, o.status
FROM customers c
left JOIN orders o ON c.email = o.customer_email

some_img

This query took less time and logical reads.


Summary

Any of these points do not always scan the table. This only happens when you have large data set (I am talking about thousands of records). Sql server optimizer is smart enough to optimize a query (thats why it is hired). But it is always good to follow these things.

If we sum up, don’t do this:

  • where some_function(some_column)= some_value.
  • where some_column like %search_string%
  • where some_nvarchar_column= integer_value
  • where column_name=@some_parameter or @some_parameter is NULL
  • where condition1 OR condition2;

References

These are the list of articles I’ve followed during my research. They are amazing and provides depth. If you want to deep dive then you must checkout them.


Canonical link

sql