When writing queries, we seem to care less about the performance issue at first. Our first goal is to make the output right. When we get the correct output, we then move on to the next phase we call performance tuning. Today we will try to understand what happens when you write OR in your query. We will do the same thing with UNION ALL and try to understand which one seems to perform better.
Figure: performance boost means winning the raceSQL Server Version
We can follow my instruction on any version of SQL Server.
Database
You don’t need any specific database for this. I shall be creating
temp tables and perform queries on those so that you can follow along easily.
Step 1
We create a temp table #User as follows
create table #User ( Id int, [type] int, [Name] nvarchar(max), CreatedOn Datetime)
This will simply hold Id column, type column which will hold the usertype, Name column, and CreatedOn column. The columns are more or less self-explanatory.
Step 2
After we have created our temp table, I will be adding rows to that table. I will be using a simple while loop to insert 100,000 rows to this table.
declare @i int; set @i = 1; while @i<=100000 begin insert into #User values (@i, @i%10, 'User-' + Cast(@i as varchar(max)), GETDATE()) set @i = @i +1; end
Here, for Id column, we have inserted incremented value through 1
to 100000. For type column, we have inserted a number from 0 to 9. And for Name
column, we have inserted 'User-' followed by the iteration count, So the Name
will look like this – 'User-34'
So, our table looks like this –
Statistics IO
To understand the effect of performance, we will turn on the Statistics IO option like this:
Set Statistics IO ON;
This is a session-based thing, so we need to run it just once.
Include Actual Execution Plan
We will be turning the actual execution plan ON to see the execution
plan.
OR [Type]
select Id, [Name] from #User where [type] = 4 OR [type] = 6
Scan count 1, logical reads 634
UNION ALL [Type]
select Id, [Name] from #User where [type] = 4
UNION ALL
select Id, [Name] from #User where [type] = 6
Scan count 2, logical reads 1268
OR [type and Name]
select Id, [Name] from #User where [type] = 4 OR [Name] ='User-34'
Scan count 1, logical reads 634
UNION ALL [Type and Name]
select Id, [Name] from #User where [type] = 4
UNION ALL
select Id, [Name] from #User where [Name] = 'User-34'
Scan count 2, logical reads 1268
Findings
For the table with no index on it, we get better performance with OR.
But in the real-world, we would be using index in our tables most of the time. But
for indexed columns, the result can be very different and interesting. Next, we
will be looking at how this OR and UNION ALL performs in indexed tables.
Performance Analysis in Skewed Data
We have seen OR performing better in tables if there are no
indexes and the values in the table are more or less non-skewed.
Now, we will be using an indexed version.
Database
We just created an empty Database named “PerformanceTuning”.
USE PerformanceTuning
go
Step 1:
We create a user table with Id (int), Name(varchar) and UserType(int) column as follows :
CREATE TABLE dbo.[User]( [Id] int, [Name] char(200), [UserType] int ) GO
Step 2:
We will be inserting some skewed data as follows
INSERT INTO dbo.[User] VALUES (1,'',1)
GO 10000
INSERT INTO dbo.[User] VALUES (2,'',2)
GO 50
INSERT INTO dbo.[User] VALUES (3,'',3)
GO 50
Note that, the 2 and 3 UserType rows are very small compared to
UserType 1 data.
Step 3 :
Next, We will be creating clustered index on Id column and Non
Clustered index on UserType Column as follows :
CREATE CLUSTERED INDEX CL_Col1 ON dbo.[User] ( Id )
GO
CREATE NONCLUSTERED INDEX IX_Col3 ON dbo.[User] ([UserType])
GO
You can use the following script to perform all the stated 3 steps as follows :
USE PerformanceTuning
go
CREATE TABLE dbo.[User]
( Id int,
[Name] char(200),
[UserType] int )
GO
INSERT INTO dbo.[User] VALUES (1,'',1)
GO 10000
INSERT INTO dbo.[User] VALUES (2,'',2)
GO 50
INSERT INTO dbo.[User] VALUES (3,'',3)
GO 50
CREATE CLUSTERED INDEX CL_Col1 ON dbo.[User] ( Id )
GO
CREATE NONCLUSTERED INDEX IX_Col3 ON dbo.[User] ([UserType])
GO
OR
We will be using the following query first
Query 1 :
SELECT
[Name],[UserType]
FROM dbo.[user]
WHERE UserType = 2
Scan count 1, logical reads 114
We can see that SQL Server has decided to index seek in this particular
scenario. Which is great.
Query 2:
SELECT
[Name],[UserType]
FROM dbo.[user]
WHERE UserType = 2 OR UserType = 3
Scan count 1, logical reads 291
We can see that SQL Server has decided to index scan the whole
table instead of key lookup, which is kind of frustrating.
Union ALL
Now let’s see if Union ALL can be a better alternative to this.
For UNION ALL version, we will be using the following query:
SELECT
[Name],[UserType]
FROM dbo.[user]
WHERE UserType = 2
UNION ALL
SELECT
[Name],[UserType]
FROM dbo.[user]
WHERE UserType = 3
If we look at the Message tab for Statistics IO information, we
see:
Scan count 2, logical reads 228
Seems better than the OR version, right?
Now, let's look at the execution plan of the query :
Conclusion:
- SQL Server can perform better in OR, if the table has uniform data and there are no indexes in the table
- SQL Server can perform better in UNION ALL, if the table has skewed data and there are indexes in the table.
Comments
Post a Comment