Skip to main content

SQL Server Performance : OR vs UNION ALL

When writing queries, we seem to care less about the performance issue at first. Our first goal is to make the output right. When we get the correct output, we then move on to the next phase we call performance tuning. Today we will try to understand what happens when you write OR in your query. We will do the same thing with UNION ALL and try to understand which one seems to perform better.

                                   Figure: performance boost means winning the race

SQL Server Version

We can follow my instruction on any version of SQL Server.

Database

You don’t need any specific database for this. I shall be creating temp tables and perform queries on those so that you can follow along easily.

Step 1

We create a temp table #User as follows

create table #User (
	Id int,
	[type] int,
	[Name] nvarchar(max),
	CreatedOn Datetime)

This will simply hold Id column, type column which will hold the usertype, Name column, and CreatedOn column. The columns are more or less self-explanatory.

Step 2

After we have created our temp table, I will be adding rows to that table. I will be using  a simple while loop to insert 100,000 rows to this table.

declare @i int;
set @i = 1;

while @i<=100000
begin
	insert into #User values (@i, @i%10, 'User-' + Cast(@i as varchar(max)), GETDATE())
	set @i = @i +1;
end

Here, for Id column, we have inserted incremented value through 1 to 100000. For type column, we have inserted a number from 0 to 9. And for Name column, we have inserted 'User-' followed by the iteration count, So the Name will look like this – 'User-34'

So, our table looks like this –

 Statistics IO

To understand the effect of performance, we will turn on the Statistics IO option like this:

Set Statistics IO ON;

This is a session-based thing, so we need to run it just once.

Include Actual Execution Plan

We will be turning the actual execution plan ON to see the execution plan.

OR [Type]

select Id, [Name] 
from #User 
where [type] = 4 OR [type] = 6
Scan count 1, logical reads 634


UNION ALL [Type]

select Id, [Name] from #User where [type] = 4

UNION ALL

select Id, [Name] from #User where [type] = 6

Scan count 2, logical reads 1268


OR [type and Name]

select Id, [Name] from #User where [type] = 4 OR [Name] ='User-34'

Scan count 1, logical reads 634


UNION ALL [Type and Name]

select Id, [Name] from #User where [type] = 4

UNION ALL

select Id, [Name] from #User where [Name] = 'User-34'

Scan count 2, logical reads 1268


Findings

For the table with no index on it, we get better performance with OR. But in the real-world, we would be using index in our tables most of the time. But for indexed columns, the result can be very different and interesting. Next, we will be looking at how this OR and UNION ALL performs in indexed tables.

Performance Analysis in Skewed Data 

We have seen OR performing better in tables if there are no indexes and the values in the table are more or less non-skewed.

Now, we will be using an indexed version.

Database

We just created an empty Database named “PerformanceTuning”.

USE PerformanceTuning

go

Step 1:

We create a user table with Id (int), Name(varchar) and UserType(int) column  as follows :

CREATE TABLE dbo.[User]( 
    [Id] int,
    [Name] char(200),
    [UserType] int )
GO

Step 2:

We will be inserting some skewed data as follows

INSERT INTO dbo.[User] VALUES (1,'',1)

GO 10000

INSERT INTO dbo.[User] VALUES (2,'',2)

GO 50

INSERT INTO dbo.[User] VALUES (3,'',3)

GO 50

Note that, the 2 and 3 UserType rows are very small compared to UserType 1 data.

Step 3 :

Next, We will be creating clustered index on Id column and Non Clustered index on UserType Column as follows :

CREATE CLUSTERED INDEX CL_Col1 ON dbo.[User] ( Id )

GO

CREATE NONCLUSTERED INDEX IX_Col3 ON dbo.[User] ([UserType])

GO

 

You can use the following script to perform all the stated 3 steps as follows :

USE PerformanceTuning

go

CREATE TABLE dbo.[User]

(             Id int,

              [Name] char(200),

              [UserType] int )

GO

INSERT INTO dbo.[User] VALUES (1,'',1)

GO 10000

INSERT INTO dbo.[User] VALUES (2,'',2)

GO 50

INSERT INTO dbo.[User] VALUES (3,'',3)

GO 50

CREATE CLUSTERED INDEX CL_Col1 ON dbo.[User] ( Id )

GO

CREATE NONCLUSTERED INDEX IX_Col3 ON dbo.[User] ([UserType])

GO

Now, Let's start our performance testing of OR and Union ALL 

OR

We will be using the following query first

Query 1 :

SELECT

     [Name],[UserType]

FROM  dbo.[user]

WHERE UserType = 2

Scan count 1, logical reads 114


We can see that SQL Server has decided to index seek in this particular scenario. Which is great.

Query 2:

SELECT

    [Name],[UserType]

FROM dbo.[user]

WHERE UserType = 2 OR UserType = 3

 

Scan count 1, logical reads 291

We can see that SQL Server has decided to index scan the whole table instead of key lookup, which is kind of frustrating.

 

Union ALL

Now let’s see if Union ALL can be a better alternative to this.

For UNION ALL version, we will be using the following query:

SELECT

     [Name],[UserType]

FROM dbo.[user]

WHERE UserType = 2

UNION ALL

SELECT

    [Name],[UserType]

FROM dbo.[user]

WHERE UserType = 3

If we look at the Message tab for Statistics IO information, we see:

Scan count 2, logical reads 228

Seems better than the OR version, right?

Now, let's look at the execution plan of the query :


 Great, as you can see, SQL Server used index seek for both the part of UNION ALL, meaning SQL Server was less confused this time.

 Conclusion:

  • SQL Server can perform better in OR, if the table has uniform data and there are no indexes in the table
  • SQL Server can perform better in UNION ALL, if the table has skewed data and there are indexes in the table.


Comments

Most Loved Posts

Threadpool - A deadly poison wait for SQL Server (The What, When and How)

Introduction  Threadpool is a  poison  wait. Yes, I mean it. Its poison for SQL Server, its poison for the Business and of course, the end-users! The most devastating thing about threadpool is you hardly recognize it because it comes in disguise, meaning you see no memory or cpu pressure in the system, yet you cannot run any query, it seems like your SQL Server is frozen solid. That scary, isn't it?

SQL Schema Compare with Visual Studio (A complete Guide)

Introduction When you're working on your Dev Database, an urgent issue comes along, and you instantly solve it by changing Scheme in the Staging Database or Production Database :3, few more these type of patching and you're completely out of sync! A lot of paid alternatives are there like SQL Data Compare by RedGate, but my first choice is Visual Studio's SQL Data Tools. In the following article, I tried to image-describe the steps for SQL Data Tool. Like I said before, there are lots of handly DBAtools out there to compare Schema between two DB Sources. I would like to discuss how you can compare two SQL Server DB with Visual Studio. Make sure you have SQL Server Data tools checked while installing Visual Studio.

Slow SQL Server : What we should NOT do

 Try to list the best practices of SQL Server. It will require a heck of a time. Try to list the Bad Practices and it will require more than the best practice list , of course, probably you’ll end up getting frustrated . (seeing all the oops configurations and its effect on SQL Server )

How to deal with Slow SQL Server due to Autogrowth issue

  Why you should not stick to SQL Server’s default Initial file size and autogrowth We hear a lot of these statements : My SQL Server is running slow My Production DB was fine when we started, But it is staggeringly slow now My Business end users are frustrated to wait too long Well, there are lots of reasons why your SQL Server might be slow. Setting the Autogrowth option to default is definitely one of the vital ones which we seem to ignore most of the time. Slow SQL Server and Tortoise SQL Server provides you with some default settings for autogrowth when you install it for the first time. These default cases are defined with increment by 8MB or by 10%. You need to change it to suit your own needs. For Small application, this default value might work but as soon as your system grows, you feel the impact of it more often. What Happens SQL Server Files needs more space SQL Server Requests the Server PC for more space The Server PC takes the request and asks the SQL request...

How to configure your Availability Group listener to ASP.NET

SQL Server’s availability group Always On feature is great to have features for your Database. Anytime one of your database nodes goes down, your secondary replica will automatically take over. After a failover, your secondary cluster node becomes the primary cluster. Now the question arises, “Do I need to configure my APP server connectionstring each time I face a failover cluster?”. The answer is NO, you don’t have to configure your app server connectionstring every time. Default ConnectionString By default, your App server connectionstring looks something like this – <connectionStrings>    <add name="ConnStringDb1" connectionString="Data Source=localhost;Initial Catalog=YourDataBaseName;Integrated Security=True;" providerName="System.Data.SqlClient" />   </connectionStrings> ConnectionString for Failover Partner You can manually specify the failover partner in your connectionstring like this <connectionStrings>     <a...

Intelligent Query Processing in SQL Server 2019 Big Data

SQL Server 2019: Intelligent Query Processing SQL Server 2019 ships with some brand-new features. Many of these features are targeted for Big Data Solutions. No wonder in that, since the world is moving faster towards Big Data and it is absolutely necessary to cope up with that. Today we will discuss one such feature called Approximate Query Processing. Approximate Query Processing SQL Server ships with Intelligent Query Processing out of the box with SQL Server 2019 installation. Approximate Query processing is a part of Intelligent Query Processing. Things we will be covering in this article – Understand the need for Approximation with Case Study Case Study 1: Railway Case Study 2: e-commerce How to use Approximate Query Processing Demo Code for Comparing Performance Results Limitations When to avoid Approximate Query Processing Understand the need for Approximate Query Processing Before using any technological feature, we must understand why we should use it? Should we jus...

Maxdop and Cost theshold for parallelism SQL Server

 Maxdop  Maxdop stands for max degree for parallelism.  Let's say, the maxdop is set to 4, it means during parallel plan execution, SQL server is going to use 4 processors. If you set your Maxdop settings to 0, it means SQL server is going to use as many processors it needs to complete your request. 

SQL Data Tools - Compare Data

Compare Data between two tables SQL Server Database with the same schema architecture can differ in different environments like Dev, Staging, and Production, especially in configuration tables. Let's see how we can easily sync the data in two different tables.

SQL Insider 01 : An Anatomy of SELECT

Introduction When we write queries, we tend to think about the internals very little. In the new series of SQL Insider, I shall try to demonstrate what your SQL Server has to go through when you write a specific query, more specifically a specific operator. In the series, we shall try to cover all the important operators in SQL. Our today's SQL participant in SELECT. SELECT  With the SELECT query, we can select one, some, or all the columns of a SQL table. The typical syntax for SELECT is like this  SELECT * FROM Sales.SalesOrderDetail SELECT sod.OrderQty, sod.UnitPrice FROM Sales.SalesOrderDetail sod Please note that we will not be dealing with WHERE clause in today's episode.  Database We will be using AdventureWorks2019 Database for the demonstration Important Configuration We will be setting STATISTICS IO ON like this - SET STATISTICS IO ON; SET STATISTICS IO ON ; We will turn on Actual Execution Plan to examine the query SQL Insider Let's sta...