SQL Server 2019: Intelligent Query Processing
SQL Server 2019 ships with some brand-new features. Many of these features are targeted for Big Data Solutions. No wonder in that, since the world is moving faster towards Big Data and it is absolutely necessary to cope up with that. Today we will discuss one such feature called Approximate Query Processing.
Approximate Query Processing
SQL Server ships with Intelligent Query Processing out of the box with SQL Server 2019 installation. Approximate Query processing is a part of Intelligent Query Processing. Things we will be covering in this article –
- Understand the need for Approximation with Case Study
- Case Study 1: Railway
- Case Study 2: e-commerce
- How to use Approximate Query Processing
- Demo Code for Comparing Performance
- Results
- Limitations
- When to avoid Approximate Query Processing
Understand the need for Approximate Query Processing
Before using any technological
feature, we must understand why we should use it? Should we just start using it
because it says it beneficial? Every beneficial thing is beneficial for a
certain business environment. So, lets first try to understand when using Approximate
Query Processing can bring positive gains for our business.
Case Study 1
Imagine you have an application for Railway Management System. It tracks train schedules, tickets bought, tickets available, passengers’ information and so on. Let’s assume the following things -
- The Railway Management System uses “Tickets” table to storing information about the ticket.
- “Tickets” table is HUGE, holding billions of rows
- “Tickets” Table takes part in INSERT, UPDATE, DELETE – all three operations
- Railway management system has a Dashboard, which shows the number of tickets available or the number of tickets sold.
What can we do to get the number
of tickets? Pretty simple answer is using
SELECT DISTINCT COUNT (*) FROM dbo.TICKETS
But imagine the memory and IO pressure on your server since you have billions of rows there. The scenario can be very bad if this query has to run multiple times so that the Dashboard is up to date. In this scenario, you should use Approximate Query Processing if –
- You need responsiveness more than the accuracy
- You want your server not to face memory/ IO pressure
Case Study 2
Now imagine you have an e-commerce
system and inventory management system is integrated right into it. Similar to
our Railway Management System, lets say we have a Dashboard for the inventories
and it showcases the number of inventories currently available in your system
with the help of following query
SELECT DISTINCT COUNT (*) FROM dbo.INVENTORY
You can also use Approximate Query Processing here if –
- You don’t need the exact number of inventories which takes a lot of time, rather -
- You are happy with an approximation of inventory count which takes far less time to execute.
How to use Approximate Query Processing
Now you have understood when you
can use Approximate Query Processing, lets start how we can use it and gain
better performance.
Demo Code
We will be using Microsoft GitHub repo for Intelligent Query Processing
ALTER DATABASE [WideWorldImportersDW] SET COMPATIBILITY_LEVEL = 150; GO USE [WideWorldImportersDW]; GO -- Compare execution time and distinct counts SELECT COUNT(DISTINCT [WWI Order ID]) FROM [Fact].[OrderHistoryExtended] OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE); -- Isolating out BMOR SELECT APPROX_COUNT_DISTINCT([WWI Order ID]) FROM [Fact].[OrderHistoryExtended] OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE); -- Isolating out BMOR
Understanding the Query
- We have set the compatibility level to 150 so that we can get this SQL Server 2019 feature
- The first portion under the compare section simply counts the distinct WWI Order Id, which is similar to our select count (*) from dbo.Tickets used in our case study.
- We have used option with DISALLOW_BATCH_EXECUTION since SQL Server 2019 comes with two variants namely rowstore and batch executions
- We used RECOMPILE option so that we can be assured that our query is not getting benefit from cached execution plans.
Results
After we have executed both the version of queries, let's compare the memory pressure on the server. We have enabled actual execution plan to see the difference.
- Our age-old version of Select count (*) used approximate 14 GB of memory (memory grant)
- Our new APPROX_COUNT_DISTINCT used only about 70 KB of memory (memory grant)
As you can see, APPROX_COUNT_DISTINCT resulted in an unbelievable performance gain!
Limitations
With APPROX_COUNT_DISTINCT, we cannot guarantee the actual output, that’s why the naming starts
with APPROX_. That raises a question, “How much inaccurate is our approximation?”
The answer is –
·
The APPROX_COUNT_DISTINCT is guaranteed to have no more
than 2% of error.
· The APPROX_COUNT_DISTINCT is guaranteed to have no more than 2% of error with 97% of probability.
When NOT to use Approximate Query Processing
Since the approximate Query Processing is not 100% accurate, you cannot blindly use this feature. Do not use Approximate Query Processing if –
- Your business application requires the Actual number
- Responsiveness is important to you, but you cannot sacrifice accuracy for responsiveness
- Your business involves financial transactions and approximation is not an option there.
Comments
Post a Comment