Intelligent Query Processing in SQL Server 2019 Big Data

SQL Server 2019: Intelligent Query Processing

SQL Server 2019 ships with some brand-new features. Many of these features are targeted for Big Data Solutions. No wonder in that, since the world is moving faster towards Big Data and it is absolutely necessary to cope up with that. Today we will discuss one such feature called Approximate Query Processing.

Approximate Query Processing

SQL Server ships with Intelligent Query Processing out of the box with SQL Server 2019 installation. Approximate Query processing is a part of Intelligent Query Processing. Things we will be covering in this article –

Understand the need for Approximation with Case Study

Case Study 1: Railway
Case Study 2: e-commerce

How to use Approximate Query Processing

Demo Code for Comparing Performance
Results

Limitations
When to avoid Approximate Query Processing

Understand the need for Approximate Query Processing

Before using any technological feature, we must understand why we should use it? Should we just start using it because it says it beneficial? Every beneficial thing is beneficial for a certain business environment. So, lets first try to understand when using Approximate Query Processing can bring positive gains for our business.

Case Study 1

Imagine you have an application for Railway Management System. It tracks train schedules, tickets bought, tickets available, passengers’ information and so on. Let’s assume the following things -

The Railway Management System uses “Tickets” table to storing information about the ticket.

“Tickets” table is HUGE, holding billions of rows

“Tickets” Table takes part in INSERT, UPDATE, DELETE – all three operations

Railway management system has a Dashboard, which shows the number of tickets available or the number of tickets sold.

What can we do to get the number of tickets? Pretty simple answer is using

SELECT DISTINCT COUNT (*) FROM dbo.TICKETS

But imagine the memory and IO pressure on your server since you have billions of rows there. The scenario can be very bad if this query has to run multiple times so that the Dashboard is up to date. In this scenario, you should use Approximate Query Processing if –

You need responsiveness more than the accuracy

You want your server not to face memory/ IO pressure

Case Study 2

Now imagine you have an e-commerce system and inventory management system is integrated right into it. Similar to our Railway Management System, lets say we have a Dashboard for the inventories and it showcases the number of inventories currently available in your system with the help of following query

SELECT DISTINCT COUNT (*) FROM dbo.INVENTORY

You can also use Approximate Query Processing here if –

You don’t need the exact number of inventories which takes a lot of time, rather -
You are happy with an approximation of inventory count which takes far less time to execute.

How to use Approximate Query Processing

Now you have understood when you can use Approximate Query Processing, lets start how we can use it and gain better performance.

Demo Code

We will be using Microsoft GitHub repo for Intelligent Query Processing

ALTER DATABASE [WideWorldImportersDW] SET COMPATIBILITY_LEVEL = 150;
GO

USE [WideWorldImportersDW];
GO

-- Compare execution time and distinct counts

SELECT COUNT(DISTINCT [WWI Order ID])
FROM [Fact].[OrderHistoryExtended]
OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE); -- Isolating out BMOR


SELECT APPROX_COUNT_DISTINCT([WWI Order ID])
FROM [Fact].[OrderHistoryExtended]
OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE); -- Isolating out BMOR

Understanding the Query

We have set the compatibility level to 150 so that we can get this SQL Server 2019 feature
The first portion under the compare section simply counts the distinct WWI Order Id, which is similar to our select count (*) from dbo.Tickets used in our case study.
We have used option with DISALLOW_BATCH_EXECUTION since SQL Server 2019 comes with two variants namely rowstore and batch executions
We used RECOMPILE option so that we can be assured that our query is not getting benefit from cached execution plans.

Results

After we have executed both the version of queries, let's compare the memory pressure on the server. We have enabled actual execution plan to see the difference.

Our age-old version of Select count (*) used approximate 14 GB of memory (memory grant)
Our new APPROX_COUNT_DISTINCT used only about 70 KB of memory (memory grant)

As you can see, APPROX_COUNT_DISTINCT resulted in an unbelievable performance gain!

Figure: Approximate Query Processing

can be faster than this powerboat

Limitations

With APPROX_COUNT_DISTINCT, we cannot guarantee the actual output, that’s why the naming starts with APPROX_. That raises a question, “How much inaccurate is our approximation?” The answer is –

· The APPROX_COUNT_DISTINCT is guaranteed to have no more than 2% of error.

· The APPROX_COUNT_DISTINCT is guaranteed to have no more than 2% of error with 97% of probability.

When NOT to use Approximate Query Processing

Since the approximate Query Processing is not 100% accurate, you cannot blindly use this feature. Do not use Approximate Query Processing if –

Your business application requires the Actual number
Responsiveness is important to you, but you cannot sacrifice accuracy for responsiveness
Your business involves financial transactions and approximation is not an option there.

SQL Server Pain Relief

Search This Blog