Skip to main content

Intelligent Query Processing in SQL Server 2019 Big Data

SQL Server 2019: Intelligent Query Processing

SQL Server 2019 ships with some brand-new features. Many of these features are targeted for Big Data Solutions. No wonder in that, since the world is moving faster towards Big Data and it is absolutely necessary to cope up with that. Today we will discuss one such feature called Approximate Query Processing.

Approximate Query Processing

SQL Server ships with Intelligent Query Processing out of the box with SQL Server 2019 installation. Approximate Query processing is a part of Intelligent Query Processing. Things we will be covering in this article –

  • Understand the need for Approximation with Case Study
    • Case Study 1: Railway
    • Case Study 2: e-commerce
  • How to use Approximate Query Processing
    • Demo Code for Comparing Performance
    • Results
  • Limitations
  • When to avoid Approximate Query Processing

Understand the need for Approximate Query Processing

Before using any technological feature, we must understand why we should use it? Should we just start using it because it says it beneficial? Every beneficial thing is beneficial for a certain business environment. So, lets first try to understand when using Approximate Query Processing can bring positive gains for our business.

Case Study 1

Imagine you have an application for Railway Management System. It tracks train schedules, tickets bought, tickets available, passengers’ information and so on. Let’s assume the following things - 

  • The Railway Management System uses “Tickets” table to storing information about the ticket.
  • “Tickets” table is HUGE, holding billions of rows
  • “Tickets” Table takes part in INSERT, UPDATE, DELETE – all three operations
  • Railway management system has a Dashboard, which shows the number of tickets available or the number of tickets sold.

What can we do to get the number of tickets? Pretty simple answer is using

SELECT DISTINCT COUNT (*) FROM dbo.TICKETS

But imagine the memory and IO pressure on your server since you have billions of rows there. The scenario can be very bad if this query has to run multiple times so that the Dashboard is up to date. In this scenario, you should use Approximate Query Processing if –

  • You need responsiveness more than the accuracy
  • You want your server not to face memory/ IO pressure

Case Study 2

Now imagine you have an e-commerce system and inventory management system is integrated right into it. Similar to our Railway Management System, lets say we have a Dashboard for the inventories and it showcases the number of inventories currently available in your system with the help of following query

SELECT DISTINCT COUNT (*) FROM dbo.INVENTORY

You can also use Approximate Query Processing here if –

  • You don’t need the exact number of inventories which takes a lot of time, rather -
  • You are happy with an approximation of inventory count which takes far less time to execute.

How to use Approximate Query Processing

Now you have understood when you can use Approximate Query Processing, lets start how we can use it and gain better performance.

Demo Code

We will be using Microsoft GitHub repo for Intelligent Query Processing

ALTER DATABASE [WideWorldImportersDW] SET COMPATIBILITY_LEVEL = 150;
GO

USE [WideWorldImportersDW];
GO

-- Compare execution time and distinct counts

SELECT COUNT(DISTINCT [WWI Order ID])
FROM [Fact].[OrderHistoryExtended]
OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE); -- Isolating out BMOR


SELECT APPROX_COUNT_DISTINCT([WWI Order ID])
FROM [Fact].[OrderHistoryExtended]
OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE); -- Isolating out BMOR


Understanding the Query

  • We have set the compatibility level to 150 so that we can get this SQL Server 2019 feature
  • The first portion under the compare section simply counts the distinct WWI Order Id, which is similar to our select count (*) from dbo.Tickets used in our case study.
  • We have used option with DISALLOW_BATCH_EXECUTION since SQL Server 2019 comes with two variants namely rowstore and batch executions
  • We used RECOMPILE option so that we can be assured that our query is not getting benefit from cached execution plans.

Results

After we have executed both the version of queries, let's compare the memory pressure on the server. We have enabled actual execution plan to see the difference.

  • Our age-old version of Select count (*) used approximate 14 GB of memory (memory grant)
  • Our new APPROX_COUNT_DISTINCT used only about 70 KB of memory (memory grant)

As you can see, APPROX_COUNT_DISTINCT resulted in an unbelievable performance gain!

   Figure: Approximate Query Processing 
can be faster than this powerboat

Limitations

With APPROX_COUNT_DISTINCT, we cannot guarantee the actual output, that’s why the naming starts with APPROX_. That raises a question, “How much inaccurate is our approximation?” The answer is –

·       The APPROX_COUNT_DISTINCT is guaranteed to have no more than 2% of error.

·       The APPROX_COUNT_DISTINCT is guaranteed to have no more than 2% of error with 97% of probability.

When NOT to use Approximate Query Processing

Since the approximate Query Processing is not 100% accurate, you cannot blindly use this feature. Do not use Approximate Query Processing if –

  • Your business application requires the Actual number
  • Responsiveness is important to you, but you cannot sacrifice accuracy for responsiveness
  • Your business involves financial transactions and approximation is not an option there.

 

 

Comments

Most Loved Posts

SQL Data Tools - Compare Data

Compare Data between two tables SQL Server Database with the same schema architecture can differ in different environments like Dev, Staging, and Production, especially in configuration tables. Let's see how we can easily sync the data in two different tables.

How to deal with Slow SQL Server due to Autogrowth issue

  Why you should not stick to SQL Server’s default Initial file size and autogrowth We hear a lot of these statements : My SQL Server is running slow My Production DB was fine when we started, But it is staggeringly slow now My Business end users are frustrated to wait too long Well, there are lots of reasons why your SQL Server might be slow. Setting the Autogrowth option to default is definitely one of the vital ones which we seem to ignore most of the time. Slow SQL Server and Tortoise SQL Server provides you with some default settings for autogrowth when you install it for the first time. These default cases are defined with increment by 8MB or by 10%. You need to change it to suit your own needs. For Small application, this default value might work but as soon as your system grows, you feel the impact of it more often. What Happens SQL Server Files needs more space SQL Server Requests the Server PC for more space The Server PC takes the request and asks the SQL request...