Skip to main content

Intelligent Query Processing in SQL Server 2019 Big Data

SQL Server 2019: Intelligent Query Processing

SQL Server 2019 ships with some brand-new features. Many of these features are targeted for Big Data Solutions. No wonder in that, since the world is moving faster towards Big Data and it is absolutely necessary to cope up with that. Today we will discuss one such feature called Approximate Query Processing.

Approximate Query Processing

SQL Server ships with Intelligent Query Processing out of the box with SQL Server 2019 installation. Approximate Query processing is a part of Intelligent Query Processing. Things we will be covering in this article –

  • Understand the need for Approximation with Case Study
    • Case Study 1: Railway
    • Case Study 2: e-commerce
  • How to use Approximate Query Processing
    • Demo Code for Comparing Performance
    • Results
  • Limitations
  • When to avoid Approximate Query Processing

Understand the need for Approximate Query Processing

Before using any technological feature, we must understand why we should use it? Should we just start using it because it says it beneficial? Every beneficial thing is beneficial for a certain business environment. So, lets first try to understand when using Approximate Query Processing can bring positive gains for our business.

Case Study 1

Imagine you have an application for Railway Management System. It tracks train schedules, tickets bought, tickets available, passengers’ information and so on. Let’s assume the following things - 

  • The Railway Management System uses “Tickets” table to storing information about the ticket.
  • “Tickets” table is HUGE, holding billions of rows
  • “Tickets” Table takes part in INSERT, UPDATE, DELETE – all three operations
  • Railway management system has a Dashboard, which shows the number of tickets available or the number of tickets sold.

What can we do to get the number of tickets? Pretty simple answer is using

SELECT DISTINCT COUNT (*) FROM dbo.TICKETS

But imagine the memory and IO pressure on your server since you have billions of rows there. The scenario can be very bad if this query has to run multiple times so that the Dashboard is up to date. In this scenario, you should use Approximate Query Processing if –

  • You need responsiveness more than the accuracy
  • You want your server not to face memory/ IO pressure

Case Study 2

Now imagine you have an e-commerce system and inventory management system is integrated right into it. Similar to our Railway Management System, lets say we have a Dashboard for the inventories and it showcases the number of inventories currently available in your system with the help of following query

SELECT DISTINCT COUNT (*) FROM dbo.INVENTORY

You can also use Approximate Query Processing here if –

  • You don’t need the exact number of inventories which takes a lot of time, rather -
  • You are happy with an approximation of inventory count which takes far less time to execute.

How to use Approximate Query Processing

Now you have understood when you can use Approximate Query Processing, lets start how we can use it and gain better performance.

Demo Code

We will be using Microsoft GitHub repo for Intelligent Query Processing

ALTER DATABASE [WideWorldImportersDW] SET COMPATIBILITY_LEVEL = 150;
GO

USE [WideWorldImportersDW];
GO

-- Compare execution time and distinct counts

SELECT COUNT(DISTINCT [WWI Order ID])
FROM [Fact].[OrderHistoryExtended]
OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE); -- Isolating out BMOR


SELECT APPROX_COUNT_DISTINCT([WWI Order ID])
FROM [Fact].[OrderHistoryExtended]
OPTION (USE HINT('DISALLOW_BATCH_MODE'), RECOMPILE); -- Isolating out BMOR


Understanding the Query

  • We have set the compatibility level to 150 so that we can get this SQL Server 2019 feature
  • The first portion under the compare section simply counts the distinct WWI Order Id, which is similar to our select count (*) from dbo.Tickets used in our case study.
  • We have used option with DISALLOW_BATCH_EXECUTION since SQL Server 2019 comes with two variants namely rowstore and batch executions
  • We used RECOMPILE option so that we can be assured that our query is not getting benefit from cached execution plans.

Results

After we have executed both the version of queries, let's compare the memory pressure on the server. We have enabled actual execution plan to see the difference.

  • Our age-old version of Select count (*) used approximate 14 GB of memory (memory grant)
  • Our new APPROX_COUNT_DISTINCT used only about 70 KB of memory (memory grant)

As you can see, APPROX_COUNT_DISTINCT resulted in an unbelievable performance gain!

   Figure: Approximate Query Processing 
can be faster than this powerboat

Limitations

With APPROX_COUNT_DISTINCT, we cannot guarantee the actual output, that’s why the naming starts with APPROX_. That raises a question, “How much inaccurate is our approximation?” The answer is –

·       The APPROX_COUNT_DISTINCT is guaranteed to have no more than 2% of error.

·       The APPROX_COUNT_DISTINCT is guaranteed to have no more than 2% of error with 97% of probability.

When NOT to use Approximate Query Processing

Since the approximate Query Processing is not 100% accurate, you cannot blindly use this feature. Do not use Approximate Query Processing if –

  • Your business application requires the Actual number
  • Responsiveness is important to you, but you cannot sacrifice accuracy for responsiveness
  • Your business involves financial transactions and approximation is not an option there.

 

 

Comments

Most Loved Posts

How to encrypt an entire excel file using AES-128 encryption

What Is AES? AES stands for Advanced Encryption Standard. It is a symmetric block cipher that is used by the U.S. government to protect classified information. AES is used worldwide to protect classified data around the world. AES is essential in cybersecurity, electronic data protection, and computer security. Variations of AES AES is used in three block cipher versions namely AES 128, AES 192, AES 256. AES 128 uses 128-bit key length to encrypt and decrypt block messages. This is a symmetric secret key which means it uses same secret key for encrypting and decrypting message blocks. AES Encryption of Excel File Today we will AES 128 to encrypt an Excel File. The excel file looks like the following image. To encrypt this excel file, we will use C#. First, we need to create a console project. Let's name the project ExcelEncryption. First, we try to understand what we need to achieve. We need a KEY and Initialization Vector (IV) pair for AES 128 encryption. Let’s say, th

SQL Insider 01 : An Anatomy of SELECT

Introduction When we write queries, we tend to think about the internals very little. In the new series of SQL Insider, I shall try to demonstrate what your SQL Server has to go through when you write a specific query, more specifically a specific operator. In the series, we shall try to cover all the important operators in SQL. Our today's SQL participant in SELECT. SELECT  With the SELECT query, we can select one, some, or all the columns of a SQL table. The typical syntax for SELECT is like this  SELECT * FROM Sales.SalesOrderDetail SELECT sod.OrderQty, sod.UnitPrice FROM Sales.SalesOrderDetail sod Please note that we will not be dealing with WHERE clause in today's episode.  Database We will be using AdventureWorks2019 Database for the demonstration Important Configuration We will be setting STATISTICS IO ON like this - SET STATISTICS IO ON; SET STATISTICS IO ON ; We will turn on Actual Execution Plan to examine the query SQL Insider Let's start with