Counter of Festivals

Ashok Blog for SQL Learners and Beginners and Experts

Monday 25 August 2014

Performance considerations for common SQL queries

Performance considerations for common SQL queries

Table variable is a viable alternative to Sub-queries.

SQL offers many different methods to produce the same results.  There is a never-ending debate between SQL developers as to the “best way” or the “most efficient way” to render a result set.  Sometimes these disputes even come to blows….well, I am a lover, not a fighter, so I decided to collect some data that will prove which way is the best and most efficient. 
For the queries below, I downloaded the test database from SQLSkills:  http://www.sqlskills.com/sql-server-resources/sql-server-demos/.  There isn’t a lot of data, but enough to prove my point: dbo.member has 10,000 records, and dbo.payment has 15,554.  Our result set contains 6,706 records.
The following queries produce an identical result set; the result set contains aggregate payment information for each member who has made more than 1 payment from the dbo.payment table and the first and last name of the member from the dbo.member table.

/*************/ 
/* Sub Query  */ 
/*************/ 
SELECT  a.[Member Number] , 
        m.lastname , 
        m.firstname , 
        a.[Number Of Payments] , 
        a.[Average Payment] , 
        a.[Total Paid] 
FROM    ( SELECT    member_no 'Member Number' , 
                    AVG(payment_amt) 'Average Payment' , 
                    SUM(payment_amt) 'Total Paid' , 
                    COUNT(Payment_No) 'Number Of Payments' 
          FROM      dbo.payment 
          GROUP BY  member_no 
          HAVING    COUNT(Payment_No) > 1 
        ) a 
        JOIN dbo.member m ON a.[Member Number] = m.member_no 
       
/***************/ 
/* Cross Apply  */ 
/***************/ 
SELECT  ca.[Member Number] , 
        m.lastname , 
        m.firstname , 
        ca.[Number Of Payments] , 
        ca.[Average Payment] , 
        ca.[Total Paid] 
FROM    dbo.member m 
        CROSS APPLY ( SELECT    member_no 'Member Number' , 
                                AVG(payment_amt) 'Average Payment' , 
                                SUM(payment_amt) 'Total Paid' , 
                                COUNT(Payment_No) 'Number Of Payments' 
                      FROM      dbo.payment 
                      WHERE     member_no = m.member_no 
                      GROUP BY  member_no 
                      HAVING    COUNT(Payment_No) > 1 
                    ) ca

/********/                    
/* CTEs  */ 
/********/ 

WITH    Payments 
          AS ( SELECT   member_no 'Member Number' , 
                        AVG(payment_amt) 'Average Payment' , 
                        SUM(payment_amt) 'Total Paid' , 
                        COUNT(Payment_No) 'Number Of Payments' 
               FROM     dbo.payment 
               GROUP BY member_no 
               HAVING   COUNT(Payment_No) > 1 
             ), 
        MemberInfo 
          AS ( SELECT   p.[Member Number] , 
                        m.lastname , 
                        m.firstname , 
                        p.[Number Of Payments] , 
                        p.[Average Payment] , 
                        p.[Total Paid] 
               FROM     dbo.member m 
                        JOIN Payments p ON m.member_no = p.[Member Number] 
             ) 
    SELECT  * 
    FROM    MemberInfo

/************************/ 
/* SELECT with Grouping   */ 
/************************/
SELECT  p.member_no 'Member Number' , 
        m.lastname , 
        m.firstname , 
        COUNT(Payment_No) 'Number Of Payments' , 
        AVG(payment_amt) 'Average Payment' , 
        SUM(payment_amt) 'Total Paid' 
FROM    dbo.payment p 
        JOIN dbo.member m ON m.member_no = p.member_no 
GROUP BY p.member_no , 
        m.lastname , 
        m.firstname 
HAVING  COUNT(Payment_No) > 1

We can see what is going on in SQL’s brain by looking at the execution plan.  The Execution Plan will demonstrate which steps and in what order SQL executes those steps, and what percentage of batch time each query takes.  SO….if I execute all 4 of these queries in a single batch, I will get an idea of the relative time SQL takes to execute them, and how it renders the Execution Plan.  We can settle this once and for all.  Here is what SQL did with these queries:

image
Not only did the queries take the same amount of time to execute, SQL generated the same Execution Plan for each of them.  Everybody is right…..I guess we can all finally go to lunch together!  But wait a second, I may not be a fighter, but I AM an instigator.  Smile   Let’s see how a table variable stacks up.  Here is the code I executed:
/********************/
/*  Table Variable  */ 
/********************/ 
DECLARE @AggregateTable TABLE 
    ( 
      member_no INT , 
      AveragePayment MONEY , 
      TotalPaid MONEY , 
      NumberOfPayments MONEY 
    )
INSERT  @AggregateTable 
        SELECT  member_no 'Member Number' , 
                AVG(payment_amt) 'Average Payment' , 
                SUM(payment_amt) 'Total Paid' , 
                COUNT(Payment_No) 'Number Of Payments' 
        FROM    dbo.payment 
        GROUP BY member_no 
        HAVING  COUNT(Payment_No) > 1 
 
SELECT  at.member_no 'Member Number' , 
        m.lastname , 
        m.firstname , 
        at.NumberOfPayments 'Number Of Payments' , 
        at.AveragePayment 'Average Payment' , 
        at.TotalPaid 'Total Paid' 
FROM    @AggregateTable at 
        JOIN dbo.member m ON m.member_no = at.member_no
In the interest of keeping things in groupings of 4, I removed the last query from the previous batch and added the table variable query.  Here’s what I got:

image

Since we first insert into the table variable, then we read from it, the Execution Plan renders 2 steps.  BUT, the combination of the 2 steps is only 22% of the batch.  It is actually faster than the other methods even though it is treated as 2 separate queries in the Execution Plan.  The argument I often hear against Table Variables is that SQL only estimates 1 row for the table size in the Execution Plan.  While this is true, the estimate does not come in to play until you read from the table variable.  In this case, the table variable had 6,706 rows, but it still outperformed the other queries.  People argue that table variables should only be used for hash or lookup tables.  The fact is, you have control of what you put IN to the variable, so as long as you keep it within reason, these results suggest that a table variable is a viable alternative to sub-queries.
If anyone does volume testing on this theory, I would be interested in the results.  My suspicion is that there is a breaking point where efficiency goes down the tubes immediately, and it would be interesting to see where the threshold is.

No comments:

Post a Comment