Moving from procedural code to sets: Part 2

time to read 31 min | 6072 words

In the previous post I showed how it is possible to use aggerate functions to avoid procedural code in SQL. The result of this post actually surprised me, since I didn't know that this was possible until I tried it (I intended to show that this is not possible, and then show another way). As an aside, looking into the results of both queries, the cursor based one was wrong by several thousands, because of data type conversions that were happening (losing precision from numeric to money, mostly).

Well, I still want show how to do it in more complex statements, but I'm having trouble of thinking of a problem simple enough to explain that cannot be solved better, as I showed yesterday. So, let's go back to my mythological Payments table, which now looks like this:

Amount To Payments Date
626.329 Gas 23 7/1/06
794.995 Food       10 7/1/06
296.007 Drink       20 7/1/06

The table contains 1,000,000 rows, and has no indexing / primak keys whatsoever.

Here is another contrived example, which now will also calculate the average amount of a single payment, as well as the total number of single payments:

DECLARE

      @current_amount money,

      @current_NumOfPayments int,

      @total money,

      @tmp money,

      @avg money,

      @count NUMERIC

 

DECLARE big_payments CURSOR

            FOR SELECT amount,NumOfPayments FROM Payments

 

SET @total = 0

SET @count = 0

 

OPEN big_payments

 

FETCH NEXT FROM

     big_payments INTO @current_amount, @current_NumOfPayments

 

WHILE @@FETCH_STATUS != -1

BEGIN

 

      IF (@current_NumOfPayments IS NOT NULL AND @current_NumOfPayments <> 0)

            SET @tmp = @current_amount/@current_NumOfPayments   

      ELSE

        SET @tmp = @current_amount

 

      SET @total = @total + @tmp  

        SET @count = @count +1

 

      FETCH NEXT FROM

            big_payments INTO @current_amount, @current_NumOfPayments

 

END

 

CLOSE big_payments

DEALLOCATE big_payments

 

IF (@count <> 0)

      SET @avg = @total / @count

 

SELECT @total, @avg

This query execute in over a minute (01:30, to be exact), this is obviously quite unacceptable, it is possible to use the previous technique here as well, using this statement (runtime over million records is less than a second):

SELECT SUM(

        CASE NumOfPayments

                  WHEN NULL THEN Amount

                  WHEN 0 THEN Amount

                  ELSE Amount/NumOfPayments

        END

) SumOfPayments,

SUM(

        CASE NumOfPayments

                  WHEN NULL THEN Amount

                  WHEN 0 THEN Amount

                  ELSE Amount/NumOfPayments

        END

) / count(*) as AveragePayment
FROM Payments

But this is just ugly, in my opinion, not to mention that it duplicate code. Also, in certain situations it gets ugly (very) fast(think of the case where I need to use both the sum and the average in yet another calculation (the code that started this talk had about seven such calculations, btw).

Here is a better solution, which involve nesting statements, like this:

SELECT

      SumOfPayments,

      AvergatePayment = CASE CountOfPayments

            WHEN NULL THEN

                  SumOfPayments

            WHEN 0 THEN

                  SumOfPayments

            ELSE

                  SumOfPayments / CountOfPayments

            END

FROM (

      SELECT SUM(

              CASE NumOfPayments

                        WHEN NULL THEN Amount

                        WHEN 0 THEN Amount

                        ELSE Amount/NumOfPayments

              END

      ) SumOfPayments,

      count(*) as CountOfPayments

      FROM Payments

) TmpCalc

This method allows for temporary calculations and avoid duplicating code. This statement also run in under a second, and has the same execution plan as the previous one. I have used this method to break up the calculations so I could turn a true masterpiece of cursor based logic into a single (big) SQL statement.