Moving from procedural code to sets: Part 2

Jan 08 2006

Moving from procedural code to sets: Part 2

time to read 31 min | 6072 words

In the previous post I showed how it is possible to use aggerate functions to avoid procedural code in SQL. The result of this post actually surprised me, since I didn't know that this was possible until I tried it (I intended to show that this is not possible, and then show another way). As an aside, looking into the results of both queries, the cursor based one was wrong by several thousands, because of data type conversions that were happening (losing precision from numeric to money, mostly).

Well, I still want show how to do it in more complex statements, but I'm having trouble of thinking of a problem simple enough to explain that cannot be solved better, as I showed yesterday. So, let's go back to my mythological Payments table, which now looks like this:

Amount	To	Payments	Date
626.329	Gas	23	7/1/06
794.995	Food	10	7/1/06
296.007	Drink	20	7/1/06

The table contains 1,000,000 rows, and has no indexing / primak keys whatsoever.

Here is another contrived example, which now will also calculate the average amount of a single payment, as well as the total number of single payments:

DECLARE

@current_amount money,

@current_NumOfPayments int,

@total money,

@tmp money,

@avg money,

@count NUMERIC

DECLARE big_payments CURSOR

FOR SELECT amount,NumOfPayments FROM Payments

SET @total = 0

SET @count = 0

OPEN big_payments

FETCH NEXT FROM

big_payments INTO @current_amount, @current_NumOfPayments

WHILE @@FETCH_STATUS != -1

BEGIN

IF (@current_NumOfPayments IS NOT NULL AND @current_NumOfPayments <> 0)

SET @tmp = @current_amount/@current_NumOfPayments

ELSE

SET @tmp = @current_amount

SET @total = @total + @tmp

SET @count = @count +1

FETCH NEXT FROM

big_payments INTO @current_amount, @current_NumOfPayments

END

CLOSE big_payments

DEALLOCATE big_payments

IF (@count <> 0)

SET @avg = @total / @count

SELECT @total, @avg

This query execute in over a minute (01:30, to be exact), this is obviously quite unacceptable, it is possible to use the previous technique here as well, using this statement (runtime over million records is less than a second):

SELECT SUM(

CASE NumOfPayments

WHEN NULL THEN Amount

WHEN 0 THEN Amount

ELSE Amount/NumOfPayments

END

) SumOfPayments,

SUM(

CASE NumOfPayments

WHEN NULL THEN Amount

WHEN 0 THEN Amount

ELSE Amount/NumOfPayments

END

) / count(*) as AveragePayment
FROM Payments

But this is just ugly, in my opinion, not to mention that it duplicate code. Also, in certain situations it gets ugly (very) fast(think of the case where I need to use both the sum and the average in yet another calculation (the code that started this talk had about seven such calculations, btw).

Here is a better solution, which involve nesting statements, like this:

SELECT

SumOfPayments,

AvergatePayment = CASE CountOfPayments

WHEN NULL THEN

SumOfPayments

WHEN 0 THEN

SumOfPayments

ELSE

SumOfPayments / CountOfPayments

END

FROM (

SELECT SUM(

CASE NumOfPayments

WHEN NULL THEN Amount

WHEN 0 THEN Amount

ELSE Amount/NumOfPayments

END

) SumOfPayments,

count(*) as CountOfPayments

FROM Payments

) TmpCalc

This method allows for temporary calculations and avoid duplicating code. This statement also run in under a second, and has the same execution plan as the previous one. I have used this method to break up the calculations so I could turn a true masterpiece of cursor based logic into a single (big) SQL statement.

Tweet Share Share 0 comments

Tags:

Oren Eini

Oren Eini

CEO of RavenDB