Moving from procedural code to sets: Part 2
In the previous post I showed how it is possible to use aggerate functions to avoid procedural code in SQL. The result of this post actually surprised me, since I didn't know that this was possible until I tried it (I intended to show that this is not possible, and then show another way). As an aside, looking into the results of both queries, the cursor based one was wrong by several thousands, because of data type conversions that were happening (losing precision from numeric to money, mostly).
Well, I still want show how to do it in more complex statements, but I'm having trouble of thinking of a problem simple enough to explain that cannot be solved better, as I showed yesterday. So, let's go back to my mythological Payments table, which now looks like this:
Amount | To | Payments | Date |
626.329 | Gas | 23 | 7/1/06 |
794.995 | Food | 10 | 7/1/06 |
296.007 | Drink | 20 | 7/1/06 |
The table contains 1,000,000 rows, and has no indexing / primak keys whatsoever.
Here is another contrived example, which now will also calculate the average amount of a single payment, as well as the total number of single payments:
DECLARE
@current_amount money,
@current_NumOfPayments int,
@total money,
@tmp money,
@avg money,
@count NUMERIC
DECLARE big_payments CURSOR
FOR SELECT amount,NumOfPayments FROM Payments
SET @total = 0
SET @count = 0
OPEN big_payments
FETCH NEXT FROM
big_payments INTO @current_amount, @current_NumOfPayments
WHILE @@FETCH_STATUS != -1
BEGIN
IF (@current_NumOfPayments IS NOT NULL AND @current_NumOfPayments <> 0)
SET @tmp = @current_amount/@current_NumOfPayments
ELSE
SET @tmp = @current_amount
SET @total = @total + @tmp
SET @count = @count +1
FETCH NEXT FROM
big_payments INTO @current_amount, @current_NumOfPayments
END
CLOSE big_payments
DEALLOCATE big_payments
IF (@count <> 0)
SET @avg = @total / @count
SELECT @total, @avg
This query execute in over a minute (01:30, to be exact), this is obviously quite unacceptable, it is possible to use the previous technique here as well, using this statement (runtime over million records is less than a second):
SELECT SUM(
CASE NumOfPayments
WHEN NULL THEN Amount
WHEN 0 THEN Amount
ELSE Amount/NumOfPayments
END
) SumOfPayments,
SUM(
CASE NumOfPayments
WHEN NULL THEN Amount
WHEN 0 THEN Amount
ELSE Amount/NumOfPayments
END
) / count(*) as AveragePayment
FROM Payments
But this is just ugly, in my opinion, not to mention that it duplicate code. Also, in certain situations it gets ugly (very) fast(think of the case where I need to use both the sum and the average in yet another calculation (the code that started this talk had about seven such calculations, btw).
Here is a better solution, which involve nesting statements, like this:
SELECT
SumOfPayments,
AvergatePayment = CASE CountOfPayments
WHEN NULL THEN
SumOfPayments
WHEN 0 THEN
SumOfPayments
ELSE
SumOfPayments / CountOfPayments
END
FROM (
SELECT SUM(
CASE NumOfPayments
WHEN NULL THEN Amount
WHEN 0 THEN Amount
ELSE Amount/NumOfPayments
END
) SumOfPayments,
count(*) as CountOfPayments
FROM Payments
) TmpCalc
This method allows for temporary calculations and avoid duplicating code. This statement also run in under a second, and has the same execution plan as the previous one. I have used this method to break up the calculations so I could turn a true masterpiece of cursor based logic into a single (big) SQL statement.
Comments
Comment preview