RBAR and UDFs

I was reading this great article on Hidden RBAR: Triangular Joins from 2007 and a similar one on CTEs (who would have thought something everyone does is such bad practice?!)

Naturally the first RBAR article was meant to have a follow up with solutions but it has been 7 years now so I guess it's not coming. I wanted to try having a look at the execution plans, because I'm not very good at them, to see if I could work out what kind of things I should look out for in the future.

So I installed the Northwind database and had a look. First I created a table of 10,000 rows (using the bad method of row generation because I haven't finished that article yet!) and had a look at the execution plan of the last part:

Create Table TenK (
	[Id] int identity(1, 1) Constraint PK_TenK_ID Primary Key,
	Something int
)
Go

; With Cte As (
	Select 	1 As Something
	Union All
	Select 	Something + 1
	From 	Cte
	Where 	Something < 10000)
Insert 	TenK
Select 	Something
From 	Cte
Option 	(MaxRecursion 0)
Go

--===== List the items with a running count
Select 	(	Select	Count(*)
		From	TenK tks
		Where	tks.Something <= tk.Something
	) As RunCount,
	tk.Something
From	TenK tk
Order By Something

And onto the execution plan :

What I first notice is that the arrow thickness increases, only a few go into the Index Spool but far more come out. Here are the properties of the Clustered Index Scan:

It looks fine to me and only has a Number of Executions set to 1. Here are the properties for the Index Spool:

Number of Executions = 10,000! And 10,000 rows went in while "Actual Number of Rows "out was 50,005,000! Just like the article predicted. Gross.

Here are the properties for the Stream Aggregate where it also executed 10,000 times:

Sick.

I also found another discussion on RBAR but involving UDFs, titled Performance Considerations of User-Defined Functions in SQL Server 2012. But again specifically I wanted to look at this the execution plans and work out what kind of things I should be on the lookout for in my day to day work.

So I set up the table by running the first bit of code on that page (I've reproduced it in case that page ever goes down, this is all their code and hard work!):

Create Function dbo.Triple(@Input Int) 
Returns Int
As
Begin
	Declare @Result Int
	Set		@Result = @Input * 3
	Return @Result
End
Go

Create Table dbo.LargeTable (
	KeyVal Int Not Null Primary Key,
	DataVal Int Not Null Check (DataVal Between 1 And 10)
)
Go

; With Digits As (
	Select	d
	From	(Values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) d(d)
) 
Insert	Into dbo.LargeTable (KeyVal, DataVal)
Select	1000000 * sm.d 
		+ 100000 * ht.d + 10000 * tt.d + 1000 * st.d 
		+ 100 * h.d + 10 * t.d + s.d + 1, 
		10 * Rand(Checksum(Newid())) + 1 
From	Digits AS s,  
		Digits AS t,  
		Digits AS h, 
		Digits AS st, 
		Digits AS tt, 
		Digits AS ht, 
		Digits AS sm
Go

Create Index NCL_LargeTable_DataVal On dbo.LargeTable (DataVal)
Go

Then I ran the two bits I wanted to compare; one running the UDF normally and the other running the UDF with a smaller result set from a sub-select and thus avoiding most of the RBAR problem. I wanted to only see a few columns from SHOWPLAN so I whipped it into some PowerShell:

Import-Module SQLPS -DisableNameChecking

$queryText = @"
Set Showplan_All On
Go

Select	Max(dbo.Triple(DataVal)) As MaxTriple
From	dbo.LargeTable d

Select	Max(dbo.Triple(DataVal)) As MaxTriple 
From 	(
	Select	Distinct DataVal
	From	dbo.LargeTable
	) d
Go

Set Showplan_All Off
Go
"@

$queryResult = Invoke-SqlCmd -Query $queryText -Database "Temp" -ServerInstance .\SQL2014
$queryResult | Select-Object -Property StmtText, EstimateRows | ConvertTo-Html # | Format-Table  StmtText, EstimateRows -AutoSize

StmtText	EstimateRows
Select Max(dbo.Triple(DataVal)) As MaxTriple From dbo.LargeTable d	1
\|-Stream Aggregate(DEFINE:([Expr1001]=MAX([Temp].[dbo].[Triple]([Temp].[dbo].[LargeTable].[DataVal] as [d].[DataVal]))))	1
\|-Index Scan(OBJECT:([Temp].[dbo].[LargeTable].[NCL_LargeTable_DataVal] AS [d]))	1E+07
UDF: [Temp].[dbo].[Triple]
Create Function dbo.Triple(@Input Int) Returns Int As Begin Declare @Result Int Set @Result = @Input * 3
Return @Result End
Select Max(dbo.Triple(DataVal)) As MaxTriple From ( Select Distinct DataVal From dbo.LargeTable ) d	1
\|-Stream Aggregate(DEFINE:([Expr1002]=MAX([Temp].[dbo].[Triple]([Temp].[dbo].[LargeTable].[DataVal]))))	1
\|-Stream Aggregate(GROUP BY:([Temp].[dbo].[LargeTable].[DataVal]))	10
\|-Index Scan(OBJECT:([Temp].[dbo].[LargeTable].[NCL_LargeTable_DataVal]), ORDERED FORWARD)	1E+07
UDF: [Temp].[dbo].[Triple]
Create Function dbo.Triple(@Input Int) Returns Int As Begin Declare @Result Int Set @Result = @Input * 3
Return @Result End

You can see that in the first execution, a million rows is going to be shoved into the UDF for processing. In the second execution, only 10 rows gets shoved into the UDF. These are estimates, but close enough to see there's a problem.

It is a both more obvious and more obscure in the Actual Execution Plan in SSMS:

The thickness of the arrow shows that there's a lot of rows being passed into Stream Aggregate. But you can't tell visually what the Stream Aggregate is doing, and the second execution plan has two! You have to right click and open the Properties to inspect the detail.

Here you can see in the original run it all gets passed to the UDF. But you have to dig for it.

In the second execution, the rightmost Stream Aggregate received those million rows but doesn't execute any UDF with it.

The second Stream Aggregate has a much smaller arrow (less rows) going into it and it's the one that calls the function.

If I've misinterpreted anything please do let me know.

Share on: