Monday, February 20, 2012

Join on subquery with distinct?

Can someone give me some reasonable explanation why these two queries are performing dramatically different:

First (slow):

select
tableA.column1,
tableA.column2
from
tableA join (select distinct col1,col2,col3 from myView) vw
on vw.col1=tableA.col1 and vw.col2=tableA.col2 and vw.col3=tableA.col3

Second (fast):

create table #tempTable (col1 varchar(200),col2 int,col3 varchar(200))

insert into #tempTable
select distinct col1,col2,col3 from myView

select

tableA.column1,

tableA.column2

from

tableA join #tempTable vw

on vw.col1=tableA.col1 and vw.col2=tableA.col2 and vw.col3=tableA.col3

SQL Server 2005 Express
Win2K3 Server

Regards,
Marko Simic

Marko:

I can't really mock this without knowing how the view is structured. It is not possible to mock a query plan that involves a view without knowing the components of the view. Can you give details of the makeup of the view?


Dave

|||

Can you post the plan for your queries, such as:

set showplan_text on
--set statistics profile on
go

--your query--


go
--set statistics profile off
set showplan_text off

The showplan one gives you estimated, the profile one gives you actual plan. That will be very helpful in this.

|||

When I executed queries (for generating estimated and actual plans), suddenly problematic one executed much faster.
Most probably, problem was about server's hardware performance, like insufficient RAM, at the moment of query execution.
My assumption is based on the fact that problematic query is using subquery, which by many views, is using more memory then query with temp table.

Anyway, I am sending you a link to XLS file with all statistics (of query with subquery) you asked, in case that something else could cause strange behavior and can be seen from this.

http://139.142.50.130/test/statistics.xls

Thank you for your efforts

|||

Without both to compare it is really impossible to say too confidently, but there are a couple of really large looking table scans (Clustered Index Scan = ordered table scan) followed by a hash match join. This kind of operation can be greatly affected by hardware performance/contention, as the hash match executes faster with more ram.

Comparing the plans might shed some light on it, but if it runs adequately now... :)

|||The optimiser will expand out queries like the first one. Just one of those annoying things about it. You could improve the speed by putting changing it to "select top (999999) distinct col1, col2, col3 from myView", because that will force the server to materialise the table first. Any number will do, so long as it's larger than the number of rows you expect to get out (otherwise you'll lose rows).

Which effectively is the same as populating a temporary table. :)

Rob|||

First of all, thank you all for your replies.
Rob, will you please tell me, what did you mean by "materialise table".
Did you mean that database would move view from memory to hard drive?
And, if that is true, would db engine make that table within tempdb space (as temp tables are) or in original database space?

|||When I say 'materialise', I just mean that the engine will create the table in memory and then join the other tables to it. It won't move it from memory to the hard drive - it will stay in RAM.

Rob|||

ok.Thanks. Then it works as I expected.

|||Ah, great. Glad to help. Can you mark it as an answer please? I know MS are keen for this to happen on all threads.|||With pleasure :) Done|||:) Cheers

No comments:

Post a Comment