Monday, February 20, 2012

JOIN Performace Vs. Multiple RersultSets

Can anyone give me some guidelines as to when to chose JOINS over returning multiple resultsets in a strored procedure..

For eample, I have two tables, Orders and OrderDetails, which are linked by a primary key field. There can be orders w/o a corresponding record in orderdetails.

1.) I can return all orders and their details using a stored preocedure that has:

SELECT o.order_id as OrderId, o.customername, od.order_id, od.orderdate FROM orders AS o LEFT OUTER JOIN orderdetails AS od ON (o.order_id=od.order_id)

2.) I can do the same by returning two results sets in a different stored procedure:

SELECT order_id, customername FROM orders

SELECT order_id, orderdate FROM orderdetails

I think the client processing time for the second option will be slightly less, because the resultset I need to filter will only be as big as the orederdetails table (it will not include records for orders that have no details). Regardless, I think this would only make for a small performance gain, so if Option 1 is better in Database performace, I would probably go with that.

I assume the method to choose also depends on table size and # of JOINS. Any guidance would be appreciated. Thanks,

Al

JOIN is a Cartesian operation INNER JOIN is fastest because technically the tables are equal OUTER JOIN defaults to a mathematical NULL because both tables are not equal and there is fixed table position. UNION is a SET operation it also performs implicit distinct by removing duplicates while UNION ALL leaves the duplicates in. UNION operation is slower than JOIN. So here is the performance list INNER JOIN, OUTER JOIN and UNION. I am assuming you know JOIN uses the FROM and ON clause the WHERE is just a filter like the AND operator. Browse SQL Performance by Peter Gulutzan for more info. Hope this helps.|||Thanks for the reply. The various differences in the performance of different types of JOIN is certainly interesting, but I was hoping to get some info on the relative performance of doing a JOIN versus returning multiple resultsets in a single query or stored procedure. I will look into that book, though.|||

I thought that was what I answered you get multiple result set with the UNION operator and it is slower than any JOIN. Try the link below for ADO.NET 2.0 MARS (multiple active result set) from the SQL Server team. Hope this helps.

http://blogs.msdn.com/sqlprogrammability/archive/2006/05/01/MARSIntroduction1.aspx

|||

I have looked into MARS, and that's not exactly what I am talking about either... I am talking about a single stored procedure which has 2 select statements in it like...

Select * From TableA

Select * From TableB

When you have a .net component which returns a dataset from running this procedure, the dataset will contain 2 tables, so you have access to all the data.

The same can be achieved by returning a dataset with one table in it by using a JOIN. My consern is at what point is the JOIN operation more costly than running the two (or more) separate SELECT statements. My understanding of JOINS is that the db has to scan the entire table (the right hand side of the JOIN) in order to find the corresponding records. Wouldn't this take longer than simply returning all the records from the second table, as is done in multiple resultsets?

|||I don't want to confuse you but JOIN started life with the UNION operator but changed math because it was too slow. Check out the book I told you about earlier Peter said when you see those queries remember to rewrite them with the new syntax. There is a limit to the level for OUTER JOIN ANSI SQL limit is four after that you may get strange results. Hope this helps.|||Thanks for the help|||

Caddre:

... There is a limit to the level for OUTER JOIN ANSI SQL limit is four after that you may get strange results. ....

Hello, Caddre

Do you have references for this statement? Plus any SQL Server implementation? I didn't catch this information from Books Online. Thanks

|||

The reference is in SQL Performance by Peter Gulutzan and you did not see it because Microsoft just implemented ANSI SQL 92 JOIN syntax while Peter the person who gave MySQL stored procs in six months was talking about ANSI SQL 1999 because the current version of ANSI SQL is 2003. ANSI SQL experts will tell you about things vendors like Microsoft will not cover in the docs because although their product works with current ANSI SQL standard they don't want users coming with complains so their docs show older ANSI SQL standard in the docs. Try the link below and read what they covered online for SQL Server 2005, if you write T-SQL and not ANSI SQL it is almost confusing.

http://msdn2.microsoft.com/en-us/library/ms187518.aspx

|||

Strictly speaking, doing 2 selects will return less data, since the join returns data from the parent table multiple times. On the other hand, if you do the join, then you don't have to do a lookup to find the parent order for an order detail. I'd go with the second option because it's easier, and there are other ways to handle memory issues.

In general, it's bad form to suck everything down from a table, unless you know that it's small. Assuming that you get 100 orders/day, in 4 years you'll have 100,000 rows in your orders table. I doubt that you'll need them all in memory very often. With a decent where clause, you can cut that down a lot, improving speed and memory use. (I once had to do some major performance enhancements on a faxing application that someone else wrote. I more than doubled it's speed by replacing a "select * from table" to a query that just got the oldest fax that was eligible to be sent out.)

|||

akashenk,

have you read the article where John Papa has done some tests on

Retrieve all of the data from a single joined query into a single DataTable
Retrieve each rowset (three in all) from a query into its own DataTable using subqueries
Retrieve each rowset (three in all) from a query into its own DataTable using joins
Retrieve each rowset (three in all) from a query into its own DataTable using parameters.

http://msdn.microsoft.com/msdnmag/issues/03/06/DataPoints/

|||Yes, but he's using Northwind, which is a demo database. At the last place I worked, retrieving the Customers database took several minutes. Doing a join took negligably longer than not doing a join. (Several aplications had small middle tier servers just for getting Customer inforation in a timely manner.)|||

This was exactly the sort of info I was looking for. I had a suspicion that doing a join could make for slower performance, but i had no way of knowing for sure.

No comments:

Post a Comment