Friday, February 24, 2012

JOIN Question

Good Morning,

I'm attempting to pull information regarding the orders placed over the last week for all of our customers. I want my result set to show a listing for each customer and any orders they have placed, or if they have not placed any orders, just a line with a NULL value or something similar. Ultimately, I may use a COUNT on this information, but for right now, I am just trying to work around the JOIN issue I'm having. Here is an example of what I am trying to do. I rewrote the query with generic names.

Code Snippet

SELECT Customers.CustID, Orders.OrderID, Order.OrderDate
FROM Customers FULL OUTER JOIN Orders
ON Customers.CustID = Orders.CustID
WHERE Orders.OrderDate >= DATEADD(week, DATEDIFF(week, 0, GETDATE())-1, 0) AND
Orders.OrderDate < DATEADD(week, DATEDIFF(week, 0, GETDATE()), 0)
ORDER BY Orders.OrderDate, Customers.CustID

I've tried this query using LEFT, RIGHT and FULL OUTER JOINS, but none seem to give me the results I desire. Below is an example of what I would like to see.

CustID OrderID OrderDate

00001 02345 08/05/2007
00001 02356 08/05/2007
00002 02347 08/05/2007
00003 NULL 08/05/2007
00004 02349 08/05/2007
00001 NULL 08/06/2007
00002 02358 08/06/2007
00003 02360 08/06/2007
00004 NULL 08/06/2007
.
.
.
.
.

Many thanks in advance for any help you can provide.
--
Anthony

The following query might help you..

Code Snippet

Create Table #orderdata (

[CustID] Varchar(100) ,

[OrderID] Varchar(100) ,

[OrderDate] datetime

);

Insert Into #orderdata Values('00001','02345','08/05/2007');

Insert Into #orderdata Values('00001','02356','08/05/2007');

Insert Into #orderdata Values('00002','02347','08/05/2007');

Insert Into #orderdata Values('00004','02349','08/05/2007');

Insert Into #orderdata Values('00002','02358','08/06/2007');

Insert Into #orderdata Values('00003','02360','08/06/2007');

Create Table #customer (

[CustID] Varchar(100)

);

Insert Into #customer Values('00001');

Insert Into #customer Values('00002');

Insert Into #customer Values('00003');

Insert Into #customer Values('00004');

Create function which will list all the dates between given from & to dates.

Code Snippet

Create Function DaysBetween(@.Startdate datetime,@.Enddate datetime)

returns @.dates Table (Date datetime)

as

Begin

While @.Startdate <= @.Enddate

Begin

Insert Into @.dates values(@.Startdate);

Set @.Startdate = Dateadd(dd,1,@.Startdate);

End

return;

End

First Cross Join the dates & customer id, then use outer join to link with your order table

Code Snippet

Select CustAndDate.CustID, OrderID, CustAndDate.Date from

(Select date,[CustID] from DaysBetween( DATEADD(week, DATEDIFF(week, 0, GETDATE())-2, 0), DATEADD(week, DATEDIFF(week, 0, GETDATE()), 0)) days

cross join #customer) as CustAndDate

Left Outer Join #orderdata orders on orders.[CustID] = CustAndDate.[CustID] and orders.[OrderDate] = CustAndDate.date

|||

You were unable to get the results you desired simple because if a Customer did NOT place an order on a particular day, there would not be a OrderDate available.

As Mani has demonstrated very well, the solution to problems like this is to use some form of a Calendar table, and then to include the Calendar table in the JOIN to have a way to include 'missing dates'. You may find this article about the benefits of having a Calendar table to be useful. (Most databases 'should' have a permanent Calendar table. So many different operations involving dates are simplified by using a Calendar table.)

Datetime -Calendar Table
http://www.aspfaq.com/show.asp?id=2519

Mani -Excellent presentation of your suggested solution!!

|||First, thanks to both Mani and Arnie.

I'm still a bit confused on this. I'm reading about creating a calendar table now, but I guess I don't understand why if I request the names from one table, then join with orders in another table that it would not return at least one row for each name, regardless of whether it had an order associated and my thought was that in those situations, that single row would return NULL and I could work with that.

The reason I went the route of using a JOIN was that I saw if I just pulled from the orders table and a particular customer did not place an order, there would be no way of coming up with those names out of thin air.

I apologize for my continued confusion here.

Thanks Again
--
Anthony
|||

Anthony,

You can create a LEFT JOIN between Customers and Orders, and ALL Customers will be listed -BUT the OrderDate will be NULL.

From you presentation of desired results, there would be no way to indicate a OrderDate for a Customer that has not placed an Order -SINCE you don't know on what date they didn't place it.

So to have a 'placeholder' for the Customer on each date (as in your desired results), you have to JOIN with a table that has all possible dates -therefore the JOIN with a Calendar table.

|||Arnie,

After playing around with this a bit more, I understand now. It turns out, we had created a function previously that has the same effect as the Calendar table. I was able to select the customers and dates using the cross join on the function and customers table, then use the left join on that and the orders table and it worked just as Mani had stated.

I thank you both again for your assistance. I may still look into creating numbers and calendar tables as this seems like it might be a better approach than our existing function.
--
Anthony
|||Having static tables is most definitely better than having a function that is 'on demand' creating a temporary table. Every time you call the function, it creates a table -so every time you use it, you are wasting time and slowing performance.

|||Alright. I've got another question that is directly related to this. If I should post to a new thread, I apologize. What if I wanted to get the same result, but also factor in a particular product? So my example of how the output would look would be the same, but there would be an addition condition of something like ...

Code Snippet

AND ProductID = 23456

The Product ID is in the Orders table, so I tried just tacking that on to the WHERE clause in my outer select (the one with the LEFT OUTER JOIN), but it results in the ones that do not have that product not showing instead of showing zeros. In fact, if I use the query for all orders, it works, but when I add on that extra condition, with no other changes, it does not. I would venture this has something to do with the nested selects and the couple joins.

I feel like I understand what was done in your previous suggestion, but I'm not sure I am familiar enough to modify that in a way to suit this need.

Thanks again for the help.
--
Anthony

|||

Add

AND ProductID = 23456

to the JOIN condition for the #OrderData table (using Mani's sample code).

|||Arnie,

Sorry I didn't get your question until this morning. I believe I am already doing what you are suggesting, but it is not working. Currently, I have my query pulling the orders for specific date ranges dependant upon the day of the week. In my existing query, I'm using the function that we already had setup (I will likely setup the numbers and calendar tables) and so long as I am looking at all orders, it works fine. Here is my query so far. By the way, I've had to modify my query so not to give out sensitive information. I apologize if I've messed something up in the query below, but I assure you, this does give me the desired result.

Code Snippet

DECLARE @.StartTime datetime
DECLARE @.EndTime datetime

IF DATEPART(weekday, GETDATE()) IN (1, 2)
BEGIN
SET @.StartTime = DATEADD(week, DATEDIFF(week, 0, GETDATE())-1, 0)
SET @.EndTime = DATEADD(day, -1, DATEADD(week, DATEDIFF(week, 0, GETDATE()), 0))
END
ELSE IF DATEPART(weekday, GETDATE()) = 3
BEGIN
SET @.StartTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-1, 0)
SET @.EndTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-1, 0)
END
ELSE IF DATEPART(weekday, GETDATE()) = 4
BEGIN
SET @.StartTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-2, 0)
SET @.EndTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-1, 0)
END
ELSE IF DATEPART(weekday, GETDATE()) = 5
BEGIN
SET @.StartTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-3, 0)
SET @.EndTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-1, 0)
END
ELSE IF DATEPART(weekday, GETDATE()) = 6
BEGIN
SET @.StartTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-4, 0)
SET @.EndTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-1, 0)
END
ELSE IF DATEPART(weekday, GETDATE()) = 7
BEGIN
SET @.StartTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-5, 0)
SET @.EndTime = DATEADD(day, DATEDIFF(day, 0, GETDATE())-1, 0)
END

SELECT CustAndDate.CustName, COUNT(Orders.[Order Num]) AS Total, LEFT(DATENAME(weekday, CustAndDate.TheDate), 3) AS DayName
FROM (
SELECT TheDate, CustName, CustID
FROM fn_TimeDimension(@.StartTime,@.EndTime) CROSS JOIN Customers
WHERE CustID IN (00020, 00025, 00027, 00029, 00030, 00032, 00034)
) AS CustAndDate LEFT OUTER JOIN Orders
ON CustAndDate.CustID = Orders.CustID AND CONVERT(char(8), CustAndDate.TheDate, 112) = CONVERT(char(8), Orders.[Order Date], 112)
WHERE DATEPART(weekday, CustAndDate.TheDate) IN (2, 3, 4, 5, 6)
GROUP BY DATEPART(weekday, CustAndDate.TheDate), CustAndDate.CustName, LEFT(DATENAME(weekday, CustAndDate.TheDate), 3)
ORDER BY DATEPART(weekday, CustAndDate.TheDate), CustAndDate.CustName


Now what I'm trying to do is restrict this to only show certain products, so my thought was that after the line that reads

Code Snippet

WHERE DATEPART(weekday, CustAndDate.TheDate) IN (2, 3, 4, 5, 6)


I would add another line that reads

Code Snippet

AND ProdID = 12345


Unfortunately, this does not work. So I don't know if I am putting that additional condition in the wrong place or what. The ProdID is in the Orders table.

Thanks Again for all your help.
--
Anthony

|||

Try something more like this:

LEFT OUTER JOIN Orders
ON ( CustAndDate.CustID = Orders.CustID

AND CONVERT(char(8), CustAndDate.TheDate, 112) = CONVERT(char(8), Orders.[Order Date], 112)

AND ProdID = 12345

)
WHERE ...

AS a side comment, using CONVERT() on both sides of the equality ensures that indexes CANNOT be used and the the entire table has to be scanned, adding significantly to execution time.

|||Arnie,

This seems to work. I will have to do some more testing. The reason I was using CONVERT on both sides of the equation was to get both dates in the same format as I am under the impression that when you are performing a JOIN, your columns that you join on must be the same, so if TheDate = '08/09/2007 00:00:00' and Order Date = '08/09/2007 11:22:05', then you would have an issue. I guess I could use ISODate instead of TheDate from our calendar function or the calendar table once that has been established and just perform the convert on the date from the orders table.

What would you suggest in a situation such as mine?
|||

You could do something like this:

LEFT OUTER JOIN Orders
ON ( CustAndDate.CustID = Orders.CustID

AND ( Orders.OrderDate >= CustAndDate.TheDate

AND Orders.OrderDate < ( dateadd( day, 1, CustAndDate.TheDate ))

)

AND ProdID = 12345

)
WHERE ...

The Orders.OrderDate is not converted, and it will properly use any indexing.

Substitute your Calendar table date for the CustAndDate.TheDate value, if a better 'match'.

|||So in your example, you are joining on the condition that the Order Date be greater than or equal to 'TheDate', which would be midnight of the day in question, but not more than a day difference between the two. I will certainly take your suggestion under consideration. At present, the query is not taking more than a couple seconds to run, however I any opportunity I have to improve performance is not something to be ignored.

Thanks again to you and Mani for your assistance. I've learned quite a lot since working with SQL, but there is always something new that you've not run into before and it is great to have a resource such as these forums and individuals who take the time to assist others.

No comments:

Post a Comment