Monday, March 19, 2012

JOINing on derived tables?

I have an UPDATE query that sets a "quantity" field in a table based
on the sum of events in another table. Those events are "basketed"
into different accounts through a four-way tupple, (acctId, portcode,
mgrgpcode, invpgm).
UPDATE requires the use of a derived table when using aggregates, fair
enough. The problem is that the interior derived table query is very
expensive, yet only a few rows of the returned recordset match in the
outer table. Without artificial limits, the query takes on the order
of 30 seconds, when the entire query batch otherwise takes about 5 to
10.
Here is the query in question (tpPNL means "temporary profit 'n
loss") . tpHPL already contains a number of rows for various accounts,
ONE of these rows is in an account that needs the complex calculation
of the inner query. Yet when the query runs, it does so for every
record in tblTrades, which has 2 million+ rows. I have artificially
introduced a WHERE constraint to limit this down for testing purposes,
but this is far from ideal. What should happen is that the inner query
will return one row for every (acctId, portcode, mgrgpcode, invpgm)
tupple in the outer table (tpHPL).
I realize I can do another sub-select on tpHPL and return a list of
which of those tupples is being used, but this strikes me as yet
another performance hit. Is there some easy way to have the inner
JOINed on the outer so this "just happens"?
UPDATE tpPNL SET
openingMVLocalCcy = s.openingMV,
closingMVLocalCcy = s.closingMV,
openingMVAcctCcy = s.openingMV * h.openingFX,
closingMVAcctCcy = s.closingMV * h.closingFX
FROM tpPNL h JOIN
(SELECT acctId, portcode, mgrgpcode, invpgm,
SUM(
CASE
WHEN TranDate>@.startDate THEN 0
ELSE amount
END) as openingMV,
SUM(
CASE
WHEN TranDate>@.endDate THEN 0
ELSE amount
END) as closingMV
FROM tblTrades
WHERE deleted=0
AND portcode=400
GROUP BY acctId, portcode, mgrgpcode, invpgm) as s
ON s.acctId=h.accountId AND s.portcode=h.portfolioId AND
s.mgrgpcode=h.groupIdI would write an EXISTS test in the subquery that checks for matches
in tpHPL. If there are really as few matches as you say it should pay
off. Alternately, it might be possible to simply JOIN tblTrades and
tpHPL in the subquery, though that would only work if the set of join
columns constitutes the full key to tpHPL.
Not that it sounds like you need me to tell you how to do that, just
saying that is what I would do.
Roy Harvey
Beacon Falls, CT
On Wed, 30 Apr 2008 08:02:13 -0700 (PDT), Maury Markowitz
<maury.markowitz@.gmail.com> wrote:
>I have an UPDATE query that sets a "quantity" field in a table based
>on the sum of events in another table. Those events are "basketed"
>into different accounts through a four-way tupple, (acctId, portcode,
>mgrgpcode, invpgm).
>UPDATE requires the use of a derived table when using aggregates, fair
>enough. The problem is that the interior derived table query is very
>expensive, yet only a few rows of the returned recordset match in the
>outer table. Without artificial limits, the query takes on the order
>of 30 seconds, when the entire query batch otherwise takes about 5 to
>10.
>Here is the query in question (tpPNL means "temporary profit 'n
>loss") . tpHPL already contains a number of rows for various accounts,
>ONE of these rows is in an account that needs the complex calculation
>of the inner query. Yet when the query runs, it does so for every
>record in tblTrades, which has 2 million+ rows. I have artificially
>introduced a WHERE constraint to limit this down for testing purposes,
>but this is far from ideal. What should happen is that the inner query
>will return one row for every (acctId, portcode, mgrgpcode, invpgm)
>tupple in the outer table (tpHPL).
>I realize I can do another sub-select on tpHPL and return a list of
>which of those tupples is being used, but this strikes me as yet
>another performance hit. Is there some easy way to have the inner
>JOINed on the outer so this "just happens"?
>UPDATE tpPNL SET
> openingMVLocalCcy = s.openingMV,
> closingMVLocalCcy = s.closingMV,
> openingMVAcctCcy = s.openingMV * h.openingFX,
> closingMVAcctCcy = s.closingMV * h.closingFX
>FROM tpPNL h JOIN
>(SELECT acctId, portcode, mgrgpcode, invpgm,
> SUM(
> CASE
> WHEN TranDate>@.startDate THEN 0
> ELSE amount
> END) as openingMV,
>SUM(
> CASE
> WHEN TranDate>@.endDate THEN 0
> ELSE amount
> END) as closingMV
> FROM tblTrades
> WHERE deleted=0
> AND portcode=400
>GROUP BY acctId, portcode, mgrgpcode, invpgm) as s
> ON s.acctId=h.accountId AND s.portcode=h.portfolioId AND
>s.mgrgpcode=h.groupId|||On Apr 30, 11:41=A0am, "Roy Harvey (SQL Server MVP)"
<roy_har...@.snet.net> wrote:
> off. =A0Alternately, it might be possible to simply JOIN tblTrades and
> tpHPL in the subquery, though that would only work if the set of join
> columns constitutes the full key to tpHPL.
Can you give me a simple example of this? I was thinking of something
like...
WHERE acctId IN (select distinct acctId from tpHPL)
AND portcode IN (select distinct portfolio from tpHPL)
but that seems expensive!
Maury|||Your approach of using two independent IN clauses is incorrect.
Imagine that we had two rows of data in tpHPL:
acctId portcode
ABC XYZ
BCD MNO
Using two independent IN clauses that would match any of four
combinations:
ABC XYZ
ABC MNO
BCD MNO
BCD XYZ
What I suggest instead is to use an EXISTS test in the subquery.
UPDATE tpPNL
SET openingMVLocalCcy = s.openingMV,
closingMVLocalCcy = s.closingMV,
openingMVAcctCcy = s.openingMV * h.openingFX,
closingMVAcctCcy = s.closingMV * h.closingFX
FROM tpPNL h
JOIN (SELECT acctId, portcode, mgrgpcode, invpgm,
SUM(CASE WHEN TranDate > @.startDate
THEN 0
ELSE amount
END) as openingMV,
SUM(CASE WHEN TranDate > @.endDate
THEN 0
ELSE amount
END) as closingMV
FROM tblTrades
WHERE deleted = 0
AND portcode = 400
AND EXISTS
(SELECT * FROM tpPNL as X
WHERE tblTrades.acctId = X.accountId
AND tblTrades.portcode = X.portfolioId
AND tblTrades.mgrgpcode = X.groupId)
GROUP BY acctId, portcode, mgrgpcode, invpgm) as s
ON s.acctId = h.accountId
AND s.portcode = h.portfolioId
AND s.mgrgpcode = h.groupId
Roy Harvey
Beacon Falls, CT
On Wed, 30 Apr 2008 11:28:02 -0700 (PDT), Maury Markowitz
<maury.markowitz@.gmail.com> wrote:
>On Apr 30, 11:41 am, "Roy Harvey (SQL Server MVP)"
><roy_har...@.snet.net> wrote:
>> off. Alternately, it might be possible to simply JOIN tblTrades and
>> tpHPL in the subquery, though that would only work if the set of join
>> columns constitutes the full key to tpHPL.
>Can you give me a simple example of this? I was thinking of something
>like...
>WHERE acctId IN (select distinct acctId from tpHPL)
> AND portcode IN (select distinct portfolio from tpHPL)
>but that seems expensive!
>Maury

No comments:

Post a Comment