Friday, March 23, 2012

Joins and inices

I can't understand this behavior:
I'm doing a join on a large table, and I created a
clustered index on the column in the join condition.
I would expect that this should enhance performance,
because of the merge join advantage.
It didn't perform well.
I then tried making that index non-clustered, and doing
a clustered index on an identity column (as a primary
key).
It's weird, but the join performs far better this way.
The only way I can explain this is that the non-clustered
index keeps the values of the join column in fewer pages,
and therefore closer together on disk (fewer reads).
Then again, there must still be the additional seek to
get to the rest of the "data" that row contains.
Even stranger, if the clustered index on the identity col-
umn (PK) is not there, the join doesn't perform well also.
If anyone has any experience with this, or suggestions,
I'd be really grateful for some help.
Thanks in advance,
Andrew.It would be easier to offer ideas if you posted DDL and ideally insert
statements to load some sample data...
this doesn't answer all of your questions but as an FYI...
<<
Even stranger, if the clustered index on the identity col-
umn (PK) is not there, the join doesn't perform well also.
the NC index actually keeps the clustering key as part of it's own key
information at the leaf level of the NC index. So... you're actually
changing the contents of the NC index if you get rid of the clustering key.
Also, NC index is stored entirely differently if there is no clustered index
at all...
--
Brian
"Andrew" <a@.b.com> wrote in message
news:01a401c38dc5$d91d84d0$a401280a@.phx.gbl...
> I can't understand this behavior:
> I'm doing a join on a large table, and I created a
> clustered index on the column in the join condition.
> I would expect that this should enhance performance,
> because of the merge join advantage.
> It didn't perform well.
> I then tried making that index non-clustered, and doing
> a clustered index on an identity column (as a primary
> key).
> It's weird, but the join performs far better this way.
> The only way I can explain this is that the non-clustered
> index keeps the values of the join column in fewer pages,
> and therefore closer together on disk (fewer reads).
> Then again, there must still be the additional seek to
> get to the rest of the "data" that row contains.
> Even stranger, if the clustered index on the identity col-
> umn (PK) is not there, the join doesn't perform well also.
> If anyone has any experience with this, or suggestions,
> I'd be really grateful for some help.
> Thanks in advance,
> Andrew.|||Still...
><<
>Even stranger, if the clustered index on the identity
col-
>umn (PK) is not there, the join doesn't perform well
also.
>the NC index actually keeps the clustering key as part
of it's own key
>information at the leaf level of the NC index. So...
you're actually
>changing the contents of the NC index if you get rid of
the clustering key.
>Also, NC index is stored entirely differently if there
is no clustered index
>at all...
>--
I realize the NC index keeps the clustering key as part
of it's own key, but I had understood that SQL server
keeps a RID (row identifier) internally when there is no
clustered index... so rather thank keeping the clustered
index in the leaf, the RID is kept.
Now, if accessing that RID is slow, I can understand, but
I'd imagined it as a kind of internal indexed identity
of its own. I guess this doesn't make sense, though.
As a follow up, though, is there no use for a NC index
without a clustered index when you want to access columns
outside of the NC indexed column?
Also, can a column be both a clustered and a non-clustered
index (would this help on join performance to access those
other columns?).
Thanks again,
Andrew
>Brian
>
>"Andrew" <a@.b.com> wrote in message
>news:01a401c38dc5$d91d84d0$a401280a@.phx.gbl...
>> I can't understand this behavior:
>> I'm doing a join on a large table, and I created a
>> clustered index on the column in the join condition.
>> I would expect that this should enhance performance,
>> because of the merge join advantage.
>> It didn't perform well.
>> I then tried making that index non-clustered, and doing
>> a clustered index on an identity column (as a primary
>> key).
>> It's weird, but the join performs far better this way.
>> The only way I can explain this is that the non-
clustered
>> index keeps the values of the join column in fewer
pages,
>> and therefore closer together on disk (fewer reads).
>> Then again, there must still be the additional seek to
>> get to the rest of the "data" that row contains.
>> Even stranger, if the clustered index on the identity
col-
>> umn (PK) is not there, the join doesn't perform well
also.
>> If anyone has any experience with this, or suggestions,
>> I'd be really grateful for some help.
>> Thanks in advance,
>> Andrew.
>
>.
>

No comments:

Post a Comment