Wednesday, February 14, 2018

"Cross-Database" Execution Plans Favor Row Mode instead of Batch Mode in Clustered Columns Indexes

Columnstore indexes are the most amazing technology. I have come to love them because of their massive increases in compression ratios and also their performance. My database servers can literally have terabytes of data cached in memory in a tiny 25GB footprint. It's some kind of miracle.

While brushing up for an MCSE refresher in a couple of months, I came across a good series of articles. Itzik Ben-Gan wrote an article about batch mode processing in columnstore indexes a year or so ago back when SQL Server 2016 SP1 was released. In them, he provides some examples. I had some tables with clustered columnstore indexes that looked almost exactly like his so I decided to try out his examples. Unfortunately, I couldn't get batch mode to work as advertised.

Why, I wondered? Perhaps it was a difference of the data? Statistically, his test table and my real table looked almost identical. I kept digging. Then I paused to think about it. What was the difference between his query and mine? Well ... it turns out that I was actually using the master database but referencing the other database with 3-part object notation. It was a form of laziness that has crept into my ad-hoc queries over the past few years. Why bother using a database when I can just reference the database I need? Could that be the problem?

Lo and behold, when I used the same database as the one where the data resided, the execution plan changed and surprisingly looked exactly like Ben-Gan's.

I had accidentally rediscovered something I had learned many years ago: "cross-database" execution plans can be skewed because of the way the optimizer views "remote-database" data. Gak!

Below are the two execution plans (with queries) for reference. Notice that the only difference between the two is that one employs USE BI_DM and the other employs USE master. Both employ 3-part object notation.