Columnstore indexes are the most amazing technology. I have come to love them because of their massive increases in compression ratios and also their performance. My database servers can literally have terabytes of data cached in memory in a tiny 25GB footprint. It's some kind of miracle.
While brushing up for an MCSE refresher in a couple of months, I came across a good series of articles. Itzik Ben-Gan wrote an article about batch mode processing in columnstore indexes a year or so ago back when SQL Server 2016 SP1 was released. In them, he provides some examples. I had some tables with clustered columnstore indexes that looked almost exactly like his so I decided to try out his examples. Unfortunately, I couldn't get batch mode to work as advertised.
Why, I wondered? Perhaps it was a difference of the data? Statistically, his test table and my real table looked almost identical. I kept digging. Then I paused to think about it. What was the difference between his query and mine? Well ... it turns out that I was actually using the master database but referencing the other database with 3-part object notation. It was a form of laziness that has crept into my ad-hoc queries over the past few years. Why bother using a database when I can just reference the database I need? Could that be the problem?
Lo and behold, when I used the same database as the one where the data resided, the execution plan changed and surprisingly looked exactly like Ben-Gan's.
I had accidentally rediscovered something I had learned many years ago: "cross-database" execution plans can be skewed because of the way the optimizer views "remote-database" data. Gak!
Below are the two execution plans (with queries) for reference. Notice that the only difference between the two is that one employs USE BI_DM and the other employs USE master. Both employ 3-part object notation.
As a professional, I am a data scientist with an interest in the languages, linguistics, and cultures of ancient civilizations. I primarily work with Microsoft SQL Server, Access, Excel, and PostgreSQL but I have 15 years of experience with databases. I am interested in data analysis, visualization, big data, and business intelligence. I have a master's degree in Classical Studies from UNM where I taught Latin for four years. I run an online dictionary called "Numen - The Latin Lexicon."
Wednesday, February 14, 2018
"Cross-Database" Execution Plans Favor Row Mode instead of Batch Mode in Clustered Columns Indexes
Subscribe to: Posts (Atom)