SyntaxHighlighter

Wednesday, February 14, 2018

"Cross-Database" Execution Plans Favor Row Mode instead of Batch Mode in Clustered Columns Indexes

Columnstore indexes are the most amazing technology. I have come to love them because of their massive increases in compression ratios and also their performance. My database servers can literally have terabytes of data cached in memory in a tiny 25GB footprint. It's some kind of miracle.

While brushing up for an MCSE refresher in a couple of months, I came across a good series of articles. Itzik Ben-Gan wrote an article about batch mode processing in columnstore indexes a year or so ago back when SQL Server 2016 SP1 was released. In them, he provides some examples. I had some tables with clustered columnstore indexes that looked almost exactly like his so I decided to try out his examples. Unfortunately, I couldn't get batch mode to work as advertised.

Why, I wondered? Perhaps it was a difference of the data? Statistically, his test table and my real table looked almost identical. I kept digging. Then I paused to think about it. What was the difference between his query and mine? Well ... it turns out that I was actually using the master database but referencing the other database with 3-part object notation. It was a form of laziness that has crept into my ad-hoc queries over the past few years. Why bother using a database when I can just reference the database I need? Could that be the problem?

Lo and behold, when I used the same database as the one where the data resided, the execution plan changed and surprisingly looked exactly like Ben-Gan's.

I had accidentally rediscovered something I had learned many years ago: "cross-database" execution plans can be skewed because of the way the optimizer views "remote-database" data. Gak!

Below are the two execution plans (with queries) for reference. Notice that the only difference between the two is that one employs USE BI_DM and the other employs USE master. Both employ 3-part object notation.




Wednesday, October 28, 2015

SSDT: What is the Solution Items Folder?

There's a folder in a SQL Server Data Tools (SSDT) Solution Explorer called Solution Items*. It's a folder and its icon has a dotted outline -- so the implication seems to be that it's some kind of ghost folder, almost to say that it isn't quite like the others. I don't know how long it's been there, but today was the first day I noticed it**.

"What's that?" I wondered to myself. I searched and searched long and hard hours, in the rain, in the snow, and near mountains****. Naturally, the first article that popped up was an MSDN article: Solution Items. There weren't very many articles about it, which means it was a good candidate for this blog****.

In a lot of cases, there are files that a developer might want to draw attention to a file and add it to a Solution that doesn't necessarily fit into the actual code, for instance README files or installation instructions. They are not source files, they are not project files, they are just ... miscellaneous files! But they are still important.

In cases like the one below (an SSIS template), such files can be added (or drag-and-dropped!) into the Solution Items folder, as depicted below.



* For those who don't know, SSDT is just a shell of Visual Studio. Ergo, this information also applies to the full product. 
** Which is weird because I spend 30 hours a week in SSDT. 
*** It's not hard. I live in Denver and it was raining that day. 
**** I try to only post about things that don't already have a good presence in the Internet Mindshare.

Thursday, June 11, 2015

This is what I do with my time ...

I will upload this Pivot Chart which comes from a data warehouse built off an ETL that pulls from an app on my phone that I use to track all my time.


Wednesday, April 22, 2015

DAX: Introductory Resources

As anyone who has worked with a cube or a tabular model knows, getting data out is easy -- but doing more complex calculations is difficult.

I've been developing a data warehouse (Kimball) for my job over the last two months and two of our business analysts are trying to figure out how to get data out of the SSAS tabular models. They know about DAX (the query language of SSAS Tabular and PowerPivot) but they've been having a hard time getting basic tutorials related to DAX.

For analysts new to DAX, I'm going to post some links to sites that I think are helpful for learning DAX. They will be broken down into three categories. I will also try augmenting and improving these lists from time to time.

Update 20150422: If SSMS is too primitive for you, there is also a tool on CodePlex called DAX Studio.

Introductory DAX

  1. Using DAX to retrieve tabular data. This link is a really excellent starting point if you want to run DAX queries in SSMS. It's a much, much easier sandbox compared to PowerPivot or SSDT. 
  2. Data Analysis Expressions (DAX) Reference. DAX is a Microsoft expression language. It was built to resemble the Excel Formula Bar, but it goes way, way beyond that. Microsoft's documentation is sometimes heavy on the technicals and low on intuition. Nonetheless, it's always a good idea to start at the source. 
  3. DAX Queries, Part 1. Chris Webb also has a multi-part series that is an intro to DAX. 

Intermediate DAX

  1. Context in DAX Formulas. Context is an important topic. However, it's not immediately necessary to get data out of your cubes. But rest assured, it will come up. And when it does, it's a challenging topic and will require extra brain-power to tackle. 

Advanced DAX

  1. DAX Patterns. You can pull of some amazing queries and calculations in DAX. Once you've mastered the introductory and intermediate levels, it's time to start learning these great patterns. 

Books

  1. Microsoft Excel 2013 Building Data Models with PowerPivot. I read this book. It's slightly advanced, but a good reference. I recommend reading it after you've figured out the basics of DAX. I also recommend working through as many of the examples as possible!
  2. Microsoft SQL Server 2012 Analysis Services: The BISM Tabular Model. This is an advanced book that is aimed at developers of tabular models. I recommend this only if you are going to be building cubes/tabular models.

Wednesday, January 28, 2015

Pentaho Hosted Demo



The Pentaho Hosted Demo went well.

First and foremost, every time I tried to login, the first attempt failed. I always had to backtrack and login again because the site would just hang at “Just a few moments please.” This is extremely frustrating. I was only able to get in every once in a while. Not cool. Also, the timeout is fairly low – I think I even got booted once while watching a tutorial video!

The demo, naturally, is targeted at Business Analysts. That means it’s more functional and less technical. As a Data and Business Intelligence specialist, I had lots of questions initially. However, I got the gist of what was going on from the demo. Clearly this demo site wouldn’t help me install and configure a Pentaho server, but it would help me connect to data sources and build some reports.

One thing to note here is that there are some terms used that aren’t defined well on the demo site. Terms such dimensions and measures. These are specialized jargon used in the database industries. However, I’ve had to explain the terms to every Business Analyst I’ve ever met. I don’t think they are common terms on the Business side of the house. Let me quickly explain these terms using analogies.
  1. A dimension (which has nothing to do with dimensions as in Star Trek or The Twilight Zone) is like the horizontal axis on a chart. A dimension is a “thing” so to speak, like time, people, houses and so on. 
  2. A measure is like the values that are inside the chart. These are typically numbers – anything that can be counted, really. A measure is a “value”, like $52, 30.525. If you’re just counting “things” (like dimensions, say, fruit) the value can just be 1 – which perhaps indicates “it exists.” Measures can be additive (meaning they can be summed across any dimension), semi-additive (they can be summed across only some dimensions), or non-additive (they cannot be summed at all). Please consult the information superhighway for more information. 
The demo site itself is minimalist and well-designed. It has the now standard soft-gradient look. There is an intro video and several tutorial videos. Each one is about 5 minutes long and gives actual, functional information. In less than 30 minutes, any Business Analyst should be able to create an interactive report, an analyzer report, and a dashboard.

For those of you who aren’t familiar with those terms, I’ll break them down.
  1. An interactive report is a fairly standard report (a grid, a matrix, a chart) that can be filtered using – as Pentaho calls them – prompts. Prompts in other systems are known as parameters or run-time options.
  2. An analyzer report is much more interactive report that allows drill-downs. In fact, let’s say you click on a country: USA. If you have your data hierarchies set up correctly, the report will drill down to states. Pick Colorado and it will further drill down to cities. This type of report is dynamic and allows data to be "sliced and diced" as the saying goes.
  3. A dashboard is a collection of charts and graphs. Each graph or chart is shown in a panel. The panels can be moved around, made larger, smaller, or reshaped. Colors and indicators are integral to dashboards. The whole point of a dashboard is to give basic, high-level information. A red color is often used to indicate to the viewer, “hey, pay attention to this!” 
Somewhere during the middle of the demo, I got a call from a Pentaho rep. My one big question was this: how do I publish one of these reports or dashboards? Is there a plugin or is there some sort of widget that will allow me to integrate this into a webpage? What if I’ve created a marketing website and I want to show off my numbers. Can I throw a dashboard into that webpage easily? Or is Pentaho an data exploration tool meant for internal use only?

He couldn’t answer the question, unfortunately. It sounds like publication is not a major part of Pentaho’s business model. If it’s not, then the product will unfortunately not be useful to me. I have plenty of tools for data exploration that do the exact same thing (Power View jumps to mind). What I need is a way to get these interactive reports and dashboards on a website! Anyway, he said he’ll get back to me with that information.

UPDATE: Embedding Pentaho in a webpagesounds way more complicated than I was hoping.


Open Source Business Intelligence: Pentaho

Pentaho Demo? Looks sweet. 
Business Intelligence has been around for a few years, but the underlying foundation of BI has been around forever. Since data has existed, businesses have wanted to summarize that data and extract valuable information from it like blood from a stone. When modern accounting was codified by Luca Pacioli in the 15th century, he demonstrated not only the double-entry book-keeping system but also ways to summarize that data. You could even go back to Mesopotamia to see early reports.

Wait. BI is just reporting?

Why, yes. That's true. In fact. However, BI tends toward the more modern, the more technical, the more visual, the more flashy. Also, BI might lean a little bit more towards the ... intelligent side of things. KPIs and such.

What kind of offerings does Microsoft have out there?

Excel and SharePoint has some really awesome BI tools thanks to Microsoft's recent initiatives. They want to compete with SAP, Tableau Software, and so forth. But guess what? Those tools are massively expensive and SharePoint is ridiculously hard to install and configure. (There's an MCSE for SharePoint if that gives you any clue).

So what's the cure to all software expensive? Open source, of course.

The one that looks most intriguing right now? I just found Pentaho Community. It looks pretty slick. I'm about to try the Pentaho Hosted Demo. Wish me luck! I'll report back -- if I make it back!

UPDATE: Here's a direct link to the Pentaho demo. Username and password are under the link that says "Login as an Evaluator."

Friday, November 21, 2014

SSAS Warning: Define attribute relationships as 'Rigid' where appropriate

When you're first deploying a dimension with hierarchies in SSDT-BI, you might notice this warning (or more appropriately, this directive): Define attribute relationships as 'Rigid' where appropriate.

A quick interwebs search reveals the answer in dozens of different websites and you'll also probably run across the answer on Microsoft's website. The answer is -- and I quote:
This rule analyzes dimensions to determine whether they contain at least one user-defined hierarchy in which each member attribute uniquely implies the member attribute above it. If the dimension does have such a hierarchy, the rule checks to see whether all the attribute relationships in the dimension have the RelationshipType property set to Flexible.
Thanks, Microsoft documentation! You've given me the answer. Now all I have to do is find the RelationshipType property and we're good to go.

Now where is it? Is it here? Nope. There? Nope. Over yon? Nuh-uh.

Well! Did you just spend 20 minutes of your life searching for the answer and you still haven't found it? Well, it's not completely obvious. Remember, everything in every relationship diagram has properties, even lines and arrows!

  1. In your Multidimensional Analysis Services project in SSDT-BI (or SSDT or BIDS or whatever Microsoft has renamed it this week), open your dimension for editing.
  2. Click on Attribute Relationships
  3. Click on the line-arrow between any two attributes. 
  4. Take a look at the Properties window and you'll see RelationshipType. That's where you can change it to Rigid

Of course, you could read that statement all day and it still might take another twenty minutes to figure out what I'm talking about. In that case -- like me -- you're a visual learner. So in that case, there are two screenshots below to help you find it.

Click on the line-arrow!

Check the properties window. Hint: press the F4 key.
You're welcome, internet.

Update (same day): I just came up with two new helpful tips. Firstly, You can double-click the warning message in the error list and it will take you to the correct location in the project. Secondly, you can also double click on the line-arrow to bring up a dialog bog that will allow you to change the relationship type. See screenshots below.