Friday, December 30, 2011

DAX Query Plan, Part 1, Introduction

SQL Server 2012 Analysis Services RC0 introduced a new SQL Server Profiler event class, DAX Query Plan, under the Query Processing event category. This is an advanced and rich new event class, but there has been no official document yet. Nonetheless, it has already attracted the attention of some users who are pushing us to release more information as soon as possible. While waiting for an official document to come out, I’ll try to find some spare time to temporarily fill the gap by providing some background information on this event class in a series of blog posts. As always, my goal is to provide accurate information with sufficient technical details. There are plenty of other BI professionals who are eager to help average users to learn this feature through intuitive, practical examples. I’ll be using the tabular model AdventureWorks for SQL Server 2012 RC0 when I need to demonstrate different aspects of DAX query plans through examples.

The DAX Query Plan event class has four event subclasses:
  1. DAX VertiPaq Logical Plan
  2. DAX VertiPaq Physical Plan
  3. DAX DirectQuery Algebrizer Tree
  4. DAX DirectQuery Logical Plan
Trace events of subclasses 1 and 2 are fired when a tabular database is in VertiPaq mode. Trace events of subclasses 3 and 4 are fired when a database is in DirectQuery mode. Since most tabular databases are likely to run in VertiPaq mode, I’ll focus my discussions on the first two types of events.

Logical Plans and Physical Plans

DAX Formula Engine evaluates a DAX expression in multiple stages and generates several tree data structures along the way, see Figure 1. The new trace event outputs two of the trees to help users investigate logic or performance issues. This is a great leap forward from the dark days of debugging MDX expressions. Logical plan trees show the primitive operations that make up the higher level user functions. The powerful yet sometimes mysterious automatic cross-table filtering becomes explicit in logical plan trees. Properties related to the sparsity of a scalar subtree tell you why DAX Formula Engine chooses one execution plan over another. If poor performance is caused by the Formula Engine, physical plan trees can help you locate the expensive sub-expressions that caused the problem.
Format of the Query Plans

Let’s first study the general structure and format common to both types of plan trees. Send the following DAX query to the tabular AdventureWorks database:
define measure 'Internet Sales'[Total Sales Amount] = Sum([Sales Amount])
evaluate
  filter(
    addcolumns(
      crossjoin(
        values('Date'[Calendar Year]),
        values('Product Category'[Product Category Name])
      ),
      "Total Sales Amount", [Total Sales Amount]
    ),
    not isblank([Total Sales Amount])
  )

When you execute this query, the logical plan tree and the physical plan tree are shown in Figures 2 and 3 respectively.

As you can see, each plan tree is output as a multi-line text. Each line represents a single operator node in the tree. The hierarchical structure of a tree is maintained by indentation. Child nodes show up indented below their parent nodes. Sibling nodes have the same level of indentation under their parent. Each line begins with the name of the operator followed by a colon and properties of the operator starting with the operator type.



Types of Operators
There are two types of logical plan nodes and two types of physical plan nodes as shown in the table below. We’ll spend a lot more time in future posts drilling into the details of various operators and their properties.
Plan Type
Operator Type
Description
Logical Plan
ScaLogOp
Scalar Logical Operator
Outputs a scalar value of type numeric, string, Boolean, etc.
RelLogOp
Relational Logical Operator
Outputs a table of columns and rows.
Physical Plan
LookupPhyOp
Lookup Physical Operator
Given a current row as input, calculates and returns a scalar value.
IterPhyOp
Iterator Physical Operator
Given a current row as an optional input, returns a sequence of rows.

Number of Trace Events per Query

Each time the DAX Formula Engine is called to evaluate a DAX expression, a pair of DAX Query Plan events are generated. Therefore, a DAX query (Evaluate statement) triggers exactly two events: a logical plan event and a physical plan event. But an MDX query may produce any number of pairs of events depending on how many times the MDX Formula Engine has to call into the DAX Formula Engine. At the time of this writing, DAX Formula Engine cannot call back into MDX Formula Engine, although this may change in the future.

Event Trigger Points

Ideally, the DAX Formula Engine should generate both the logical plan and the physical plan before any query execution happens so users can capture the plans without being blocked by potentially long-running operations. But this is not the case in the current implementation. Logical plans are built in two stages. The first stage is quick and light-weight, but the second stage may need to execute a portion of the tree therefore potentially expensive. Unfortunately the logical plan event is fired after the second stage is completed, so sometimes users may have to wait for certain time-consuming operations to finish before they can capture the logical plan event. But in most cases, constructing and simplifying a logical plan is a quick process. On the other hand, building a physical plan can often involve expensive operations. Although currently the trace event only shows two types of physical plan nodes: lookup and iterator, there is actually a third type of plan node: spool. A spool plan is when an operator materializes its result in memory by executing its entire subtree. A physical plan tree may contain many nodes built from spools, each of which requires partial execution of a subtree before the entire physical plan tree is fully constructed. In particular, all leaf level nodes which require fetching data from the VertiPaq Engine currently always build spools to store VertiPaq results, therefore, users can see the physical plan event only after all VertiPaq queries have completed.
The new DAX Query Plan trace event can assist you in writing efficient DAX expressions and troubleshooting problematic DAX behavior. How you use them is up to you, but first you need to understand the information contained within the plans and how to interpret it. Today we have gone over the basics such as types of plans, format of text, types of plan nodes, and when and how frequently the events are fired. Next time we are going one step further to examine the various properties of plan nodes.