Friday, December 30, 2011

DAX Query Plan, Part 1, Introduction

SQL Server 2012 Analysis Services RC0 introduced a new SQL Server Profiler event class, DAX Query Plan, under the Query Processing event category. This is an advanced and rich new event class, but there has been no official document yet. Nonetheless, it has already attracted the attention of some users who are pushing us to release more information as soon as possible. While waiting for an official document to come out, I’ll try to find some spare time to temporarily fill the gap by providing some background information on this event class in a series of blog posts. As always, my goal is to provide accurate information with sufficient technical details. There are plenty of other BI professionals who are eager to help average users to learn this feature through intuitive, practical examples. I’ll be using the tabular model AdventureWorks for SQL Server 2012 RC0 when I need to demonstrate different aspects of DAX query plans through examples.

The DAX Query Plan event class has four event subclasses:
  1. DAX VertiPaq Logical Plan
  2. DAX VertiPaq Physical Plan
  3. DAX DirectQuery Algebrizer Tree
  4. DAX DirectQuery Logical Plan
Trace events of subclasses 1 and 2 are fired when a tabular database is in VertiPaq mode. Trace events of subclasses 3 and 4 are fired when a database is in DirectQuery mode. Since most tabular databases are likely to run in VertiPaq mode, I’ll focus my discussions on the first two types of events.

Logical Plans and Physical Plans

DAX Formula Engine evaluates a DAX expression in multiple stages and generates several tree data structures along the way, see Figure 1. The new trace event outputs two of the trees to help users investigate logic or performance issues. This is a great leap forward from the dark days of debugging MDX expressions. Logical plan trees show the primitive operations that make up the higher level user functions. The powerful yet sometimes mysterious automatic cross-table filtering becomes explicit in logical plan trees. Properties related to the sparsity of a scalar subtree tell you why DAX Formula Engine chooses one execution plan over another. If poor performance is caused by the Formula Engine, physical plan trees can help you locate the expensive sub-expressions that caused the problem.
Format of the Query Plans

Let’s first study the general structure and format common to both types of plan trees. Send the following DAX query to the tabular AdventureWorks database:
define measure 'Internet Sales'[Total Sales Amount] = Sum([Sales Amount])
evaluate
  filter(
    addcolumns(
      crossjoin(
        values('Date'[Calendar Year]),
        values('Product Category'[Product Category Name])
      ),
      "Total Sales Amount", [Total Sales Amount]
    ),
    not isblank([Total Sales Amount])
  )

When you execute this query, the logical plan tree and the physical plan tree are shown in Figures 2 and 3 respectively.

As you can see, each plan tree is output as a multi-line text. Each line represents a single operator node in the tree. The hierarchical structure of a tree is maintained by indentation. Child nodes show up indented below their parent nodes. Sibling nodes have the same level of indentation under their parent. Each line begins with the name of the operator followed by a colon and properties of the operator starting with the operator type.



Types of Operators
There are two types of logical plan nodes and two types of physical plan nodes as shown in the table below. We’ll spend a lot more time in future posts drilling into the details of various operators and their properties.
Plan Type
Operator Type
Description
Logical Plan
ScaLogOp
Scalar Logical Operator
Outputs a scalar value of type numeric, string, Boolean, etc.
RelLogOp
Relational Logical Operator
Outputs a table of columns and rows.
Physical Plan
LookupPhyOp
Lookup Physical Operator
Given a current row as input, calculates and returns a scalar value.
IterPhyOp
Iterator Physical Operator
Given a current row as an optional input, returns a sequence of rows.

Number of Trace Events per Query

Each time the DAX Formula Engine is called to evaluate a DAX expression, a pair of DAX Query Plan events are generated. Therefore, a DAX query (Evaluate statement) triggers exactly two events: a logical plan event and a physical plan event. But an MDX query may produce any number of pairs of events depending on how many times the MDX Formula Engine has to call into the DAX Formula Engine. At the time of this writing, DAX Formula Engine cannot call back into MDX Formula Engine, although this may change in the future.

Event Trigger Points

Ideally, the DAX Formula Engine should generate both the logical plan and the physical plan before any query execution happens so users can capture the plans without being blocked by potentially long-running operations. But this is not the case in the current implementation. Logical plans are built in two stages. The first stage is quick and light-weight, but the second stage may need to execute a portion of the tree therefore potentially expensive. Unfortunately the logical plan event is fired after the second stage is completed, so sometimes users may have to wait for certain time-consuming operations to finish before they can capture the logical plan event. But in most cases, constructing and simplifying a logical plan is a quick process. On the other hand, building a physical plan can often involve expensive operations. Although currently the trace event only shows two types of physical plan nodes: lookup and iterator, there is actually a third type of plan node: spool. A spool plan is when an operator materializes its result in memory by executing its entire subtree. A physical plan tree may contain many nodes built from spools, each of which requires partial execution of a subtree before the entire physical plan tree is fully constructed. In particular, all leaf level nodes which require fetching data from the VertiPaq Engine currently always build spools to store VertiPaq results, therefore, users can see the physical plan event only after all VertiPaq queries have completed.
The new DAX Query Plan trace event can assist you in writing efficient DAX expressions and troubleshooting problematic DAX behavior. How you use them is up to you, but first you need to understand the information contained within the plans and how to interpret it. Today we have gone over the basics such as types of plans, format of text, types of plan nodes, and when and how frequently the events are fired. Next time we are going one step further to examine the various properties of plan nodes.

Tuesday, November 15, 2011

Avoid Pitfalls of Fact Data Prefetching

As mentioned in Chris Webb’s article Query Performance Tuning in Microsoft Analysis Services and SQL Server CAT team’s white paper SQLServer Best Practices Article: Identifying and Resolving MDX Query PerformanceBottlenecks in SQL Server 2005 Analysis Services, MDX Formula Engine (FE) may request more data from the Storage Engine (SE) upfront so that following SE queries can be answered directly from the SE cache. In order to retrieve the right amount of data, FE uses some clever heuristics to construct the prefetch query. While the underlying design delivers great performance most of the time, sometimes the resulting query prefetches too much data. In this blog post I am going to show you a pathological case where a prefetch query seems to completely ignore the slices in the original MDX query thereby scan all partitions instead of just the needed ones. I will explain why it happens so you understand what’s going on in the engine if you ever run into such cases and deal with it accordingly.

To illustrate the problem, I am going to run a series of MDX queries against the AdventureWorks sample database for Microsoft SQL Server 2008R2. In between queries, I send ClearCache commands to eliminate inter-query interference. For each MDX query, we are going to observe the sequence of SE queries through the Query Subcube Verbose profiler trace events. Instead of showing the complete output of Query Subcube Verbose events, I am going to extract only those attributes in the [Date] dimension which are relevant to our discussion. As well known in the MDX community, Analysis Services engine internally uses subcubes to represent queries, therefore I will use the terms query and subcube interchangeably throughout the post.
Now, let’s run our first MDX query to retrieve the values of measure [Internet Sales Amount] for all days in January 2007.
Query 1:
// January 2007
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&[2007], [Date].[Calendar].[Date]), 31) on 1
from [Adventure Works]

Since measure [Internet Sales Amount] is defined as a measure expression, there are three Query Subcube Verbose events, one for the [Exchange Rates] measure group, two for the [Internet Sales] measure group, although only one of them hits the partitions, the other is just an artifact of measure expression evaluation.

Here are the attributes corresponding to levels in the [Calendar] hierarchy extracted from the trace events:
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0 0 20 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1 CY 2007]
[Calendar Semester]:[H1 CY 2007]
[Month Name]:[January 2007]
[Calendar Year]:[CY 2007]

The SE query asks for all days in January 2007, which match exactly the days listed on Axis 1 of the MDX query, with the added benefit of not listing each day individually. There is no prefetch query in this case which makes sense as the original query is asking for all days in January 2007 that matches exactly the SE query.
Now let’s add one more day to the query.
Query 2:

// January 2007 + February 1 2007
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&[2007], [Date].[Calendar].[Date]), 32) on 1
from [Adventure Works]

From the captured profiler trace below

  
You can extract the following information form the three queries sent to the [Internet Sales] measure group:

Non-cache data
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0 0 20 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1 CY 2007]
[Calendar Semester]:[H1 CY 2007]
[Month Name]:[January 2007]
[Calendar Year]:[CY 2007]

Non-cache data
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0 0 21 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1 CY 2007]
[Calendar Semester]:[H1 CY 2007]
[Month Name]:[February 2007]
[Calendar Year]:[CY 2007]

Cache data
Dimension 7 [Date] (0 582 8 0 6 0 0 0 0 0 0 21 4 0 0 0 0 0 0 0 0)
[Date]:[February 1, 2007]
[Calendar Quarter]:[Q1 CY 2007]
[Calendar Semester]:[H1 CY 2007]
[Month Name]:[February 2007]
[Calendar Year]:[CY 2007]

Again there is no prefetch query for January 2007 but FE does prefetch all days in February 2007 first before getting the value of February 1, 2007 from cached data.
Now let’s add one more day again to the query.
Query 3:
// January 2007 + February 1 2007 + February 2 2007
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&[2007], [Date].[Calendar].[Date]), 33) on 1
from [Adventure Works]

Below is the profiler trace events and relevant information extracted from them:

Non-cache data:
Dimension 7 [Date] (0 * * 0 * 0 0 0 0 0 0 * * 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:*
[Calendar Semester]:*
[Month Name]:*
[Calendar Year]:*

Cache data:
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0 0 20 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1 CY 2007]
[Calendar Semester]:[H1 CY 2007]
[Month Name]:[January 2007]
[Calendar Year]:[CY 2007]

Cache data:
Dimension 7 [Date] (0 + * 0 * 0 0 0 0 0 0 * * 0 0 0 0 0 0 0 0)
[Date]:+
[Calendar Quarter]:*
[Calendar Semester]:*
[Month Name]:*
[Calendar Year]:*

One EventSubclass  = Non-Cache data query prefetches fact data into SE cache which is used to answer the subsequent two SE queries shown as EventSubclass  = Cache data, one for all days in January 2007, the other for the first two days of February 2007. But why did the prefetch query ask for data for all dates? Why did we lose all slices in the [Date] dimension? This is really bad as it would have caused all partitions to be scanned. The answer lies in the imprecise nature of the algorithm used to construct the prefetch subcube.

In order to build the prefetch SE subcube, FE keeps track of all relevant MDX sets in a stack-like data structure called Sonar registry. Later on each MDX set in the Sonar registry is used to build a collection of Sonar subcubes. Perfmon counter MDX\Total Sonar subcubes displays the total number of Sonar subcubes generated inside FE.


By default, the conversion from an MDX set to its collection of Sonar subcubes is not a precise process. The resulting Sonar subcubes will cover the original MDX set but may include additional members on some attributes. The Cache Ratio connection string property controls how a Sonar subcube attribute is expanded to include more members than what the original set contains. The actual algorithm is very complex. I am only going to show here a greatly simplified version that illustrates the main idea of the algorithm.

Algorithm 1: Build Sonar subcubes from an MDX set
Input: an MDX set S, for simplicity, assume S is single-grained

For each hierarchy H in S
Since S is single-grained, let Ls be the level of H where members of S reside
For each level L in H starting from the topmost level down to Ls
Identify all members on L with good coverage of S
A member M has good coverage of S if the ratio of the number of its descendants in S to the number of all its descendants on level Ls is greater than or equal to the Cache Ratio
Add M to the list of slicers on level L
For each level with a nonempty list of slicers
Create a Sonar subcube with granularities on attributes corresponding to the levels of H from the topmost level down to Ls
Apply the list of slicers to the subcube

Output: the collection of generated Sonar subcubes

First, apply Algorithm 1 to the [Calendar] set in Query 1

head(descendants([Date].[Calendar].[Calendar Year].&[2007], [Date].[Calendar].[Date]), 31)

you get the following sequence of slices encoded in internal data IDs and eventually FE finds a single good cover slice on the month level.

Sonar [Calendar] [1 * * * * *] ratio: 31/1188
  Sonar [Calendar] [1 4 * * * *] ratio: 31/365
   Sonar [Calendar] [1 4 6 * * *] ratio: 31/181
    Sonar [Calendar] [1 4 6 8 * *] ratio: 31/90
     Sonar [Calendar] [1 4 6 8 20 *] ratio: 31/31 ß good cover

The resulting Sonar subcube is

Sonar Subcube 1
Calendar Year
Calendar Semester
Calendar Quarter
Month Name
Date
2007
H1 CY 2007
Q1 CY 2007
January 2007
*

Here and in the following I only show the important attributes in the Sonar subcube which correspond to levels of the [Calendar] hierarchy. Next, apply Algorithm 1 to the [Calendar] set in Query 2

head(descendants([Date].[Calendar].[Calendar Year].&[2007], [Date].[Calendar].[Date]), 32)

FE finds two good cover slices, one on the month level and the other on the date level.

Sonar [Calendar] [1 * * * * *] ratio: 32/1188
  Sonar [Calendar] [1 4 * * * *] ratio: 32/365
   Sonar [Calendar] [1 4 6 * * *] ratio: 32/181
    Sonar [Calendar] [1 4 6 8 * *] ratio: 32/90
     Sonar [Calendar] [1 4 6 8 20 *] ratio: 31/31 ß good cover

     Sonar [Calendar] [1 4 6 8 21 *] ratio: 1/28
      Sonar [Calendar] [1 4 6 8 21 582] ratio: 1/1 ß good cover

The two Sonar subcubes created are:

Sonar Subcube 2.1
Calendar Year
Calendar Semester
Calendar Quarter
Month Name
Date
2007
H1 CY 2007
Q1 CY 2007
January 2007
*

Sonar Subcube 2.2
Calendar Year
Calendar Semester
Calendar Quarter
Month Name
Date
2007
H1 CY 2007
Q1 CY 2007
February 2007
February 1, 2007


Finally, apply Algorithm 1 to the [Calendar] set in Query 3

head(descendants([Date].[Calendar].[Calendar Year].&[2007], [Date].[Calendar].[Date]), 33)

FE finds three good cover slices, one on the month level and two on the date level.

Sonar [Calendar] [1 * * * * *] ratio: 33/1188
  Sonar [Calendar] [1 4 * * * *] ratio: 33/365
   Sonar [Calendar] [1 4 6 * * *] ratio: 33/181
    Sonar [Calendar] [1 4 6 8 * *] ratio: 33/90
     Sonar [Calendar] [1 4 6 8 20 *] ratio: 31/31 ß good cover

     Sonar [Calendar] [1 4 6 8 21 *] ratio: 2/28
      Sonar [Calendar] [1 4 6 8 21 582] ratio: 1/1 ß good cover

      Sonar [Calendar] [1 4 6 8 21 583] ratio: 1/1 ß good cover

Again, FE created two Sonar subcubes.

Sonar Subcube 3.1
Calendar Year
Calendar Semester
Calendar Quarter
Month Name
Date
2007
H1 CY 2007
Q1 CY 2007
January 2007
*

Sonar Subcube 3.2
Calendar Year
Calendar Semester
Calendar Quarter
Month Name
Date
*
*
*
*
+

Note that Sonar Subcube 2.2 retains slices on all levels of the [Calendar] hierarchy while Sonar Subcube 3.2 only has a slice on the lowest level of the [Calendar] hierarchy. This is an artifact of the current implementation. When multiple covering members of a hierarchy are added to a Sonar subcube, only a single slice is created at the lowest level of the hierarchy.

FE uses Sonar subcubes in several different places, including constructing a larger evaluation node when FE is executing in cell-by-cell mode. In the simple case where an MDX query can be answered directly from SE, Sonar subcubes are used to construct SE queries.

Algorithm 2: Evaluate cell values
Input: a cell iterator and the current Sonar registry
For each cell C
Find the Sonar subcube S that contains C
Build an evaluation node E for S
Execute E. In the simplest case, create SE subcube SSE from S and send SSE to SE.
Lookup value of C from cached result in E
Output: cell values

Another usage of Sonar subcubes is to build a prefetch subcube from a given SE query subcube. The full algorithm is again very complex with exceptions and special handling of arbitrary shapes or unary operators, etc. Here I include a simplified version to illustrate the essence of the algorithm:

Algorithm 3: Build a prefetch subcube for a query subcube
Input: a SE query subcube Squery and the current Sonar registry
Initialize the prefetch subcube Sprefetch as a clone of Squery,
For each Sonar subcube Ssonar in the Sonar registry
For each attribute Aprefetch in Sprefetch
Find the corresponding attribute Asonar in Ssonar
Skip Asonar if it is not a granularity attribute
If Aprefetch is a granularity attribute
If the slice of Asonar is a superset of the slice of Aprefetch
Copy the slice of Asonar to Aprefetch
Else
If the slice of Asonar includes all members
Make Aprefetch a granularity attribute
Output: Sprefetch if it is different from Squery

In Query 1, Sonar Subcube 1 is equivalent to the SE query subcube therefore no prefetch subcube is needed. In Query 2, the first SE query subcube matches Sonar Subcube 2.1 so again no prefetch subcube is needed. But the second SE query would copy the * from [Date] in Sonar Subcube 2.1 therefore produce a prefetch subcube that fetches all days in February. The * in Sonar Subcube 2.1 really means all days in January 2007 but Algorithm 3 treats each attribute independently and is oblivious to the fact that [Date]:* is cross-filtered by slices on other attributes. But the end result of prefetching all days in February 2007 isn’t too bad.

Prefetch Subcube 1
Calendar Year
Calendar Semester
Calendar Quarter
Month Name
Date
2007
H1 CY 2007
Q1 CY 2007
February 2007
*

But when we come to Query 3, there is a * on each attribute from either Sonar Subcube 3.1 or Sonar Subcube 3.2, we end up with a prefetch subcube that has * on all its attributes.

Prefetch Subcube 2
Calendar Year
Calendar Semester
Calendar Quarter
Month Name
Date
*
*
*
*
*

Nowadays we are seeing more and more implementation of systems with multiple terabytes of data in Analysis Services. The above example of overly aggressive prefetching of fact data is unacceptable in such environments. You can disable prefetching altogether by setting connection string property Disable Prefetch Facts=true or you can play with the Cache Ratio connection string property to influence the Sonar subcubes generated by FE. But either approach is likely to have negative performance impact that would require a lot of testing to confirm their overall benefit against a certain query set. If you can control the types of queries in your deployment, it is better to replace queries that fall into this pitfall with ones that produce good Sonar subcubes in the default settings. Using our example, since we know that there are 90 days in the first quarter and the default Cache Ratio is 0.5, we can simply ask for 45 days instead of 33 days to produce a single good cover at quarter level.

Query 4:

// January 2007 + February 1-14 2007
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&[2007], [Date].[Calendar].[Date]), 45) on 1
from [Adventure Works]

 

The captured profiler trace confirmed our conjecture as it shows a single SE query to the [Internet Sales] measure group to retrieve all days in the first quarter of 2007.

Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0 0 * 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1 CY 2007]
[Calendar Semester]:[H1 CY 2007]
[Month Name]:*
[Calendar Year]:[CY 2007]