Thursday, January 6, 2011

Execution Plans and Plan Hints for MDX IIF Function and CASE Statement

MDX IIF function, and similarly MDX CASE statement, is often a cause of performance issues during execution of MDX formulas.  As a developer who worked on this feature, I'll explain the execution plans considered by the MDX formula engine (FE) in this post.

Introduction to some FE concepts

Before I delve into IIF function, I first need to introduce some basic concepts used by MDX FE. I admit that these concepts can be very complex and are worth several dedicated blog posts. The cursory description here can only give you some general ideas to help you understand execution plans for IIF. You can skip this section directly to the discussion about IIF plans and come back here for references when you need to.
  • Subspace
Every MDX expression is evaluated in a current context which includes a subspace.  A subspace is a subset of cells within the cube space. The original subspace is defined by the MDX query, for example, evaluate a measure expression for all product categories and all years. For a given MDX expression, FE will produce a result, like a value for scalar expressions, for every cell in the current subspace.
  • Rectangular subspace vs. arbitrary-shaped subspace
FE represents most subspaces in a very efficient rectangular shape which is a crossjoin of all attributes with non-trivial slices. Many common operations are fast and efficient when performed on rectangular subspaces. For example, search for a cached execution plan; determine whether a cell belongs to a subspace; decide whether an MDX scoped calculation applies to a subspace. But sometimes FE has to create arbitrary-shaped subspaces which you can imagine as rectangular subspaces plus additional constraints such as an enumerated set of tuples. { food, drink } x { 2009, 2010 } is an example of rectangular subspace. { (food, 2009), (drink, 2010) } is an example of arbitrary-shaped subspace. In general, Arbitrary-shaped subspaces are bad for performance.
  • Cell-by-cell mode vs. block mode
FE can evaluate a given MDX formula in cell-by-cell mode, in which case FE first builds a context for each cell in the subspace and then calculates the expression in each single-cell context, or in block mode, in which case FE builds a context which contains all cells in the subspace and evaluates the expression in a single scoop. Microsoft introduced MDX block mode execution in SQL Server 2005 and extended it to much more functions in SQL Server 2008. Typically block mode execution is orders of magnitude faster than cell-by-cell execution.
  • Logical plan
An MDX expression is evaluated in multiple stages, at one stage, a logical plan tree is built. If you have any SQL background, don't confuse the logical plan for an MDX expression with the logical plan of relational engine. In particular, MDX FE may need to execute some subtrees in the middle of building a large logical plan tree.
  • Varying attributes
The varying attributes of an MDX expression in a given subspace are those attributes that have more than one member in the subspace and on which the MDX expression depends. Varying attributes of an expression is derived after FE has built a logical plan tree for the expression.
  • Sparse expression vs. dense expression
An MDX scalar expression is called sparse in a subspace if most of the time it produces a single value, typically NULL. For example, physical measures are sparse in typical query subspaces consisting of crossjoin of dimensional attributes. If an MDX expression is not sparse, it is called dense.
  • Default value
The common single value of a sparse MDX scalar expression is called the default value of that expression. The default value of an MDX scalar expression is determined after FE has built a logical plan for the expression.

Execution plans for IIF function


Now we are ready to explore the execution plans of IIF. Implementation of IIF underwent a major overhaul in SQL Server 2008. In SQL Server 2005, there is limited support of block mode for most MDX functions, therefore IIF branches are more often evaluated in cell-by-cell mode. When SQL Server 2005 is able to evaluate IIF in block mode, it evaluates both branches in the original subspace with some special flag to indicate that the subspace may be too large, see eager mode below.

In this post I focus on the algorithm adopted by SQL Server 2008 and above. As a first step, FE transforms a single IIF function call into nested IIF function calls by eliminating AND operators and OR operators in the predicate expression. Notice that one branch expression is duplicated as a result of this rewrite.

        IIF(a AND b, x, y) => IIF(a, IIF(b, x, y), y)
        IIF(a OR b, x, y) => IIF(a, x, IIF(b, x, y))

Next, FE builds a logical plan for the simplified predicate expression. Note that FE may execute some subtrees while building a logical plan tree, this is a big topic which I have to discuss in a seperate blog post. After FE has built a logical plan tree for the predicate expression, FE performs constant folding by checking whether the simplified predicate is constant in the current subspace, and if so, reducing IIF function to one of its branch expressions.

The next step is to build logical plans for the then branch expression and the else branch expression. FE can evaluate each branch expression in one of two modes: eager mode and strict mode. As mentioned before, the IIF expression is evaluated in a subspace, let's denote it as S0. FE can either evaluate the then branch expression in S0 or FE can build a new subspace, S1, by restricting S0 to those cells that the predicate is true. Similarly, FE can either evaluate the else branch expression in S0 or FE can build another new subspace, S2, by restricting S0 to those cells that the predicate is false. If FE evaluates a branch expression in the original subspace S0, we call it in eager mode. If FE evaluates a branch expression in the restricted subspace S1 or S2, we call it in strict mode.

Strict mode has the obvious benefit of reducing the number of cells an expression is evaluated, so why doesn't FE choose it all the time? As it turns out, constructing a new subspace by evaluating the predicate will produce an arbitrary-shaped subspace if the varying attributes of the predicate expression contain more than one attribute. In general, arbitrary-shaped subspace can lead to bad performance due to various reasons such as cache misses. Moreover, if a branch expression is simple enough, like when it is a constant value, it doesn't make sense to build a new subspace at all.

MDX gives user direct control over whether a branch is executed in eager mode or strict mode through branch hints.  For example, IIF(p, x HINT EAGER, y HINT STRICT) tells FE to build logical plan in eager mode for the then branch and in strict mode for the else branch.

When there is no direct hint from user, FE takes several steps to come up with a final logical plan for each branch expression. It first tries to build a logical plan for each branch in eager mode if it is cheap. It is always cheap to build a plan for a constant value expression. When branch expressions are more complex, FE looks at the varying attributes of the predicate logical plan. If the varying attributes contains more than one attribute, hence will produce an arbitrary-shaped subspace which makes an eager plan more appealing, FE first tries to build an eager logical plan tree for a branch expression if it can be cheaply done, in other words, if it can build a plan tree without executing any subtree. In the cheap mode, if at any point in the process of building a logical plan tree for a branch expression, FE finds out that it needs to execute a subtree before it can proceed, FE will abandon the process and roll back all the way to the root of the current tree. I have seen bad performance caused by this trial-and-error process when a deeply nested IIF tree recursively tried to build a cheap plan, aborted the effort, built a real plan, and then repeated this process at the next nesting level. FE may abandon a cheaply built plan when the predicate expression is sparse and a branch with a cheaply built eager plan is on the minority side. For example, if the predicate is mostly true, the else branch is on the minority side. If a branch is on the minority side and if it is dense or if its default value doesn't match the default value of the IIF expression, its cheaply built eager plan will be abandoned.

At this stage, if FE still doesn't have a logical plan for a branch expression, FE will build a logical plan for that branch in strict mode. It first produces a new subspace for the branch by actually executing the predicate expression, producing a materialized list of tuples of the varying attributes of the predicate that satisfy the condition for that branch, and then applying the list of tuples to the original subspace as a new constraint. FE then generates a logical plan of the branch expression in the new subspace.

As you can see, MDX FE employes a rule-based approach when building execution plans for IIF function.  FE makes decisions based on sparsity and complexity of sub-expressons. Since sparisity of MDX expression is based on heuristics rather than statistics, FE may make the wrong decision which is why we need user hints.

IIF functions are often deeply nested, although this may not be obvious at first glance. As mentioned earlier in the post, AND and OR operators in IIF predicate are rewritten into nested IIF calls. Searched CASE statement is internally rewritten as nested IIF calls as well. Moreover, MDX script IF statement is also internally mapped to an IIF function call. All these internal tree rewrites make deeply nested IIF trees a common phenomenon in many cube.

Although MDX FE employes a complex algorithm to build a final plan for the IIF function, we can still learn some basic lessons that may help us understand performance issues associated with the function.
  • Strict mode helps when a branch expression is very expensive. It can be a lot of extra data has to be read from disk for those extra cells in eager mode. It can also be a very expensive MDX scoped calculation has to be included for one unwanted cell in eager mode. Try plan HINT STRICT if you think AS fetches too much data from disk as a result of evaluating a branch in eager mode.
  • All the trial-and-errors when building the eager plan can be a source of bad performance when the same IIF function is evaluated many times, typically since the parent function chooses cell-by-cell mode, or when deeply nested IIF calls recursively try eager mode and then fall back to strict mode.  Adding plan hints in this case will force FE to make up its mind right away.
  • Order of predicates in nested IIF calls can be important. A wrong condition at the top level may produce an expensive arbitrary-shaped subspace which makes it expensive to evaluate a child subtree. Or a right condition at the top may save a lot of time by eliminating the need to evaluate a more expensive condition at lower level.  Keep in mind that AND and OR operators are rewritten as nested IIF so the order in AND and OR operators is important for the same reason.
  • Sometimes cell-by-cell mode can be fast especially when a top level function is in cell-by-cell mode. I have seen cases where MDX FE spends a lot of time building plans for a branch expression but never execute the plans as the IIF condition was never met. When the IIF function itself is called many times since its parent is in cell-by-cell mode, all the time FE spent on building unused plans is wasted. Here you may try an undocumented HINT LAZY which applies to all MDX expressions but only takes effect when a server config flag LazyEnabled is set to 1.  Using this hint will force FE to delay evaluating the MDX expression until execution time.  No logical plan is built by traversing the expression tree.  WARNING: You will not get support from Microsoft if you use this hint on your own.  You must get approval from Microsoft Customer Service and Support if you want to use this hint in your production system.
Execution plans for CASE statement

There are two types of CASE statements: simple case statement and search case statement.

Search case statement is internally rewritten as nested IIF function calls in the following way:

CASE
WHEN p1 THEN e1
WHEN p2 THEN e2
...
WHEN pN THEN eN
ELSE eL
END

=>

IIF (p1, e1, IIF(p2, e2, ... IIF(pN, eN, eL)...))

After the rewrite, FE executes all IIF functions according to the algorithm outlined in the previous section.

SQL Server 2008 SP2 CTP1 introduced a hint for search case statement. The syntax of hint for CASE is:

CASE HINT <EAGER|STRICT>
WHEN ... THEN ...
...
ELSE ...
END

Internally, the provided hint is propagated to all branches of all IIF functions generated during the rewrite. Afterwards, FE will use the hint in the same way as hints provided directly to an explicit IIF function.

The simple case statement has an execution plan similar to that of IIF function. The main difference is that IIF function only has two branches while simple case statement has N branches. As of this writing, users cannot specify eager or strict hints to simple case statements.

In summary, since MDX IIF function and CASE statement are frequently a source of performance problems, using plan hints may help you solve your problem. Note that plan hints for IIF function are only available in SQL Server 2008 and later and plan hints for search case statement are introduced after SQL Server 2008 SP2 CTP1.  I have to admit trying out plan hints is like shooting in the dark due to the lack of end user tools to display MDX execution plan trees.

8 comments:

  1. Great post, Loved going through it. Took me some time and many reads to finally understand it though :).. Thanks

    ReplyDelete
  2. Great post! We want more! We want more!

    ReplyDelete
  3. Awesome Post!!! Would you be able to let me know of any ebooks, books or sites that dwell on execution of MDX queries as well as performance tuning of MDX queries?
    Thanks in Advance.....

    ReplyDelete
  4. The most recent materials on this topic would be

    Analysis Services 2008 R2 Performance Guide and SQL Server 2008R2 Analysis Services Operations Guide

    http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/10/10/analysis-services-2008-r2-performance-guide.aspx

    http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/06/01/sql-server-2008r2-analysis-services-operations-guide.aspx

    Many people have published all kinds of blogs on MDX in the past, like Mosha Pasumansky, Chris Webb, Marco Russo, etc.

    One of the insider books was Microsoft SQL Server 2005 Analysis Services.

    If you have a specific question, you can always ask it on the MSDN forum for Analysis Services.

    http://social.msdn.microsoft.com/forums/en-US/sqlanalysisservices/threads/

    ReplyDelete